ENH: predict tools, helper function using pandas for exog grid #8439

josef-pkt · 2022-10-13T16:00:34Z

helper function for lsmeans, emmeans, predict exog
related: #5387 ...

Currently we don't have any functions to create a grid of exog values for predict that affect more than a single column.
Relevant for predict and marginal/derivative effects.

lsmeans, emmeans and other margin packages in R (and SAS, ...) have functions that create grids of exog values.

For us, the models do not have enough information about the original data, trying to build the grid there is too late.

So, what we need are helper functions for the user to process the original dataframe assuming there are no irrelevant columns.
This would be all using pandas dataframes and the corresponding methods for categorical variables and quantiles.
Using formulas and formula transform will convert this to a design matrix for predict and other methods consistent with the model specification.
(Do we have _get_predict_exog in general? AFAIR, I only added it in a few models.)

bonus:
for purely categorical exog, we might also want to have freq or prob weights for cell frequencies or probabilities in the original sample.
(new get_prediction in discrete allows for aggregation weights.)

The text was updated successfully, but these errors were encountered:

josef-pkt · 2022-10-13T17:51:45Z

a not very quick try:
I didn't find any useful pandas methods, or I don't know it well enough to figure it out

The following works in my example (speed is the endog)
The get_col_values is an iterator to work with python product and can be extended to handle other dtypes like categorical.
Not sure how to handle count data in exog. e.g. user provides list of count varnames, then either use all or round quantiles to int.


included = []
def get_col_values(data2, exclude=["const", "speed"]):
    for col in data2:
        ser = data2[col]
        if ser.name in exclude:
            continue
        if ser.dtype == np.float64:
            values = ser.quantile(q=[0.1, 0.5, 0.9]).to_list()

        if ser.dtype == object:
            values = ser.unique().tolist()  # returns ndarray

        included.append(ser.name)
        yield values

# based on https://stackoverflow.com/a/37755303/333700  
# preserves dtypes of values (but not meta info, e.g. categorical)
result = pd.DataFrame(list(product(*(get_col_values(data2)))), columns=included)
result.head()

josef-pkt · 2023-03-19T17:53:21Z

another question for predicted means at some exog values
Stata margin command defaults to predicted marginal means, not a marginal effect with "marginal" as derivative or difference.

https://stackoverflow.com/questions/75772170/produce-predictive-margins-in-statsmodels-output-for-logistic-regression

josef-pkt · 2023-03-19T18:45:23Z

something that would give us an original array

predict_at where user needs to provide a DataFrame that includes all original variables used in exog and only those.
Then we automatically construct a grid with whatever options for at are specified.
Then we call get_prediction with the constructed "exog", which will still be formula transformed in get_prediction.

The same would be possible if we have the original DataFrame attached to the model and we can identify columns that were used in the formula.

possible ambiguity, we need to know which variables are categorical if they have numeric levels, C(cat) in formula. those should use unique instead of mean.
Fractional exog for categorical variables (like mean gender in the population) would not be possible.

josef-pkt added type-enh comp-base pandas-integration comp-tools topic-predict labels Oct 13, 2022

josef-pkt mentioned this issue Mar 19, 2023

SUMM/REF: create exog for predict at for margins and predictions #7071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: predict tools, helper function using pandas for exog grid #8439

ENH: predict tools, helper function using pandas for exog grid #8439

josef-pkt commented Oct 13, 2022

josef-pkt commented Oct 13, 2022

josef-pkt commented Mar 19, 2023

josef-pkt commented Mar 19, 2023 •

edited

ENH: predict tools, helper function using pandas for exog grid #8439

ENH: predict tools, helper function using pandas for exog grid #8439

Comments

josef-pkt commented Oct 13, 2022

josef-pkt commented Oct 13, 2022

josef-pkt commented Mar 19, 2023

josef-pkt commented Mar 19, 2023 • edited

josef-pkt commented Mar 19, 2023 •

edited