You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
helper function for lsmeans, emmeans, predict exog
related: #5387 ...
Currently we don't have any functions to create a grid of exog values for predict that affect more than a single column.
Relevant for predict and marginal/derivative effects.
lsmeans, emmeans and other margin packages in R (and SAS, ...) have functions that create grids of exog values.
For us, the models do not have enough information about the original data, trying to build the grid there is too late.
So, what we need are helper functions for the user to process the original dataframe assuming there are no irrelevant columns.
This would be all using pandas dataframes and the corresponding methods for categorical variables and quantiles.
Using formulas and formula transform will convert this to a design matrix for predict and other methods consistent with the model specification.
(Do we have _get_predict_exog in general? AFAIR, I only added it in a few models.)
bonus:
for purely categorical exog, we might also want to have freq or prob weights for cell frequencies or probabilities in the original sample.
(new get_prediction in discrete allows for aggregation weights.)
The text was updated successfully, but these errors were encountered:
a not very quick try:
I didn't find any useful pandas methods, or I don't know it well enough to figure it out
The following works in my example (speed is the endog)
The get_col_values is an iterator to work with python product and can be extended to handle other dtypes like categorical.
Not sure how to handle count data in exog. e.g. user provides list of count varnames, then either use all or round quantiles to int.
included = []
def get_col_values(data2, exclude=["const", "speed"]):
for col in data2:
ser = data2[col]
if ser.name in exclude:
continue
if ser.dtype == np.float64:
values = ser.quantile(q=[0.1, 0.5, 0.9]).to_list()
if ser.dtype == object:
values = ser.unique().tolist() # returns ndarray
included.append(ser.name)
yield values
# based on https://stackoverflow.com/a/37755303/333700
# preserves dtypes of values (but not meta info, e.g. categorical)
result = pd.DataFrame(list(product(*(get_col_values(data2)))), columns=included)
result.head()
another question for predicted means at some exog values
Stata margin command defaults to predicted marginal means, not a marginal effect with "marginal" as derivative or difference.
predict_at where user needs to provide a DataFrame that includes all original variables used in exog and only those.
Then we automatically construct a grid with whatever options for at are specified.
Then we call get_prediction with the constructed "exog", which will still be formula transformed in get_prediction.
The same would be possible if we have the original DataFrame attached to the model and we can identify columns that were used in the formula.
possible ambiguity, we need to know which variables are categorical if they have numeric levels, C(cat) in formula. those should use unique instead of mean.
Fractional exog for categorical variables (like mean gender in the population) would not be possible.
helper function for lsmeans, emmeans, predict exog
related: #5387 ...
Currently we don't have any functions to create a grid of exog values for predict that affect more than a single column.
Relevant for predict and marginal/derivative effects.
lsmeans, emmeans and other margin packages in R (and SAS, ...) have functions that create grids of exog values.
For us, the models do not have enough information about the original data, trying to build the grid there is too late.
So, what we need are helper functions for the user to process the original dataframe assuming there are no irrelevant columns.
This would be all using pandas dataframes and the corresponding methods for categorical variables and quantiles.
Using formulas and formula
transform
will convert this to a design matrix for predict and other methods consistent with the model specification.(Do we have
_get_predict_exog
in general? AFAIR, I only added it in a few models.)bonus:
for purely categorical exog, we might also want to have freq or prob weights for cell frequencies or probabilities in the original sample.
(new get_prediction in discrete allows for aggregation weights.)
The text was updated successfully, but these errors were encountered: