-
Notifications
You must be signed in to change notification settings - Fork 14
Description
When doing exploratory testing with small subsets of data, I sometimes had situations where one of the columns in the test data was constant, i.e. every row had the same value for that subset of the data.
There is code in the MRA modeling method which adds a constant column to each dataframe, to represent the intercept value in the model:
openavmkit/openavmkit/modeling.py
Line 1657 in 295c080
| if intercept: |
The add_constant method does not add a new column if there's already a constant column in the dataframe. Because of this, if one of the columns happens to be constant in the X_test dataframe but not in the other dataframes, we end up in a situation where X_test has (n) columns and the other dataframes have (n + 1) columns. The mismatched column count leads to failures in the prediction phase after the fitted model is created.
I think using the has_constant='add' argument when running add_constant would prevent this issue - see https://www.statsmodels.org/stable/generated/statsmodels.tools.tools.add_constant.html