Skip to content

MRA modeling fails when one of the datasets has a constant field #150

@connorschwartz

Description

@connorschwartz

When doing exploratory testing with small subsets of data, I sometimes had situations where one of the columns in the test data was constant, i.e. every row had the same value for that subset of the data.

There is code in the MRA modeling method which adds a constant column to each dataframe, to represent the intercept value in the model:

if intercept:

The add_constant method does not add a new column if there's already a constant column in the dataframe. Because of this, if one of the columns happens to be constant in the X_test dataframe but not in the other dataframes, we end up in a situation where X_test has (n) columns and the other dataframes have (n + 1) columns. The mismatched column count leads to failures in the prediction phase after the fitted model is created.

I think using the has_constant='add' argument when running add_constant would prevent this issue - see https://www.statsmodels.org/stable/generated/statsmodels.tools.tools.add_constant.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions