BinomialBayesMixedGLM predict function returns non-1D array #6158

mnky9800n · 2019-09-10T11:10:35Z

Describe the bug

I tried to use BinomialBayesMixedGLM to predict some values so that I can assess the prediction capability of the model. However, when putting in sample data, the output is not 1D predictions but instead the same shape as the input data. This seems to be a bug since the expected output is a 1-dimensional array of predicted values.

Code Sample, a copy-pastable example if possible

In[1] : from statsmodels.genmod import bayes_mixed_glm as b
        import pandas as pd
        data = pd.DataFrame(my_private_data, columns=['graduates_next_semester', 'first_course_year', 'cumulative_avg_grade', 'hs_gpa', 'female'])
        formula = 'graduates_next_semester ~ C(first_course_year) + cumulative_avg_grade + hs_gpa + C(female)'
        random = {'a':'0 + C(first_course_year)', 'b':'0 + C(first_course_year)*hs_gpa'}
        model = b.BinomialBayesMixedGLM.from_formula(formula, random, data)
        results = model.fit_vb()

In[2] : data.values
Out: array([[ 0.        ,  0.        , -0.16846258, -3.26481235,  0.        ],
       [ 0.        ,  0.        ,  0.25580621,  0.60181999,  1.        ],
       [ 1.        ,  4.        ,  1.64888846,  1.66018495,  0.        ],
       ...,
       [ 0.        , 20.        ,  0.88437497,  0.15070583,  1.        ],
       [ 0.        , 20.        , -1.72232004, -1.76500254,  0.        ],
       [ 1.        , 20.        ,  0.82043374,  0.27632605,  1.        ]])
In[3] : model.predict(data)
Out : array([[0.5       , 0.5       , 0.60994004, 0.00872048, 0.45798368],
       [0.5       , 0.5       , 0.7986418 , 0.01236763, 0.77830333],
       [0.5       , 0.5       , 0.14909222, 0.00507625, 0.93394245],
       ...,
       [0.73105858, 0.5       , 0.72314575, 0.00128766, 0.86811284],
       [0.73105858, 0.5       , 0.34848906, 0.01597394, 0.15157257],
       [0.73105858, 0.5       , 0.71658333, 0.00144936, 0.86061816]])

The text was updated successfully, but these errors were encountered:

bashtage · 2019-09-10T11:47:36Z

What is the dtype of graduates_next_semester? This is probably happening because Patsy is encoding graduates_next_semester as a categorical variable.

mnky9800n · 2019-09-10T11:49:45Z

It's int64. However it is categorical, its just 1 or 0 depending on graduation.

bashtage · 2019-09-10T11:53:03Z

Turn it into a plain int64

mnky9800n · 2019-09-10T12:00:54Z

As in dont have the formula be C(graduates_next_semester) ~...?

The current formula already lacks the C() designation.

As a note:

In : model.endog.dtype
Out : dtype('float64')

bashtage · 2019-09-10T12:08:45Z

You are getting float since Patsy is encoding your categorical as two columns of floating point 0.0 or 1.0 values.

mnky9800n · 2019-09-10T12:27:38Z

Sorry I'm unclear what you are suggesting to do to fix it. Converting graduates_next_semester column to int64 doesnt seem to make a difference:

In[1] : data = df[df.semester_idx==10][[ 'first_course_year', 'cumulative_avg_grade', 'hs_gpa', 'female', 
        'graduates_next_semester',]].copy()
        data['graduates_next_semester'] = data.graduates_next_semester.astype(np.int64)

        formula = 'graduates_next_semester ~ C(first_course_year) + cumulative_avg_grade + hs_gpa + C(female)'

        # formula = 'C(graduates_next_semester) ~ C(first_course_year) + cumulative_avg_grade + hs_gpa + C(female)'

        random = {'a':'1 + C(first_course_year)', 'b':'1 + C(first_course_year)*hs_gpa'}
        model = b.BinomialBayesMixedGLM.from_formula(formula, random, data)
        results = model.fit_vb()
In[2] : model.predict(data[data.columns[:-1]])
Out   : array([[5.00000000e-01, 4.97043420e-01, 3.27626316e-01, 3.12715703e-02],
       [5.00000000e-01, 2.09075156e-01, 2.01295331e-02, 9.68658863e-01],
       [5.00000000e-01, 1.62070571e-01, 7.78242678e-04, 7.46860763e-01],
       ...,
       [5.00000000e-01, 3.45092835e-01, 2.62984333e-02, 8.90587849e-01],
       [5.00000000e-01, 3.11531895e-01, 5.49629386e-03, 7.12442971e-01],
       [5.00000000e-01, 6.60760922e-01, 1.14419101e-01, 9.37076501e-01]])
In[3] : model.endog
Out   : array([0., 0., 0., ..., 1., 1., 0.])
In[4] : model.endog.dtype
Out[4]: dtype('float64')

sean00002 · 2022-04-01T05:52:12Z

I just had the same issue. Instead of using model.predict, you probably wanna try result.predict?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BinomialBayesMixedGLM predict function returns non-1D array #6158

BinomialBayesMixedGLM predict function returns non-1D array #6158

mnky9800n commented Sep 10, 2019

bashtage commented Sep 10, 2019

mnky9800n commented Sep 10, 2019

bashtage commented Sep 10, 2019

mnky9800n commented Sep 10, 2019 •

edited

bashtage commented Sep 10, 2019

mnky9800n commented Sep 10, 2019

sean00002 commented Apr 1, 2022

BinomialBayesMixedGLM predict function returns non-1D array #6158

BinomialBayesMixedGLM predict function returns non-1D array #6158

Comments

mnky9800n commented Sep 10, 2019

Describe the bug

Code Sample, a copy-pastable example if possible

bashtage commented Sep 10, 2019

mnky9800n commented Sep 10, 2019

bashtage commented Sep 10, 2019

mnky9800n commented Sep 10, 2019 • edited

bashtage commented Sep 10, 2019

mnky9800n commented Sep 10, 2019

sean00002 commented Apr 1, 2022

mnky9800n commented Sep 10, 2019 •

edited