BUG: Correct issue if patsy handles missing. Closes #2083. #2084

Merged
merged 11 commits into from Nov 7, 2014

Projects

None yet

2 participants

@jseabold
Member
jseabold commented Nov 7, 2014

No description provided.

@jseabold jseabold added this to the 0.6.1 milestone Nov 7, 2014
@josef-pkt
Member

Does not yet close #2083.
I just saw that offset and exposure in poisson are not handled, at least not in the example. I haven't looked yet at the code path.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Nov 7, 2014
statsmodels/discrete/tests/test_discrete.py
+ d = {'Foo': [1, 2, 10, 149], 'Bar': [1, 2, 3, np.nan],
+ 'constant': [1] * 4, 'exposure' : np.random.uniform(size=4),
+ 'x': [1, 3, 2, 1.5]}
+ df = pd.DataFrame(d)
+
+ # should work
+ mod1 = smf.poisson('Foo ~ Bar', data=df, exposure=df['exposure'])
+
+ # should work, lines up on index, exposure should be array after
+ exposure = pd.Series(np.random.uniform(size=5))
+ mod2 = smf.poisson('Foo ~ Bar', data=df, exposure=exposure)
+ assert_(type(mod2.exposure) is np.ndarray, msg='Exposure is not ndarray')
+
+ # make sure this raises
+ assert_raises(ValueError, sm.Poisson, df.Foo, df[['constant', 'Bar']],
+ exposure=exposure)
@josef-pkt
josef-pkt Nov 7, 2014 Member

why does this raise?
I thought this would just break somewhere with the nans, if missing is default (not 'drop')

@jseabold
jseabold Nov 7, 2014 Member

Default non-formula missing does nothing. It raises because we end up with an exposure longer than endog. I made it length 5 above to make sure the check_inputs call still worked.

@josef-pkt
josef-pkt Nov 7, 2014 Member

ok, I didn't see the definition. (I mixed it up with df['exposure']

@josef-pkt josef-pkt commented on the diff Nov 7, 2014
statsmodels/base/data.py
combined_2d_names += [key]
else:
raise ValueError("Arrays with more than 2 dimensions "
"aren't yet handled")
if missing_idx is not None:
- nan_mask = missing_idx | _nan_rows(*combined)
@josef-pkt
josef-pkt Nov 7, 2014 Member

I don't understand why it's not necessary.

general question on this: Is there currently a handling of extra nans in the extra arrays?
simple case: suppose endog and exog have no nans, but we have nans in exposure or offset, or wls weights.

@jseabold
jseabold Nov 7, 2014 Member

Fixed in a fixup commit. Patsy's behavior makes this more difficult than it needs to be. Yes, adding back in the _nan_rows(*combined) if not empty handles the missing values in extra arrays.

@josef-pkt
josef-pkt Nov 7, 2014 Member

needs a unit test.

Are endog and exog here still the full length? I thought they will be the ones shortened by patsy. In that case, nan rows because of nans in extra arrays would require to drop additional rows from endog and exog.

(It's a pain that we cannot turn off patsy's missing handling. I don't think it will currently be very common to have extra nans in the extra arrays, but we have two multi-equation PR's betareg and heckman, and more will be coming for which we will have to check nans across several separate design matrices.)

@josef-pkt
josef-pkt Nov 7, 2014 Member

What I mean is that, if this is starting to get too messy, than we postpone extra nans in extra arrays as a new feature in 0.7.

@josef-pkt
Member

I'm working on a unit test for from_formula for GEE, it doesn't fail with the old code.
This is similar to WLS and always requires an extra array and wasn't affected by #2083. MixedLM is the same, it always requires groups. Neither of them does the _check_inputs AFAICS.

@josef-pkt
Member

Two test errors on TravisCI: shape mismatch and missing asarray

@jseabold
Member
jseabold commented Nov 7, 2014

I'm going to merge this. It's becoming a mess but technical debt will wait. Re-open #2083 if you find more issues.

@jseabold jseabold merged commit c8e980d into statsmodels:master Nov 7, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@jseabold jseabold deleted the jseabold:fix-2083 branch Nov 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment