DataFrame with bool type not cast correctly. #880

Closed
jseabold opened this Issue Jun 20, 2013 · 7 comments

Projects

None yet

3 participants

@jseabold
Member

From the mailing list. Replication

sm.Logit(np.random.randint(2, size=50), pandas.DataFrame(sm.add_constant(np.random.randint(2, size=50).astype(bool)))).fit()
@jseabold
Member

X has dtype of object.

I wonder if this worked with older versions of pandas.

@jburroni

this is not a step to reproduce. Sorry

df = pd.DataFrame(data= random.random(100), columns=['rand'])
df.boolean = df.rand > 0.5
sm.Logit(df.rand, sm.add_constant(df['boolean'], prepend=True)).fit()
@josef-pkt
Member

I'm using pandas 0.7.3 and get the same casting

>>> np.asarray(zdf)
array([[0.0, True],
       [2.0, False]], dtype=object)
>>> np.asarray(zdf, float)
array([[ 0.,  1.],
       [ 2.,  0.]])

casting to at least float
>>> np.asarray(zdf, np.find_common_type(tuple(zdf.dtypes) + (float,), []))
array([[ 0.,  1.],
       [ 2.,  0.]])
>>> np.find_common_type(tuple(zdf.dtypes) + (float,), [])
dtype('float64')

for numpy >=1.6

np.result_type(arr, np.float64) can be used to find common dtype

@josef-pkt
Member

one possibility with pandas dataframes:

>>> common_dtype = np.find_common_type(list(xdf.dtypes), [])
>>> if common_dtype is np.dtype('object'): raise ValueError('dtype cannot be converted to numeric')
else: x = np.asarray(xdf, dtype=common_dtype)

this doesn't require a minimum numeric dtype, maybe we want at least float32 if someone wants to calculate in single precision.

@jseabold jseabold added this to the 0.6 milestone May 28, 2014
@jseabold
Member

Bumping this. Contacted off-list about the same issue. I guess we need to cast object dtypes or maybe we should just raise?

X = np.random.random((40,2))
df = pd.DataFrame(X)
df[2] = np.random.randint(2, size=40).astype('object')
df['constant'] = 1

y = pd.Series(np.random.randint(2, size=40))

sm.Logit(y, df).fit()
@jseabold
Member

Really we should have comprehensive handling of dtypes, including single precision, etc. Might be a decent amount of work though. We'd have to check.

@josef-pkt josef-pkt added the prio-high label May 28, 2014
@josef-pkt
Member

asked on the mailinglist: automatic conversion of dates in exog

@jseabold jseabold added a commit to jseabold/statsmodels that referenced this issue Sep 26, 2014
@jseabold jseabold ENH: Better error message on object dtype. Closes #880 eb55b15
@jseabold jseabold added a commit to jseabold/statsmodels that referenced this issue Sep 26, 2014
@jseabold jseabold ENH: Better error message on object dtype. Closes #880 00f6558
@jseabold jseabold closed this in #2013 Sep 26, 2014
@yarikoptic yarikoptic added a commit to yarikoptic/statsmodels that referenced this issue Oct 23, 2014
@yarikoptic yarikoptic Merge commit 'v0.5.0-1491-g850e0e4' into debian-experimental
* commit 'v0.5.0-1491-g850e0e4': (178 commits)
  DOC: Fix versions to match other docs.
  REF/ENH: Use clip pattern. Use it for resid_dev in Poisson.
  STY: Pep-8
  ENH: More numerically stable inv. nbinom.
  STY: Pep-8
  ENH: More numerically stable version of invlogit.
  TST: Test invlogit stability.
  BUG: Fix prediction for ARIMA d > 1. Closes #1562.
  TST: Test predict for ARIMA with d > 1
  TST: Test forecast with ARIMA d > 1.
  BUG: Fix ARIMA.forecast for d > 1.
  ENH: Cleanup unintegrate. Add unintegrate_levels
  STY: Cleanup imports
  ENH: Better error message on object dtype. Closes #880
  TST: Test dtype object error
  TST: Test DataFrame ACF with FFT.
  BUG: 2d 1 columns -> 1d. Closes #322.
  TST: Silence convergence warnings in tests.
  ENH: Do not warn on intermediate results convergence.
  TST: Silence test warnings.
  ...
7e3fe95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment