Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame with bool type not cast correctly. #880

Closed
jseabold opened this issue Jun 20, 2013 · 7 comments

Comments

Projects
None yet
3 participants
@jseabold
Copy link
Member

commented Jun 20, 2013

From the mailing list. Replication

sm.Logit(np.random.randint(2, size=50), pandas.DataFrame(sm.add_constant(np.random.randint(2, size=50).astype(bool)))).fit()
@jseabold

This comment has been minimized.

Copy link
Member Author

commented Jun 20, 2013

X has dtype of object.

I wonder if this worked with older versions of pandas.

@jburroni

This comment has been minimized.

Copy link

commented Jun 20, 2013

this is not a step to reproduce. Sorry

df = pd.DataFrame(data= random.random(100), columns=['rand'])
df.boolean = df.rand > 0.5
sm.Logit(df.rand, sm.add_constant(df['boolean'], prepend=True)).fit()
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jun 20, 2013

I'm using pandas 0.7.3 and get the same casting

>>> np.asarray(zdf)
array([[0.0, True],
       [2.0, False]], dtype=object)
>>> np.asarray(zdf, float)
array([[ 0.,  1.],
       [ 2.,  0.]])

casting to at least float
>>> np.asarray(zdf, np.find_common_type(tuple(zdf.dtypes) + (float,), []))
array([[ 0.,  1.],
       [ 2.,  0.]])
>>> np.find_common_type(tuple(zdf.dtypes) + (float,), [])
dtype('float64')

for numpy >=1.6

np.result_type(arr, np.float64) can be used to find common dtype

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jun 20, 2013

one possibility with pandas dataframes:

>>> common_dtype = np.find_common_type(list(xdf.dtypes), [])
>>> if common_dtype is np.dtype('object'): raise ValueError('dtype cannot be converted to numeric')
else: x = np.asarray(xdf, dtype=common_dtype)

this doesn't require a minimum numeric dtype, maybe we want at least float32 if someone wants to calculate in single precision.

@jseabold jseabold added this to the 0.6 milestone May 28, 2014

@jseabold

This comment has been minimized.

Copy link
Member Author

commented May 28, 2014

Bumping this. Contacted off-list about the same issue. I guess we need to cast object dtypes or maybe we should just raise?

X = np.random.random((40,2))
df = pd.DataFrame(X)
df[2] = np.random.randint(2, size=40).astype('object')
df['constant'] = 1

y = pd.Series(np.random.randint(2, size=40))

sm.Logit(y, df).fit()
@jseabold

This comment has been minimized.

Copy link
Member Author

commented May 28, 2014

Really we should have comprehensive handling of dtypes, including single precision, etc. Might be a decent amount of work though. We'd have to check.

@josef-pkt josef-pkt added the prio-high label May 28, 2014

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jun 3, 2014

asked on the mailinglist: automatic conversion of dates in exog

jseabold added a commit to jseabold/statsmodels that referenced this issue Sep 26, 2014

jseabold added a commit to jseabold/statsmodels that referenced this issue Sep 26, 2014

yarikoptic added a commit to yarikoptic/statsmodels that referenced this issue Oct 23, 2014

Merge commit 'v0.5.0-1491-g850e0e4' into debian-experimental
* commit 'v0.5.0-1491-g850e0e4': (178 commits)
  DOC: Fix versions to match other docs.
  REF/ENH: Use clip pattern. Use it for resid_dev in Poisson.
  STY: Pep-8
  ENH: More numerically stable inv. nbinom.
  STY: Pep-8
  ENH: More numerically stable version of invlogit.
  TST: Test invlogit stability.
  BUG: Fix prediction for ARIMA d > 1. Closes statsmodels#1562.
  TST: Test predict for ARIMA with d > 1
  TST: Test forecast with ARIMA d > 1.
  BUG: Fix ARIMA.forecast for d > 1.
  ENH: Cleanup unintegrate. Add unintegrate_levels
  STY: Cleanup imports
  ENH: Better error message on object dtype. Closes statsmodels#880
  TST: Test dtype object error
  TST: Test DataFrame ACF with FFT.
  BUG: 2d 1 columns -> 1d. Closes statsmodels#322.
  TST: Silence convergence warnings in tests.
  ENH: Do not warn on intermediate results convergence.
  TST: Silence test warnings.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.