Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Infer dtype from non-nulls when pushing to SQL #8973

Merged
merged 1 commit into from
Dec 8, 2014

Conversation

artemyk
Copy link
Contributor

@artemyk artemyk commented Dec 3, 2014

Closes #8778

This infers dtype from non-null values for insertion into SQL database. See #8778 . I had to alter @tiagoantao test a bit.

Like @tiagoantao, I skipped writing tests for legacy MySQL. Support for this will be removed soon, right?

As a side note, lib.infer_dtype throws an exception for categorical data -- probably shouldn't happen, right?

@jorisvandenbossche
@jahfet
@tiagoantao

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

can u show what u r passing that raises on Categorical
infer_dtypes is internal and cannot be called with certain types of things
but sounds like a bug

@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

Yes, here's the snippet. Interestingly, it works for a dataframe but not for a series.

In [11]: import pandas as pd

In [12]: import pandas.lib as lib

In [13]: df = pd.DataFrame({'a':pd.Series(['A','B'], dtype='category')})

In [14]: lib.infer_dtype(df)
Out[14]: 'string'

In [15]: lib.infer_dtype(df['a'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-20fdf746850b> in <module>()
----> 1 lib.infer_dtype(df['a'])

/Users/artemy/dev/pandas/pandas/lib.so in pandas.lib.infer_dtype (pandas/lib.c:41462)()

TypeError: Cannot convert Categorical to numpy.ndarray

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

that's not really valid input, it has to be an ndarray (though we have relaxed it a bit). But I will add this as it should be able to infer (its easy to do), it just checks the dtype (and doesn't do anything more)

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

@artemyk ok, fixed up by #8975

@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

@jreback Great, thanks! Simplified this PR a bit (no longer doing special check for categorical type)

@@ -884,37 +884,49 @@ def _harmonize_columns(self, parse_dates=None):
except KeyError:
pass # this column not in results

def _get_notnull_col_dtype(self, col):
col_for_inference = col
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring to explain why this is needed?

@jorisvandenbossche
Copy link
Member

@artemyk put some comments, but in general it looks very good!
ok for skipping the tests on mysql legacy

@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

@jorisvandenbossche Thanks, made the fixes.

Regarding lib.infer_dtype(com._ensure_object(col)) --- this is from #6932 , not sure why _ensure_object was used , I thought that code path would occur when col is already dtype object (@jreback may know) . In any case, the new dtype inference code takes care of inferring date and time.

And yes, I removed the previous testing for dtype, lines like elif issubclass(pytype, np.integer): &c., and now rely on lib.infer_dtype to do this. Not sure what you mean by should we also do that for sqlalchemy + test it.

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 3, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.15.2 milestone Dec 3, 2014
@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

@jorisvandenbossche Oh, that makes sense! Added check / test for complex numbers .

@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 3, 2014
@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

@jorisvandenbosscher ready to go?

@jorisvandenbossche
Copy link
Member

yep, looking good! @artemyk can you add a release note and squash?

@artemyk artemyk force-pushed the notnulldtype_sql branch 2 times, most recently from ee4f172 to 944b28b Compare December 7, 2014 19:27
@artemyk
Copy link
Contributor Author

artemyk commented Dec 7, 2014

@jorisvandenbossche @jreback Done

@jreback
Copy link
Contributor

jreback commented Dec 7, 2014

looks ok to me, any docs need to be updated for this? (maybe a note somewhere explaining what this is doing), e.g. maybe add a 'dtypes' sub-section for SQL (as getting more traction in this area).

Could be done in separate issue.

@jorisvandenbossche
Copy link
Member

@artemyk hmm, can you rebase again?

Docs can go in a seperate PR I think (more general than this)

@jorisvandenbossche
Copy link
Member

@jreback did a PR for your doc suggestion

Minor cleanup

Minor rename

Simplifying

Code review fixes

Complex numbers

Release note

Release note
@artemyk
Copy link
Contributor Author

artemyk commented Dec 8, 2014

@jorisvandenbossche Rebased on master, let me if I still need to fix something. And thanks for doing the docs!

jorisvandenbossche added a commit that referenced this pull request Dec 8, 2014
ENH: Infer dtype from non-nulls when pushing to SQL
@jorisvandenbossche jorisvandenbossche merged commit 67ec0a8 into pandas-dev:master Dec 8, 2014
@jorisvandenbossche
Copy link
Member

@artemyk Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

problem with to_sql with NA
3 participants