Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add dtype argument to read_sql_query (GH10285) #37546

Merged
merged 12 commits into from
Dec 23, 2020

Conversation

avinashpancham
Copy link
Contributor

@avinashpancham avinashpancham commented Oct 31, 2020

@avinashpancham avinashpancham changed the title ENH: Add dtype argument to read_sql_query ENH: Add dtype argument to read_sql_query (GH10285) Oct 31, 2020
@avinashpancham
Copy link
Contributor Author

avinashpancham commented Nov 13, 2020

@jorisvandenbossche tagging you since you were involved in the original issue. Could you have a look at this imp and lmk whether this is what you had in mind?

@avinashpancham
Copy link
Contributor Author

@jorisvandenbossche could you (or maybe someone else) have a look at this PR? Thanks

@avinashpancham
Copy link
Contributor Author

@jreback could you (or maybe someone else) have a look at this PR? Thanks

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls always add tests. w/o them its impossible to tell if this does anything

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO SQL to_sql, read_sql, read_sql_query labels Nov 28, 2020
@avinashpancham
Copy link
Contributor Author

Updated PR. Only Linux py38_np_dev fails, but due to other reasons.

@avinashpancham
Copy link
Contributor Author

Merged master, everything succeeds.

I also saw a similar enhancement request for dtypes for read_sql. Once we agree on this imp we can easily extend it to that function as well

@avinashpancham
Copy link
Contributor Author

@jreback could you have another look? Will fix the merge conflict in the docs later today

pandas/io/sql.py Outdated
@@ -295,6 +301,7 @@ def read_sql_query(
params=None,
parse_dates=None,
chunksize: None = None,
dtype: Optional[Union[Dtype, Dict[str, Dtype]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type actually is pretty usefule, can you define in _typing, call it DtypeOrDictDtype / DtypeTable and add a comment about it. cc @simonjayhawkins @WillAyd @jorisvandenbossche for the name here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dtype: Optional[Union[Dtype, Dict[str, Dtype]]] = None,
dtype: Optional[Union[Dtype, Dict[Label, Dtype]]] = None,

Maybe DtypeArg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added DtypeArg to _typing

pandas/io/sql.py Outdated
index_col=None,
coerce_float=True,
parse_dates=None,
dtype=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type anywhere you are adding type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -307,6 +307,7 @@ Other enhancements
- Improve numerical stability for :meth:`.Rolling.skew`, :meth:`.Rolling.kurt`, :meth:`Expanding.skew` and :meth:`Expanding.kurt` through implementation of Kahan summation (:issue:`6929`)
- Improved error reporting for subsetting columns of a :class:`.DataFrameGroupBy` with ``axis=1`` (:issue:`37725`)
- Implement method ``cross`` for :meth:`DataFrame.merge` and :meth:`DataFrame.join` (:issue:`5401`)
- :func:`pandas.read_sql_query` now accepts a ``dtype`` argument to cast the columnar data from the SQL database based on user input (:issue:`10285`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
DtypeArg = Optional[Union[Dtype, Dict[Label, Dtype]]]
DtypeObj = Union[np.dtype, "ExtensionDtype"]

# For functions like rename that convert one label to another
Copy link
Contributor Author

@avinashpancham avinashpancham Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the dtype block since Label was defined later in the file

@avinashpancham
Copy link
Contributor Author

In #13049 it is proposed to add the dtype argument to read_sql. This can be done in similar manner as in this PR.

Dtype = Union[
"ExtensionDtype", str, np.dtype, Type[Union[str, float, int, complex, bool, object]]
]
DtypeArg = Optional[Union[Dtype, Dict[Label, Dtype]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use Optional in the spec itself

pandas/io/sql.py Outdated
@@ -132,10 +133,14 @@ def _wrap_result(
index_col=None,
coerce_float: bool = True,
parse_dates=None,
dtype: DtypeArg = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[DtypeArg] for all of these

@@ -100,6 +93,14 @@
JSONSerializable = Optional[Union[PythonScalar, List, Dict]]
Axes = Collection

# dtypes

Dtype = Union[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on what DtypeArg is / supposed to be used

@@ -1361,6 +1376,9 @@ def read_query(
chunksize : int, default None
If specified, return an iterator where `chunksize` is the number
of rows to include in each chunk.
dtype : Type name or dict of columns
Data type for data or columns. E.g. np.float64 or
{‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

versionadded 1.3

result = sql.read_sql_query(
"SELECT SepalLength, SepalWidth FROM iris", self.conn, dtype=dtype
)
assert result.dtypes.to_dict() == {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you constructed an expected frame and use tm.assert_frame_equal

@jreback jreback added this to the 1.3 milestone Dec 23, 2020
@jreback
Copy link
Contributor

jreback commented Dec 23, 2020

lgtm ping on greenish

@jreback
Copy link
Contributor

jreback commented Dec 23, 2020

as a followup :-> can type any dtype= args in the codebase

@avinashpancham
Copy link
Contributor Author

@jreback CI is greenish,Travis failed due to other reasons

Will work on dtype follow up tomorrow.

@@ -371,6 +379,9 @@ def read_sql_query(
chunksize : int, default None
If specified, return an iterator where `chunksize` is the number of
rows to include in each chunk.
dtype : Type name or dict of columns
Data type for data or columns. E.g. np.float64 or
{‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a versionadded 1.3 here. ok to add in next PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I see i didn't commit that change. But will indeed add it to the follow on

@jreback jreback merged commit 5fecf47 into pandas-dev:master Dec 23, 2020
@jreback
Copy link
Contributor

jreback commented Dec 23, 2020

thanks @avinashpancham

small comment for followon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dtype keyword to read_sql_query to control per column dtypes.
3 participants