ENH: Add dtype argument to read_sql_query (GH10285) #37546

avinashpancham · 2020-10-31T21:44:46Z

closes Add dtype keyword to read_sql_query to control per column dtypes. #10285
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

avinashpancham · 2020-11-13T14:42:39Z

@jorisvandenbossche tagging you since you were involved in the original issue. Could you have a look at this imp and lmk whether this is what you had in mind?

avinashpancham · 2020-11-22T16:01:52Z

@jorisvandenbossche could you (or maybe someone else) have a look at this PR? Thanks

avinashpancham · 2020-11-28T15:34:47Z

@jreback could you (or maybe someone else) have a look at this PR? Thanks

jreback

pls always add tests. w/o them its impossible to tell if this does anything

avinashpancham · 2020-12-01T22:14:16Z

Updated PR. Only Linux py38_np_dev fails, but due to other reasons.

avinashpancham · 2020-12-03T11:58:25Z

Merged master, everything succeeds.

I also saw a similar enhancement request for dtypes for read_sql. Once we agree on this imp we can easily extend it to that function as well

avinashpancham · 2020-12-06T16:38:43Z

@jreback could you have another look? Will fix the merge conflict in the docs later today

jreback · 2020-12-14T23:51:41Z

pandas/io/sql.py

@@ -295,6 +301,7 @@ def read_sql_query(
    params=None,
    parse_dates=None,
    chunksize: None = None,
+    dtype: Optional[Union[Dtype, Dict[str, Dtype]]] = None,


this type actually is pretty usefule, can you define in _typing, call it DtypeOrDictDtype / DtypeTable and add a comment about it. cc @simonjayhawkins @WillAyd @jorisvandenbossche for the name here.

Suggested change

dtype: Optional[Union[Dtype, Dict[str, Dtype]]] = None,

dtype: Optional[Union[Dtype, Dict[Label, Dtype]]] = None,

Maybe DtypeArg?

Done, added DtypeArg to _typing

jreback · 2020-12-14T23:52:18Z

pandas/io/sql.py

+        index_col=None,
+        coerce_float=True,
+        parse_dates=None,
+        dtype=None,


can you type anywhere you are adding type

jreback · 2020-12-14T23:52:44Z

doc/source/whatsnew/v1.2.0.rst

@@ -307,6 +307,7 @@ Other enhancements
 - Improve numerical stability for :meth:`.Rolling.skew`, :meth:`.Rolling.kurt`, :meth:`Expanding.skew` and :meth:`Expanding.kurt` through implementation of Kahan summation (:issue:`6929`)
 - Improved error reporting for subsetting columns of a :class:`.DataFrameGroupBy` with ``axis=1`` (:issue:`37725`)
 - Implement method ``cross`` for :meth:`DataFrame.merge` and :meth:`DataFrame.join` (:issue:`5401`)
+- :func:`pandas.read_sql_query` now accepts a ``dtype`` argument to cast the columnar data from the SQL database based on user input (:issue:`10285`)


move to 1.3

avinashpancham · 2020-12-15T20:00:02Z

pandas/_typing.py

+]
+DtypeArg = Optional[Union[Dtype, Dict[Label, Dtype]]]
+DtypeObj = Union[np.dtype, "ExtensionDtype"]
+
 # For functions like rename that convert one label to another


Moved the dtype block since Label was defined later in the file

avinashpancham · 2020-12-20T16:09:16Z

In #13049 it is proposed to add the dtype argument to read_sql. This can be done in similar manner as in this PR.

jreback · 2020-12-23T18:47:50Z

pandas/_typing.py

+Dtype = Union[
+    "ExtensionDtype", str, np.dtype, Type[Union[str, float, int, complex, bool, object]]
+]
+DtypeArg = Optional[Union[Dtype, Dict[Label, Dtype]]]


don't use Optional in the spec itself

jreback · 2020-12-23T18:48:19Z

pandas/io/sql.py

@@ -132,10 +133,14 @@ def _wrap_result(
    index_col=None,
    coerce_float: bool = True,
    parse_dates=None,
+    dtype: DtypeArg = None,


Optional[DtypeArg] for all of these

jreback · 2020-12-23T18:48:46Z

pandas/_typing.py

@@ -100,6 +93,14 @@
 JSONSerializable = Optional[Union[PythonScalar, List, Dict]]
 Axes = Collection

+# dtypes
+
+Dtype = Union[


can you add a comment on what DtypeArg is / supposed to be used

jreback · 2020-12-23T18:48:57Z

pandas/io/sql.py

@@ -1361,6 +1376,9 @@ def read_query(
        chunksize : int, default None
            If specified, return an iterator where `chunksize` is the number
            of rows to include in each chunk.
+        dtype : Type name or dict of columns
+            Data type for data or columns. E.g. np.float64 or
+            {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}


versionadded 1.3

jreback · 2020-12-23T18:49:26Z

pandas/tests/io/test_sql.py

+        result = sql.read_sql_query(
+            "SELECT SepalLength, SepalWidth FROM iris", self.conn, dtype=dtype
+        )
+        assert result.dtypes.to_dict() == {


can you constructed an expected frame and use tm.assert_frame_equal

jreback · 2020-12-23T21:10:34Z

lgtm ping on greenish

jreback · 2020-12-23T21:10:51Z

as a followup :-> can type any dtype= args in the codebase

avinashpancham · 2020-12-23T23:26:35Z

@jreback CI is greenish,Travis failed due to other reasons

Will work on dtype follow up tomorrow.

jreback · 2020-12-23T23:27:54Z

pandas/io/sql.py

@@ -371,6 +379,9 @@ def read_sql_query(
    chunksize : int, default None
        If specified, return an iterator where `chunksize` is the number of
        rows to include in each chunk.
+    dtype : Type name or dict of columns
+        Data type for data or columns. E.g. np.float64 or
+        {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}


need a versionadded 1.3 here. ok to add in next PR

Sorry, I see i didn't commit that change. But will indeed add it to the follow on

jreback · 2020-12-23T23:28:32Z

thanks @avinashpancham

small comment for followon

avinashpancham changed the title ~~ENH: Add dtype argument to read_sql_query~~ ENH: Add dtype argument to read_sql_query (GH10285) Oct 31, 2020

jreback requested changes Nov 28, 2020

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO SQL to_sql, read_sql, read_sql_query labels Nov 28, 2020

avinashpancham added 4 commits December 1, 2020 22:25

ENH: Add dtype argument to read_sql_query

bcbe5ea

Update sql unit tests

9c4f034

Update type hinting and update doc

620c0ab

Add test

bcef60e

avinashpancham force-pushed the GH10285 branch from 5258da9 to bcef60e Compare December 1, 2020 21:28

Merge remote-tracking branch 'upstream/master' into GH10285

5c88e5c

avinashpancham added 2 commits December 10, 2020 21:39

Merge remote-tracking branch 'upstream/master' into GH10285

24308c4

Merge remote-tracking branch 'upstream/master' into GH10285

d6cc4b7

jreback requested changes Dec 14, 2020

View reviewed changes

avinashpancham added 2 commits December 15, 2020 20:56

Address comments

5de64f2

Merge remote-tracking branch 'upstream/master' into GH10285

e9be344

avinashpancham commented Dec 15, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into GH10285

d7d4439

Merge remote-tracking branch 'upstream/master' into GH10285

dbf1f5f

jreback requested changes Dec 23, 2020

View reviewed changes

Address comments

a4e7cdf

jreback added this to the 1.3 milestone Dec 23, 2020

jreback approved these changes Dec 23, 2020

View reviewed changes

jreback reviewed Dec 23, 2020

View reviewed changes

jreback merged commit 5fecf47 into pandas-dev:master Dec 23, 2020

This was referenced Dec 24, 2020

CLN: Add typing for dtype argument in io/sql.py #38680

Merged

CLN: Add typing for dtype argument in codebase #38808

Closed

Improve type handling in read_sql and read_sql_table #13049

Open

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

ENH: Add dtype argument to read_sql_query (GH10285) (pandas-dev#37546)

c995254

zaneselvans mentioned this pull request Jan 13, 2022

Research persisting to DB with Pandas/Dask/Prefect catalyst-cooperative/pudl#1399

Closed

ParfaitG mentioned this pull request Jan 23, 2022

ENH: Add dtypes/converters arguments for pandas.read_xml #45411

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add dtype argument to read_sql_query (GH10285) #37546

ENH: Add dtype argument to read_sql_query (GH10285) #37546

avinashpancham commented Oct 31, 2020 •

edited

Loading

avinashpancham commented Nov 13, 2020 •

edited

Loading

avinashpancham commented Nov 22, 2020

avinashpancham commented Nov 28, 2020

jreback left a comment

avinashpancham commented Dec 1, 2020

avinashpancham commented Dec 3, 2020

avinashpancham commented Dec 6, 2020

jreback Dec 14, 2020

WillAyd Dec 15, 2020

avinashpancham Dec 16, 2020

jreback Dec 14, 2020

avinashpancham Dec 16, 2020

jreback Dec 14, 2020

avinashpancham Dec 16, 2020

avinashpancham Dec 15, 2020 •

edited

Loading

avinashpancham commented Dec 20, 2020

jreback Dec 23, 2020

jreback Dec 23, 2020

jreback Dec 23, 2020

jreback Dec 23, 2020

jreback Dec 23, 2020

jreback commented Dec 23, 2020

jreback commented Dec 23, 2020

avinashpancham commented Dec 23, 2020

jreback Dec 23, 2020

avinashpancham Dec 23, 2020

jreback commented Dec 23, 2020

	dtype: Optional[Union[Dtype, Dict[str, Dtype]]] = None,
	dtype: Optional[Union[Dtype, Dict[Label, Dtype]]] = None,

ENH: Add dtype argument to read_sql_query (GH10285) #37546

ENH: Add dtype argument to read_sql_query (GH10285) #37546

Conversation

avinashpancham commented Oct 31, 2020 • edited Loading

avinashpancham commented Nov 13, 2020 • edited Loading

avinashpancham commented Nov 22, 2020

avinashpancham commented Nov 28, 2020

jreback left a comment

Choose a reason for hiding this comment

avinashpancham commented Dec 1, 2020

avinashpancham commented Dec 3, 2020

avinashpancham commented Dec 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avinashpancham Dec 15, 2020 • edited Loading

Choose a reason for hiding this comment

avinashpancham commented Dec 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 23, 2020

jreback commented Dec 23, 2020

avinashpancham commented Dec 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 23, 2020

avinashpancham commented Oct 31, 2020 •

edited

Loading

avinashpancham commented Nov 13, 2020 •

edited

Loading

avinashpancham Dec 15, 2020 •

edited

Loading