pd.read_sql_query() does not convert NULLs to NaN #14319

Gerenuk · 2016-09-29T06:37:14Z

A small, complete example of the issue

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite://')
conn = engine.connect()
conn.execute("create table test (a float)")
for _ in range(5):
    conn.execute("insert into test values (NULL)")

df = pd.read_sql_query("select * from test", engine, coerce_float=True)
print(df.a)

Expected Output

In pandas 0.18.1 this will result in a column of type object with None values, whereas I needed float("nan"). The coerce_float=True option made no difference. This is most needed, when reading in a float column chunk-wise, since there may be sequences of NULLs.

(also http://stackoverflow.com/questions/30652457/adjust-pandas-read-sql-query-null-value-treatment/)

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2016-09-29T12:17:24Z

Seems to be the same as #14314 (though your example is better, thanks; I'll copy it over there).

The coerce_float option is for things like Decimal to float conversion, not null to NaN.

Does your actual use-case use select * from test? if so you can do pd.read_sql_table('test', engine, chunksize=chunksize), which will get the type information from the database. read_sql_query

TomAugspurger · 2016-09-29T12:23:27Z

Let's actually reopen this, is it worth adding a coerce_null parameter to read_sql_query to handle cases like this?

Gerenuk · 2016-09-29T12:35:52Z

My actual example was slightly more complex. I used an SQL query and had a way to do chunk-wise reading, which made this issue more likely.

jorisvandenbossche · 2016-09-29T21:09:57Z

I am not sure it is worth adding such a parameter. The issue is also that it should not necessarily result in a float column, as in SQL the NULLs can be in any type of column.

To solve the dtype issues with multiple chunks, we could also think about a dtype kwarg where you could specify types for certain columns explicitly (similar as the dtype argument in eg read_csv). Although this of course is manual work specifying the exact column.

The problem with queries is that we do not have any information about the database table layout, we just get back result sets. But maybe we could also have a new kwarg to specify the table name where to get the type information. Or a helper function to get this information and to automatically construct a dict that could be passed to a possible dtype argument. Just brainstorming here.

TomAugspurger · 2016-09-29T21:22:22Z

Agreed about the coerce_null parameter not making sense.

A helper method for getting dtypes from the table might be useful. I've written similar methods in the past that get the column info from a sqlalchemy MetaData object and transform them to a dictionary of {column_name: pandas type}. That could be passed to a dtype keyword for read_sql_query.

threemonks · 2018-08-12T17:14:58Z

Is there any work around for this while the issue is being addressed? I tried the following, but neither works:

df.replace(None, np.nan, inplace=True) df.fillna(value=np.nan, inplace=True)

TomAugspurger · 2018-08-12T19:42:51Z

Does to_numeric work? (with errors='coerce')

puttkraidej · 2018-08-29T03:30:21Z

same issue here, pandas forced decimal to float dtype when using read_sql_query() :(

annette987 · 2018-11-16T01:11:23Z

Was this ever resolved? I am having the same issue. See also: https://stackoverflow.com/questions/53315035/pandas-read-sql-query-returning-none-for-all-values-in-some-columns

I am using pandas read_sql_query to read data from a MySQL database table into a pandas dataframe. Some columns in this table have all NULL values. For those columns the pandas dataframe contains None in every row. For all other columns the dataframe contains NaN where there was a NULL value. Can anyone explain why None is returned for the all NULL columns? And how do I make sure I have all NaNs, hopefully without doing manual conversions? I should add that two of the columns causing this problem are float and the third is of type double,

Here is an example. The columns pef and fer contain all NULLS in the database.

from sqlalchemy import create_engine
import pandas as pd
import math

querystr = "SELECT * FROM dbname.mytable"
engine = create_engine('mysql+pymysql://username:password@localhost/' + "dbname")
df = pd.read_sql_query(querystr, engine)
df.head()

    sys     dias    pef     fer
0   NaN     NaN     None    None
1   159.0   92.666  None    None
2   NaN     NaN     None    None
3   NaN     NaN     None    None
4   102.0   63.333  None    None

In the MySQL database these columns are defined as:

Columns: 
    sys float 
    dias float 
    pef float 
    fer float

I would expect the columns pef and fer to contain NaN in each row, not None.

TomAugspurger · 2018-11-16T15:11:16Z

Still open.

Is your actual query SELECT * FROM ...? If so, you could use pd.read_sql_table.

Pandas can't use the dtypes from the table for arbitrary sql expressions passed to read_sql_query, since the expression can change the column names / types. But pandas does use the types for read_sql_table.

annette987 · 2018-11-19T01:26:34Z

My actual query is more complicated than that and involves multiple tables. So I can't just use pd.read_sql_table.
What I am doing at the moment is just converting None to NaN in the dataframe:
df.replace([None], np.nan, inplace=True)

TomAugspurger · 2019-06-06T15:29:24Z

Closing in favor of #10285

TomAugspurger closed this as completed Sep 29, 2016

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Sep 29, 2016

TomAugspurger mentioned this issue Sep 29, 2016

read_sql_query type detection when missing data #14314

Closed

TomAugspurger reopened this Sep 29, 2016

TomAugspurger added API Design IO SQL to_sql, read_sql, read_sql_query labels Sep 29, 2016

jorisvandenbossche removed the Duplicate Report Duplicate issue or pull request label Sep 29, 2016

thiesben mentioned this issue Nov 30, 2018

write a method that retrieves the sql data in the way we want it FlxVctr/RADICES#48

Closed

TomAugspurger closed this as completed Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.read_sql_query() does not convert NULLs to NaN #14319

pd.read_sql_query() does not convert NULLs to NaN #14319

Gerenuk commented Sep 29, 2016

TomAugspurger commented Sep 29, 2016

TomAugspurger commented Sep 29, 2016

Gerenuk commented Sep 29, 2016

jorisvandenbossche commented Sep 29, 2016

TomAugspurger commented Sep 29, 2016

threemonks commented Aug 12, 2018

TomAugspurger commented Aug 12, 2018 •

edited

puttkraidej commented Aug 29, 2018

annette987 commented Nov 16, 2018

TomAugspurger commented Nov 16, 2018

annette987 commented Nov 19, 2018

TomAugspurger commented Jun 6, 2019

pd.read_sql_query() does not convert NULLs to NaN #14319

pd.read_sql_query() does not convert NULLs to NaN #14319

Comments

Gerenuk commented Sep 29, 2016

A small, complete example of the issue

Expected Output

TomAugspurger commented Sep 29, 2016

TomAugspurger commented Sep 29, 2016

Gerenuk commented Sep 29, 2016

jorisvandenbossche commented Sep 29, 2016

TomAugspurger commented Sep 29, 2016

threemonks commented Aug 12, 2018

TomAugspurger commented Aug 12, 2018 • edited

puttkraidej commented Aug 29, 2018

annette987 commented Nov 16, 2018

TomAugspurger commented Nov 16, 2018

annette987 commented Nov 19, 2018

TomAugspurger commented Jun 6, 2019

TomAugspurger commented Aug 12, 2018 •

edited