DataFrame.from_records() should optionally convert None to NaN #893

gerigk · 2012-03-09T18:19:30Z

I still run frequently into problems with "NaN" and "None".
For example:
I retrieve Data from an SQL DB (postgres in my case) and then I have a list of tuples with "None" (in SQL NULL) values.
I then convert to a DF by using DataFrame.from_records()
Now, to avoid object Series it would be nice to have None automatically mapped to "NaN" in the same way as "from_csv" does.
Right now I would have to do something like DF.fillna(np.nan) and then convert the numeric columns again to their original type which is kind of ugly.

        time0 = datetime.now()
        result_dataframe = DataFrame.from_records(results, columns=columns)
        time1 = datetime.now()
        for col in result_dataframe:
            if result_dataframe.dtypes[col] == object:
                result_dataframe[col].fillna(np.nan, inplace=True)
        result_dataframe = result_dataframe.from_records(result_dataframe)
        print datetime.now() - time1, time1-time0

0:00:07.477906 0:00:01.971754

The fillna takes much longer than the construction of the DF itself. Example with 750k rows.

The text was updated successfully, but these errors were encountered:

wesm · 2012-03-13T03:49:03Z

Hi, could you give a self-contained example showing how the original tuples and how they end up not being converted to numeric dtype by from_records? I just added a test case (see linked commit) that shows that everything checks out in a simple case. from_records is intended to convert a "SQL column" containing only None and other numeric values into a float64 column. Let me know so I can get this sorted out and close the issue.

gerigk · 2012-03-14T15:36:20Z

So this is apparently my fault but still for my specific case it would be nice to have this option:

psycopg2 returns type decimal.Decimal for columns of type "numeric".

from decimal import Decimal
x = [(Decimal(10),5),(None,None)]
y = DataFrame.from_records(x)

y
0 1
0 10 5
1 None NaN

But with the current setup there is no bijective mapping from NULL (SQL) to NaN (pandas) if I have object and numeric columns. If it was possible to say "convert None to NaN for object columns" which would then use some speedy conversion when constructing the DataFrame it would be helpful in my opinion.

wesm · 2012-03-15T16:50:16Z

hi arthur, I added an option coerce_float (in the above commit) that converts Decimal -> float and fills None with NaN. Converting Decimal to float is still really slow. Will be part of 0.7.2 to be released soon

gerigk · 2012-03-15T17:19:06Z

Thanks a lot!

wesm added a commit that referenced this issue Mar 13, 2012

TST: unit test for #893

8ec6236

wesm closed this as completed in 39efc7b Mar 15, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.from_records() should optionally convert None to NaN #893

DataFrame.from_records() should optionally convert None to NaN #893

gerigk commented Mar 9, 2012

wesm commented Mar 13, 2012

gerigk commented Mar 14, 2012

wesm commented Mar 15, 2012

gerigk commented Mar 15, 2012

DataFrame.from_records() should optionally convert None to NaN #893

DataFrame.from_records() should optionally convert None to NaN #893

Comments

gerigk commented Mar 9, 2012

wesm commented Mar 13, 2012

gerigk commented Mar 14, 2012

wesm commented Mar 15, 2012

gerigk commented Mar 15, 2012