Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.from_records() should optionally convert None to NaN #893

Closed
gerigk opened this issue Mar 9, 2012 · 4 comments
Closed

DataFrame.from_records() should optionally convert None to NaN #893

gerigk opened this issue Mar 9, 2012 · 4 comments
Milestone

Comments

@gerigk
Copy link

gerigk commented Mar 9, 2012

I still run frequently into problems with "NaN" and "None".
For example:
I retrieve Data from an SQL DB (postgres in my case) and then I have a list of tuples with "None" (in SQL NULL) values.
I then convert to a DF by using DataFrame.from_records()
Now, to avoid object Series it would be nice to have None automatically mapped to "NaN" in the same way as "from_csv" does.
Right now I would have to do something like DF.fillna(np.nan) and then convert the numeric columns again to their original type which is kind of ugly.

        time0 = datetime.now()
        result_dataframe = DataFrame.from_records(results, columns=columns)
        time1 = datetime.now()
        for col in result_dataframe:
            if result_dataframe.dtypes[col] == object:
                result_dataframe[col].fillna(np.nan, inplace=True)
        result_dataframe = result_dataframe.from_records(result_dataframe)
        print datetime.now() - time1, time1-time0

0:00:07.477906 0:00:01.971754

The fillna takes much longer than the construction of the DF itself. Example with 750k rows.

wesm added a commit that referenced this issue Mar 13, 2012
@wesm
Copy link
Member

wesm commented Mar 13, 2012

Hi, could you give a self-contained example showing how the original tuples and how they end up not being converted to numeric dtype by from_records? I just added a test case (see linked commit) that shows that everything checks out in a simple case. from_records is intended to convert a "SQL column" containing only None and other numeric values into a float64 column. Let me know so I can get this sorted out and close the issue.

@gerigk
Copy link
Author

gerigk commented Mar 14, 2012

So this is apparently my fault but still for my specific case it would be nice to have this option:

psycopg2 returns type decimal.Decimal for columns of type "numeric".

from decimal import Decimal
x = [(Decimal(10),5),(None,None)]
y = DataFrame.from_records(x)

y
0 1
0 10 5
1 None NaN

But with the current setup there is no bijective mapping from NULL (SQL) to NaN (pandas) if I have object and numeric columns. If it was possible to say "convert None to NaN for object columns" which would then use some speedy conversion when constructing the DataFrame it would be helpful in my opinion.

@wesm wesm closed this as completed in 39efc7b Mar 15, 2012
@wesm
Copy link
Member

wesm commented Mar 15, 2012

hi arthur, I added an option coerce_float (in the above commit) that converts Decimal -> float and fills None with NaN. Converting Decimal to float is still really slow. Will be part of 0.7.2 to be released soon

@gerigk
Copy link
Author

gerigk commented Mar 15, 2012

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants