You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I still run frequently into problems with "NaN" and "None".
For example:
I retrieve Data from an SQL DB (postgres in my case) and then I have a list of tuples with "None" (in SQL NULL) values.
I then convert to a DF by using DataFrame.from_records()
Now, to avoid object Series it would be nice to have None automatically mapped to "NaN" in the same way as "from_csv" does.
Right now I would have to do something like DF.fillna(np.nan) and then convert the numeric columns again to their original type which is kind of ugly.
time0 = datetime.now()
result_dataframe = DataFrame.from_records(results, columns=columns)
time1 = datetime.now()
for col in result_dataframe:
if result_dataframe.dtypes[col] == object:
result_dataframe[col].fillna(np.nan, inplace=True)
result_dataframe = result_dataframe.from_records(result_dataframe)
print datetime.now() - time1, time1-time0
0:00:07.477906 0:00:01.971754
The fillna takes much longer than the construction of the DF itself. Example with 750k rows.
The text was updated successfully, but these errors were encountered:
Hi, could you give a self-contained example showing how the original tuples and how they end up not being converted to numeric dtype by from_records? I just added a test case (see linked commit) that shows that everything checks out in a simple case. from_records is intended to convert a "SQL column" containing only None and other numeric values into a float64 column. Let me know so I can get this sorted out and close the issue.
So this is apparently my fault but still for my specific case it would be nice to have this option:
psycopg2 returns type decimal.Decimal for columns of type "numeric".
from decimal import Decimal
x = [(Decimal(10),5),(None,None)]
y = DataFrame.from_records(x)
y
0 1
0 10 5
1 None NaN
But with the current setup there is no bijective mapping from NULL (SQL) to NaN (pandas) if I have object and numeric columns. If it was possible to say "convert None to NaN for object columns" which would then use some speedy conversion when constructing the DataFrame it would be helpful in my opinion.
hi arthur, I added an option coerce_float (in the above commit) that converts Decimal -> float and fills None with NaN. Converting Decimal to float is still really slow. Will be part of 0.7.2 to be released soon
I still run frequently into problems with "NaN" and "None".
For example:
I retrieve Data from an SQL DB (postgres in my case) and then I have a list of tuples with "None" (in SQL NULL) values.
I then convert to a DF by using DataFrame.from_records()
Now, to avoid object Series it would be nice to have None automatically mapped to "NaN" in the same way as "from_csv" does.
Right now I would have to do something like DF.fillna(np.nan) and then convert the numeric columns again to their original type which is kind of ugly.
The fillna takes much longer than the construction of the DF itself. Example with 750k rows.
The text was updated successfully, but these errors were encountered: