fast_to_sql alters source data and fails to load NaN #13

chris-swenson · 2020-12-27T23:04:00Z

I've been testing out various ways to upload data from Python to SQL Server, and I tested fast_to_sql as well. I noticed two issues: First, fast_to_sql has issues with missing values when loading to SQL Server. Second, it alters the source data, converting it to be used in an INSERT statement (or something similar) for SQL Server. This results in damaging the input data, which should not occur. It would be a better practice to make a copy and alter the copy instead. If any further work is needed on the data that is uploaded, then the data has to be re-loaded from the source since fast_to_sql alters it.

from datetime import datetime
import pandas as pd
import numpy as np
import pyodbc
from fast_to_sql import fast_to_sql as fts

# Test Dataframe for insertion
df = pd.DataFrame({
    "Col1": [1, 2, np.NaN],
    "Col2": ["A", "B", "C"],
    "Col3": [True, False, True],
    "Col4": [datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3)]
})

# Create a pyodbc connection
conn = pyodbc.connect(
    """
    Driver={ODBC Driver 17 for SQL Server};
    Server=[SERVER};
    Database=[DATABASE];
    Trusted_Connection=yes;
    """
)

# If a table is created, the generated sql is returned
create_statement = fts.fast_to_sql(df, "[TABLE_NAME]", conn, if_exists="replace", temp=False)

# Commit upload actions and close connection
conn.commit()
conn.close()

The text was updated successfully, but these errors were encountered:

jdglaser · 2021-01-07T15:47:58Z

Can you elaborate on what values you notice are missing when loading data to SQL server?

Good point about the function altering the original df. I'll make a change to have it copy the df first before applying any transformations to it.

chris-swenson · 2021-01-07T19:19:58Z

The issue regarding missing values is not about what gets loaded to SQL Server. The code entirely fails with the np.NaN, loading nothing. The value np.NaN should get converted to a NULL.

Specifically I get the error "Unknown object type: NAType" when the executemany runs.

chris-swenson · 2021-01-07T19:22:22Z

Regarding making a copy, perhaps a new argument, defaulted to make a copy (e.g., copy=True), would be the most flexible. It would conserve memory to run it in the manner you have written right now, but may not always be ideal or expected. An option to switch between the two would be useful.

jdglaser · 2021-02-12T14:49:01Z

I've added the ability to copy the pandas dataframe. That will be included in the next release.

However, I was unable to replicate the error you are seeing with the np.NaN values. When I load np.NaN values that are in a dataframe I get nulls properly loaded to the db.

Can you please provide a code snippet or example so I can try to reproduce the error?

chris-swenson · 2021-02-19T20:43:38Z

Apologies, I left the role where I was using this, so I cannot retrieve the code. – Chris

…

On Fri, Feb 12, 2021 at 8:49 AM Jarred Glaser ***@***.***> wrote: I've added the ability to copy the pandas dataframe. That will be included in the next release. However, I was unable to replicate the error you are seeing with the np.NaN values. When I load np.NaN values that are in a dataframe I get nulls properly loaded to the db. Can you please provide a code snippet or example so I can try to reproduce the error? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADTEEQ6ZQBLF5XWQ3E3JLUTS6U5XFANCNFSM4VLJCY6Q> .

Doug-GithubTech · 2021-03-15T02:49:56Z

For the new pandas version, we should consider 2 new data types: "Int64" and "string" to deal with the missing values

jdglaser · 2021-04-30T15:15:24Z

For the new pandas version, we should consider 2 new data types: "Int64" and "string" to deal with the missing values

Int64 should already be supported, although please let me know if you find that it isn't. string is still an experimental type according to the Pandas documentation, so I'm going to hold off on adding support for that for now. I added issue #16 in case someone wants to do that.

jdglaser · 2021-04-30T15:16:44Z

The copy command was added in Release 2.1.14. Closing.

jdglaser closed this as completed Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast_to_sql alters source data and fails to load NaN #13

fast_to_sql alters source data and fails to load NaN #13

chris-swenson commented Dec 27, 2020 •

edited

Loading

jdglaser commented Jan 7, 2021

chris-swenson commented Jan 7, 2021

chris-swenson commented Jan 7, 2021

jdglaser commented Feb 12, 2021

chris-swenson commented Feb 19, 2021 via email

Doug-GithubTech commented Mar 15, 2021

jdglaser commented Apr 30, 2021

jdglaser commented Apr 30, 2021

fast_to_sql alters source data and fails to load NaN #13

fast_to_sql alters source data and fails to load NaN #13

Comments

chris-swenson commented Dec 27, 2020 • edited Loading

jdglaser commented Jan 7, 2021

chris-swenson commented Jan 7, 2021

chris-swenson commented Jan 7, 2021

jdglaser commented Feb 12, 2021

chris-swenson commented Feb 19, 2021 via email

Doug-GithubTech commented Mar 15, 2021

jdglaser commented Apr 30, 2021

jdglaser commented Apr 30, 2021

chris-swenson commented Dec 27, 2020 •

edited

Loading