Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_to_sql alters source data and fails to load NaN #13

Closed
chris-swenson opened this issue Dec 27, 2020 · 8 comments
Closed

fast_to_sql alters source data and fails to load NaN #13

chris-swenson opened this issue Dec 27, 2020 · 8 comments

Comments

@chris-swenson
Copy link

chris-swenson commented Dec 27, 2020

I've been testing out various ways to upload data from Python to SQL Server, and I tested fast_to_sql as well. I noticed two issues: First, fast_to_sql has issues with missing values when loading to SQL Server. Second, it alters the source data, converting it to be used in an INSERT statement (or something similar) for SQL Server. This results in damaging the input data, which should not occur. It would be a better practice to make a copy and alter the copy instead. If any further work is needed on the data that is uploaded, then the data has to be re-loaded from the source since fast_to_sql alters it.

from datetime import datetime
import pandas as pd
import numpy as np
import pyodbc
from fast_to_sql import fast_to_sql as fts

# Test Dataframe for insertion
df = pd.DataFrame({
    "Col1": [1, 2, np.NaN],
    "Col2": ["A", "B", "C"],
    "Col3": [True, False, True],
    "Col4": [datetime(2020,1,1),datetime(2020,1,2),datetime(2020,1,3)]
})

# Create a pyodbc connection
conn = pyodbc.connect(
    """
    Driver={ODBC Driver 17 for SQL Server};
    Server=[SERVER};
    Database=[DATABASE];
    Trusted_Connection=yes;
    """
)

# If a table is created, the generated sql is returned
create_statement = fts.fast_to_sql(df, "[TABLE_NAME]", conn, if_exists="replace", temp=False)

# Commit upload actions and close connection
conn.commit()
conn.close()
@jdglaser
Copy link
Owner

jdglaser commented Jan 7, 2021

Can you elaborate on what values you notice are missing when loading data to SQL server?

Good point about the function altering the original df. I'll make a change to have it copy the df first before applying any transformations to it.

@chris-swenson
Copy link
Author

The issue regarding missing values is not about what gets loaded to SQL Server. The code entirely fails with the np.NaN, loading nothing. The value np.NaN should get converted to a NULL.

Specifically I get the error "Unknown object type: NAType" when the executemany runs.

@chris-swenson
Copy link
Author

Regarding making a copy, perhaps a new argument, defaulted to make a copy (e.g., copy=True), would be the most flexible. It would conserve memory to run it in the manner you have written right now, but may not always be ideal or expected. An option to switch between the two would be useful.

@jdglaser
Copy link
Owner

I've added the ability to copy the pandas dataframe. That will be included in the next release.

However, I was unable to replicate the error you are seeing with the np.NaN values. When I load np.NaN values that are in a dataframe I get nulls properly loaded to the db.

Can you please provide a code snippet or example so I can try to reproduce the error?

@chris-swenson
Copy link
Author

chris-swenson commented Feb 19, 2021 via email

@Doug-GithubTech
Copy link

For the new pandas version, we should consider 2 new data types: "Int64" and "string" to deal with the missing values

@jdglaser
Copy link
Owner

For the new pandas version, we should consider 2 new data types: "Int64" and "string" to deal with the missing values

Int64 should already be supported, although please let me know if you find that it isn't. string is still an experimental type according to the Pandas documentation, so I'm going to hold off on adding support for that for now. I added issue #16 in case someone wants to do that.

@jdglaser
Copy link
Owner

The copy command was added in Release 2.1.14. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants