Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Pandas.to_sql has the following issues #52488

Closed
1 task done
majormj opened this issue Apr 6, 2023 · 4 comments
Closed
1 task done

DOC: Pandas.to_sql has the following issues #52488

majormj opened this issue Apr 6, 2023 · 4 comments
Labels
IO SQL to_sql, read_sql, read_sql_query Needs Info Clarification about behavior needed to assess issue

Comments

@majormj
Copy link

majormj commented Apr 6, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

Pandas.to_sql

Documentation problem

1、Unable to successfully write data when MySQL table has primary key

2、Can you add write methods for "updata" and "Insert IGNORE" so that when writing to a MySQL data table with a primary key, it can successfully overwrite the updated data or incrementally write the data

3、When the fields in the data table do not exist in the MySQL table, can the corresponding fields be automatically added to the MySQL table and the data be written to the MySQL table

Suggested fix for documentation

1、Unable to successfully write data when MySQL table has primary key

2、Can you add write methods for "updata" and "Insert IGNORE" so that when writing to a MySQL data table with a primary key, it can successfully overwrite the updated data or incrementally write the data

3、When the fields in the data table do not exist in the MySQL table, can the corresponding fields be automatically added to the MySQL table and the data be written to the MySQL table

@majormj majormj added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2023
@phofl
Copy link
Member

phofl commented Apr 6, 2023

Please add reproducible examples

@phofl phofl added IO SQL to_sql, read_sql, read_sql_query Needs Info Clarification about behavior needed to assess issue and removed Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2023
@majormj
Copy link
Author

majormj commented Apr 7, 2023

`import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text

def mysqlConnetNew(db):
HOSTNAME = 'mysql.xxxx.cn'
PORT = '22484'
DATABASE = db
USERNAME = 'root'
PASSWORD = '**'
DB_URI = 'mysql+pymysql://{}:{}@{}:{}/{}'.format(USERNAME,PASSWORD,HOSTNAME,PORT,DATABASE)
try:
engine = create_engine(DB_URI,
echo=False ,
pool_size=100,
pool_recycle=3600,
pool_pre_ping=True,
)
return engine
except Exception as error:
print(repr(error))

def run_sql_query(engine,query_sql_str):
"""
run sql query
"""
with engine.begin() as cnn:
result = cnn.execute(query_sql_str)
return result

data1 = [
{'column1':'content_a_1','column2':'content_b_1'},
{'column1':'content_a_2','column2':'content_b_2'}
]
df1 = pd.DataFrame(data1)

data2 = [
{'column1':'content_a_1','column2':'content_b_1'},
{'column1':'content_a_3','column2':'content_b_3'}
]
df2 = pd.DataFrame(data2)

data3 = [
{'column1':'content_a_11','column2':'content_b_01'},
{'column1':'content_a_12','column2':'content_b_02'}
]
df3 = pd.DataFrame(data2)

Link MySQL

engine = mysqlConnetNew('TB_HPXX')

CREATE TABLE

strsql = "CREATE TABLE if not exists test_table (column1 VARCHAR(50) PRIMARY KEY,column2 VARCHAR(30) )"
run_sql_query(engine,text(strsql))

run test

df1.to_sql(name = 'test_table',con=engine,if_exists='append',index=False)
df2.to_sql(name = 'test_table',con=engine,if_exists='append',index=False)
"""
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'content_a_1' for key 'PRIMARY'")
[SQL: INSERT INTO test_table (column1, column2) VALUES (%(column1)s, %(column2)s)]
[parameters: [{'column1': 'content_a_1', 'column2': 'content_b_1'}, {'column1': 'content_a_3', 'column2': 'content_b_3'}]]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
PS Microsoft.PowerShell.Core\FileSystem::\GX\work$\Tools\vsCode\pip_packge\strmysql>
"""`

@gy-mate
Copy link

gy-mate commented Dec 31, 2023

If I understand it correctly this issue is rather an ENH: than DOC:. I guess @majormj wants a parameter to be added to DataFrame.to_sql() which would enable adding the IGNORE modifier to the INSERT SQL command the function uses—so e.g. duplicate-key errors could be ignored.

But this feature request was already dismissed in #15988 so this one may be closed as well.

@gy-mate
Copy link

gy-mate commented Feb 13, 2024

@phofl I think this issue might be closed due to the reasons above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants