Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: .to_sql(method=mysql_update_on_dupe_key) fails because of change from 1.3.1 to 1.4.2. #46891

Closed
3 tasks done
ericbmoreira opened this issue Apr 28, 2022 · 4 comments · Fixed by #47474
Closed
3 tasks done
Labels
Bug IO SQL to_sql, read_sql, read_sql_query Needs Triage Issue that has not been reviewed by a pandas team member
Milestone

Comments

@ericbmoreira
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import sqlalchemy
import datetime as dt

# custom modules
import muni_config

def get_df():
    df = pd.DataFrame(
        [{
            'fills_id': 105802017,
            'trade_date': dt.datetime(2022,4,28,9,41,14),
            'tsy_id': 'CT5',
            'avg_prc': 99.545089,
            'yr_string': 'five',
            'ytw': 2.848337
        }]
    )

    return df

def write_fills_tsy_spots(df: pd.DataFrame):
    write_data = df[['fills_id', 'tsy_id', 'avg_prc', 'ytw', 'yr_string']]

    with muni_config.new_get_db_conn(muni_config.muni_write) as conn:
        write_data.to_sql(
            'fills_tsy_spots',
            conn,
            if_exists='append',
            index=False,
            method=mysql_update_on_dupe_key
        )

    return df

def mysql_update_on_dupe_key(table, conn, keys, data_iter):
    data = [dict(zip(keys, val)) for val in data_iter]
    stmt = sqlalchemy.dialects.mysql.insert(table.table).values(data)
    update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
        stmt.inserted.values())))
    conn.execute(update_stmt)

    return mysql_update_on_dupe_key

if __name__ == '__main__':
    df = get_df()
    write_fills_tsy_spots(df)

Issue Description

df.to_sql fails when a method is passed for updating on duplicate keys. New code was added to check the total_inserted against the num_inserted between 1.3.1 and 1.4.2. I suppose I should have reproducible code that would show a sqllite table creation, etc.. Hopefully this is enough code to understand my bug report.

A TypeError is returned:

File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\core\generic.py", line 2951, in to_sql
    return sql.to_sql(
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\io\sql.py", line 697, in to_sql
    return pandas_sql.to_sql(
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\io\sql.py", line 1739, in to_sql
    total_inserted = sql_engine.insert_records(
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\io\sql.py", line 1322, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\io\sql.py", line 954, in insert
    total_inserted += num_inserted
TypeError: unsupported operand type(s) for +=: 'int' and 'function'

Because this code block at line 951 in sql.py.

                if num_inserted is None:
                    total_inserted = None
                else:
                    total_inserted += num_inserted

If I change it to the following, it works.:

                if num_inserted is None or callable(num_inserted):
                    total_inserted = None
                else:
                    total_inserted += num_inserted

Expected Behavior

Not raise a TypeError when a method is passed.

Change line 951 in sql.py.
if num_inserted is None or callable(num_inserted):

Installed Versions

Working pd.show_versions:

INSTALLED VERSIONS ------------------ commit : c7f7443 python : 3.9.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_United States.1252

pandas : 1.3.1
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.2
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.22
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Not working pd.show_versions():
Failed:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\util\_print_versions.py", line 109, in show_versions
    deps = _get_dependency_info()
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\util\_print_versions.py", line 88, in _get_dependency_info
    mod = import_optional_dependency(modname, errors="ignore")
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\pandas\compat\_optional.py", line 138, in import_optional_dependency
    module = importlib.import_module(name)
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\setuptools\__init__.py", line 8, in <module>
    import _distutils_hack.override  # noqa: F401
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\_distutils_hack\override.py", line 1, in <module>
    __import__('_distutils_hack').do_override()
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\_distutils_hack\__init__.py", line 72, in do_override
    ensure_local_distutils()
  File "C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\site-packages\_distutils_hack\__init__.py", line 59, in ensure_local_distutils
    assert '_distutils' in core.__file__, core.__file__
AssertionError: C:\Users\eric.moreira\Miniconda3\envs\mquant\lib\distutils\core.py
conda list
apscheduler               3.9.1            py39hcbf5309_0    conda-forge
blas                      1.0                         mkl    anaconda
blpapi                    3.17.1          py39_blpapicpp3.16.1.1_0    conda-forge
boa-lib                   1.0.26                   pypi_0    pypi
ca-certificates           2021.10.8            h5b45459_0    conda-forge
certifi                   2021.10.8        py39hcbf5309_2    conda-forge
charset-normalizer        2.0.12                   pypi_0    pypi
et_xmlfile                1.1.0            py39haa95532_0    anaconda
greenlet                  1.1.2            py39h415ef7b_2    conda-forge
icc_rt                    2019.0.0             h0cc432a_1    anaconda
idna                      3.3                      pypi_0    pypi
intel-openmp              2021.4.0          haa95532_3556    anaconda
mkl                       2021.4.0           haa95532_640    anaconda
mkl-service               2.4.0            py39h2bbff1b_0    anaconda
mkl_fft                   1.3.1            py39h277e83a_0    anaconda
mkl_random                1.2.2            py39hf11a4ad_0    anaconda
numpy                     1.21.5           py39h7a0a035_1    anaconda
numpy-base                1.21.5           py39hca35cd5_1    anaconda
openpyxl                  3.0.9              pyhd3eb1b0_0    anaconda
openssl                   1.1.1n               h8ffe710_0    conda-forge
pandas                    1.4.2            py39h2e25243_1    conda-forge
pip                       21.2.4           py39haa95532_0
pymysql                   1.0.2              pyhd8ed1ab_0    conda-forge
pyodbc                    4.0.32           py39h415ef7b_1    conda-forge
pyside6                   6.3.0                    pypi_0    pypi
pyside6-addons            6.3.0                    pypi_0    pypi
pyside6-essentials        6.3.0                    pypi_0    pypi
python                    3.9.2                h6244533_0
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.1             pyhd8ed1ab_0    conda-forge
pywin32                   303              py39hb82d6ee_0    conda-forge
quantlib                  1.26                     pypi_0    pypi
requests                  2.27.1                   pypi_0    pypi
scipy                     1.7.3            py39h0a974cb_0    anaconda
setuptools                61.2.0           py39haa95532_0
shiboken6                 6.3.0                    pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlalchemy                1.4.36           py39hb82d6ee_0    conda-forge
sqlite                    3.38.2               h2bbff1b_0
tbb                       2021.5.0             h2d74725_1    conda-forge
tzdata                    2022a                hda174b7_0
tzlocal                   2.1                pyh9f0ad1d_0    conda-forge
urllib3                   1.26.9                   pypi_0    pypi
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.37.1             pyhd3eb1b0_0
wincertstore              0.2              py39haa95532_2
xlwings                   0.27.6           py39hcbf5309_0    conda-forge
@ericbmoreira ericbmoreira added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 28, 2022
@simonjayhawkins simonjayhawkins added the IO SQL to_sql, read_sql, read_sql_query label May 4, 2022
@ericbmoreira ericbmoreira changed the title BUG: BUG: .to_sql(method=mysql_update_on_dupe_key) fails because of change from 1.3.1 to 1.4.2. May 5, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone May 18, 2022
@simonjayhawkins
Copy link
Member

moving to 1.4.4

@simonjayhawkins simonjayhawkins modified the milestones: 1.4.3, 1.4.4 Jun 22, 2022
@jorisvandenbossche
Copy link
Member

This was probably caused by #45137 cc @mroeschke

@jorisvandenbossche
Copy link
Member

@ericbmoreira is there a reason that your insert function mysql_update_on_dupe_key returns itself (the function object)? I think that if it does not return anything, it should still work?

@ericbmoreira
Copy link
Author

@jorisvandenbossche , no there was not a good reason it was returning itself. It was because I did not fully understand how it worked, and got the code from somewhere else. I just tested the same code above, except without the return, and it works with no errors on 1.4.2. It also works on 1.3.1. This is my first bug report, so please let me know if you would like me to close it or if you need anything else.

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SQL to_sql, read_sql, read_sql_query Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants