enable multivalues insert #19664

danfrankj · 2018-02-12T19:47:31Z

Summary

Currently when pushing a dataframe to a database, lines are inserted one by one. This change enables multivalues inserts.

TODO

release note
address chunksize behavior

Reference

http://docs.sqlalchemy.org/en/rel_0_9/core/dml.html?highlight=insert%20values#sqlalchemy.sql.expression.Insert.values

TomAugspurger · 2018-02-12T19:52:14Z

xref #14315 and #8953

I think the previous issues were with sqlalchemy dialects that don't support multi-row inserts, so we'll need to test that.

docs, release note, and tests.

danfrankj · 2018-02-12T19:54:47Z

For reference, I'm using a SQLAlchemy dialect that supports_multivalues_inserts and inserts still happen line by line. Will add tests though.

codecov · 2018-02-13T00:46:31Z

Codecov Report

❗ No coverage uploaded for pull request base (master@f33e84c). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #19664   +/-   ##
=========================================
  Coverage          ?   91.71%           
=========================================
  Files             ?      150           
  Lines             ?    49104           
  Branches          ?        0           
=========================================
  Hits              ?    45035           
  Misses            ?     4069           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.09% <ø> (?)`
#single	`41.87% <ø> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f33e84c...f298de1. Read the comment docs.

TomAugspurger · 2018-02-13T11:39:52Z

Ideally, in the tests we'll be able to introspect the sqlalchemy engine somehow to assert that multi-row inserts are actually happening.

jreback

this needs a test

pep8speaks · 2018-02-18T07:54:02Z

Hello @danfrankj! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 07, 2018 at 21:54 Hours UTC

danfrankj · 2018-02-18T16:51:00Z

@jreback @TomAugspurger added first stab at a test. Let me know what you think!

TomAugspurger

Test seems good, thanks.

How does this interact with chunksize? Can we say that you'll have len(df) // chunksize inserts? Is there any way to test that?

Also need a release note.

TomAugspurger · 2018-02-18T17:00:17Z

pandas/tests/io/test_sql.py

        res2 = self.pandasSQL.read_query('SELECT * FROM test_trans')
        assert len(res2) == 1

+    def _test_insert_multivalues(self):


Add a comment with the Github issue number.

And since this is sqlalchemy-specific, could you just define the test directly on _TestSQLAlchemy?

Added github issues as comments and test moved

jreback · 2018-02-18T17:58:35Z

pandas/io/sql.py


-    def insert_statement(self):
-        return self.table.insert()
+    def insert_statement(self, data, conn):


can you add a doc-string here

Could you add a parameters and returns section as well. http://numpydoc.readthedocs.io/en/latest/format.html

jreback · 2018-02-18T17:58:41Z

pandas/io/sql.py

    def _execute_insert(self, conn, keys, data_iter):
        data = [{k: v for k, v in zip(keys, row)} for row in data_iter]
-        conn.execute(self.insert_statement(), data)
+        conn.execute(*self.insert_statement(data, conn))


here as well

jreback

need a whatsnew note that lists the backends where this would work.

jreback · 2018-02-18T18:00:41Z

pandas/tests/io/test_sql.py


        tm.assert_frame_equal(df, expected)

+    def test_insert_multivalues(self):


can you explicity test which backends support this.

@jreback I believe I've done this by adding a class variable to the below classes. Let me know if that addresses this concern

danfrankj · 2018-02-23T18:28:57Z

doc/source/whatsnew/v0.23.0.txt


 - ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
 - :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
+- :func:`pd.io.sql.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row (:issue:`14315`, :issue: `8953`)


not sure if this should be an enhancement or a bugfix?

danfrankj · 2018-02-24T17:50:38Z

@jreback @TomAugspurger I believe I've addressed your concerns above, when you get a chance PTAL

TomAugspurger

Just some minor comments. Looks good overall.

Can you run a quick benchmark to see how things look compared to master? How much faster are things?

TomAugspurger · 2018-02-28T03:33:48Z

pandas/io/sql.py


-    def insert_statement(self):
-        return self.table.insert()
+    def insert_statement(self, data, conn):


Could you add a parameters and returns section as well. http://numpydoc.readthedocs.io/en/latest/format.html

TomAugspurger · 2018-02-28T03:33:59Z

pandas/io/sql.py

+        dialect = getattr(conn, 'dialect', None)
+        if dialect and getattr(dialect, 'supports_multivalues_insert', False):
+            return (self.table.insert(data),)
+        return (self.table.insert(), data)


Shouldn't need the parenthesis here.

TomAugspurger · 2018-02-28T03:36:03Z

doc/source/whatsnew/v0.23.0.txt

 - Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
 - :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
+- :func:`pd.io.sql.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row.
+  SQL dialects supporting multivalue inserts include mysql, postgresql, sqlite and any dialect with `supports_multivalues_insert`. (:issue:`14315`, :issue:`8953`)


"dialect" -> "SQLAlchemy dialect"

TomAugspurger · 2018-02-28T03:36:50Z

doc/source/whatsnew/v0.23.0.txt

 - :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
 - Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
 - :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
+- :func:`pd.io.sql.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row.


I don't know if io.sql.to_sql is part of the API list. Better to just have

:meth:`DataFrame.to_sql`

danfrankj · 2018-03-02T20:31:04Z

Some profiling

Presto


In [5]: df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))

In [6]: %time df.to_sql('multi_insert_profile', presto_engine, schema='dan_frank', index=False)
INFO:pyhive.presto:SHOW COLUMNS FROM "dan_frank"."multi_insert_profile"
INFO:pyhive.presto:
CREATE TABLE "dan_frank"."multi_insert_profile" (
    "A" DOUBLE,
    "B" DOUBLE,
    "C" DOUBLE,
    "D" DOUBLE
)


INFO:pyhive.presto:INSERT INTO "dan_frank"."multi_insert_profile" ("A", "B", "C", "D") VALUES (-0.4425108530531213, -0.4582021047086419, 0.17242001384630398, -1.2917653645626361), (-0.9715964127007015, -0.1458055798883143, 0.3444250700373072, -0.35869901840257923), (0.6732070093449385, 0.3371601918897362, -0.49645678476330574, -0.8241023338536242), (-0.4845513289740901, -1.4860936235542728, 0.19123940403655423, -0.32166319533058985), (2.72221337305179, 0.31572155167450705, -0.5522159042533455, -0.28023622560479866), (-2.2406710854261345, 0.8005522925313067, -0.5762370339886204, 1.1784968768877826), (-0.06826129801094293, 0.2760723638718846, 0.526970720133034, 
... LOG TRUNCATED


CPU times: user 131 ms, sys: 9.06 ms, total: 140 ms
Wall time: 13.1 s



In [7]: presto_engine.dialect.supports_multivalues_insert = False

In [8]: %time df.to_sql('sequential_insert_profile', presto_engine, schema='dan_frank', index=False)
INFO:pyhive.presto:SHOW COLUMNS FROM "dan_frank"."sequential_insert_profile"
INFO:pyhive.presto:
CREATE TABLE "dan_frank"."sequential_insert_profile" (
    "A" DOUBLE,
    "B" DOUBLE,
    "C" DOUBLE,
    "D" DOUBLE
)


INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-0.4425108530531213, -0.4582021047086419, 0.17242001384630398, -1.2917653645626361)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-0.9715964127007015, -0.1458055798883143, 0.3444250700373072, -0.35869901840257923)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (0.6732070093449385, 0.3371601918897362, -0.49645678476330574, -0.8241023338536242)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-0.4845513289740901, -1.4860936235542728, 0.19123940403655423, -0.32166319533058985)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (2.72221337305179, 0.31572155167450705, -0.5522159042533455, -0.28023622560479866)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-2.2406710854261345, 0.8005522925313067, -0.5762370339886204, 1.1784968768877826)
... LOG TRUNCATED 


CPU times: user 15.7 s, sys: 1.44 s, total: 17.1 s
Wall time: 14min 57s

MySQL

Comparable insert times

danfrankj · 2018-03-05T19:24:34Z

@TomAugspurger did some brief profiling and added additional documentation. Anything else you think is needed for this PR?

TomAugspurger · 2018-03-05T23:15:54Z

@jorisvandenbossche any thoughts? This seems harmless to me, and the results are... impressive :)

jreback · 2018-03-07T14:11:07Z

doc/source/whatsnew/v0.23.0.txt

 - :meth:`Timestamp.month_name`, :meth:`DatetimeIndex.month_name`, and :meth:`Series.dt.month_name` are now available (:issue:`12805`)
 - :meth:`Timestamp.day_name` and :meth:`DatetimeIndex.day_name` are now available to return day names with a specified locale (:issue:`12806`)
+- :meth:`DataFrame.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row.
+  SQLAlchemy dialects supporting multivalue inserts include mysql, postgresql, sqlite and any dialect with `supports_multivalues_insert`. (:issue:`14315`, :issue:`8953`)


double back-ticks. can you also add a 'note' in io.rst about this.

danfrankj · 2018-03-07T18:03:58Z

@jreback backquotes fixed and note added to io.rst

jreback · 2018-03-07T21:54:49Z

thanks!

danfrankj · 2018-03-07T22:03:26Z

Thank you guys for the reviews! Excited for my first pandas contribution :)

@TomAugspurger

commit df2e361 Author: Jeff Reback <jeff@reback.net> Date: Sun Mar 11 18:33:25 2018 -0400 LINT: fixing commit f1c0b7c Author: David Polo <delkk0@users.noreply.github.com> Date: Sun Mar 11 22:54:27 2018 +0100 DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods… (pandas-dev#20157) * DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods.barh() - Added examples section - Added extended summary - Added argument explanation * DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods.barh() - Correcting PR comments * DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods.barh() - Adding defaults for variables. * Update reference commit 0780193 Author: Jonas Schulze <jonas.schulze7@t-online.de> Date: Sun Mar 11 22:37:37 2018 +0100 DOC: update the pandas.DataFrame.plot.density docstring (pandas-dev#20236) * DOC: update the pandas.DataFrame.plot.kde and pandas.Series.plot.kde docstrings Unfortunately, I was not able to compute a kernel estimate of a two-dimensional random variable. Hence, the example is more of an analysis of some independent data series. * DOC: extract similarities of kde docstrings The `DataFrame.plot.kde` and `Series.plot.kde` now use a common docstring, for which the differences are inserted. commit 2718984 Author: Cihan Ceyhan <chncyhn@gmail.com> Date: Sun Mar 11 21:48:08 2018 +0100 DOC: Update the pandas.Series.dt.round/floor/ceil docstrings (pandas-dev#20187) * DOC: Update the pandas.Series.dt.round/floor/ceil docstrings * DOC: review points fixed. * Add series commit 0d86742 Author: Antonio Molina <aydevosotros@gmail.com> Date: Sun Mar 11 18:57:37 2018 +0100 DOC: Improved pandas.plotting.bootstrap_plot docstring (pandas-dev#20166) * Improved documentation on bootstrap_plot * Improved documentation on bootstrap_plot * Doc bootstrap_plot: Fixed some comments on pull requests * Added reference to wikipedia * Changed kwds for **kwds * Removed ** from kwds becuase of validation iuses * Fixed forgotten break line. I think that the kwds paramater now fits what expected @TomAugspurger. If not, sorry and indicate how it should be * Fixed warnings on compilation * Moved reference to extended description commit a2910ad Author: András Novoszáth <nocibambi@gmail.com> Date: Sun Mar 11 18:56:01 2018 +0100 DOC: update the Index.get_values docstring (pandas-dev#20231) * DOC: update the Index.get_values docstring * Corrections * Corrected extended summary and quotes * Correcting spaces, extended summary, multiIndex example * See also correction * Multi ndim commit afa6c42 Author: Marc <mlafore05@gmail.com> Date: Sun Mar 11 10:42:35 2018 -0400 DOC: update the pandas.DataFrame.all docstring (pandas-dev#20216) commit a44bae3 Author: Victor Villas <villasv@outlook.com> Date: Sun Mar 11 11:41:12 2018 -0300 DOC: update the Series.view docstring (pandas-dev#20220) commit 233103f Author: David Adrián Cañones Castellano <davidarcano@gmail.com> Date: Sun Mar 11 15:40:02 2018 +0100 DOC: update the docstring of pandas.DataFrame.from_dict (pandas-dev#20259) commit 62bddec Author: csfarkas <csaba.farkas95@gmail.com> Date: Sun Mar 11 15:33:54 2018 +0100 DOC: add docstring for Index.get_duplicates (pandas-dev#20223) commit 8c77238 Author: adatasetaday <32177771+adatasetaday@users.noreply.github.com> Date: Sun Mar 11 10:17:05 2018 -0400 Docstring pandas.series.diff (pandas-dev#20238) commit 4271757 Author: Aly Sivji <4369343+alysivji@users.noreply.github.com> Date: Sun Mar 11 08:51:25 2018 -0500 DOC: update `pandas/core/ops.py` docstring template to accept examples (pandas-dev#20246) commit 080ef0c Author: akosel <aaronjkosel@gmail.com> Date: Sun Mar 11 12:43:10 2018 +0000 DOC: update the DataFrame.iat[] docstring (pandas-dev#20219) * DOC: update the DataFrame.iat[] docstring * Update based on PR comments * Update based on PR comments * Singular not plural * Update to account for use with Series. Add example using Series. * Update indexing.py * PEP8 commit 302fda4 Author: adatasetaday <32177771+adatasetaday@users.noreply.github.com> Date: Sun Mar 11 08:36:21 2018 -0400 DOC: update the pandas.DataFrame.diff docstring (pandas-dev#20227) * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * DOC: update the pandas.DataFrame.diff docstring * Cleanup commit c791a84 Author: Pietro Battiston <me@pietrobattiston.it> Date: Sun Mar 11 13:07:01 2018 +0100 DOC: pd.core.window.Expanding.kurt docstring (split from pd.core.Rolling.kurt) (pandas-dev#20064) commit b3d6ce6 Author: Nipun Sadvilkar <nipunsadvilkar@gmail.com> Date: Sun Mar 11 17:29:33 2018 +0530 DOC: update the pandas.date_range() docstring (pandas-dev#20143) * DOC: Improved the docstring of pandas.date_range() * Change date strings to iso format * Removed import pands in Examples docstring * Add See Also Docstring * Update datetimes.py * Doctests commit 6d7272a Author: Samuel Sinayoko <samuelsinayoko@bmlltech.com> Date: Sun Mar 11 11:58:09 2018 +0000 DOC: update DataFrame.to_records (pandas-dev#20191) * Update to_records docstring. - Minor changes (missing dots, newlines) to make tests pass. - More examples. * Fix html docs. Missing newlines. * Reword datetime type information. * flake8 errors * Fix typo (duplicated type) * Remove unwanted blank line after Examples. * Fix doctests. ``` (pandas_dev) sinayoks@landade:~/dev/pandas/ $ pytest --doctest-modules pandas/core/frame.py -k to_record ========================================================================================== test session starts ========================================================================================== platform darwin -- Python 3.6.4, pytest-3.4.2, py-1.5.2, pluggy-0.6.0 rootdir: /Users/sinayoks/dev/pandas, inifile: setup.cfg plugins: xdist-1.22.1, forked-0.2, cov-2.5.1 collected 43 items pandas/core/frame.py . [100%] ========================================================================================== 42 tests deselected ========================================================================================== ``` * Few more changes commit 636335a Author: Gabriel de Maeztu <gabriel.maeztu@gmail.com> Date: Sun Mar 11 12:56:48 2018 +0100 DOC: Improved the docstring of pandas.plotting.radviz (pandas-dev#20169) commit fbebc7f Author: jen w <j.e.weiss@gmail.com> Date: Sun Mar 11 06:50:54 2018 -0500 DOC: Update pandas.DataFrame.tail docstring (pandas-dev#20225) commit c2864d7 Author: Stephen Childs <sechilds@gmail.com> Date: Sun Mar 11 07:50:39 2018 -0400 DOC: update the DataFrame.cov docstring (pandas-dev#20245) * DOC: Revise docstring of DataFrame cov method Update the docstring with some examples from elsewhere in the pandas documentation. Some of the examples use randomly generated time series because we need to get covariance between long series. Used a random seed to ensure that the results are the same each time. * DOC: Fix See Also and min_periods explanation. Responding to comments on PR. See also section will link properly and number of periods explanation clearer. commit 90e31b9 Author: jen w <j.e.weiss@gmail.com> Date: Sun Mar 11 06:50:18 2018 -0500 DOC: update pandas.DataFrame.head docstring (pandas-dev#20262) commit fb556ed Author: Israel Saeta Pérez <dukebody@gmail.com> Date: Sat Mar 10 22:33:42 2018 +0100 DOC: Improve pandas.Series.plot.kde docstring and kwargs rewording for whole file (pandas-dev#20041) commit c3d491a Author: Andy R. Terrel <andy.terrel@gmail.com> Date: Sat Mar 10 11:48:13 2018 -0800 DOC: update the DataFrame.head() docstring (pandas-dev#20206) commit dd7f567 Author: DataOmbudsman <DataOmbudsman@users.noreply.github.com> Date: Sat Mar 10 20:15:48 2018 +0100 DOC: update the Index.shift docstring (pandas-dev#20192) * DOC: updating docstring of Index.shift * Add See Also section to shift * Update link to Series.shift commit 5b0caf4 Author: Eric O. LEBIGOT (EOL) <lebigot@users.noreply.github.com> Date: Sat Mar 10 17:32:20 2018 +0100 DOC: update the Series.memory_usage() docstring (pandas-dev#20086) commit 9fb7ac9 Author: Carol Willing <carolcode@willingconsulting.com> Date: Sat Mar 10 08:28:54 2018 -0800 DOC: Edit contributing to docs section (pandas-dev#20190) commit d8181a5 Author: DaanVanHauwermeiren <DaanVanHauwermeiren@users.noreply.github.com> Date: Sat Mar 10 17:25:20 2018 +0100 DOC: update the Series.isin docstring (pandas-dev#20175) commit ec631ce Author: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com> Date: Sat Mar 10 17:12:41 2018 +0100 DOC: update the pandas.Series.tail docstring (pandas-dev#20176) commit e5e4ae9 Author: DaanVanHauwermeiren <DaanVanHauwermeiren@users.noreply.github.com> Date: Sat Mar 10 16:41:58 2018 +0100 DOC: update the pandas.Index.drop_duplicates and pandas.Series.drop_duplicates docstring (pandas-dev#20114) commit d7bcb22 Author: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com> Date: Sat Mar 10 15:49:31 2018 +0100 DOC: update the MultiIndex.swaplevel docstring (pandas-dev#20105) commit 8497029 Author: Gjelt <math-and-data@users.noreply.github.com> Date: Sat Mar 10 15:41:17 2018 +0100 DOC: Improved the docstring of pandas.DataFrame.values (pandas-dev#20065) commit 840d432 Author: Jordi Contestí <25779507+jcontesti@users.noreply.github.com> Date: Sat Mar 10 13:24:35 2018 +0100 DOC: Improved the docstring of Series.str.findall (pandas-dev#19982) commit 2a0d23b Author: Jeff Reback <jeff@reback.net> Date: Sat Mar 10 06:54:19 2018 -0500 DOC: lint commit bf0dcb5 Author: Kate Surta <kate.surta@gmail.com> Date: Sat Mar 10 14:42:52 2018 +0300 BUG: Check for wrong arguments in index subclasses constructors (pandas-dev#20017) commit 4131149 Author: Stijn Van Hoey <stijnvanhoey@gmail.com> Date: Sat Mar 10 10:15:41 2018 +0100 DOC: Extend docstring pandas core index to_frame method (pandas-dev#20036) commit 52cffa3 Author: William Ayd <william.ayd@icloud.com> Date: Fri Mar 9 18:06:43 2018 -0800 Cythonized GroupBy pct_change (pandas-dev#19919) commit da6f827 Author: William Ayd <william.ayd@icloud.com> Date: Fri Mar 9 18:03:50 2018 -0800 Refactored GroupBy ASVs (pandas-dev#20043) commit bd31f71 Author: William Ayd <william.ayd@icloud.com> Date: Fri Mar 9 17:53:34 2018 -0800 Added 'displayed_only' option to 'read_html' (pandas-dev#20047) commit ed96567 Author: Ksenia <bobrovaksenia@gmail.com> Date: Sat Mar 10 02:40:10 2018 +0100 TST: series/indexing tests parametrization + moving test methods (pandas-dev#20059) commit 7c14e4f Author: Kyle Barron <kylebarron2@gmail.com> Date: Fri Mar 9 11:31:14 2018 -0500 DOC: Add syntax highlighting to SAS code blocks in comparison_with_sas.rst (pandas-dev#20080) * Add syntax highlighting to SAS code blocks * Fix typo commit 731d971 Author: Matthew Roeschke <emailformattr@gmail.com> Date: Fri Mar 9 03:30:22 2018 -0800 Fix typo in apply.py (pandas-dev#20058) commit cc1b934 Author: Matthew Roeschke <emailformattr@gmail.com> Date: Fri Mar 9 03:13:50 2018 -0800 BUG: Retain timezone dtype with cut and qcut (pandas-dev#19890) commit c730d08 Author: William Ayd <william.ayd@icloud.com> Date: Fri Mar 9 02:37:27 2018 -0800 DOC: Update Kurt Docstr (pandas-dev#20044) commit 9119d07 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Fri Mar 9 10:03:44 2018 +0100 Temporary github PR template for sprint (pandas-dev#20055) commit 747501a Author: Aly Sivji <4369343+alysivji@users.noreply.github.com> Date: Fri Mar 9 02:19:59 2018 -0600 DOC: Improve docstring for pandas.Index.repeat (pandas-dev#19985) commit 1d73cf3 Author: Rouz Azari <rouzazari@users.noreply.github.com> Date: Thu Mar 8 16:54:53 2018 -0800 BUG: Dense ranking with percent now uses 100% basis (pandas-dev#15639) commit f9fd540 Author: William Ayd <william.ayd@icloud.com> Date: Thu Mar 8 16:36:23 2018 -0800 Added flake8 to DEV requirements (pandas-dev#20063) commit b669112 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Thu Mar 8 14:09:12 2018 +0100 DOC: require returns section in validation script (pandas-dev#19994) commit 024d8b4 Author: Jeff Reback <jeff@reback.net> Date: Thu Mar 8 07:08:57 2018 -0500 TST: xfail test_time on py2 & mpl 1.4.3 (pandas-dev#20053) commit b85f6c1 Author: Marc Garcia <garcia.marc@gmail.com> Date: Thu Mar 8 11:07:08 2018 +0000 DOC: update docstring validation script + replace api coverage script (pandas-dev#20025) * Improvments to validate_docstrings script: adding sections to summary, validating type and description of parameters * DOC: Improvements to validate docstring script (added api_coverage functionality, sections in csv and extra validations) commit 9273bf5 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Thu Mar 8 11:14:05 2018 +0100 DOC/CI: temp pin matplotlib for doc build (pandas-dev#20045) commit 63ce781 Author: Jeff Reback <jeff@reback.net> Date: Wed Mar 7 17:01:38 2018 -0500 TST: xfail mpl 2.2 tests xref pandas-dev#20031 commit 7c7bd56 Author: Daniel Frank <danfrankj@gmail.com> Date: Wed Mar 7 13:54:46 2018 -0800 enable multivalues insert (pandas-dev#19664) commit f33e84c Author: Ksenia <bobrovaksenia@gmail.com> Date: Wed Mar 7 22:09:42 2018 +0100 Moving tests in series/indexing to fixtures (pandas-dev#20014.1) (pandas-dev#20034) commit 2532a49 Author: Liam3851 <david.krych@gmail.com> Date: Wed Mar 7 13:04:22 2018 -0500 BUG: Fixes to msgpack support. (pandas-dev#19975) commit fd010de Author: Guilherme Beltramini <guilherme.beltramini@nubank.com.br> Date: Wed Mar 7 11:33:09 2018 -0300 to_sql also accepts Series (pandas-dev#20004) commit 8d462ed Author: Paul Reidy <paul_reidy@outlook.com> Date: Wed Mar 7 14:32:12 2018 +0000 EHN: Implement method argument for DataFrame.replace (pandas-dev#19894) commit d14fae8 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Mar 7 06:19:21 2018 -0800 cleanup ops (pandas-dev#19972) commit 776f2be Author: William Ayd <william.ayd@icloud.com> Date: Wed Mar 7 05:59:39 2018 -0800 Added .pytest_cache to gitignore (pandas-dev#20021) commit 460941f Author: jschendel <jschendel@users.noreply.github.com> Date: Wed Mar 7 06:57:51 2018 -0700 Fix typos in test_interval_new (pandas-dev#20026) commit 5782ab8 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Wed Mar 7 14:57:17 2018 +0100 DOC: enable matplotlib plot_directive to include figures in docstrings (pandas-dev#20015) commit dd2b224 Author: DataOmbudsman <DataOmbudsman@users.noreply.github.com> Date: Wed Mar 7 14:56:49 2018 +0100 DOC: updating docstring of Index.shift (pandas-dev#19996) commit 09c416c Author: William Ayd <william.ayd@icloud.com> Date: Wed Mar 7 05:56:16 2018 -0800 DOC: Updated kurt docstring (for pandas sprint) (pandas-dev#19999) commit ad15f80 Author: Kate Surta <kate.surta@gmail.com> Date: Wed Mar 7 16:55:48 2018 +0300 TST: Fix wrong argument in TestDataFrameAlterAxes.test_set_index_dst (pandas-dev#20019) commit f6ee9ac Author: Jeff Reback <jeff@reback.net> Date: Wed Mar 7 08:55:33 2018 -0500 TST: xfail clip tests under numpy-dev (pandas-dev#20035) xref pandas-dev#19976 commit 397e296 Author: Jeff Reback <jeff@reback.net> Date: Wed Mar 7 08:15:49 2018 -0500 TST: xfail some tests for mpl 2.2 compat (pandas-dev#20033) xref pandas-dev#20031 commit 56939b4 Author: luzpaz <luzpaz@users.noreply.github.com> Date: Wed Mar 7 06:10:39 2018 -0500 DOC: misc typos (pandas-dev#20029) commit 01b91c2 Author: alinde1 <32714875+alinde1@users.noreply.github.com> Date: Tue Mar 6 22:47:45 2018 +0100 DOC: is confusing for ddof parameter of sem, var and std functions (pandas-dev#19986) commit db82165 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Tue Mar 6 22:42:41 2018 +0100 CLN/DOC: cache_readonly: remove allow_setting + preserve docstring (pandas-dev#19991) commit e02f737 Author: Tom Augspurger <TomAugspurger@users.noreply.github.com> Date: Tue Mar 6 09:38:32 2018 -0600 DOC: add doc on ExtensionArray and extending pandas (pandas-dev#19936) commit 0ca77b3 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Tue Mar 6 04:27:21 2018 -0800 Datetimelike add/sub catch cases more explicitly, tests (pandas-dev#19912) commit 0038bad Author: Matthew Roeschke <emailformattr@gmail.com> Date: Tue Mar 6 04:25:55 2018 -0800 month_name/day_name warnings followup (pandas-dev#20010) commit fd63c90 Author: Ksenia <bobrovaksenia@gmail.com> Date: Tue Mar 6 13:25:37 2018 +0100 TST: split series/test_indexing.py (pandas-dev#18614) (pandas-dev#20006) commit 6366bf0 Author: Jeff Reback <jeff@reback.net> Date: Tue Mar 6 07:25:17 2018 -0500 TST: clean deprecation warnings for xref pandas-dev#19980 (pandas-dev#20013) xfail some mpl > 2.1.2 tests commit fe61299 Author: William Ayd <william.ayd@icloud.com> Date: Tue Mar 6 00:30:13 2018 -0800 DOC: fixed dynamic import mechanics of make.py (pandas-dev#20005) commit 8a084eb Author: Grant Smith <grantsmith@gmail.com> Date: Tue Mar 6 03:29:26 2018 -0500 CLN: deprecate the pandas.tseries.plotting.tsplot function (GH18627) (pandas-dev#19980) commit aedbd94 Author: Jeff Reback <jeff@reback.net> Date: Mon Mar 5 06:36:41 2018 -0500 TST: text correction, xref pandas-dev#19987 commit cbffd19 Author: Bhavesh Poddar <bhavesh13103507@gmail.com> Date: Mon Mar 5 06:34:59 2018 -0500 fixed pytest deprecation warning (pandas-dev#19987) commit 058a16c Author: Matthew Roeschke <emailformattr@gmail.com> Date: Mon Mar 5 03:23:49 2018 -0800 CLN: Use generators in builtin functions (pandas-dev#19989) commit 607910b Author: Matthew Roeschke <emailformattr@gmail.com> Date: Sun Mar 4 12:15:37 2018 -0800 Add month names (pandas-dev#18164) commit 2fad756 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Sun Mar 4 12:00:39 2018 -0800 transition period_helper to use pandas_datetimestruct (pandas-dev#19918) commit 53606ff Author: Liam3851 <david.krych@gmail.com> Date: Sun Mar 4 14:58:22 2018 -0500 BUG: Compat for pre-0.20 TimedeltaIndex and Float64Index pickles pandas-dev#19939 (pandas-dev#19943) commit 0bfb61b Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Fri Mar 2 22:35:45 2018 +0100 DOC: small updates to make.py script (pandas-dev#19951) * enable passing verbosity flag to sphinx * alias api for api.rst commit d1f3689 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Fri Mar 2 22:33:48 2018 +0100 DOC: fix some sphinx syntax warnings (pandas-dev#19962) commit 49f09cc Author: Tom Augspurger <TomAugspurger@users.noreply.github.com> Date: Fri Mar 2 15:20:28 2018 -0600 API: Added ExtensionArray constructor from scalars (pandas-dev#19913) commit d30d165 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Fri Mar 2 22:18:10 2018 +0100 DOC: update docstring validation script (pandas-dev#19960) commit a7a7f8c Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Fri Mar 2 13:49:59 2018 +0100 DOC: clarify version of ActivePython that includes pandas (pandas-dev#19964) commit b167483 Author: Gina <Dr-G@users.noreply.github.com> Date: Fri Mar 2 05:33:49 2018 -0600 DOC: update install.rst to include ActivePython distribution (pandas-dev#19908) commit e6c7dea Author: topper-123 <terji78@gmail.com> Date: Fri Mar 2 11:19:07 2018 +0000 ENH: Let initialisation from dicts use insertion order for python >= 3.6 (part III) (pandas-dev#19884) commit d615f86 Author: Marc Garcia <garcia.marc@gmail.com> Date: Fri Mar 2 09:39:45 2018 +0000 DOC: Adding script to validate docstrings, and generate list of all functions/methods with state (pandas-dev#19898) commit 5f271eb Author: Yian <yian.shang@gmail.com> Date: Fri Mar 2 00:13:58 2018 +0100 BUG: Adding skipna as an option to groupby cumsum and cumprod (pandas-dev#19914) commit 072545d Author: David C Hall <davidchall@users.noreply.github.com> Date: Thu Mar 1 15:06:20 2018 -0800 ENH: Add option to disable MathJax (pandas-dev#19824). (pandas-dev#19856) commit d44a6ec Author: Yian <yian.shang@gmail.com> Date: Fri Mar 2 00:02:31 2018 +0100 Making to_datetime('today') and Timestamp('today') consistent (pandas-dev#19937) commit 87fefe2 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Thu Mar 1 14:54:42 2018 -0800 dispatch Series[datetime64] comparison ops to DatetimeIndex (pandas-dev#19800) commit 9242248 Author: Matthew Roeschke <emailformattr@gmail.com> Date: Thu Mar 1 14:50:35 2018 -0800 BUG: DataFrame.diff(axis=0) with DatetimeTZ data (pandas-dev#19773) commit c5a1ef1 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Thu Mar 1 22:48:39 2018 +0100 DOC: remove empty attribute/method lists from class docstrings html page (pandas-dev#19949) commit 9958ce6 Author: jschendel <jschendel@users.noreply.github.com> Date: Thu Mar 1 04:14:19 2018 -0700 BUG: Preserve column metadata with DataFrame.astype (pandas-dev#19948) commit 3b4eb8d Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Thu Mar 1 12:12:35 2018 +0100 CLN: remove redundant clean_fill_method calls (pandas-dev#19947) commit c8859b5 Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Date: Thu Mar 1 10:35:05 2018 +0100 DOC: script to build single docstring page (pandas-dev#19840) commit 52559f5 Author: Matthew Roeschke <emailformattr@gmail.com> Date: Wed Feb 28 17:32:24 2018 -0800 ENH: Allow Timestamp to accept Nanosecond argument (pandas-dev#19889) commit 4a27697 Author: William Ayd <william.ayd@icloud.com> Date: Wed Feb 28 17:30:18 2018 -0800 Cythonized GroupBy any (pandas-dev#19722) commit 96b8bb1 Author: jschendel <jschendel@users.noreply.github.com> Date: Wed Feb 28 18:07:15 2018 -0700 ENH: Implement DataFrame.astype('category') (pandas-dev#18099) commit 6ef4be3 Author: Liam3851 <david.krych@gmail.com> Date: Wed Feb 28 06:14:11 2018 -0500 ENH: Allow literal (non-regex) replacement using .str.replace pandas-dev#16808 (pandas-dev#19584) commit 318a287 Author: README Bot <35302948+codetriage-readme-bot@users.noreply.github.com> Date: Wed Feb 28 05:07:28 2018 -0600 Add CodeTriage badge to pandas-dev/pandas (pandas-dev#19928) Adds a badge showing the number of people helping this repo on CodeTriage. commit 14a38a6 Author: Chris Catalfo <ccatalfo@users.noreply.github.com> Date: Wed Feb 28 03:14:23 2018 -0500 DOC: fixes pipe example in basics.rst due to statsmodel changes (pandas-dev#19923) commit dfe9d4a Author: Phil Ngo <ngo.phil@gmail.com> Date: Wed Feb 28 00:05:56 2018 -0800 DOC: fix Series.reset_index example (pandas-dev#19930) commit 9bdc5c8 Author: William Ayd <william.ayd@icloud.com> Date: Tue Feb 27 16:16:48 2018 -0800 Consistent Timedelta Writing for all Excel Engines (pandas-dev#19921) commit 61211a8 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Tue Feb 27 16:11:47 2018 -0800 Assorted _libs cleanups (pandas-dev#19887)

tripkane · 2018-05-17T10:22:33Z

Hi all, In pandas 0.22 I could write a dataframe to sql of reasonable size without error. Now I receive this error "OperationalError: (sqlite3.OperationalError) too many SQL variables". I am converting a dataframe with ~20k+ rows to sql. After looking around I suspect the problem lies in the limit set by sqlite3: SQLITE_MAX_VARIABLE_NUMBER which is set to 999 by default. This can apparently be changed by recompiling sqlite and adjusting this variable accordingly. I also noticed that adjusting the chunksize in DataFrame.to_sql has no effect perhaps confirming this is the root cause.

TomAugspurger · 2018-05-17T11:09:54Z

@tripkane maybe make a new issue (link back here) with a reproducible example.

tripkane · 2018-05-17T11:28:07Z

@TomAugspurger: ok thanks, will do

This reverts commit 7c7bd56.

This reverts commit 7c7bd56. (cherry picked from commit c460710)

This reverts commit 7c7bd56.

danfrankj force-pushed the df_multivalues_insert branch from 06fcfcc to af65ea5 Compare February 12, 2018 19:51

TomAugspurger added Performance Memory or execution speed performance IO SQL to_sql, read_sql, read_sql_query labels Feb 12, 2018

danfrankj force-pushed the df_multivalues_insert branch 3 times, most recently from 89fad1c to 4cc8890 Compare February 12, 2018 22:27

jreback requested changes Feb 14, 2018

View reviewed changes

danfrankj force-pushed the df_multivalues_insert branch from 4cc8890 to 6bd1086 Compare February 18, 2018 07:53

danfrankj force-pushed the df_multivalues_insert branch 2 times, most recently from f7f1c3d to 9b50c47 Compare February 18, 2018 16:05

TomAugspurger reviewed Feb 18, 2018

View reviewed changes

danfrankj force-pushed the df_multivalues_insert branch from 9b50c47 to 951b74c Compare February 18, 2018 17:42

jreback requested changes Feb 18, 2018

View reviewed changes

danfrankj force-pushed the df_multivalues_insert branch 2 times, most recently from c875d87 to 0db5d5c Compare February 23, 2018 18:28

danfrankj commented Feb 23, 2018

View reviewed changes

danfrankj force-pushed the df_multivalues_insert branch 3 times, most recently from e3953c6 to c62a9c1 Compare February 23, 2018 19:20

TomAugspurger reviewed Feb 28, 2018

View reviewed changes

danfrankj mentioned this pull request Feb 28, 2018

Add new hive _push method that supports cli operation and table properties airbnb/omniduct#43

Merged

danfrankj force-pushed the df_multivalues_insert branch from 2dc22da to 09691d8 Compare March 5, 2018 19:09

jreback requested changes Mar 7, 2018

View reviewed changes

danfrankj force-pushed the df_multivalues_insert branch from 09691d8 to 616935b Compare March 7, 2018 17:50

ENH: enable multivalues insert

b8cbc2e

danfrankj force-pushed the df_multivalues_insert branch from 616935b to b8cbc2e Compare March 7, 2018 18:03

jreback added this to the 0.23.0 milestone Mar 7, 2018

jreback approved these changes Mar 7, 2018

View reviewed changes

jreback added 2 commits March 7, 2018 16:54

Merge branch 'master' into PR_TOOL_MERGE_PR_19664

ec40a08

doc

f298de1

jreback merged commit 7c7bd56 into pandas-dev:master Mar 7, 2018

pandres pushed a commit to pandres/pandas that referenced this pull request Mar 15, 2018

enable multivalues insert (pandas-dev#19664)

e25e69d

jorisvandenbossche mentioned this pull request Mar 20, 2018

Use multi-row inserts for massive speedups on to_sql over high latency connections #8953

Closed

tripkane mentioned this pull request May 17, 2018

"too many SQL variables" Error with pandas 0.23 - enable multivalues insert #19664 issue #21103

Closed

schettino72 mentioned this pull request May 21, 2018

to_sql() performance regression (#19664) when DF contains many columns #21146

Closed

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jun 7, 2018

Revert "enable multivalues insert (pandas-dev#19664)"

91d24c3

This reverts commit 7c7bd56.

jorisvandenbossche mentioned this pull request Jun 7, 2018

Revert "enable multivalues insert (#19664)" #21355

Merged

jorisvandenbossche added a commit that referenced this pull request Jun 7, 2018

Revert "enable multivalues insert (#19664)" (#21355)

c460710

This reverts commit 7c7bd56.

daminisatya pushed a commit to daminisatya/pandas that referenced this pull request Jun 8, 2018

Revert "enable multivalues insert (pandas-dev#19664)" (pandas-dev#21355)

2fa7818

This reverts commit 7c7bd56.

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018

Revert "enable multivalues insert (pandas-dev#19664)" (pandas-dev#21355)

222dff8

This reverts commit 7c7bd56. (cherry picked from commit c460710)

TomAugspurger pushed a commit that referenced this pull request Jun 12, 2018

Revert "enable multivalues insert (#19664)" (#21355)

1391fba

This reverts commit 7c7bd56. (cherry picked from commit c460710)

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

Revert "enable multivalues insert (pandas-dev#19664)" (pandas-dev#21355)

fba9f1f

This reverts commit 7c7bd56.


		tm.assert_frame_equal(df, expected)

		def test_insert_multivalues(self):

Uh oh!

enable multivalues insert #19664

enable multivalues insert #19664

Uh oh!

Conversation

danfrankj commented Feb 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reference

Uh oh!

TomAugspurger commented Feb 12, 2018

Uh oh!

danfrankj commented Feb 12, 2018

Uh oh!

codecov bot commented Feb 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger commented Feb 13, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Feb 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on March 07, 2018 at 21:54 Hours UTC

Uh oh!

danfrankj commented Feb 18, 2018

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danfrankj commented Feb 24, 2018

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danfrankj commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Presto

MySQL

Uh oh!

danfrankj commented Mar 5, 2018

Uh oh!

TomAugspurger commented Mar 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danfrankj commented Mar 7, 2018

Uh oh!

danfrankj commented Feb 12, 2018 •

edited

Loading

codecov bot commented Feb 13, 2018 •

edited

Loading

pep8speaks commented Feb 18, 2018 •

edited

Loading

jreback left a comment •

edited

Loading

danfrankj commented Mar 2, 2018 •

edited

Loading