ENH: sql support via SQLAlchemy, with legacy fallback #5950

mangecoeur · 2014-01-15T17:35:55Z

Code, tests, and docs for an updated io.sql module which uses SQLAlchemy as a DB abstraction layer. A legacy fallback is provided for SQLite and MySQL. The api should be backwards compatible, with depreciated names marked as such.

jreback · 2014-01-15T17:45:21Z

@mangecoeur

need to rebase away the merged branchs (makes history cleaner)

do this:

git fetch upstream
git rebase -i upstream/master

delete the tracked branch

git push upstream thisbranch -f

can also combine/squash branch as you see fit (you can do this later if you want)...post comments

https://github.com/pydata/pandas/wiki/Using-Git

jreback · 2014-01-15T17:50:46Z

doc/source/io.rst


-For example, suppose you want to query some data with different types from a
-table such as:
+If SQLAlchemy is not installed a legacy fallback is provided for sqlite and mysql.


add a new in version 0.14 tag (it might make it to 0.13.1...let's see)

jreback · 2014-01-15T18:31:43Z

@mangecoeur instead of unittest.SkipTest, use nose.SkipTest (these fail in py2.6)

ghost · 2014-01-15T18:33:26Z

@jtratner , @hayd you guys were involved at the design stage for this, is there something
we need to be aware of before moving forward based on @mangecoeur 's work? any
remaining issues that need to be ironed out before we commit to an implementation?

ghost · 2014-01-15T18:42:07Z

@mangecoeur , please update the optional dependencies secion of the README and
set a reasonable minimum on the supported version of SQLAlchemy. Nothing that would
force most users to upgrade nor get us tangled in a legacy whack-a-mole, if possible.
if 0.9.1 is not it, One of the builds should test with the minimum version.

@jreback, should we ask for a @Sql decorator on these tests, what do you think?

jreback · 2014-01-15T19:11:17Z

also pls edit install.rst with the same requirements

u may want to test with an older version of sqlalchemy that is popular
(and make one of the requirements. for that)

mangecoeur · 2014-01-16T12:07:35Z

Ok I slept on it, I now agree that automatic datatime conversion needs to be part of this feature.

I will try testing with SQLAlchemy 0.8 , I think that's probably the most widespread version. I don't anticipate any issues, their core api is stable.

Please add any more comments/suggestions and I will implement them in the coming week or so.

ghost · 2014-01-20T21:59:55Z

@mangecoeur, please don't merge master into your PR branches.

mangecoeur · 2014-01-20T22:38:19Z

Sorry, didn't realize, was just doing the workflow i was used to... can try
to revert if its a big problem...

On Monday, January 20, 2014, y-p notifications@github.com wrote:

@mangecoeur https://github.com/mangecoeur, please don't merge master
into your PR branches.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5950#issuecomment-32801129
.

ghost · 2014-01-20T22:42:17Z

Legitimate, but we're settled on using rebase in pandas.

jreback · 2014-01-23T14:58:52Z

ref to #5936

jreback · 2014-01-23T15:00:05Z

@mangecoeur can you rebase and squash a bit......

my brief look says that this does not break API's...

thoughts on inclusion in 0.13.1 ?

@y-p @hayd @danielballan

danielballan · 2014-01-23T15:09:46Z

I don't see any problems, but I'd like to see it sit on master for at least a week (two weeks?) before it's rolled into a release. Just one man's opinion though. How firm is that 6-day timeline on the 0.13.1 milestone?

jreback · 2014-01-23T15:09:49Z

@mangecoeur also need to remove all tracking/merge branches

see https://github.com/pydata/pandas/wiki/Using-Git

ghost · 2014-01-23T15:10:20Z

This is a big feature, which I would normally associate with a major release.
It hasn't lived in master for a couple of months, which I'd say is required to get it stable.

If you'd like to ship it as experimental in 0.13.1 and if the old code is still working,
No strong objection here.

jreback · 2014-01-23T15:10:37Z

@danielballan will be at least that long....0.13 really just rolled out...

ghost · 2014-01-23T15:12:22Z

The point of quickly releasing 0.13.1 is to ensure a stable version is available (see various
perf regressions fixed) during the longer release cycle of the next major release (0.14), so we
can start merging big features into master.

That's the idea anyway.

danielballan · 2014-01-23T15:14:15Z

OK, just basing that on the milestone. Then I don't see a problem.

Has anyone done benchmarks on SQLA-mediated connections vs. MySQLdb / sqlite as it's set up now? I volunteer, once it's in master.

jreback · 2014-01-23T15:14:31Z

@y-p is right here....(I wanted the docs updated)...to reflect 0.13 changes...

maybe can split off the doc updates to satisfy #5936 then roll this to master right after 0.13.1

I am sure that the dtype inference needs some more work and that will happen in 0.14 (hopefully)

ghost · 2014-01-23T15:15:23Z

It's become obvious that the RC's really don't get that much extra testing.
Users wait for the official release. So we only get the bug reports right after a release
which we can then address in a point release. hopefully quickly.

I'd love a better solution, do you have a hypnotic ray I can borrow to get people to test our RC's?

TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings

Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH #4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH #4163 added version added coment ENH #4163 added depreciation warning for tquery and uquery ENH #4163 Documentation and tests

…e date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH #4163 Fixed class renaming, expanded docs ENH #4163 Fixed tests in legacy mode

mangecoeur · 2014-01-23T16:07:24Z

Squashed the history into 3 commits with more or less informative messages (not so used to using git this way, but hopefully shouldn't have screwed anything up)

Let me know how to finish this to get this merged to master, it certainly needs to be exercised in the real world to see how people get on, this should generate some new issues for improving in the next release.

On the docs - because they've been changed quite a bit to cover sqlalchemy and the read_table function I don't think it makes sense to split them out as they stand. I feel confident we could push the whole thing to master and see how people get on.

jorisvandenbossche · 2014-01-23T16:55:48Z

How do we envisage the user API?
In the docs, at the moment there is from pandas.io import sql and then sql.read_sql/to_sql/read_table

What do we want top-level? At the moment there is pd.read_sql and pd.DataFrame.to_sql. Is this to stay? Should it then be used like that in the docs? (but then it contrasts with sql.read_table)
What other functions do we regard as public?
- There were votes against sql.tquery and sql.uquery and this has been removed from the docs and deprecation warnings added , so OK.
- sql.has_table
- sql.execute seems also useful? If so, also mention it in the docs?

To finish summary: also deprecated are sql.read_frame and sql.write_frame (they should be removed from api.rst)

jreback · 2014-02-06T16:16:44Z

@mangecoeur @jorisvandenbossche

we can merge this as soon as you are ready (at least passes green)...don't worry about extra features atm

mangecoeur · 2014-02-06T16:18:10Z

@jreback @jorisvandenbossche ok just want to fix the bug where specifying the same name in index_col and parse_dates causes an error, then we should be good to go.

…for problem

mangecoeur · 2014-02-06T17:04:28Z

@jreback @jorisvandenbossche Done. Read to merge when you are. Might need some instructions on how to actually do this...

jreback · 2014-02-06T17:13:58Z

@mangecoeur need you to rebase on master
(you don't need to squash), just rebase

https://github.com/pydata/pandas/wiki/Using-Git

if you are having issues, ping back

TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings

Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH #4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH #4163 added version added coment ENH #4163 added depreciation warning for tquery and uquery ENH #4163 Documentation and tests

…e date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH #4163 Fixed class renaming, expanded docs ENH #4163 Fixed tests in legacy mode

Conflicts: ci/requirements-2.6.txt doc/source/io.rst pandas/io/sql.py pandas/io/tests/test_sql.py

mangecoeur · 2014-02-06T17:31:46Z

@jreback done, waiting for travis builds had to fix a py3.3 compat thing, should be ok

jreback · 2014-02-06T18:32:50Z

once you pass lmk; I can fix the rebase (e.g. you shouldn't have a merge of master branch in their).....

mangecoeur · 2014-02-06T18:33:47Z

Ok will do, i had trouble getting the rebase to work, loads of merge conflicts...

mangecoeur · 2014-02-06T19:12:04Z

@jreback Travis builds passing, enjoy!

jreback · 2014-02-06T20:30:51Z

@mangecoeur take a look see at this rebased branch.

looks ok? (and the tests all pass, but skipping some, though I see that yours did too).

jreback@f47f9fe

[goat-jreback-~/pandas] nosetests pandas/io/tests/test_sql.py
SSSSSSSSSSSSSSSSSSSSSSSSS.....................................
----------------------------------------------------------------------
Ran 62 tests in 0.842s

OK (SKIP=25)

hayd · 2014-02-06T20:53:26Z

@jreback perhaps needs to add pymysql to travis (which is the only mysql driver used currently in the tests..)

jorisvandenbossche · 2014-02-06T20:55:35Z

@jreback Did you squash it down to one commit? Because it would maybe be nice to keep the commits of mangeceour to give him the credit

jreback · 2014-02-06T20:58:07Z

I agree
prob was rebase was problematic

let me do it again
and I'll leave all the sub commits in place
8-10 or so

course if u would like to rebase it
go ahead!

jreback · 2014-02-06T21:35:56Z

merged via c7e3058

thanks!

jreback · 2014-02-06T21:37:46Z

ok.....if someone would form a new / consolidate SQL page with the todos:

obviously more testing with postgres / MySQL

and need to skip if the module is installed but no server is up (e.g. installing pymsql but not running it shows an Operationalerror)

jorisvandenbossche · 2014-02-06T21:42:14Z

@jreback I will open a new 'meta' issue with a to do list (or better open seperate issues for the different tasks/bugs?)

And there are still some merge problems to clean up in the docs, see eg c7e3058#diff-bb262b5e18f651a4357476b250d4b255R3111

jreback · 2014-02-06T21:46:10Z

@jorisvandenbossche that would be gr8

rebase wasn't perfect...prob did miss some doc stuff.....go ahead and fix!

@hayd yep...that's next on tap making travis test everything

pls consolidate / close SQL issues as needed (as he merge didn't do it because I pushed to master)

jorisvandenbossche · 2014-02-06T22:17:17Z

Fixed the merge conflicts in the docs: db2f6d9

@mangecoeur Maybe you should also have a look to check that all your latest changes are correctly commited to master.

jorisvandenbossche · 2014-02-06T22:31:08Z

And opened a new issue #6292 for the follow-up.

mangecoeur mentioned this pull request Jan 15, 2014

ENH: sql support #4163

Closed

20 tasks

jreback reviewed Jan 15, 2014
View reviewed changes

ghost mentioned this pull request Jan 18, 2014

Sql alchemy tests #4475

Closed

danielballan and others added 3 commits January 23, 2014 15:47

ghost mentioned this pull request Jan 23, 2014

io.rst needs updating to reflect api sql API methods introduced in 0.13 #5936

Closed

jreback added the SQL label Feb 6, 2014

ENH #4163 Fix bug in index + parse date interaction, added test case …

32b493a

…for problem

danielballan and others added 5 commits February 6, 2014 17:20

Merge branch 'master' of https://github.com/mangecoeur/pandas

fbc14fa

Conflicts: ci/requirements-2.6.txt doc/source/io.rst pandas/io/sql.py pandas/io/tests/test_sql.py

ENH #4163 Fixed missing basestring import for py3.3 compat

9a1972a

ENH #4163 Fixed missing string_types import for py3.3 compat

527fe56

jreback closed this Feb 6, 2014

jorisvandenbossche mentioned this pull request Feb 6, 2014

ENH: sql support with SQLAlchemy - follow-up #6292

Closed

17 tasks

This was referenced Feb 7, 2014

SQLAlchemy in io.sql to manage different SQL dialects #2717

Closed

API: user api for new sql functionality #6300

Closed

ENH: sql support via SQLAlchemy, with legacy fallback #5950

ENH: sql support via SQLAlchemy, with legacy fallback #5950

Conversation

mangecoeur commented Jan 15, 2014

jreback commented Jan 15, 2014

jreback Jan 15, 2014

Choose a reason for hiding this comment

jreback commented Jan 15, 2014

ghost commented Jan 15, 2014

ghost commented Jan 15, 2014

jreback commented Jan 15, 2014

mangecoeur commented Jan 16, 2014

ghost commented Jan 20, 2014

mangecoeur commented Jan 20, 2014

ghost commented Jan 20, 2014

jreback commented Jan 23, 2014

jreback commented Jan 23, 2014

danielballan commented Jan 23, 2014

jreback commented Jan 23, 2014

ghost commented Jan 23, 2014

jreback commented Jan 23, 2014

ghost commented Jan 23, 2014

danielballan commented Jan 23, 2014

jreback commented Jan 23, 2014

ghost commented Jan 23, 2014

mangecoeur commented Jan 23, 2014

jorisvandenbossche commented Jan 23, 2014

jreback commented Feb 6, 2014

mangecoeur commented Feb 6, 2014

mangecoeur commented Feb 6, 2014

jreback commented Feb 6, 2014

mangecoeur commented Feb 6, 2014

jreback commented Feb 6, 2014

mangecoeur commented Feb 6, 2014

mangecoeur commented Feb 6, 2014

jreback commented Feb 6, 2014

hayd commented Feb 6, 2014

jorisvandenbossche commented Feb 6, 2014

jreback commented Feb 6, 2014

jreback commented Feb 6, 2014

jreback commented Feb 6, 2014

jorisvandenbossche commented Feb 6, 2014

jreback commented Feb 6, 2014

jorisvandenbossche commented Feb 6, 2014

jorisvandenbossche commented Feb 6, 2014