Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: sql support via SQLAlchemy, with legacy fallback #5950

Closed
wants to merge 14 commits into from
Closed

ENH: sql support via SQLAlchemy, with legacy fallback #5950

wants to merge 14 commits into from

Conversation

mangecoeur
Copy link
Contributor

Code, tests, and docs for an updated io.sql module which uses SQLAlchemy as a DB abstraction layer. A legacy fallback is provided for SQLite and MySQL. The api should be backwards compatible, with depreciated names marked as such.

@mangecoeur mangecoeur mentioned this pull request Jan 15, 2014
20 tasks
@jreback
Copy link
Contributor

jreback commented Jan 15, 2014

@mangecoeur

need to rebase away the merged branchs (makes history cleaner)

do this:

git fetch upstream
git rebase -i upstream/master

delete the tracked branch

git push upstream thisbranch -f

can also combine/squash branch as you see fit (you can do this later if you want)...post comments

https://github.com/pydata/pandas/wiki/Using-Git


For example, suppose you want to query some data with different types from a
table such as:
If SQLAlchemy is not installed a legacy fallback is provided for sqlite and mysql.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a new in version 0.14 tag (it might make it to 0.13.1...let's see)

@jreback
Copy link
Contributor

jreback commented Jan 15, 2014

@mangecoeur instead of unittest.SkipTest, use nose.SkipTest (these fail in py2.6)

@ghost
Copy link

ghost commented Jan 15, 2014

@jtratner , @hayd you guys were involved at the design stage for this, is there something
we need to be aware of before moving forward based on @mangecoeur 's work? any
remaining issues that need to be ironed out before we commit to an implementation?

@ghost
Copy link

ghost commented Jan 15, 2014

@mangecoeur , please update the optional dependencies secion of the README and
set a reasonable minimum on the supported version of SQLAlchemy. Nothing that would
force most users to upgrade nor get us tangled in a legacy whack-a-mole, if possible.
if 0.9.1 is not it, One of the builds should test with the minimum version.

@jreback, should we ask for a @Sql decorator on these tests, what do you think?

@jreback
Copy link
Contributor

jreback commented Jan 15, 2014

also pls edit install.rst with the same requirements

u may want to test with an older version of sqlalchemy that is popular
(and make one of the requirements. for that)

@mangecoeur
Copy link
Contributor Author

Ok I slept on it, I now agree that automatic datatime conversion needs to be part of this feature.

I will try testing with SQLAlchemy 0.8 , I think that's probably the most widespread version. I don't anticipate any issues, their core api is stable.

Please add any more comments/suggestions and I will implement them in the coming week or so.

@ghost ghost mentioned this pull request Jan 18, 2014
@ghost
Copy link

ghost commented Jan 20, 2014

@mangecoeur, please don't merge master into your PR branches.

@mangecoeur
Copy link
Contributor Author

Sorry, didn't realize, was just doing the workflow i was used to... can try
to revert if its a big problem...

On Monday, January 20, 2014, y-p notifications@github.com wrote:

@mangecoeur https://github.com/mangecoeur, please don't merge master
into your PR branches.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5950#issuecomment-32801129
.

@ghost
Copy link

ghost commented Jan 20, 2014

Legitimate, but we're settled on using rebase in pandas.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

ref to #5936

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@mangecoeur can you rebase and squash a bit......

my brief look says that this does not break API's...

thoughts on inclusion in 0.13.1 ?

@y-p @hayd @danielballan

@danielballan
Copy link
Contributor

I don't see any problems, but I'd like to see it sit on master for at least a week (two weeks?) before it's rolled into a release. Just one man's opinion though. How firm is that 6-day timeline on the 0.13.1 milestone?

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@mangecoeur also need to remove all tracking/merge branches

see https://github.com/pydata/pandas/wiki/Using-Git

@ghost
Copy link

ghost commented Jan 23, 2014

This is a big feature, which I would normally associate with a major release.
It hasn't lived in master for a couple of months, which I'd say is required to get it stable.

If you'd like to ship it as experimental in 0.13.1 and if the old code is still working,
No strong objection here.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@danielballan will be at least that long....0.13 really just rolled out...

@ghost
Copy link

ghost commented Jan 23, 2014

The point of quickly releasing 0.13.1 is to ensure a stable version is available (see various
perf regressions fixed) during the longer release cycle of the next major release (0.14), so we
can start merging big features into master.

That's the idea anyway.

@danielballan
Copy link
Contributor

OK, just basing that on the milestone. Then I don't see a problem.

Has anyone done benchmarks on SQLA-mediated connections vs. MySQLdb / sqlite as it's set up now? I volunteer, once it's in master.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@y-p is right here....(I wanted the docs updated)...to reflect 0.13 changes...

maybe can split off the doc updates to satisfy #5936 then roll this to master right after 0.13.1

I am sure that the dtype inference needs some more work and that will happen in 0.14 (hopefully)

@ghost
Copy link

ghost commented Jan 23, 2014

It's become obvious that the RC's really don't get that much extra testing.
Users wait for the official release. So we only get the bug reports right after a release
which we can then address in a point release. hopefully quickly.

I'd love a better solution, do you have a hypnotic ray I can borrow to get people to test our RC's?

danielballan and others added 3 commits January 23, 2014 15:47
TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

TST to use sqlachemy syntax in tests

CLN sql into classes, legacy passes

FIX few engine vs con calls

CLN pep8 cleanup

add postgres support for pandas.io.sql.get_schema

WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon

TODO: renamed _engine_read_table, need to think of a better name.
TODO: clean up get_conneciton function

ENH: cleanup of SQL io

TODO: check that legacy mode works
TODO: run tests

correctly enabled coerce_float option

Cleanup and bug-fixing mainly on legacy mode sql.
IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized.
TODO: tests and doc

Added Test coverage for basic functionality using in-memory SQLite database

Simplified API by automatically distinguishing between engine and connection. Added warnings
Initial draft of doc updates

minor doc updates

Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names

Documentation updates, more tests

Added depreciation warnings for legacy names.

Updated docs and test doc build

ENH #4163 - finalized tests and docs, ready for wider use…

TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3

TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

ENH #4163 added version added coment

ENH #4163 added depreciation warning for tquery and uquery

ENH #4163 Documentation and tests
…e date options. Updated optional dependancies

Added columns optional arg to read_table, removed failing legacy tests.

Added columns to doc

ENH #4163 Fixed class renaming, expanded docs

ENH #4163 Fixed tests in legacy mode
@mangecoeur
Copy link
Contributor Author

Squashed the history into 3 commits with more or less informative messages (not so used to using git this way, but hopefully shouldn't have screwed anything up)

Let me know how to finish this to get this merged to master, it certainly needs to be exercised in the real world to see how people get on, this should generate some new issues for improving in the next release.

On the docs - because they've been changed quite a bit to cover sqlalchemy and the read_table function I don't think it makes sense to split them out as they stand. I feel confident we could push the whole thing to master and see how people get on.

@jorisvandenbossche
Copy link
Member

How do we envisage the user API?
In the docs, at the moment there is from pandas.io import sql and then sql.read_sql/to_sql/read_table

  • What do we want top-level? At the moment there is pd.read_sql and pd.DataFrame.to_sql. Is this to stay? Should it then be used like that in the docs? (but then it contrasts with sql.read_table)
  • What other functions do we regard as public?
    • There were votes against sql.tquery and sql.uquery and this has been removed from the docs and deprecation warnings added , so OK.
    • sql.has_table
    • sql.execute seems also useful? If so, also mention it in the docs?

To finish summary: also deprecated are sql.read_frame and sql.write_frame (they should be removed from api.rst)

@jreback jreback added the SQL label Feb 6, 2014
@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@mangecoeur @jorisvandenbossche

we can merge this as soon as you are ready (at least passes green)...don't worry about extra features atm

@mangecoeur
Copy link
Contributor Author

@jreback @jorisvandenbossche ok just want to fix the bug where specifying the same name in index_col and parse_dates causes an error, then we should be good to go.

@mangecoeur
Copy link
Contributor Author

@jreback @jorisvandenbossche Done. Read to merge when you are. Might need some instructions on how to actually do this...

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@mangecoeur need you to rebase on master
(you don't need to squash), just rebase

https://github.com/pydata/pandas/wiki/Using-Git

if you are having issues, ping back

danielballan and others added 5 commits February 6, 2014 17:20
TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

TST to use sqlachemy syntax in tests

CLN sql into classes, legacy passes

FIX few engine vs con calls

CLN pep8 cleanup

add postgres support for pandas.io.sql.get_schema

WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon

TODO: renamed _engine_read_table, need to think of a better name.
TODO: clean up get_conneciton function

ENH: cleanup of SQL io

TODO: check that legacy mode works
TODO: run tests

correctly enabled coerce_float option

Cleanup and bug-fixing mainly on legacy mode sql.
IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized.
TODO: tests and doc

Added Test coverage for basic functionality using in-memory SQLite database

Simplified API by automatically distinguishing between engine and connection. Added warnings
Initial draft of doc updates

minor doc updates

Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names

Documentation updates, more tests

Added depreciation warnings for legacy names.

Updated docs and test doc build

ENH #4163 - finalized tests and docs, ready for wider use…

TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3

TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

ENH #4163 added version added coment

ENH #4163 added depreciation warning for tquery and uquery

ENH #4163 Documentation and tests
…e date options. Updated optional dependancies

Added columns optional arg to read_table, removed failing legacy tests.

Added columns to doc

ENH #4163 Fixed class renaming, expanded docs

ENH #4163 Fixed tests in legacy mode
Conflicts:
	ci/requirements-2.6.txt
	doc/source/io.rst
	pandas/io/sql.py
	pandas/io/tests/test_sql.py
@mangecoeur
Copy link
Contributor Author

@jreback done, waiting for travis builds had to fix a py3.3 compat thing, should be ok

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

once you pass lmk; I can fix the rebase (e.g. you shouldn't have a merge of master branch in their).....

@mangecoeur
Copy link
Contributor Author

Ok will do, i had trouble getting the rebase to work, loads of merge conflicts...

@mangecoeur
Copy link
Contributor Author

@jreback Travis builds passing, enjoy!

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@mangecoeur take a look see at this rebased branch.

looks ok? (and the tests all pass, but skipping some, though I see that yours did too).

jreback@f47f9fe

[goat-jreback-~/pandas] nosetests pandas/io/tests/test_sql.py
SSSSSSSSSSSSSSSSSSSSSSSSS.....................................
----------------------------------------------------------------------
Ran 62 tests in 0.842s

OK (SKIP=25)

@hayd
Copy link
Contributor

hayd commented Feb 6, 2014

@jreback perhaps needs to add pymysql to travis (which is the only mysql driver used currently in the tests..)

@jorisvandenbossche
Copy link
Member

@jreback Did you squash it down to one commit? Because it would maybe be nice to keep the commits of mangeceour to give him the credit

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

I agree
prob was rebase was problematic

let me do it again
and I'll leave all the sub commits in place
8-10 or so

course if u would like to rebase it
go ahead!

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

merged via c7e3058

thanks!

@jreback jreback closed this Feb 6, 2014
@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

ok.....if someone would form a new / consolidate SQL page with the todos:

obviously more testing with postgres / MySQL

and need to skip if the module is installed but no server is up (e.g. installing pymsql but not running it shows an Operationalerror)

@jorisvandenbossche
Copy link
Member

@jreback I will open a new 'meta' issue with a to do list (or better open seperate issues for the different tasks/bugs?)

And there are still some merge problems to clean up in the docs, see eg c7e3058#diff-bb262b5e18f651a4357476b250d4b255R3111

@jreback
Copy link
Contributor

jreback commented Feb 6, 2014

@jorisvandenbossche that would be gr8

rebase wasn't perfect...prob did miss some doc stuff.....go ahead and fix!

@hayd yep...that's next on tap making travis test everything

pls consolidate / close SQL issues as needed (as he merge didn't do it because I pushed to master)

@jorisvandenbossche
Copy link
Member

Fixed the merge conflicts in the docs: db2f6d9

@mangecoeur Maybe you should also have a look to check that all your latest changes are correctly commited to master.

@jorisvandenbossche
Copy link
Member

And opened a new issue #6292 for the follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants