Conditional Database Import / Docos #692

jlaura · 2015-09-10T04:17:00Z

Two parter, so feel free to request that I break apart.

Trivial fixes to ESDA documentation.
I added a database driver to the current FileIO module. This uses a conditional import (in core/IOHandlers/init.py) and depends on both sqlalchemy (ships with anaconda) and geomet.

The main reason for the PR is to continue the conversation re: FileIO / Database access / Conditional imports.

To test (conda install gdal - only to get the test data generated, not necessary for the PR):

#Slurp all the pysal example files into a single sqlite database
import glob
import subprocess
psexamples = glob.glob('/Users/jay/github/pysal/pysal/examples/*/*.shp')
for i, s in enumerate(psexamples):
    cmd = 'ogr2ogr -f SQLite -append /Users/jay/Desktop/psexamples.sqlite {}'.format(s)
    subprocess.check_call(cmd.split())

Then

import pysal as ps
db = ps.open('sqlite:///psexamples.db') # /// is a relative sqlite open and //// is an absolute path
db.tables #property to list available tables

# List of tables printed to screen 

virginia_geojson = ps.read('virginia', attributes=True)  #Defaults to attributes=False, or just the

Some things to discuss (maybe in a hangout?)

FileIO looks like it would be much more streamlined using ABC. Having said that, it works and adding a new driver is trivial once the learning curve is stumbled up.
FileIO docos claim to require read, seek, and next methods, but a DB is not a file object. How can the interface be either generalized or the differences obfuscated (the current hack feels like a hack, so zero love from here)
Conditional import a la @ljwolf suggestion via gitter. Thoughts? It is throwing a proper error on your side if you do not have Geomet?
What about passing an SQL statement to the reader - totally possible - but do we want to support that?
The user is required to know the input data type with respect to identifying sqlite:/// (or postgresql with username/password/port). Do we want to try and abstract that from the user or just accept that this is the way SQLAlchemy manages interfacing with databases?

ljwolf · 2015-09-10T06:35:50Z

This looks awesome! Cloning now.

What you've put are great starting points for a discussion. That said, I may be a bit more outspoken about FileIO...

This is a great implementation of a db interface leaning on the current mature standard for database interaction in python. Awesome. IMHO, we should extend this approach: FileIO started before a "good" python wrapper around OGR IO stuff. Moving forward, we should take advantage of new IO packages with soft dependencies while providing ways to access old behavior. Our comparative advantage is analytics, not feature-complete python clones of OGR. Using SQLalchemy here is definitely 👍
True... also applies more broadly to streams of GeoJSON/semistructured data. No clue how to do this. Ties to 4 quite strongly.
freaking awesome, exactly how this should work @ module level! Now, how can we get nose to acknowledge this and avoid running tests on disabled modules?
So, if the connection is opened as conn, then, a conn.read(<SQL>) would just pass the query string and kwargs down to the sqlalchemy engine? Seems fine in theory.
I've seen some solutions to make it simple to connect to databases (and to avoid using plaintext passwords in demo code), but they involved a persistent local store of hashed credentials, and a set of "accessor" functions to connect to a database using the stored creds. So, like pysal.db.add_database('USCensus', 'username', 'password', 'host', 'port', 'protocol')) would store and pysal.db.connect('USCensus') would attach. Not the best, but definitely should be discussed. I have code to do this laying around if a prototype is wanted.

jlaura · 2015-09-10T14:21:33Z

For 3, I think that the solution could be as easy as using a unittest.skipif() decorator.
https://docs.python.org/2/library/unittest.html#skipping-tests-and-expected-failures

So perhaps something like:

test_db.py

import unittest
from .. import db

try:
    import sqlalchemy
    import geomet
    missing = False
except:
    logger.debug('some message')
    missing = True

class TestDB(unittest.testcase):

    @unittest.skipif(missing == True)
    def setUp(self):
        #Fire up the db connection

    @unittest.skipif(missing == True)
    def test_listtables(self):
        #Test table listing

Then it looks like nose has support for printing a 'S' for skipped tests. http://nose.readthedocs.org/en/latest/plugins/skip.html

As an aside - checkout some of the other decorators and the ability to roll your own unittest decorators. Not something I have thought about before, but super nice to see that these exist.

For 4:
That would be the idea. I will extend the class to support that. At the same time, I need to explore SQLAlchemy a bit more to see how much metadata I can get off of a mapped table. For example, it would be a nice bit of functionality to print table metadata via a mechanism similar to the .tables attribute.

ljwolf · 2015-09-10T19:43:54Z

Okay, cool. @TaylorOshan and I were dealing with some of the same issues in skipping tests based on conditions while trying to make moving contrib/spint into core work correctly, with testing and conditional import.

We should write condtional import stuff into the styleguide, so that it's consistent across the different sets of potential soft dependencies.

jlaura · 2015-09-11T14:25:41Z

So I played around a bit more with FileIO and how this might look:

If a class base architecture with strict interface enforcement is the desired approach, I think using the ABC module to define an abstract base class and then a dispatcher similar to the current implementation might work. That should be more explict than the existing implementation, but follow the same model.

This maintains the fragility in guessing input type by extension, but enforces a consistent interface. From a user perspective, the development of drivers that are roughly similar to those that already exist is quite straight forward. Additionally, the user can expect that ps.open will succeed in opening the file. Unforuntaley, this approach introduces significant difficulty when the input format does not conform so well to the interface, e.g. the current example of a file based approach interfacing with a database. I am also not 100% sure how a dispatacher might look without a metaclass. ABC enforces structure, but does not help with dispatching.

Looking to pandas, the burden of input type identification is pushed to the user, e.g. pandas.from_json(). This allows the IO code structure to be significantly simpler. Looking at the excel driver, a class based approach is taken. Moving to the sql driver, the driver is organized as a module of functions. This approach support freedom for the developer and freedom in defining the interface, i.e. sql driver can focus on treating the database as a database. Pandas is, in some cases, defining writers using ABC, but this is inconsistent. This works for pandas because all the drivers create the same set of internal data representations (data frame, series).

Some questions that we need to answer:

Where does the burden of driver selection lie? I see this as a balancing act between fragility and user responsability, but perhaps that is not the best approach?
Given the range of drivers, does it make sense to try and strictly enforce a uniform interface? Consider .shp, .gal, and .sqlite as three potential divergent formats.

Looking at FileIO.py, the documentation at the top suggests that all driers will support read, seek, and next. Take a look at the GAL driver - seek is just a place holder since seeking makes little sense and next does not exist. This suggests that the defined interface is not representative of the interface we need. (Or that ABC should be used to throw an error that we do not conform to our own docos).

Am I missing some other essential issue?

ljwolf · 2016-01-14T17:15:23Z

While we probably want to finally finish this (& weights construction stuff), I think we might benefit from:

Moving the discussion of improving/restructuring of FileIO around plug-and-play drivers to a "release + ∞" issue. I think it requires quite a bit of sub-issuing to make attainable.
Inaugurating an "OPTIONAL" tag for soft dependency introductions that are intended for core.
Merging this PR as "OPT: Database driver in FileIO" OR redirecting this to contrib

Thoughts?

@dfolch

adding @dfolch ckdtree work 🎉

adding cartodb to projects

fixing KDTree/cKDTree changes to API & adding ckdtree to Arc_KDTree

adding people to travis email list

add spint to contrib_docs

Dev

…n the standard normal approximation. Means and standard errors were taken over all local permutations rather than specific to each local value.

Local Moran was using the incorrect moments in z_sim and p_z_sim

Doc/rolling

add changes to make doctests in weights submodule pass

coveralls · 2016-04-28T17:13:05Z

Changes Unknown when pulling e1b54ee on jlaura:master into * on pysal:master*.

ljwolf · 2016-07-16T22:54:11Z

@jlaura would it be alright to cherrypick your commits from this & redirect to dev? I've rebased this in ljwolf/pysal/pr692 and in #841

[REBASE & REDIRECT] Conditional Database Imports & Docos, #692

jlaura · 2016-07-17T15:55:28Z

👍

On Saturday, July 16, 2016, Levi John Wolf notifications@github.com wrote:

@jlaura https://github.com/jlaura would it be alright to cherrypick
your commits from this & redirect to dev?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#692 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACAgfMogfa8jHQokVhJVbrtC4h732yljks5qWWEUgaJpZM4F6va2
.

ljwolf · 2016-07-17T20:26:56Z

closed by #841

jlaura added 8 commits August 31, 2015 14:21

getis doc updates

aabd01e

doc updates to mapclassify

97001f9

Merge branch 'master' of https://github.com/jlaura/pysal

e20c1b4

merge upstream

1e6d778

Added a database IO driver based on SQLAlchemy and Geomet

e870fba

upstream

9d85c8b

removed printing from FileIO

d70a277

fixed typo in docos

0c55f47

sjsrey added the Discussion label Sep 10, 2015

levi.john.wolf@gmail.com and others added 16 commits April 1, 2016 09:51

add isKDTree typecomparison to handle divergent cKDTree and KDTree types

c2329ad

adding ckdtree

b54d0ef

add spint to contrib_docs

6c3ab6b

adding people to travis email list

ede9d5c

Merge pull request pysal#11 from dfolch/kdtree

96dc347

adding @dfolch ckdtree work 🎉

fixing travis email syntax

8e473af

adding 3.5 to travis

f6560ca

adding one more person to travis

0c0e043

adding yet another person to travis

5bb1725

adding cartodb

e83c706

Merge pull request pysal#788 from CartoDB/add-cartodb-to-projects

9d7a8a9

adding cartodb to projects

Merge pull request pysal#786 from ljwolf/kdtree

65a8812

fixing KDTree/cKDTree changes to API & adding ckdtree to Arc_KDTree

Merge pull request pysal#785 from dfolch/dev

f819b3c

adding people to travis email list

Merge pull request pysal#784 from TaylorOshan/contrib_docs

c26e770

add spint to contrib_docs

Merge branch 'master' into dev

7eb6361

clean up release instructions

c054473

sjsrey and others added 12 commits April 6, 2016 15:10

moving dev to 1.11.2dev

8f427d5

Merge pull request pysal#789 from sjsrey/dev

0d68f1a

Dev

adding note on support for Python-3 pysal#787

d608613

Local Moran was using the incorrect moments for the pseudo p-values i…

f1d9086

…n the standard normal approximation. Means and standard errors were taken over all local permutations rather than specific to each local value.

Merge pull request pysal#792 from sjsrey/b/lisaMoments

564047c

Local Moran was using the incorrect moments in z_sim and p_z_sim

Merge pull request pysal#791 from sjsrey/doc/rolling

da3adce

Doc/rolling

Getting weights doctests to pass

22516c4

Merge pull request pysal#793 from sjsrey/e/id

ccc0fa2

add changes to make doctests in weights submodule pass

fix for css problem on rtd pysal#790

818708a

fix for css problem on rtd pysal#790 @sjsrey

29b289b

Merge remote-tracking branch 'upstream/dev'

cd8fdad

Adding coveralls back in to the travis reqs.

e1b54ee

ljwolf mentioned this pull request Jul 16, 2016

[REBASE & REDIRECT] Conditional Database Imports & Docos, #692 #841

Merged

sjsrey added a commit that referenced this pull request Jul 16, 2016

Merge pull request #841 from ljwolf/pr692

d8bb749

[REBASE & REDIRECT] Conditional Database Imports & Docos, #692

ljwolf closed this Jul 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional Database Import / Docos #692

Conditional Database Import / Docos #692

jlaura commented Sep 10, 2015

ljwolf commented Sep 10, 2015

jlaura commented Sep 10, 2015

ljwolf commented Sep 10, 2015

jlaura commented Sep 11, 2015

ljwolf commented Jan 14, 2016

coveralls commented Apr 28, 2016 •

edited

Loading

ljwolf commented Jul 16, 2016 •

edited

Loading

jlaura commented Jul 17, 2016

ljwolf commented Jul 17, 2016

Conditional Database Import / Docos #692

Conditional Database Import / Docos #692

Conversation

jlaura commented Sep 10, 2015

ljwolf commented Sep 10, 2015

jlaura commented Sep 10, 2015

ljwolf commented Sep 10, 2015

jlaura commented Sep 11, 2015

ljwolf commented Jan 14, 2016

coveralls commented Apr 28, 2016 • edited Loading

ljwolf commented Jul 16, 2016 • edited Loading

jlaura commented Jul 17, 2016

ljwolf commented Jul 17, 2016

coveralls commented Apr 28, 2016 •

edited

Loading

ljwolf commented Jul 16, 2016 •

edited

Loading