pandas.io.gbq Version 2 #6937

jacobschaer · 2014-04-23T03:24:11Z

closes #5840 (as new interface makes it obsolete)
closes #6096

@jreback : We still have some documentation to work on, but we would like your initial thoughts on what we have so far. The key change for this version is the removal of bq.py as a dependency (except as a setup method for a test case). Instead, we rely entirely on the BigQuery python API. We also simplified to_gbq() significantly. Though it cost a few features, the code is much more manageable and less error prone (thanks Google!). Test cases are much more granular and run significantly faster. To use the test cases fully, a BigQuery project_id is still required, though there are some unittests offline.

jreback · 2014-04-23T05:12:29Z

can u rebase on master?

also flip github switch for Travis

jacobschaer · 2014-04-23T07:07:13Z

I rebased to master, and Travis was turned on. I'm still working to get Travis passing - mostly just a matter of getting the right dependencies in place and being sure integration tests are skipped.

jreback · 2014-04-23T12:49:42Z

ci/requirements-2.6.txt

@@ -4,8 +4,10 @@ python-dateutil==1.5
 pytz==2013b
 http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz
 html5lib==1.0b2
-bigquery==2.0.17


don;'t you still need the bigquery package so that bq is installed? (or is that in the google-api-python-client), what a horrible package name google!

also if requirements are changing, pls update install.rst as well

bigquery is only required for the to_gbq() test suite, which can't be run in CI anyways due to lack of valid project id. Will update install.rst soon.

jreback · 2014-04-23T12:56:14Z

this going to close #5840 ?

jacobschaer · 2014-04-23T17:50:01Z

@jreback : I believe #5840 was no longer an issue due to a backend change with bigquery. However, it's no longer an issue with our code, since we track the page tokens we've seen so far and ensure we don't get a duplicate.

jacobschaer · 2014-04-23T17:54:27Z

pandas/io/tests/test_gbq.py

+        self.assertEqual(len(df.drop_duplicates()), 200005)
+
+@unittest.skipIf(missing_bq(), "Cannot run to_gbq tests without bq command line client")
+@unittest.skipIf(PROJECT_ID is None, "Cannot run integration tests without a project id")


These @unittest.skipIf seem to cause backward-compatibility issues... Perhaps I should switch all these over to whatever the nose equivalent is?

pandas doesn't implement (properly);

make these regular tests and just do a raise nose.SkipTest(slipping message) when u skip
they are indicated by travis at the very end (do u can see that u skipped them)

jreback · 2014-04-28T23:37:24Z

@jacobschaer you are doing small doc change to close #6096 as well?

jreback · 2014-04-28T23:37:41Z

@jacobschaer like to get this in soon...how's coming?

jreback · 2014-05-01T14:16:47Z

@jacobschaer how's this coming?

jreback · 2014-06-03T23:46:54Z

@jacobschaer pls rebase on master

jacobschaer · 2014-06-04T05:34:09Z

OK, all rebased and mostly squashed. Were there any documentation things we're missing?

jorisvandenbossche · 2014-06-04T08:08:43Z

doc/source/io.rst

-into BigQuery and pull it into a DataFrame.
+As an example, suppose you want to load all data from an existing BigQuery 
+table : `test_dataset.test_table` into a DataFrame using the :func:`~pandas.io.read_gbq` 
+function.

 .. code-block:: python

   from pandas.io import gbq


I think read_gbq is already available as a top-level function. So also use it here as such?

jreback · 2014-06-05T16:45:01Z

@jacobschaer

the 3.4 and numpy-dev builds are failing because they are not installing httplib2 which defacto is a requirement of gbq now. those tests should be skipping (because of the import failure).

those builds need to pass (they are experimental just because the build matrix comes back faster)

jacobschaer · 2014-06-08T22:19:46Z

@jreback Seems 3.4 and numpy-dev builds now pass.

jreback · 2014-06-09T13:24:48Z

doc/source/v0.14.1.txt

+- ``io.gbq.read_gbq`` and ``io.gbq.to_gbq`` were refactored to remove the
+  dependency on the Google ``bq.py`` command line client. This submodule 
+  now uses ``httplib2`` and the Google ``apiclient`` and ``oauth2client`` API client 
+  libraries which should be more stable and, therefore, reliable than 


reference the issue number here (this PR number is ok)

put this in the API changes section as well

or in an 'experimental' (sub)section? (as it was and still is tagged as experimental?)

yes...put that in the experimental section (down a little on the page in v0.14.1.txt)

Moved this to the experimental section and added issue number with this PR number.

jreback · 2014-06-19T11:42:52Z

@jacobschaer pls squash down to a small number of commits (and remove merge branches), and rebase on master

jreback · 2014-06-26T13:12:33Z

@jacobschaer can you rebase and squash?

…ng except unit testing. Minor API changes were also introduced.

jacobschaer · 2014-06-27T04:19:07Z

All rebased and squashed.

pandas.io.gbq Version 2

jreback · 2014-06-30T19:31:19Z

@jacobschaer thanks for this!

pls review the docs when they are built in case a followup is needed. http://pandas-docs.github.io/pandas-docs-travis/

jreback · 2014-07-08T13:11:14Z

@jacobschaer If you have a chance. Can you break the big paragraph into say bullet points:

Finally, you can append data to a BigQuery table from a pandas DataFrame using the to_gbq() 
function. This function uses the Google streaming API which requires that your destination table exists in
 BigQuery. Given the BigQuery table already exists, your DataFrame should match the destination table 
in column order, structure, and data types. DataFrame indexes are not supported. By default, rows are 
streamed to BigQuery in chunks of 10,000 rows, but you can pass other chuck values via the chunksize 
argument. You can also see the progess of your post via the verbose flag which defaults to True. The 
http response code of Google BigQuery can be successful (200) even if the append failed. For this 
reason, if there is a failure to append to the table, the complete error response from BigQuery is 
returned which can be quite long given it provides a status for each row. You may want to start with 
smaller chuncks to test that the size and types of your dataframe match your destination table to make 
debugging simpler.

http://pandas-docs.github.io/pandas-docs-travis/io.html#io-bigquery

jreback added the Google I/O label Apr 23, 2014

jreback added this to the 0.14.0 milestone Apr 23, 2014

jreback reviewed Apr 23, 2014
View reviewed changes

jacobschaer reviewed Apr 23, 2014
View reviewed changes

jreback modified the milestones: 0.14.1, 0.14.0 May 1, 2014

jorisvandenbossche reviewed Jun 4, 2014
View reviewed changes

jreback reviewed Jun 9, 2014
View reviewed changes

Redesigned pandas.io.gbq to remove bq.py as a dependency for everythi…

56caedb

…ng except unit testing. Minor API changes were also introduced.

Removing Python2.6 CI dependencies to test import failures

dd5b6e9

jreback added a commit that referenced this pull request Jun 30, 2014

Merge pull request #6937 from jacobschaer/gbqv2

5a07b7b

pandas.io.gbq Version 2

jreback merged commit 5a07b7b into pandas-dev:master Jun 30, 2014

ghost mentioned this pull request Aug 17, 2015

to_gbq: Allow creation of new tables from DataFrame (and generate schema) #8325

Closed

parthea mentioned this pull request Oct 19, 2015

pandas.io.gbq verify_schema seems to be too strict. #11359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.io.gbq Version 2 #6937

pandas.io.gbq Version 2 #6937

jacobschaer commented Apr 23, 2014

jreback commented Apr 23, 2014

jacobschaer commented Apr 23, 2014

jreback Apr 23, 2014

jreback Apr 23, 2014

jacobschaer Apr 23, 2014

jreback commented Apr 23, 2014

jacobschaer commented Apr 23, 2014

jacobschaer Apr 23, 2014

jreback Apr 23, 2014

jreback commented Apr 28, 2014

jreback commented Apr 28, 2014

jreback commented May 1, 2014

jreback commented Jun 3, 2014

jacobschaer commented Jun 4, 2014

jorisvandenbossche Jun 4, 2014

jreback commented Jun 5, 2014

jacobschaer commented Jun 8, 2014

jreback Jun 9, 2014

jreback Jun 9, 2014

jorisvandenbossche Jun 9, 2014

jreback Jun 9, 2014

azbones Jun 9, 2014

jreback commented Jun 19, 2014

jreback commented Jun 26, 2014

jacobschaer commented Jun 27, 2014

jreback commented Jun 30, 2014

jreback commented Jul 8, 2014

pandas.io.gbq Version 2 #6937

pandas.io.gbq Version 2 #6937

Conversation

jacobschaer commented Apr 23, 2014

jreback commented Apr 23, 2014

jacobschaer commented Apr 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 23, 2014

jacobschaer commented Apr 23, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 28, 2014

jreback commented Apr 28, 2014

jreback commented May 1, 2014

jreback commented Jun 3, 2014

jacobschaer commented Jun 4, 2014

Choose a reason for hiding this comment

jreback commented Jun 5, 2014

jacobschaer commented Jun 8, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 19, 2014

jreback commented Jun 26, 2014

jacobschaer commented Jun 27, 2014

jreback commented Jun 30, 2014

jreback commented Jul 8, 2014