Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add parameter to download BigQuery results with the BigQuery Storage API #26104

Merged
merged 1 commit into from Apr 20, 2019

Conversation

@tswast
Copy link
Contributor

commented Apr 15, 2019

pandas-gbq 0.10.0 adds a new use_bqstorage_api parameter to speed up downloads of large dataframes.

  • closes #xxxx
  • tests added / passed
$ pytest pandas/tests/io/test_gbq.py
=============================== test session starts ================================
platform darwin -- Python 3.6.4, pytest-4.4.1, py-1.5.3, pluggy-0.9.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/swast/src/pandas/pandas/.hypothesis/examples')
rootdir: /Users/swast/src/pandas/pandas, inifile: setup.cfg
plugins: xdist-1.22.2, forked-0.2, cov-2.5.1, hypothesis-3.70.3
collected 1 item                                                                   

pandas/tests/io/test_gbq.py .                                                [100%]

============================= 1 passed in 9.84 seconds =============================
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@tswast tswast force-pushed the tswast:pandas-gbq-0.10.0 branch from 454cd4e to 31bb4a4 Apr 15, 2019

@tswast tswast marked this pull request as ready for review Apr 15, 2019

@@ -191,6 +191,8 @@ Optional libraries below the lowest tested version may still work, but are not c
+-----------------+-----------------+
| openpyxl | 2.4.0 |
+-----------------+-----------------+
| pandas-gbq | 0.10.0 |
+-----------------+-----------------+

This comment has been minimized.

Copy link
@jreback

jreback Apr 15, 2019

Contributor

why are we requiring this? shouldn’t the older versions still work ok? this is very disruptive for users to have to upgrade

This comment has been minimized.

Copy link
@tswast

tswast Apr 15, 2019

Author Contributor

The problem was that older versions don't know about use_bqstorage_api. I've updated the PR to populate a dictionary for new and deprecated kwargs only if they are explicitly set.

This comment has been minimized.

Copy link
@jreback

jreback Apr 15, 2019

Contributor

of course but the point is that we cannot break prior installs

so you need to make things back compatible in pandas-gbq itself

we are not going to constantly upgrade the min supported version

This comment has been minimized.

Copy link
@tswast

tswast Apr 15, 2019

Author Contributor

It is backwards compatible so long as we do as I've done in 9581ddf

Or do you mean old versions of pandas-gbq should accept and ignore arbitrary kwargs?

@codecov

This comment has been minimized.

Copy link

commented Apr 15, 2019

Codecov Report

Merging #26104 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26104      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52384    52387       +3     
==========================================
- Hits        48189    48188       -1     
- Misses       4195     4199       +4
Flag Coverage Δ
#multiple 90.53% <100%> (ø) ⬆️
#single 40.73% <0%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <100%> (-8.56%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b60ebb...c08c7b6. Read the comment docs.

@tswast tswast changed the title ENH: Update pandas-gbq to 0.10.0. ENH: Add parameter to download BigQuery results with the BigQuery Storage API Apr 17, 2019

ENH: Add parameter to download BigQuery results with the BigQuery Sto…
…rage API

Adds new `use_bqstorage_api` parameter to `read_gbq`. This can speed up
downloads of large data frames.

@tswast tswast force-pushed the tswast:pandas-gbq-0.10.0 branch from 6786682 to c08c7b6 Apr 17, 2019

@tswast

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

@jreback I've updated the PR to only populate the new kwargs if they are explicitly set. This should prevent forced upgrades unless someone wants to use the new feature. Ready for another review.

@tswast

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

All tests and linters are passing. Ready for another review.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

make sure we have a ci run for both oldest supported version and newest

@tswast

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

The GBQ tests are running on Travis.

@jreback jreback added this to the 0.25.0 milestone Apr 20, 2019

@jreback jreback merged commit 7706741 into pandas-dev:master Apr 20, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190417.25 succeeded
Details
pandas-dev.pandas (Checks_and_doc) Checks_and_doc succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

thanks @tswast

yhaque1213 added a commit to yhaque1213/pandas that referenced this pull request Apr 22, 2019

ENH: Add parameter to download BigQuery results with the BigQuery Sto…
…rage API (pandas-dev#26104)

Adds new `use_bqstorage_api` parameter to `read_gbq`. This can speed up
downloads of large data frames.

ryanreh99 added a commit to ryanreh99/pandas that referenced this pull request Apr 22, 2019

ENH: Add parameter to download BigQuery results with the BigQuery Sto…
…rage API (pandas-dev#26104)

Adds new `use_bqstorage_api` parameter to `read_gbq`. This can speed up
downloads of large data frames.

@tswast tswast deleted the tswast:pandas-gbq-0.10.0 branch Apr 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.