Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add parameter to download BigQuery results with the BigQuery Storage API #26104

Merged
merged 1 commit into from Apr 20, 2019

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Apr 15, 2019

pandas-gbq 0.10.0 adds a new use_bqstorage_api parameter to speed up downloads of large dataframes.

  • closes #xxxx
  • tests added / passed
$ pytest pandas/tests/io/test_gbq.py
=============================== test session starts ================================
platform darwin -- Python 3.6.4, pytest-4.4.1, py-1.5.3, pluggy-0.9.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/swast/src/pandas/pandas/.hypothesis/examples')
rootdir: /Users/swast/src/pandas/pandas, inifile: setup.cfg
plugins: xdist-1.22.2, forked-0.2, cov-2.5.1, hypothesis-3.70.3
collected 1 item                                                                   

pandas/tests/io/test_gbq.py .                                                [100%]

============================= 1 passed in 9.84 seconds =============================
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@tswast tswast marked this pull request as ready for review April 15, 2019 21:25
@@ -191,6 +191,8 @@ Optional libraries below the lowest tested version may still work, but are not c
+-----------------+-----------------+
| openpyxl | 2.4.0 |
+-----------------+-----------------+
| pandas-gbq | 0.10.0 |
+-----------------+-----------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we requiring this? shouldn’t the older versions still work ok? this is very disruptive for users to have to upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that older versions don't know about use_bqstorage_api. I've updated the PR to populate a dictionary for new and deprecated kwargs only if they are explicitly set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course but the point is that we cannot break prior installs

so you need to make things back compatible in pandas-gbq itself

we are not going to constantly upgrade the min supported version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is backwards compatible so long as we do as I've done in 9581ddf

Or do you mean old versions of pandas-gbq should accept and ignore arbitrary kwargs?

@codecov
Copy link

codecov bot commented Apr 15, 2019

Codecov Report

Merging #26104 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26104      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52384    52387       +3     
==========================================
- Hits        48189    48188       -1     
- Misses       4195     4199       +4
Flag Coverage Δ
#multiple 90.53% <100%> (ø) ⬆️
#single 40.73% <0%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <100%> (-8.56%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b60ebb...c08c7b6. Read the comment docs.

@tswast tswast changed the title ENH: Update pandas-gbq to 0.10.0. ENH: Add parameter to download BigQuery results with the BigQuery Storage API Apr 17, 2019
…rage API

Adds new `use_bqstorage_api` parameter to `read_gbq`. This can speed up
downloads of large data frames.
@tswast
Copy link
Contributor Author

tswast commented Apr 17, 2019

@jreback I've updated the PR to only populate the new kwargs if they are explicitly set. This should prevent forced upgrades unless someone wants to use the new feature. Ready for another review.

@tswast
Copy link
Contributor Author

tswast commented Apr 19, 2019

All tests and linters are passing. Ready for another review.

@jreback
Copy link
Contributor

jreback commented Apr 19, 2019

make sure we have a ci run for both oldest supported version and newest

@tswast
Copy link
Contributor Author

tswast commented Apr 19, 2019

The GBQ tests are running on Travis.

@jreback jreback added this to the 0.25.0 milestone Apr 20, 2019
@jreback jreback merged commit 7706741 into pandas-dev:master Apr 20, 2019
@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

thanks @tswast

yhaque1213 pushed a commit to yhaque1213/pandas that referenced this pull request Apr 22, 2019
…rage API (pandas-dev#26104)

Adds new `use_bqstorage_api` parameter to `read_gbq`. This can speed up
downloads of large data frames.
ryanreh99 pushed a commit to ryanreh99/pandas that referenced this pull request Apr 22, 2019
…rage API (pandas-dev#26104)

Adds new `use_bqstorage_api` parameter to `read_gbq`. This can speed up
downloads of large data frames.
@tswast tswast deleted the pandas-gbq-0.10.0 branch April 22, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants