Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_gbq: Allow creation of new tables from DataFrame (and generate schema) #8325

Closed
jtratner opened this issue Sep 19, 2014 · 5 comments

Comments

Projects
None yet
3 participants
@jtratner
Copy link
Contributor

commented Sep 19, 2014

Small extension on top the to_gbq so that you can actually create new tables given only an existing dataframe. Given an arbitrary DataFrame with a non hierarchical-index, create a schema from it. For now, we'd likely assume that object dtype columns are string and maybe allow for specifying some or all columns for the schema so that int columns with nulls come out correctly (otherwise, they'd be coerced to float columns b/c of nan stuff).

E.g.:

In [6]: import pandas as pd

In [7]: import pandas.util.testing as testing

In [8]: df = testing.makeMixedDataFrame()

In [9]: df
Out[9]:
   A  B     C          D
0  0  0  foo1 2009-01-01
1  1  1  foo2 2009-01-02
2  2  0  foo3 2009-01-05
3  3  1  foo4 2009-01-06
4  4  0  foo5 2009-01-07

In [10]: df.dtypes
Out[10]:
A           float64
B           float64
C            object
D    datetime64[ns]
dtype: object

Then you could do something like:

In [11]: generate_bq_schema(df)
Out[11]:
{'fields': [{'name': 'A', 'type': 'FLOAT'},
  {'name': 'B', 'type': 'FLOAT'},
  {'name': 'C', 'type': 'STRING'},
  {'name': 'D', 'type': 'TIMESTAMP'}]}

and with a named index, that could be added to the schema as well. For now, we could stick to requiring non-hierarchical/MultiIndex, but maybe we could use record types for an index that's MultiIndex in the future?

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2014

cc @jacobschaer - I think you're the main one to ask on this?

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2014

@jacobschaer for context - there's a Bloomberg Hackathon that's happening next Saturday and I'm thinking this could be a good project for someone who uses BigQuery and/or Pandas

@jacobschaer

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2014

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2014

yeah, it's certainly not that complicated, just would make it easier for
people to write to slap-dash write to bigquery. That SO post is probably
80% of the way there too.

On Fri, Sep 19, 2014 at 3:46 PM, Jacob Schaer notifications@github.com
wrote:

Sounds like something someone asked a while ago on Stack. See

http://stackoverflow.com/questions/21886742/convert-pandas-dtypes-to-bigquery-type-representation


Reply to this email directly or view it on GitHub
#8325 (comment).

@ghost

This comment has been minimized.

Copy link

commented Aug 17, 2015

I'm currently using pandas for a project I'm working on and would really like to see a new feature that allows users to create new tables in google big query using to_gbq. I notice that the ability to create tables from schema was removed in #6937.

I would like to try and develop this feature if no one else is working on it.

jreback added a commit that referenced this issue Sep 13, 2015

Merge pull request #10857 from parthea/allow-creation-of-gbq-tables
ENH: #8325 Add ability to create tables using the gbq module.

yarikoptic added a commit to neurodebian/pandas that referenced this issue Sep 16, 2015

Merge commit 'v0.17.0rc1-40-gd1feb49' into debian
* commit 'v0.17.0rc1-40-gd1feb49': (394 commits)
  DOC: fix ref to template for plot accessor
  ENH Move check for inferred compression to before `get_filepath_or_buffer`
  CI: add py3.5 build
  ENH Enable streaming from S3
  Fix Series.nunique groupby with object
  DOC: Update perf doc for 10953
  TST: Fix skipped unit tests in test_ga. Install python-gflags using pip. pandas-dev#11090
  ENH Recognize 's3n' and 's3a' as an S3 address
  DOC: Comparison with SAS
  BUG: Use StrictVersion instead of LooseVersion when testing for minimum google api client version pandas-dev#10652
  BLD: Install google-api-python-client and httplib2 using pip
  ENH: Add ability to create tables using the gbq module. pandas-dev#8325
  TST: make sure to close stata readers
  asv bench cleanup - groupby
  DOC: fix plot submethods whatsnew example
  CI: support *.pip for installations
  DOC: Modified incorrect doc-string for DataFrameFormatter and removed outdated doc-string (+1 squashed commit) Squashed commits: [068b1fd] DOC: Modified incorrect doc-string for DataFrameFormatter using new doc-string design  (+1 squashed commit) Squashed commits: [12e032d] DOC: Updated doc-string using new doc-string design for DataFrameFormatter
  ENH Enable bzip2 streaming for Python 3
  DOC: update release.rst with the highlites
  DOC: Categorize whatsnew
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.