CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq

closes pandas-dev#15347
jreback · Feb 27, 2017 · 3222de1 · 3222de1
1 parent fb7dc7d
commit 3222de1
Show file tree

Hide file tree

Showing 11 changed files with 110 additions and 2,679 deletions.
diff --git a/ci/requirements-2.7.pip b/ci/requirements-2.7.pip
@@ -1,8 +1,5 @@
 blosc
-httplib2
-google-api-python-client==1.2
-python-gflags==2.0
-oauth2client==1.5.0
+pandas-gbq
 pathlib
 backports.lzma
 py

diff --git a/ci/requirements-3.4.pip b/ci/requirements-3.4.pip
@@ -1,5 +1,2 @@
 python-dateutil==2.2
 blosc
-httplib2
-google-api-python-client
-oauth2client
diff --git a/ci/requirements-3.4_SLOW.pip b/ci/requirements-3.4_SLOW.pip
diff --git a/ci/requirements-3.5.pip b/ci/requirements-3.5.pip
@@ -1 +1,2 @@
 xarray==0.9.1
+pandas-gbq
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -4652,293 +4652,12 @@ And then issue the following queries:
 Google BigQuery
 ---------------
 
-.. versionadded:: 0.13.0
-
-The :mod:`pandas.io.gbq` module provides a wrapper for Google's BigQuery
-analytics web service to simplify retrieving results from BigQuery tables
-using SQL-like queries. Result sets are parsed into a pandas
-DataFrame with a shape and data types derived from the source table.
-Additionally, DataFrames can be inserted into new BigQuery tables or appended
-to existing tables.
-
-.. warning::
-
-   To use this module, you will need a valid BigQuery account. Refer to the
-   `BigQuery Documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__
-   for details on the service itself.
-
-The key functions are:
-
-.. currentmodule:: pandas.io.gbq
-
-.. autosummary::
-   :toctree: generated/
-
-   read_gbq
-   to_gbq
-
-.. currentmodule:: pandas
-
-
-Supported Data Types
-''''''''''''''''''''
-
-Pandas supports all these `BigQuery data types <https://cloud.google.com/bigquery/data-types>`__:
-``STRING``, ``INTEGER`` (64bit), ``FLOAT`` (64 bit), ``BOOLEAN`` and
-``TIMESTAMP`` (microsecond precision). Data types ``BYTES`` and ``RECORD``
-are not supported.
-
-Integer and boolean ``NA`` handling
-'''''''''''''''''''''''''''''''''''
-
-.. versionadded:: 0.20
-
-Since all columns in BigQuery queries are nullable, and NumPy lacks of ``NA``
-support for integer and boolean types, this module will store ``INTEGER`` or
-``BOOLEAN`` columns with at least one ``NULL`` value as ``dtype=object``.
-Otherwise those columns will be stored as ``dtype=int64`` or ``dtype=bool``
-respectively.
-
-This is opposite to default pandas behaviour which will promote integer
-type to float in order to store NAs. See the :ref:`gotchas<gotchas.intna>`
-for detailed explaination.
-
-While this trade-off works well for most cases, it breaks down for storing
-values greater than 2**53. Such values in BigQuery can represent identifiers
-and unnoticed precision lost for identifier is what we want to avoid.
-
-.. _io.bigquery_deps:
-
-Dependencies
-''''''''''''
-
-This module requires following additional dependencies:
-
-- `httplib2 <https://github.com/httplib2/httplib2>`__: HTTP client
-- `google-api-python-client <http://github.com/google/google-api-python-client>`__: Google's API client
-- `oauth2client <https://github.com/google/oauth2client>`__: authentication and authorization for Google's API
-
-.. _io.bigquery_authentication:
-
-Authentication
-''''''''''''''
-
-.. versionadded:: 0.18.0
-
-Authentication to the Google ``BigQuery`` service is via ``OAuth 2.0``.
-Is possible to authenticate with either user account credentials or service account credentials.
-
-Authenticating with user account credentials is as simple as following the prompts in a browser window
-which will be automatically opened for you. You will be authenticated to the specified
-``BigQuery`` account using the product name ``pandas GBQ``. It is only possible on local host.
-The remote authentication using user account credentials is not currently supported in pandas.
-Additional information on the authentication mechanism can be found
-`here <https://developers.google.com/identity/protocols/OAuth2#clientside/>`__.
-
-Authentication with service account credentials is possible via the `'private_key'` parameter. This method
-is particularly useful when working on remote servers (eg. jupyter iPython notebook on remote host).
-Additional information on service accounts can be found
-`here <https://developers.google.com/identity/protocols/OAuth2#serviceaccount>`__.
-
-Authentication via ``application default credentials`` is also possible. This is only valid
-if the parameter ``private_key`` is not provided. This method also requires that
-the credentials can be fetched from the environment the code is running in.
-Otherwise, the OAuth2 client-side authentication is used.
-Additional information on
-`application default credentials <https://developers.google.com/identity/protocols/application-default-credentials>`__.
-
-.. versionadded:: 0.19.0
-
-.. note::
-
-   The `'private_key'` parameter can be set to either the file path of the service account key
-   in JSON format, or key contents of the service account key in JSON format.
-
-.. note::
-
-   A private key can be obtained from the Google developers console by clicking
-   `here <https://console.developers.google.com/permissions/serviceaccounts>`__. Use JSON key type.
-
-.. _io.bigquery_reader:
-
-Querying
-''''''''
-
-Suppose you want to load all data from an existing BigQuery table : `test_dataset.test_table`
-into a DataFrame using the :func:`~pandas.io.gbq.read_gbq` function.
-
-.. code-block:: python
-
-   # Insert your BigQuery Project ID Here
-   # Can be found in the Google web console
-   projectid = "xxxxxxxx"
-
-   data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table', projectid)
-
-
-You can define which column from BigQuery to use as an index in the
-destination DataFrame as well as a preferred column order as follows:
-
-.. code-block:: python
-
-   data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table',
-                             index_col='index_column_name',
-                             col_order=['col1', 'col2', 'col3'], projectid)
-
-
-Starting with 0.20.0, you can specify the query config as parameter to use additional options of your job.
-For more information about query configuration parameters see
-`here <https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query>`__.
-
-.. code-block:: python
-
-   configuration = {
-      'query': {
-        "useQueryCache": False
-      }
-   }
-   data_frame = pd.read_gbq('SELECT * FROM test_dataset.test_table',
-                             configuration=configuration, projectid)
-
-
-.. note::
-
-   You can find your project id in the `Google developers console <https://console.developers.google.com>`__.
-
-
-.. note::
-
-   You can toggle the verbose output via the ``verbose`` flag which defaults to ``True``.
-
-.. note::
-
-    The ``dialect`` argument can be used to indicate whether to use BigQuery's ``'legacy'`` SQL
-    or BigQuery's ``'standard'`` SQL (beta). The default value is ``'legacy'``. For more information
-    on BigQuery's standard SQL, see `BigQuery SQL Reference
-    <https://cloud.google.com/bigquery/sql-reference/>`__
-
-.. _io.bigquery_writer:
-
-Writing DataFrames
-''''''''''''''''''
-
-Assume we want to write a DataFrame ``df`` into a BigQuery table using :func:`~pandas.DataFrame.to_gbq`.
-
-.. ipython:: python
-
-   df = pd.DataFrame({'my_string': list('abc'),
-                      'my_int64': list(range(1, 4)),
-                      'my_float64': np.arange(4.0, 7.0),
-                      'my_bool1': [True, False, True],
-                      'my_bool2': [False, True, False],
-                      'my_dates': pd.date_range('now', periods=3)})
-
-   df
-   df.dtypes
-
-.. code-block:: python
-
-   df.to_gbq('my_dataset.my_table', projectid)
-
-.. note::
-
-   The destination table and destination dataset will automatically be created if they do not already exist.
-
-The ``if_exists`` argument can be used to dictate whether to ``'fail'``, ``'replace'``
-or ``'append'`` if the destination table already exists. The default value is ``'fail'``.
-
-For example, assume that ``if_exists`` is set to ``'fail'``. The following snippet will raise
-a ``TableCreationError`` if the destination table already exists.
-
-.. code-block:: python
-
-   df.to_gbq('my_dataset.my_table', projectid, if_exists='fail')
-
-.. note::
-
-   If the ``if_exists`` argument is set to ``'append'``, the destination dataframe will
-   be written to the table using the defined table schema and column types. The
-   dataframe must match the destination table in structure and data types.
-   If the ``if_exists`` argument is set to ``'replace'``, and the existing table has a
-   different schema, a delay of 2 minutes will be forced to ensure that the new schema
-   has propagated in the Google environment. See
-   `Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__.
-
-Writing large DataFrames can result in errors due to size limitations being exceeded.
-This can be avoided by setting the ``chunksize`` argument when calling :func:`~pandas.DataFrame.to_gbq`.
-For example, the following writes ``df`` to a BigQuery table in batches of 10000 rows at a time:
-
-.. code-block:: python
-
-   df.to_gbq('my_dataset.my_table', projectid, chunksize=10000)
-
-You can also see the progress of your post via the ``verbose`` flag which defaults to ``True``.
-For example:
-
-.. code-block:: python
-
-   In [8]: df.to_gbq('my_dataset.my_table', projectid, chunksize=10000, verbose=True)
-
-           Streaming Insert is 10% Complete
-           Streaming Insert is 20% Complete
-           Streaming Insert is 30% Complete
-           Streaming Insert is 40% Complete
-           Streaming Insert is 50% Complete
-           Streaming Insert is 60% Complete
-           Streaming Insert is 70% Complete
-           Streaming Insert is 80% Complete
-           Streaming Insert is 90% Complete
-           Streaming Insert is 100% Complete
-
-.. note::
-
-   If an error occurs while streaming data to BigQuery, see
-   `Troubleshooting BigQuery Errors <https://cloud.google.com/bigquery/troubleshooting-errors>`__.
-
-.. note::
-
-   The BigQuery SQL query language has some oddities, see the
-   `BigQuery Query Reference Documentation <https://cloud.google.com/bigquery/query-reference>`__.
-
-.. note::
-
-   While BigQuery uses SQL-like syntax, it has some important differences from traditional
-   databases both in functionality, API limitations (size and quantity of queries or uploads),
-   and how Google charges for use of the service. You should refer to `Google BigQuery documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__
-   often as the service seems to be changing and evolving. BiqQuery is best for analyzing large
-   sets of data quickly, but it is not a direct replacement for a transactional database.
-
-.. _io.bigquery_create_tables:
-
-Creating BigQuery Tables
-''''''''''''''''''''''''
-
 .. warning::
 
-   As of 0.17, the function :func:`~pandas.io.gbq.generate_bq_schema` has been deprecated and will be
-   removed in a future version.
-
-As of 0.15.2, the gbq module has a function :func:`~pandas.io.gbq.generate_bq_schema` which will
-produce the dictionary representation schema of the specified pandas DataFrame.
-
-.. code-block:: ipython
-
-   In [10]: gbq.generate_bq_schema(df, default_type='STRING')
-
-   Out[10]: {'fields': [{'name': 'my_bool1', 'type': 'BOOLEAN'},
-            {'name': 'my_bool2', 'type': 'BOOLEAN'},
-            {'name': 'my_dates', 'type': 'TIMESTAMP'},
-            {'name': 'my_float64', 'type': 'FLOAT'},
-            {'name': 'my_int64', 'type': 'INTEGER'},
-            {'name': 'my_string', 'type': 'STRING'}]}
-
-.. note::
-
-   If you delete and re-create a BigQuery table with the same name, but different table schema,
-   you must wait 2 minutes before streaming data into the table. As a workaround, consider creating
-   the new table with a different name. Refer to
-   `Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191>`__.
+   Starting in 0.20.0, pandas has split off Google BigQuery support into the
+   separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it.
 
+   Documentation is now hosted `here <https://pandas-gbq.readthedocs.io/>`__
 
 .. _io.stata:
 

diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
@@ -360,6 +360,15 @@ New Behavior:
   In [5]: df['a']['2011-12-31 23:59:59']
   Out[5]: 1
 
+.. _whatsnew_0200.api_breaking.gbq:
+
+Pandas Google BigQuery support has moved
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+pandas has split off Google BigQuery support into a separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it.
+The functionality of ``pd.read_gbq()`` and ``.to_gbq()`` remains the same with the currently released version of ``pandas-gbq=0.1.2``. (:issue:`15347`)
+Documentation is now hosted `here <https://pandas-gbq.readthedocs.io/>`__
+
 .. _whatsnew_0200.api_breaking.memory_usage:
 
 Memory Usage for Index is more Accurate

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -77,7 +77,8 @@
                            OrderedDict, raise_with_traceback)
 from pandas import compat
 from pandas.compat.numpy import function as nv
-from pandas.util.decorators import deprecate_kwarg, Appender, Substitution
+from pandas.util.decorators import (deprecate_kwarg, Appender,
+                                    Substitution, docstring_wrapper)
 from pandas.util.validators import validate_bool_kwarg
 
 from pandas.tseries.period import PeriodIndex
@@ -941,6 +942,12 @@ def to_gbq(self, destination_table, project_id, chunksize=10000,
                           chunksize=chunksize, verbose=verbose, reauth=reauth,
                           if_exists=if_exists, private_key=private_key)
 
+    def _f():
+        from pandas.io.gbq import _try_import
+        return _try_import().to_gbq.__doc__
+    to_gbq = docstring_wrapper(
+        to_gbq, _f, default='the pandas_gbq package is not installed')
+
     @classmethod
     def from_records(cls, data, index=None, exclude=None, columns=None,
                      coerce_float=False, nrows=None):