![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2FTips&dt=BigQuery+-+Python+Client.ipynb)

# Python BigQuery Client
## < IN PROGRESS >

Resources:
- Using BigQuery From Python, Notebooks in This Repository
    - [01 - Data Sources/01 - BigQuery - Table Data Sources](../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb)
    - [03 - BigQuery ML (BQML)/03 - Introduction to BigQuery ML (BQML)](../03%20-%20BigQuery%20ML%20(BQML)/03%20-%20Introduction%20to%20BigQuery%20ML%20(BQML).ipynb)
    - [Applied Forecasting/1 - BigQuery Time Series Forecasting Data Review and Preparation](../Applied%20Forecasting/1%20-%20BigQuery%20Time%20Series%20Forecasting%20Data%20Review%20and%20Preparation.ipynb)


---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
# source data
BQ_PROJECT = 'bigquery-public-data'
BQ_DATASET = 'ml_datasets'
BQ_TABLE = 'ulb_fraud_detection'

packages:

In [3]:
import pandas as pd
from google.cloud import bigquery

clients:

In [4]:
bq = bigquery.Client()

parameters:

In [5]:
BQ_SOURCE = f'{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}'

---
## Using BigQuery From Jupyter and Python

> **Notes:**
> - The `LIMIT 5` statement does limit the number of rows returned by BigQuery to 5 but BigQuery still does a full table scan.  If you have a table larger than 1GB and want to limit the rows scanned for a quick review then replacing `LIMIT 5` with `TABLESAMPLE SYSTEM (1 PERCENT)` would be more efficient.  For tables under 1GB it will still return the full table.  More on [Table Sampling](https://cloud.google.com/bigquery/docs/table-sampling)
> - Each of the examples below run the same query in BigQuery.  The query is cached on the first run for up to 24 hours.  This means the subsequent, identical queries will not scan the data and instead use the cached results table.  More information on [Using cached query results](https://cloud.google.com/bigquery/docs/cached-results).

---
## BigQuery From Jupyter!

### BigQuery Cell Magic

In [6]:
%%bigquery
SELECT *
FROM bigquery-public-data.ml_datasets.ulb_fraud_detection # this cannot be parameterized with magics
LIMIT 5

Query complete after 0.01s: 100%|██████████| 2/2 [00:00<00:00, 1018.28query/s]                        
Downloading: 100%|██████████| 5/5 [00:02<00:00,  2.35rows/s]


Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,282.0,-0.356466,0.725418,1.971749,0.831343,0.369681,-0.107776,0.75161,-0.120166,-0.420675,...,0.020804,0.424312,-0.015989,0.466754,-0.809962,0.657334,-0.04315,-0.046401,0.0,0
1,380.0,-1.299837,0.881817,1.452842,-1.293698,-0.025105,-1.170103,0.86161,-0.193934,0.592001,...,-0.272563,-0.360853,0.223911,0.59893,-0.397705,0.637141,0.234872,0.021379,0.0,0
2,820.0,-0.937481,0.401649,1.882689,-0.362001,0.751088,-0.899262,0.880557,-0.18165,-0.211657,...,-0.001757,0.097379,-0.32405,0.436521,0.509674,0.454116,-0.201804,-0.175439,0.0,0
3,1193.0,1.130646,0.625391,0.837987,2.506543,-0.107116,-0.245548,0.099603,-0.041457,-0.867319,...,-0.017154,-0.014311,0.086559,0.393496,0.332062,-0.066378,0.013858,0.025382,0.0,0
4,2371.0,-0.878833,0.133657,2.534047,2.609811,1.510839,2.075778,-0.384729,0.2303,-0.367956,...,-0.296422,-0.255485,-0.583298,-1.677514,0.050524,0.250409,-0.223149,-0.420764,0.0,0


### BigQuery Python Client

In [7]:
query = f"""
    SELECT * 
    FROM `{BQ_SOURCE}`
    LIMIT 5
"""
preview = bq.query(query = query).to_dataframe()
preview

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,282.0,-0.356466,0.725418,1.971749,0.831343,0.369681,-0.107776,0.75161,-0.120166,-0.420675,...,0.020804,0.424312,-0.015989,0.466754,-0.809962,0.657334,-0.04315,-0.046401,0.0,0
1,380.0,-1.299837,0.881817,1.452842,-1.293698,-0.025105,-1.170103,0.86161,-0.193934,0.592001,...,-0.272563,-0.360853,0.223911,0.59893,-0.397705,0.637141,0.234872,0.021379,0.0,0
2,820.0,-0.937481,0.401649,1.882689,-0.362001,0.751088,-0.899262,0.880557,-0.18165,-0.211657,...,-0.001757,0.097379,-0.32405,0.436521,0.509674,0.454116,-0.201804,-0.175439,0.0,0
3,1193.0,1.130646,0.625391,0.837987,2.506543,-0.107116,-0.245548,0.099603,-0.041457,-0.867319,...,-0.017154,-0.014311,0.086559,0.393496,0.332062,-0.066378,0.013858,0.025382,0.0,0
4,2371.0,-0.878833,0.133657,2.534047,2.609811,1.510839,2.075778,-0.384729,0.2303,-0.367956,...,-0.296422,-0.255485,-0.583298,-1.677514,0.050524,0.250409,-0.223149,-0.420764,0.0,0


### BigQuery Python Client: Helper Function

In [8]:
def bq_runner(query):
    return bq.query(query = query)

In [9]:
bq_runner(
    query = f"""
        SELECT * 
        FROM `{BQ_SOURCE}`
        LIMIT 5
    """
).to_dataframe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,282.0,-0.356466,0.725418,1.971749,0.831343,0.369681,-0.107776,0.75161,-0.120166,-0.420675,...,0.020804,0.424312,-0.015989,0.466754,-0.809962,0.657334,-0.04315,-0.046401,0.0,0
1,380.0,-1.299837,0.881817,1.452842,-1.293698,-0.025105,-1.170103,0.86161,-0.193934,0.592001,...,-0.272563,-0.360853,0.223911,0.59893,-0.397705,0.637141,0.234872,0.021379,0.0,0
2,820.0,-0.937481,0.401649,1.882689,-0.362001,0.751088,-0.899262,0.880557,-0.18165,-0.211657,...,-0.001757,0.097379,-0.32405,0.436521,0.509674,0.454116,-0.201804,-0.175439,0.0,0
3,1193.0,1.130646,0.625391,0.837987,2.506543,-0.107116,-0.245548,0.099603,-0.041457,-0.867319,...,-0.017154,-0.014311,0.086559,0.393496,0.332062,-0.066378,0.013858,0.025382,0.0,0
4,2371.0,-0.878833,0.133657,2.534047,2.609811,1.510839,2.075778,-0.384729,0.2303,-0.367956,...,-0.296422,-0.255485,-0.583298,-1.677514,0.050524,0.250409,-0.223149,-0.420764,0.0,0


### BigQuery Python Client: Using Query Job Attributes and Methods

Query Jobs have Methods and Attributes that can benefit the Python workflow:
- Query Job [Methods](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob:~:text=for%20accurate%20signature.-,Methods,-__init__(job_id%2C%C2%A0query)
- Query Job [Attributes](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob:~:text=from%20a%20QueryJob-,Attributes,-allow_large_results)

BigQuery Query Job (using helper function):

In [10]:
job = bq_runner(
    query = f"""
        SELECT * 
        FROM `{BQ_SOURCE}`
        LIMIT 5
    """
)

Using Query Job Atrributes to get timing:

In [11]:
job.result()
(job.ended-job.started).total_seconds()

0.124

Using Query Job Methods to retrieve result to Pandas dataframe:

In [12]:
job.to_dataframe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,282.0,-0.356466,0.725418,1.971749,0.831343,0.369681,-0.107776,0.75161,-0.120166,-0.420675,...,0.020804,0.424312,-0.015989,0.466754,-0.809962,0.657334,-0.04315,-0.046401,0.0,0
1,380.0,-1.299837,0.881817,1.452842,-1.293698,-0.025105,-1.170103,0.86161,-0.193934,0.592001,...,-0.272563,-0.360853,0.223911,0.59893,-0.397705,0.637141,0.234872,0.021379,0.0,0
2,820.0,-0.937481,0.401649,1.882689,-0.362001,0.751088,-0.899262,0.880557,-0.18165,-0.211657,...,-0.001757,0.097379,-0.32405,0.436521,0.509674,0.454116,-0.201804,-0.175439,0.0,0
3,1193.0,1.130646,0.625391,0.837987,2.506543,-0.107116,-0.245548,0.099603,-0.041457,-0.867319,...,-0.017154,-0.014311,0.086559,0.393496,0.332062,-0.066378,0.013858,0.025382,0.0,0
4,2371.0,-0.878833,0.133657,2.534047,2.609811,1.510839,2.075778,-0.384729,0.2303,-0.367956,...,-0.296422,-0.255485,-0.583298,-1.677514,0.050524,0.250409,-0.223149,-0.420764,0.0,0


### Indirect use with pandas-gbq

When working with [Pandas](https://pandas.pydata.org/docs/user_guide/index.html#user-guide) the methods above show the client returning data to pandas dataframes.  This section will show a pandas mudule, [pandas-gbq](https://pandas-gbq.readthedocs.io/en/latest/) the wraps the BigQuery client so that pandas can retrieve BigQuery data to dataframes.

References:
- [Comparison of BigQuery Client with pandas-gbq](https://cloud.google.com/bigquery/docs/pandas-gbq-migration)

#### Package Install (if needed)

In [13]:
try:
    import pandas_gbq
except ImportError:
    print('You need to pip install pandas-gbq')
    !pip install pandas-gbq -q

#### Using pandas-gbq

In [15]:
query = f"""
SELECT * 
FROM `{BQ_SOURCE}`
LIMIT 5
"""
df = pd.read_gbq(query, project_id = PROJECT_ID)

In [16]:
df

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,403.0,1.237413,0.512365,0.687746,1.693872,-0.236323,-0.650232,0.118066,-0.230545,-0.808523,...,-0.077543,-0.17822,0.038722,0.471218,0.289249,0.871803,-0.066884,0.012986,0.0,0
1,430.0,-1.860258,-0.629859,0.96657,0.844632,0.759983,-1.481173,-0.509681,0.540722,-0.733623,...,0.268028,0.125515,-0.225029,0.586664,-0.031598,0.570168,-0.043007,-0.223739,0.0,0
2,804.0,1.181697,-0.007908,-0.066845,1.532223,1.728832,4.409885,-1.138816,1.164645,-0.020578,...,-0.13992,-0.399563,0.0102,0.992235,0.457015,-0.027924,0.045273,0.028868,0.0,0
3,1444.0,1.040958,0.216092,1.535953,2.536507,-0.481435,0.966522,-0.686046,0.407032,-0.269952,...,0.001437,0.195375,0.053458,0.040637,0.221507,-0.033785,0.060665,0.023782,0.0,0
4,1444.0,-0.960403,1.355316,2.501171,3.036488,-0.71282,1.057343,-0.518911,0.927235,-0.656639,...,-0.127068,-0.131587,-0.114279,0.070171,0.00757,0.245045,0.261886,0.093203,0.0,0
