# Data collection directly from Bigquery

There are two libraries we can use to get data directly from Big Query
* pandas-gbq: open source library maintained by PyData
* google-cloud-bigquery: open source library maintained by google

Google-cloud-bigquery have more new features, full API functionality and could also run queries more quickly. 
However the authentification with google-cloud-bigquery requires service account, which we are not allowed to have based on DH default settings. We are contacting data engineering team to get it. Till then we can use pandas-gbq to run the query 

https://cloud.google.com/bigquery/docs/pandas-gbq-migration

## 1. Data collection with pandas.gbq

The following code shows how you import libraries, define a query, run it and get the data.
More info about the library:
* Documentation: https://pandas-gbq.readthedocs.io/en/latest/
* Github: https://github.com/pydata/pandas-gbq/

In [1]:
# load packages
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.io import gbq
cwd=os.getcwd()
cwd

'C:\\Users\\y.zhang\\Documents\\Projects\\bigquery'

The following example shows how you define a query and then get the data using the API, which is saved as a data frame. The first time when you run the query, you need to log in your google account and authorize pandas-gbq to process the data.

In [2]:
# define the query you want ot run
query="""select * from `fulfillment-dwh-production.cl.orders` limit 10"""

In [4]:
# run the query and save the results as data frame named df
df=gbq.read_gbq(query,project_id="fulfillment-dwh-production")
# get first 10 rows of datasets
df.head()

Downloading: 100%|███████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 23.38rows/s]


Unnamed: 0,country_code,region,order_id,platform_order_id,platform_order_code,platform,entity,created_date,created_at,order_placed_at,...,tags,order_value,capacity,vendor_order_number,customer,vendor,porygon,deliveries,cancellation,timings
0,uy,Americas,1231758,34145238,102298690,PedidosYa,"{'id': 'PY_UY', 'display_name': 'PY - Uruguay'...",2019-06-14,2019-06-14 23:50:44.806628+00:00,2019-06-14 23:50:44.593000+00:00,...,[preorder],100000,50,,{'location': 'POINT(-56.1547 -34.9246)'},"{'id': 93, 'vendor_code': '49162', 'name': 'Ea...",[],"[{'id': 1279934, 'created_at': 2019-06-14 23:5...","{'source': 'api', 'performed_by': None, 'reaso...","{'updated_prep_time': 0, 'hold_back_time': 0, ..."
1,uy,Americas,1231156,34137590,102277460,PedidosYa,"{'id': 'PY_UY', 'display_name': 'PY - Uruguay'...",2019-06-14,2019-06-14 23:02:54.229943+00:00,2019-06-14 23:02:53.943000+00:00,...,[existing_address],69500,34,,{'location': 'POINT(-56.1472 -34.8849)'},"{'id': 593, 'vendor_code': '66736', 'name': 'F...",[],"[{'id': 1279313, 'created_at': 2019-06-14 23:0...","{'source': 'api', 'performed_by': None, 'reaso...","{'updated_prep_time': 2100, 'hold_back_time': ..."
2,uy,Americas,1231342,34139778,102283249,PedidosYa,"{'id': 'PY_UY', 'display_name': 'PY - Uruguay'...",2019-06-14,2019-06-14 23:18:43.445022+00:00,2019-06-14 23:18:43.233000+00:00,...,[existing_address],28500,14,,{'location': 'POINT(-56.0426 -34.8718)'},"{'id': 442, 'vendor_code': '63176', 'name': 'M...",[],"[{'id': 1279501, 'created_at': 2019-06-14 23:1...","{'source': 'api', 'performed_by': None, 'reaso...","{'updated_prep_time': 1200, 'hold_back_time': ..."
3,uy,Americas,1231347,34139868,102283509,PedidosYa,"{'id': 'PY_UY', 'display_name': 'PY - Uruguay'...",2019-06-14,2019-06-14 23:19:28.782375+00:00,2019-06-14 23:19:28.485000+00:00,...,[existing_address],91000,44,,{'location': 'POINT(-56.1498 -34.9117)'},"{'id': 200, 'vendor_code': '53702', 'name': 'L...",[],"[{'id': 1279506, 'created_at': 2019-06-14 23:1...","{'source': 'dispatcher', 'performed_by': None,...","{'updated_prep_time': 1994, 'hold_back_time': ..."
4,uy,Americas,1231808,34145932,102300915,PedidosYa,"{'id': 'PY_UY', 'display_name': 'PY - Uruguay'...",2019-06-14,2019-06-14 23:54:53.402779+00:00,2019-06-14 23:54:53.129000+00:00,...,[existing_address],91000,44,,{'location': 'POINT(-56.1528 -34.901)'},"{'id': 658, 'vendor_code': '88448', 'name': 'L...",[],"[{'id': 1279989, 'created_at': 2019-06-14 23:5...","{'source': 'dispatcher', 'performed_by': None,...","{'updated_prep_time': 600, 'hold_back_time': 1..."


Then you can continue to manipulate and visualize data with pandas and other libraries

## 2. Data collection with google-cloud-bigquery

The following code shows how you import libraries, define a query, run it and get the data. More info about the library:
Documentation: https://googleapis.dev/python/bigquery/latest/index.html
Github: https://github.com/pydata/pandas-gbq/

In [5]:
# load packages
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from google.cloud import bigquery
from google.oauth2 import service_account
cwd=os.getcwd() # get working directory
cwd

'C:\\Users\\y.zhang\\Documents\\Projects\\bigquery'

### There are two ways to for authentification
* service account: https://cloud.google.com/docs/authentication/getting-started
* end user credential (OAuth client ID - client json) https://cloud.google.com/bigquery/docs/authentication/end-user-installed

However, currently we do not have permissions to create credentials. We are contacting Riccardo to get this solved. google-cloud-bigquery should give us results more quickly with more features etc.

I will update the file after I got answer from Ricardo