# Retrieving data from BigQuery

In this notebook we show how to retrieve your data from BigQuery

In case gbq is not working you might need to update the python api client library:

```bash
sudo pip install --upgrade google-api-python-client
```

In [None]:
import pandas as pd
import pandas_gbq as gbq 
import json
import matplotlib.pyplot as plt
%matplotlib inline
# hide warnings. `gbq.read_gbq()` gives some
import warnings
warnings.filterwarnings('ignore')

In [None]:
# project specifics
PRIVATE_KEY = '../google-credentials/gsdk-credentials.json'
PROJECT_ID = json.load(open(PRIVATE_KEY))['project_id']

## Inspecting and selecting tables

In [None]:
!bq ls iens

In [None]:
# dataset specifics
city = 'dongen'
date = '20180124'
bq_table = '_'.join(['iens.iens', city, date])  # use iens.iens_comments when querying on the comments table

## Reading from BigQuery

To load a BigQuery table into a Pandas dataframe, all you need is a query, the project_id, and a way to authenticate.

Do note that repeated fields will be exploded when read into Pandas. For example, if we have 1 restaurant with 3 tags, 3 records will be created in the new DataFrame. As we have not 1, but 2 repeated fields (`tags` and `image_urls`) in our data we always select which columns to query. Avoid `SELECT *`! 

In [None]:
# select all info fields, plus image_urls
query = "SELECT info.*, image_urls FROM {}".format(bq_table)

df = gbq.read_gbq(query, project_id=PROJECT_ID, private_key=PRIVATE_KEY)

In [None]:
df.shape