<a href="https://colab.research.google.com/github/nurfnick/Data_Viz/blob/main/Content/Data_Collecting/075CreatingTables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adding Data To BigQuery

## Create A Dataset

So what if we want to create our own table on BigQuery?  How could we add data into that table?  Here we get all logged on.

In [1]:
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client('pic-math')

from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


Next we'll create a new table.  I have an established project name, `pic-math`, and have created a new table called `titanicData2`.

In [2]:
# TODO(developer): Set dataset_id to the ID of the dataset to create.
dataset_id = "pic-math.titanicData".format(client.project)



This last peice of code will create the dataset.



In [3]:
# Construct a full Dataset object to send to the API.
dataset = bigquery.Dataset(dataset_id)

# TODO(developer): Specify the geographic location where the dataset should reside.
dataset.location = "US"

# Send the dataset to the API for creation, with an explicit timeout.
# Raises google.api_core.exceptions.Conflict if the Dataset already
# exists within the project.
dataset = client.create_dataset(dataset, timeout=30)  # Make an API request.
print("Created dataset {}.{}".format(client.project, dataset.dataset_id))

Created dataset pic-math.titanicData


I can check in the sandbox that the dataset was created.

![titanicData Created](https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Images/titanicData.png)

## Specifying Schema

To keep going here we are going to need some data to add to our SQL dataset we created.  Below here I load the titanic train dataset from Kaggle.

In [4]:
import pandas as pd

df = pd.read_csv("https://github.com/nurfnick/Data_Sets_For_Stats/raw/master/CuratedDataSets/titanic_data_train.csv")

df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


So we have a bunch of different columns.  We'll have to specify a schema for each.  I am going to

In [5]:
df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

We will make the int64 dtypes into INT64, float64 into FLOAT64 and object into STRING.  I'll do this with a dictionary.

In [6]:
convert = {"int64":"INT64", "object":"STRING", "float64":"FLOAT64"}


Now I am ready to do the schema

In [7]:
schema = []

for i,x in enumerate(df.dtypes):
  schema.append(bigquery.SchemaField(df.columns[i],convert[str(x)]))


In [8]:
schema

[SchemaField('PassengerId', 'INT64', 'NULLABLE', None, None, (), None),
 SchemaField('Survived', 'INT64', 'NULLABLE', None, None, (), None),
 SchemaField('Pclass', 'INT64', 'NULLABLE', None, None, (), None),
 SchemaField('Name', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('Sex', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('Age', 'FLOAT64', 'NULLABLE', None, None, (), None),
 SchemaField('SibSp', 'INT64', 'NULLABLE', None, None, (), None),
 SchemaField('Parch', 'INT64', 'NULLABLE', None, None, (), None),
 SchemaField('Ticket', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('Fare', 'FLOAT64', 'NULLABLE', None, None, (), None),
 SchemaField('Cabin', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('Embarked', 'STRING', 'NULLABLE', None, None, (), None)]

## Create a Table

With the schema made, we need to create a table in our dataset.

In [13]:
table_id = 'pic-math.titanicData.train'


table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table)  # Make an API request.
print(
    "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

Conflict: 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/pic-math/datasets/titanicData/tables?prettyPrint=false: Already Exists: Table pic-math:titanicData.train

I called the table `train`.  We see in the sandbox that we have the table now and seem to have got the schema correct.

![table has been created](https://raw.githubusercontent.com/nurfnick/Data_Viz/main/Images/titanicDataSchema.png)

## Load the Data

There are a couple of approaches we could take here.  One would be to convert all our data into a json type file.  You could do this with each row name and then add the data to it.

In [10]:
json0 = {}
for i in df.columns:
  json0[str(i)] =  str(df[i][0])
json1 = {}
for i in df.columns:
  json1[str(i)] =  str(df[i][1])

In [11]:
rows_to_insert = [
    json0,
    json1,
]



In [12]:
errors = client.insert_rows_json(
    table_id, rows_to_insert, row_ids=[None] * len(rows_to_insert)
)  # Make an API request.
if errors == []:
    print("New rows have been added.")
else:
    print("Encountered errors while inserting rows: {}".format(errors))

Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/pic-math/datasets/titanicData/tables/train/insertAll?prettyPrint=false: Access Denied: BigQuery BigQuery: Streaming insert is not allowed in the free tier

Actually I could not do this on my tier...  I think I have it set up correctly though.  Let's see if I can do it a different way.

In [14]:
json0

{'PassengerId': '1',
 'Survived': '0',
 'Pclass': '3',
 'Name': 'Braund, Mr. Owen Harris',
 'Sex': 'male',
 'Age': '22.0',
 'SibSp': '1',
 'Parch': '0',
 'Ticket': 'A/5 21171',
 'Fare': '7.25',
 'Cabin': 'nan',
 'Embarked': 'S'}

In [16]:
%%bigquery --project pic-math

INSERT INTO pic-math.titanicData.train (PassengerId, Survived, Pclass, Sex)
VALUES
(1, 0, 3, 'male')

Executing query with job ID: fbdd3dc1-72aa-4e34-8dbf-3ca9e06c22a3
Query executing: 0.88s


ERROR:
 403 Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. DML queries are not allowed in the free tier. Set up a billing account to remove this restriction.

Location: US
Job ID: fbdd3dc1-72aa-4e34-8dbf-3ca9e06c22a3



Well it takes billing to be enabled to do this too...  This will get you there at least if you want to enable billing, you'll be able to get further than I did.

## Your Turn

Create a dataset and table for your possible table.  Enable billing and load some data.