# Get Started with Jupyter on Google Cloud
In the following you find various helpful code examples which show you how to access data or start ML routines on Google Cloud resources like CPUs, GPUs, or TPUs.

Our first task is to import all necessary libraries used in the examples below.

In [4]:
import pandas as pd

## Access Data on Google Cloud Storage


Cloud Storage is a storage service in the Google Cloud. It can store virtually infinite amounts of data. Typically, Cloud Storage is used to store files with unstructured data, such as images, text files, and semi-structured file formats, such as CSV, Avro, Parquet, and TFRecords.

We start by creating a Cloud Storage client in Python. The client allows us to interact with the Cloud Storage service. With the client we can download and upload files.

In [3]:
from google.cloud import storage
client = storage.Client()
print("Client created using default project: {}".format(client.project))

Client created using default project: ecb-fsf-hackathon-base


To explicitly specify a project when constructing the client, set the `project` parameter:

In [23]:
# client = storage.Client(project='your-project-id')

First, we work with a bucket which is a root folder in Cloud Storage. Buckets can contain many files and have (practically) no size limit. Here is how we access our bucket for the hackathon:

In [6]:
bucket_name = "ecb-fsf-hackathon-base-data"
bucket = client.get_bucket(bucket_name)

print("Bucket name: {}".format(bucket.name))
print("Bucket location: {}".format(bucket.location))
print("Bucket storage class: {}".format(bucket.storage_class))

Bucket name: ecb-fsf-hackathon-base-data
Bucket location: EU
Bucket storage class: STANDARD


Let's list all files in the bucket:

In [7]:
blobs = bucket.list_blobs()

print("Blobs in {}:".format(bucket.name))
for item in blobs:
    print("\t" + item.name)

Blobs in ecb-fsf-hackathon-base-data:
	data.csv
	sample.csv


We can also use the gsutil command line tool for a list:

In [29]:
!gsutil ls gs://{bucket_name}

gs://ecb-fsf-hackathon-base-data/data.csv


Now we can get details about one of the files, download it, and load into a dataframe:

In [8]:
blob_name = "sample.csv"
blob = bucket.get_blob(blob_name)

print("Name: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
print("Content type: {}".format(blob.content_type))
print("Public URL: {}".format(blob.public_url))

output_file_name = "/tmp/sample.csv"
blob.download_to_filename(output_file_name)

print("Downloaded blob {} to {}.".format(blob.name, output_file_name))

Name: ecb-fsf-hackathon-base-data/sample.csv/1568832969971160
Size: 19537490 bytes
Content type: application/octet-stream
Public URL: https://storage.googleapis.com/ecb-fsf-hackathon-base-data/sample.csv
Downloaded blob sample.csv to /tmp/sample.csv.


Again, the same can be achieved using the gsutil command line tool:

In [32]:
!gsutil cp gs://{bucket_name}/{blob_name} /tmp/{blob_name}

Copying gs://ecb-fsf-hackathon-base-data/data.csv...
/ [1 files][  4.1 KiB/  4.1 KiB]                                                
Operation completed over 1 objects/4.1 KiB.                                      


With the file stored locally, we can load it into a Pandas dataframe:

In [10]:
df = pd.read_csv(output_file_name, header=None)
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
count,34821,34821,34821,20167,34821,34821,33825,33686,33326,34821.0,...,22486,24888,28162,28344,6914,6914,6914,6914.0,179.0,34821
unique,942,34821,34171,16073,28417,30794,10810,758,9,1688.0,...,92,24,7,5,116,29,9,6.0,3.0,5
top,Lebensmittel Frühstück,https://www.edeka24.de/Lebensmittel/Beilagen/B...,Lorenz Clubs Party Cracker,"Hergestellt für: REWE Markt GmbH, D-50603 Köln.",6sefe1ab19922722eafb8eddfde2a119,62deba79afd41398d7f002419d1225b7,100g,500,gram,1.99,...,1191,111,11,1,1114,111,11,1.0,1.0,0
freq,590,1,4,290,590,46,879,2466,22183,1111.0,...,1261,3580,14301,14323,367,1008,2504,3416.0,114.0,17666


And we should have a look into the dataframe:

In [11]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,category,url,product_name,product_description,product_id_store,product_id,volume,qty,unit,price,...,Coicop5_Suggested,Coicop4_Suggested,Coicop3_Suggested,Coicop2_Suggested,Coicop5_Final,Coicop4_Final,Coicop3_Final,Coicop2_Final,Controversial_Classification,Sample_Indicator
1,"Wein, Spirituosen & Tabak Spirituosen & -misch...",https://shop.rewe.de/p/siderit-gingerlime-lond...,Siderit Gingerlime London Dry Gin 700ml,"Siderit Gingerlime ist ein Citric Gin, der in ...",p/siderit-gingerlime-london-dry-gin-700ml/SIAE...,cf166b3dc4aef0f0ea0226566017b8a3,"0,7 L (1 L = 68,77 €)",07,liter,48.14,...,02111,0211,021,02,02111,0211,021,02,,1
2,"Wein, Spirituosen & Tabak Wein Rotwein Frankreich",https://shop.rewe.de/p/ch-teau-haut-terre-fort...,Château Haut Terre Fort rouge Bordeaux trocken...,Weinfreunde.de empfiehlt: Château Haut-Terre-F...,p/ch-teau-haut-terre-fort-rouge-bordeaux-trock...,cb30ae128226bcd945cef02927fd558a,"0,75l (1 l = 8,67 €)",075,liter,6.5,...,02121,0212,021,02,02121,0212,021,02,,1
3,"Wein, Spirituosen & Tabak Tabak & Zigaretten T...",https://shop.rewe.de/p/tipi-ohne-zusaetze-30g/...,Tipi Ohne Zusätze 30g,,p/tipi-ohne-zusaetze-30g/3101979,1ca5c87fb28c3f9b1143a33c0decc6c8,"30g (100 g = 13,17 €)",30,gram,3.95,...,02203,0220,022,02,02203,0220,022,02,,1
4,"Wein, Spirituosen & Tabak Fruchtwein & Weinmis...",https://shop.rewe.de/p/katlenburger-waldmeiste...,Katlenburger Waldmeister Weinbowle 1l,Die Weinbowle aus 75% Fruchtwein.,p/katlenburger-waldmeister-weinbowle-1l/N9E0J3LC,4edb1991aff450c436c9f974f0860e20,"1 L (1 L = 2,93 €)",1,liter,2.93,...,02124,0212,021,02,02124,0212,021,02,,1


Let's use Panda's built-in support for Google Cloud Storage:

In [12]:
df = pd.read_csv('gs://ecb-fsf-hackathon-base-data/sample.csv', header=None)
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
count,34821,34821,34821,20167,34821,34821,33825,33686,33326,34821.0,...,22486,24888,28162,28344,6914,6914,6914,6914.0,179.0,34821
unique,942,34821,34171,16073,28417,30794,10810,758,9,1688.0,...,92,24,7,5,116,29,9,6.0,3.0,5
top,Lebensmittel Frühstück,https://www.edeka24.de/Lebensmittel/Beilagen/B...,Lorenz Clubs Party Cracker,"Hergestellt für: REWE Markt GmbH, D-50603 Köln.",6sefe1ab19922722eafb8eddfde2a119,62deba79afd41398d7f002419d1225b7,100g,500,gram,1.99,...,1191,111,11,1,1114,111,11,1.0,1.0,0
freq,590,1,4,290,590,46,879,2466,22183,1111.0,...,1261,3580,14301,14323,367,1008,2504,3416.0,114.0,17666


And .head() should return the same lines as with our manual download:

In [13]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,category,url,product_name,product_description,product_id_store,product_id,volume,qty,unit,price,...,Coicop5_Suggested,Coicop4_Suggested,Coicop3_Suggested,Coicop2_Suggested,Coicop5_Final,Coicop4_Final,Coicop3_Final,Coicop2_Final,Controversial_Classification,Sample_Indicator
1,"Wein, Spirituosen & Tabak Spirituosen & -misch...",https://shop.rewe.de/p/siderit-gingerlime-lond...,Siderit Gingerlime London Dry Gin 700ml,"Siderit Gingerlime ist ein Citric Gin, der in ...",p/siderit-gingerlime-london-dry-gin-700ml/SIAE...,cf166b3dc4aef0f0ea0226566017b8a3,"0,7 L (1 L = 68,77 €)",07,liter,48.14,...,02111,0211,021,02,02111,0211,021,02,,1
2,"Wein, Spirituosen & Tabak Wein Rotwein Frankreich",https://shop.rewe.de/p/ch-teau-haut-terre-fort...,Château Haut Terre Fort rouge Bordeaux trocken...,Weinfreunde.de empfiehlt: Château Haut-Terre-F...,p/ch-teau-haut-terre-fort-rouge-bordeaux-trock...,cb30ae128226bcd945cef02927fd558a,"0,75l (1 l = 8,67 €)",075,liter,6.5,...,02121,0212,021,02,02121,0212,021,02,,1
3,"Wein, Spirituosen & Tabak Tabak & Zigaretten T...",https://shop.rewe.de/p/tipi-ohne-zusaetze-30g/...,Tipi Ohne Zusätze 30g,,p/tipi-ohne-zusaetze-30g/3101979,1ca5c87fb28c3f9b1143a33c0decc6c8,"30g (100 g = 13,17 €)",30,gram,3.95,...,02203,0220,022,02,02203,0220,022,02,,1
4,"Wein, Spirituosen & Tabak Fruchtwein & Weinmis...",https://shop.rewe.de/p/katlenburger-waldmeiste...,Katlenburger Waldmeister Weinbowle 1l,Die Weinbowle aus 75% Fruchtwein.,p/katlenburger-waldmeister-weinbowle-1l/N9E0J3LC,4edb1991aff450c436c9f974f0860e20,"1 L (1 L = 2,93 €)",1,liter,2.93,...,02124,0212,021,02,02124,0212,021,02,,1


**Learn more about interacting with Cloud Storage in the following tutorials:**
- [Cloud Storage client library](../tutorials/storage/Cloud%20Storage%20client%20library.ipynb)
- [Storage command-line tool](../tutorials/storage/Storage%20command-line%20tool.ipynb)

## Access Tables & Views on Google BigQuery



In [4]:
from google.cloud import bigquery
client = bigquery.Client(location="EU")
print("Client creating using default project: {}".format(client.project))

Client creating using default project: ecb-fsf-hackathon-base


To explicitly specify a project when constructing the client, set the `project` parameter:

In [51]:
# client = bigquery.Client(location="US", project="your-project-id")

In [5]:
query = """
    SELECT `Set`, URL, Label
    FROM `ecb-fsf-hackathon-base.hackathon_dataset.data_table`
    LIMIT 60
"""
query_job = client.query(query, location="EU")
df = query_job.to_dataframe()
df.describe()

Unnamed: 0,Set,URL,Label
count,60,60,60
unique,2,60,3
top,TRAIN,gs://sandbox-michael-menzel-vcm/clouds/cumulus...,cirrus
freq,45,1,20


In [53]:
df.head()

Unnamed: 0,Set,URL,Label
0,TEST,gs://sandbox-michael-menzel-vcm/clouds/cirrus/...,cirrus
1,TEST,gs://sandbox-michael-menzel-vcm/clouds/cirrus/...,cirrus
2,TEST,gs://sandbox-michael-menzel-vcm/clouds/cirrus/...,cirrus
3,TEST,gs://sandbox-michael-menzel-vcm/clouds/cirrus/...,cirrus
4,TEST,gs://sandbox-michael-menzel-vcm/clouds/cirrus/...,cirrus


In [6]:
query = """
    SELECT category, url, product_name
    FROM `ecb-fsf-hackathon-base.hackathon_dataset.food_unique_products_classified`
    LIMIT 60
"""
query_job = client.query(query, location="EU")
df = query_job.to_dataframe()
df.describe()

Unnamed: 0,category,url,product_name
count,60,60,60
unique,2,60,60
top,Getränke Tee,https://www.edeka24.de/Getraenke/Tee/Alnatura-...,Milford Tee Ingwer pur 28x 2 g
freq,57,1,1


In [17]:
df.head()

Unnamed: 0,category,url,product_name
0,Tiefkühl,https://shop.rewe.de/p/rewe-beste-wahl-zitrone...,REWE Beste Wahl Zitronensauce 200g
1,Tiefkühl,https://shop.rewe.de/p/rewe-beste-wahl-curry-k...,REWE Beste Wahl Curry Kokos Sauce 200g
2,Tiefkühl,https://shop.rewe.de/p/rewe-beste-wahl-pfeffer...,REWE Beste Wahl Pfeffersauce 200g
3,Getränke Tee,https://www.edeka24.de/Getraenke/Tee/Teekanne-...,"Teekanne Grüner Tee 20x 1,75 g"
4,Getränke Tee,https://www.edeka24.de/Getraenke/Tee/Bad-Heilb...,Bad Heilbrunner Magen Mild Kräutertee 20x 2 g


You can also execute a query using the BigQuery magic expression in a cell:

In [14]:
%%bigquery --verbose df
SELECT category, Count(*) as Occurence
FROM `ecb-fsf-hackathon-base.hackathon_dataset.food_unique_products_classified`
GROUP BY category
ORDER BY Occurence DESC
LIMIT 10

Executing query with job ID: 31175d58-4007-4b64-be73-2cf4736387d2
Query executing: 0.43s
Query complete after 0.67s


In [15]:
df.head()

Unnamed: 0,category,Occurence
0,Lebensmittel Frühstück,590
1,"Frische & Kühlung Joghurt, Pudding & Milchsnac...",448
2,Lebensmittel Backzutaten,390
3,Lebensmittel Schokolade/Riegel,372
4,Lebensmittel Fertiggerichte,365


**Learn more about interacting with BigQuery in the following tutorials:**
- [BigQuery basics](../tutorials/bigquery/BigQuery%20basics.ipynb)
- [BigQuery command-line tool](../tutorials/bigquery/BigQuery%20command-line%20tool.ipynb)
- [BigQuery query magic](../tutorials/bigquery/BigQuery%20query%20magic.ipynb)

## Cloud AI APIs and Cloud AutoML

Some useful resources to get started with our Cloud APIs for NLP and [AutoML](https://cloud.google.com/automl/) for NLP:
* [Cloud NLP Intro](https://cloud.google.com/natural-language/)
* [Cloud Natural Language API Docs](https://cloud.google.com/natural-language/docs/)
* [Cloud AutoML Get Started Guides](https://cloud.google.com/natural-language/overview/docs/get-started)
* [Cloud AutoML NLP in the Console](https://console.cloud.google.com/natural-language)

There is also [Cloud AutoML Tables](https://cloud.google.com/automl-tables/) to build ML models on tabular data (e.g. from BigQuery):
* [Cloud AutoML Tables Intro](https://cloud.google.com/automl-tables/)
* [Cloud AutoML Tables Docs](https://cloud.google.com/automl-tables/docs/)
* [Cloud AutoML Tables in the Console](https://console.cloud.google.com/automl-tables)


## Data Transformation with Apache Beam (and Cloud Dataflow)

In [None]:
!pip3 install apache-beam[gcp]

In [3]:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

pipeline_options = PipelineOptions.from_dictionary({
    'runner': 'DirectRunner',
# Run it massively parallel on Dataflow with
#   'runner': 'DataflowRunner'
    'job_name': 'notebook',
    'streaming': True
})

def collect(i):
    output.append(i)
    return True

output = []

p = beam.Pipeline(options=pipeline_options)

pipeline = (
    p 
    | 'generate' >> beam.Create(range(1000))
    | 'square' >> beam.Map(lambda x: x**2)
    | "print" >> beam.Map(collect)
)

result = p.run()
result.wait_until_finish()

output[:10]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Train Models with Google Cloud AI Platform Training

We want to enable the ML and Container Registry APIs in our project.

In [71]:
!gcloud services enable ml.googleapis.com
!gcloud services enable containerregistry.googleapis.com

Then, we need to create a bucket for the staging and training results. Replace with your favorite name (needs to be globally unique!):

In [None]:
!gsutil mb gs://[YOUR_GCS_BUCKET]

Ready to start our Training Job! Fill in in your bucket name where you find brackets. You can modify the model_dir parameter to change where the training output is stored.

In [None]:
gcloud ml-engine jobs submit training $JOB_NAME \
    --staging-bucket [YOUR_GCS_BUCKET] \
    --runtime-version 1.8 \
    --scale-tier BASIC_TPU \
    --module-name resnet.resnet_main \
    --package-path resnet/ \
    --region us-central1 \
    -- \
    --data_dir=gs://cloud-tpu-test-datasets/fake_imagenet \
    --model_dir=gs://[YOUR_GCS_BUCKET]/training_result/ \
    --resnet_depth=50 \
    --train_steps=1024

Learn more about AI Platform Training & Serving with ML Engine:
- [Training & Serving on ML Engine with SciKit Learn](../tutorials/cloud-ml-engine/Training%20and%20prediction%20with%20scikit-learn.ipynb)
- [Github Repo full of Training & Prediction Examples](https://github.com/GoogleCloudPlatform/cloudml-samples)

## Evaluate your Model

**Visit the notebook [evaluation.ipynb](./evaluation.ipynb).**