# Get Started with Jupyter on Google Cloud
In the following you find various helpful code examples which show you how to access data or start ML routines on Google Cloud resources like CPUs, GPUs, or TPUs.

Our first task is to import all necessary libraries used in the examples below.

In [None]:
import pandas as pd

## Access Data on Google Cloud Storage


Cloud Storage is a storage service in the Google Cloud. It can store virtually infinite amounts of data. Typically, Cloud Storage is used to store files with unstructured data, such as images, text files, and semi-structured file formats, such as CSV, Avro, Parquet, and TFRecords.

We start by creating a Cloud Storage client in Python. The client allows us to interact with the Cloud Storage service. With the client we can download and upload files.

In [None]:
from google.cloud import storage
client = storage.Client()
print("Client created using default project: {}".format(client.project))

To explicitly specify a project when constructing the client, set the `project` parameter:

In [None]:
# client = storage.Client(project='your-project-id')

First, we work with a bucket which is a root folder in Cloud Storage. Buckets can contain many files and have (practically) no size limit. Here is how we access our bucket for the hackathon:

In [None]:
bucket_name = "ecb-fsf-hackathon-base-data"
bucket = client.get_bucket(bucket_name)

print("Bucket name: {}".format(bucket.name))
print("Bucket location: {}".format(bucket.location))
print("Bucket storage class: {}".format(bucket.storage_class))

Let's list all files in the bucket:

In [None]:
blobs = bucket.list_blobs()

print("Blobs in {}:".format(bucket.name))
for item in blobs:
    print("\t" + item.name)

We can also use the gsutil command line tool for a list:

In [None]:
!gsutil ls gs://{bucket_name}

Now we can get details about one of the files, download it, and load into a dataframe:

In [None]:
blob_name = "sample.csv"
blob = bucket.get_blob(blob_name)

print("Name: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
print("Content type: {}".format(blob.content_type))
print("Public URL: {}".format(blob.public_url))

output_file_name = "/tmp/sample.csv"
blob.download_to_filename(output_file_name)

print("Downloaded blob {} to {}.".format(blob.name, output_file_name))

Again, the same can be achieved using the gsutil command line tool:

In [None]:
!gsutil cp gs://{bucket_name}/{blob_name} /tmp/{blob_name}

With the file stored locally, we can load it into a Pandas dataframe:

In [None]:
df = pd.read_csv(output_file_name, header=None)
df.describe()

And we should have a look into the dataframe:

In [None]:
df.head()

Let's use Panda's built-in support for Google Cloud Storage:

In [None]:
df = pd.read_csv('gs://ecb-fsf-hackathon-base-data/sample.csv', header=None)
df.describe()

And .head() should return the same lines as with our manual download:

In [None]:
df.head()

**Learn more about interacting with Cloud Storage in the following tutorials:**
- [Cloud Storage client library](../tutorials/storage/Cloud%20Storage%20client%20library.ipynb)
- [Storage command-line tool](../tutorials/storage/Storage%20command-line%20tool.ipynb)

## Access Tables & Views on Google BigQuery



In [1]:
from google.cloud import bigquery
client = bigquery.Client(location="EU")
print("Client creating using default project: {}".format(client.project))

Client creating using default project: ecb-fsf-hackathon-base


To explicitly specify a project when constructing the client, set the `project` parameter:

In [None]:
# client = bigquery.Client(location="US", project="your-project-id")

In [3]:
query = """
    SELECT category, url, product_name
    FROM `ecb-fsf-hackathon-base.hackathon_dataset.web_scraped_data`
    LIMIT 60
"""
query_job = client.query(query, location="EU")
df = query_job.to_dataframe()
df.describe()

Unnamed: 0,category,url,product_name
count,60,60,60
unique,29,60,60
top,"Wein, Spirituosen & Tabak Fruchtwein & Weinmis...",https://shop.rewe.de/p/kunzmann-bio-gluehwein-...,Die Weinmacher Deidesheimer Hofstück Portugies...
freq,12,1,1


In [4]:
df.head()

Unnamed: 0,category,url,product_name
0,"Wein, Spirituosen & Tabak Spirituosen & -misch...",https://shop.rewe.de/p/ramazzotti-amaro-1l/138...,Ramazzotti Amaro 1l
1,"Wein, Spirituosen & Tabak Spirituosen & -misch...",https://shop.rewe.de/p/aperol-aperitivo-italia...,Aperol Aperitivo Italiano 1l
2,"Wein, Spirituosen & Tabak Fruchtwein & Weinmis...",https://shop.rewe.de/p/kunzmann-bio-gluehwein-...,Kunzmann Bio Glühwein weiß 1l
3,"Wein, Spirituosen & Tabak Fruchtwein & Weinmis...",https://shop.rewe.de/p/gerstacker-apfelpunsch-...,Gerstacker Apfelpunsch 1l
4,"Wein, Spirituosen & Tabak Spirituosen & -misch...",https://shop.rewe.de/p/loerch-williams-christ-...,Lörch Williams Christ Birne 1l


You can also execute a query using the BigQuery magic expression in a cell:

In [7]:
%%bigquery --verbose df
SELECT category, Count(*) as Occurence
FROM `ecb-fsf-hackathon-base.hackathon_dataset.web_scraped_data`
GROUP BY category
ORDER BY Occurence DESC
LIMIT 10

Executing query with job ID: ba9b9c6f-18b5-4cbd-8ac9-f1b8db501321
Query executing: 0.40s
Query complete after 0.64s


In [8]:
df.head()

Unnamed: 0,category,Occurence
0,Lebensmittel Frühstück,590
1,"Frische & Kühlung Joghurt, Pudding & Milchsnac...",448
2,Lebensmittel Backzutaten,390
3,Lebensmittel Schokolade/Riegel,372
4,Lebensmittel Fertiggerichte,365


**Learn more about interacting with BigQuery in the following tutorials:**
- [BigQuery basics](../tutorials/bigquery/BigQuery%20basics.ipynb)
- [BigQuery command-line tool](../tutorials/bigquery/BigQuery%20command-line%20tool.ipynb)
- [BigQuery query magic](../tutorials/bigquery/BigQuery%20query%20magic.ipynb)

## Cloud AI APIs and Cloud AutoML

Some useful resources to get started with our Cloud APIs for NLP and [AutoML](https://cloud.google.com/automl/) for NLP:
* [Cloud NLP Intro](https://cloud.google.com/natural-language/)
* [Cloud Natural Language API Docs](https://cloud.google.com/natural-language/docs/)
* [Cloud AutoML Get Started Guides](https://cloud.google.com/natural-language/overview/docs/get-started)
* [Cloud AutoML NLP in the Console](https://console.cloud.google.com/natural-language)

There is also [Cloud AutoML Tables](https://cloud.google.com/automl-tables/) to build ML models on tabular data (e.g. from BigQuery):
* [Cloud AutoML Tables Intro](https://cloud.google.com/automl-tables/)
* [Cloud AutoML Tables Docs](https://cloud.google.com/automl-tables/docs/)
* [Cloud AutoML Tables in the Console](https://console.cloud.google.com/automl-tables)


## Data Transformation with Apache Beam (and Cloud Dataflow)

In [None]:
!pip3 install apache-beam[gcp]

In [None]:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

pipeline_options = PipelineOptions.from_dictionary({
    'runner': 'DirectRunner',
# Run it massively parallel on Dataflow with
#   'runner': 'DataflowRunner'
    'job_name': 'notebook',
    'streaming': True
})

def collect(i):
    output.append(i)
    return True

output = []

p = beam.Pipeline(options=pipeline_options)

pipeline = (
    p 
    | 'generate' >> beam.Create(range(1000))
    | 'square' >> beam.Map(lambda x: x**2)
    | "print" >> beam.Map(collect)
)

result = p.run()
result.wait_until_finish()

output[:10]

## Train Models with Google Cloud AI Platform Training

We want to enable the ML and Container Registry APIs in our project.

In [None]:
!gcloud services enable ml.googleapis.com
!gcloud services enable containerregistry.googleapis.com

Then, we need to create a bucket for the staging and training results. Replace with your favorite name (needs to be globally unique!):

In [None]:
!gsutil mb gs://[YOUR_GCS_BUCKET]

Ready to start our Training Job! Fill in in your bucket name where you find brackets. You can modify the model_dir parameter to change where the training output is stored.

In [None]:
gcloud ml-engine jobs submit training $JOB_NAME \
    --staging-bucket [YOUR_GCS_BUCKET] \
    --runtime-version 1.8 \
    --scale-tier BASIC_TPU \
    --module-name resnet.resnet_main \
    --package-path resnet/ \
    --region us-central1 \
    -- \
    --data_dir=gs://cloud-tpu-test-datasets/fake_imagenet \
    --model_dir=gs://[YOUR_GCS_BUCKET]/training_result/ \
    --resnet_depth=50 \
    --train_steps=1024

Learn more about AI Platform Training & Serving with ML Engine:
- [Training & Serving on ML Engine with SciKit Learn](../tutorials/cloud-ml-engine/Training%20and%20prediction%20with%20scikit-learn.ipynb)
- [Github Repo full of Training & Prediction Examples](https://github.com/GoogleCloudPlatform/cloudml-samples)

## Evaluate your Model

**Visit the notebook [evaluation.ipynb](./evaluation.ipynb).**