title | theme | highlightTheme | separator | verticalSeparator | revealOptions | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Google Colab & AI Notebook |
solarized |
solarized-dark |
--- |
-- |
|
Toulouse Data Science #38 - June 18th 2019
Florient CHOUTEAU
--
-
ML Engineer @ Airbus Defence and Space (Space Systems)
-
Training Neural Networks on remote sensing imagery since 2016
-
Delair, Magellium & Airbus Intelligence (w/ Jeff, spoilers), Airbus DS...
-
torch, tf, keras, pytorch, ...
-
a lot of time spent installing instances
-
-
Contact: @foxchouteau or on Slack
--
--
--
--
-
easy access to configured development environment for ML
-
from Google but not limited to their tech
-
jupyter-based products
-
one free, one paid: different use cases, similar principles
--
This talk is not sponsored by Google ;)
There may be better alternatives: Feel free to comment after :)
https://colab.research.google.com
--
--
-
Jupyter Notebook + Google Drive
-
Full python data science environment
-
12h max session lifetime
--
-
Students, people learning ML/DS
-
Teachers, share courses, get assignments
-
Quick experiments / sharing
--
-
Can use your data: gdrive, gsheet, local filesystem
-
Jupyter-based: All the power of interactive & visualisations
-
You can
apt-get
andpip install
what you need
--
-
GPU ! (Nvidia Tesla T4, 16 GB GPU RAM = 3000$)
-
Collaboration ! (share and co-edit notebooks)
-
Open notebook from github to colab !
--
-
End-to-end training w/ GPU. pytorch and pytorch-ignite
-
Notebook on github, Data on Google Drive
--
-
Long calculations w/ guarantees (you can checkpoint your models on colab though)
-
Code syncing / huge codebase & huge datasets
-
Full control over installation and data
https://cloud.google.com/deep-learning-vm/
--
-
Cloud Provider, very nice VM instances options
-
300$ free, paid for GPU and unlocked bandwidth
-
Rather easy to use for ML / DS
--
-
Pre configured paid Cloud Virtual Machines (Google Compute Engine)
-
With jupyter lab auto launched & ready
-
Papermill pre installed for scheduling
--
--
-
Jupyter only ("AI Notebook")
-
Pre-configured instance for Data Science ("Deep Learning VM")
--
-
Creating an instance
-
Connecting to jupyter lab (with or without ssh !)
https://console.cloud.google.com
--
-
Using the DL VM as a preconfigured headless code runner
-
Executing a notebook on a deep-learning-vm
-
Install this: https://github.com/gclouduniverse/gcp-notebook-executor
INPUT_NOTEBOOK="gs://{your-storage}/ai-notebook-demo.ipynb"
GCP_BUCKET="gs://{your-storage}/runs"
IMAGE_FAMILY_NAME="pytorch-latest-gpu"
INSTANCE_TYPE="n1-standard-8"
GPU_TYPE="k80"
GPU_COUNT=1
ZONE="europe-west1-b"
execute_notebook -i "${INPUT_NOTEBOOK}" \
-o "${GCP_BUCKET}" \
-f "${IMAGE_FAMILY_NAME}" \
-t "${INSTANCE_TYPE}" \
-z "${ZONE}" \
-g "${GPU_TYPE}" \
-c "${GPU_COUNT}"
--
-
Use "preemptible" (spot in AWS terminology)*
-
CLI creation for more customization
*5x less expensive, run only 24h
--
Google Colab | Google AI Notebook |
---|---|
Learn, experiment | Can scale compute |
Single notebook / Clone from github | Upload own code |
Simple jupyter env. | Full jupyter lab or SSH access |
Data from anywhere / google drive | Fully owned cloud environment |
Short runtimes | Cheap 1d runtimes or arbitrary runtimes |
**free** | **[paid](https://cloud.google.com/compute/pricing)** (by minute of computing + storage) |
--
-
Kaggle Kernels: for kaggle, colab, free, 9h, P100
-
Amazon Sagemaker: can someone tell me about it ?
-
A lot of smaller entities... floydhub...
-
Build your own machine ? opinion: last step for individual use (be sure of what you need !)
--