# Google Colab Introduction 

## Cells

Python instructions:

In [0]:
print("Hello world")

Hello world


Linux commands (bash):

In [0]:
!ls /

bin   content  dev  home  lib32  media	opt   root  sbin  swift  tmp	usr
boot  datalab  etc  lib   lib64  mnt	proc  run   srv   sys	 tools	var


### This is heading

This is some Text

- this
- is
- a 
- list

Formula:

$$ \int_0^\infty x^2 dx $$

Visualization 

In [0]:
# load an example dataset
from vega_datasets import data
cars = data.cars()

import altair as alt

interval = alt.selection_interval()

base = alt.Chart(cars).mark_point().encode(
  y='Miles_per_Gallon',
  color=alt.condition(interval, 'Origin', alt.value('lightgray'))
).properties(
  selection=interval
)

base.encode(x='Acceleration') | base.encode(x='Horsepower')

## VM Info

### Software info (OS, Python Version)

In [0]:
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic


In [0]:
!python3 --version

Python 3.6.7


### Hardware Info

**CPU:**

In [0]:
!lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:            0
CPU MHz:             2300.000
BogoMIPS:            4600.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd i

**Memory:**

In [0]:
!free -m

              total        used        free      shared  buff/cache   available
Mem:          13022         438       10568           0        2015       12326
Swap:             0           0           0


**GPU:**

In [0]:
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil humanize --quiet

import humanize
import GPUtil as GPU

gpu = GPU.getGPUs()[0]
print(gpu.name)
print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))

  Building wheel for gputil (setup.py) ... [?25ldone
[?25hTesla K80
GPU RAM Free: 11441MB | Used: 0MB | Util   0% | Total 11441MB


## Data Import

### Using `wget`:

In [0]:
!wget https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/Titanic.csv

--2019-02-19 13:18:21--  https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/Titanic.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1170 (1.1K) [text/plain]
Saving to: ‘Titanic.csv’


2019-02-19 13:18:21 (212 MB/s) - ‘Titanic.csv’ saved [1170/1170]



In [0]:
import pandas as pd
pd.read_csv('Titanic.csv')

Unnamed: 0.1,Unnamed: 0,Class,Sex,Age,Survived,Freq
0,1,1st,Male,Child,No,0
1,2,2nd,Male,Child,No,0
2,3,3rd,Male,Child,No,35
3,4,Crew,Male,Child,No,0
4,5,1st,Female,Child,No,0
5,6,2nd,Female,Child,No,0
6,7,3rd,Female,Child,No,17
7,8,Crew,Female,Child,No,0
8,9,1st,Male,Adult,No,118
9,10,2nd,Male,Adult,No,154


### Using Git

In [0]:
!git clone https://github.com/Jakobovski/free-spoken-digit-dataset.git

Cloning into 'free-spoken-digit-dataset'...
remote: Enumerating objects: 535, done.[K
remote: Counting objects:   0% (1/535)   [Kremote: Counting objects:   1% (6/535)   [Kremote: Counting objects:   2% (11/535)   [Kremote: Counting objects:   3% (17/535)   [Kremote: Counting objects:   4% (22/535)   [Kremote: Counting objects:   5% (27/535)   [Kremote: Counting objects:   6% (33/535)   [Kremote: Counting objects:   7% (38/535)   [Kremote: Counting objects:   8% (43/535)   [Kremote: Counting objects:   9% (49/535)   [Kremote: Counting objects:  10% (54/535)   [Kremote: Counting objects:  11% (59/535)   [Kremote: Counting objects:  12% (65/535)   [Kremote: Counting objects:  13% (70/535)   [Kremote: Counting objects:  14% (75/535)   [Kremote: Counting objects:  15% (81/535)   [Kremote: Counting objects:  16% (86/535)   [Kremote: Counting objects:  17% (91/535)   [Kremote: Counting objects:  18% (97/535)   [Kremote: Counting objects:  19% (102/535) 

### Upload from your computer

In [0]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

Saving test.py to test.py
User uploaded file "test.py" with length 42 bytes


In [0]:
!python3 test.py

This is a script from my computer


### Download file from colab

In [0]:
from google.colab import files
import time

with open('colab_file.txt', 'w') as f:
  f.write('This file created in current Colab instance at ' + str(time.time()))

files.download('colab_file.txt')

### Using Google Drive

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
!apt install tree -y > /dev/null
!tree "/content/drive/My Drive"



/content/drive/My Drive
├── Colab Notebooks
│   └── Colab Intro.ipynb
├── datasets
│   └── biostats.csv
├── Untitled form.gform
└── Untitled form (Responses).gsheet

2 directories, 4 files


In [0]:
from pathlib import Path
GDRIVE_DIR = Path("/content/drive/My Drive")

pd.read_csv(GDRIVE_DIR / 'datasets' / 'biostats.csv')

Unnamed: 0,Name,"""Sex""","""Age""","""Height (in)""","""Weight (lbs)"""
0,Alex,"""M""",41,74,170
1,Bert,"""M""",42,68,166
2,Carl,"""M""",32,70,155
3,Dave,"""M""",39,72,167
4,Elly,"""F""",30,66,124
5,Fran,"""F""",33,66,115
6,Gwen,"""F""",26,64,121
7,Hank,"""M""",30,71,158
8,Ivan,"""M""",53,72,175
9,Jake,"""M""",32,69,143


In [0]:
!mkdir -p  "/content/drive/My Drive/colab_result"
!echo "accuracy is 100%" > "/content/drive/My Drive/colab_result/result_01.txt"
!tree "/content/drive/My Drive"

/content/drive/My Drive
├── Colab Notebooks
│   └── Colab Intro.ipynb
├── colab_result
│   └── result_01.txt
├── datasets
│   └── biostats.csv
├── Untitled form.gform
└── Untitled form (Responses).gsheet

3 directories, 5 files


## Train MLP Model

Always install all of your dependencies in the begining of your notebook! 

In [0]:
!apt-get -qq install -y libarchive-dev 
!pip install -q -U libarchive

Selecting previously unselected package libarchive-dev:amd64.
(Reading database ... (Reading database ... 5%(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%(Reading database ... 65%(Reading database ... 70%(Reading database ... 75%(Reading database ... 80%(Reading database ... 85%(Reading database ... 90%(Reading database ... 95%(Reading database ... 100%(Reading database ... 131359 files and directories currently installed.)
Preparing to unpack .../libarchive-dev_3.2.2-3.1ubuntu0.3_amd64.deb ...
Unpacking libarchive-dev:amd64 (3.2.2-3.1ubuntu0.3) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up libarchive-dev:amd64 (3.2.2-3.1ubuntu0.3) ...
  Building wheel for libarchive (setup.py) ... [?25ldone
[?25h

In [0]:
import libarchive
print(libarchive.__version__)

0.4.6


The import all of your prerequisites:

In [0]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (7,9) # Make the figures a bit bigger


import os
import tensorflow as tf
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils, to_categorical
from keras.callbacks import ModelCheckpoint

Using TensorFlow backend.


Build the network

In [0]:
nb_classes = 10

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

x_train = train_images.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = test_images.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

y_train = to_categorical(train_labels)
y_test = to_categorical(test_labels)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

# Checkpoint In the google drive folder
model_dir = GDRIVE_DIR / 'models' / 'mnist'
model_dir.mkdir(parents=True, exist_ok=True)

checkpoint_path =  model_dir /"model-{acc:02f}.hdf5"

# Keep only a single checkpoint, the best over test accuracy.
checkpoint = ModelCheckpoint(str(checkpoint_path),
                             monitor='acc',
                             verbose=1)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
Instructions for updating:
Colocations handled automatically by placer.


Train model from scratch or Resume previous training:

In [0]:
def get_checkpoints():
  saved_checkpoints = [f for f in os.listdir(model_dir) if f.startswith('model-')]
  saved_checkpoints.sort(reverse=True)
  return saved_checkpoints

In [0]:
saved_checkpoints = get_checkpoints()
print(saved_checkpoints)

['model-0.865600.hdf5', 'model-0.840783.hdf5', 'model-0.751167.hdf5', 'model-0.502367.hdf5']


In [0]:
if len(saved_checkpoints) > 0:
  last_checkpoint = saved_checkpoints[0]
  print("Resume training from " + last_checkpoint)
  model.load_weights(model_dir / last_checkpoint)
else:
  print("Traning from scratch!")

model.fit(x_train, y_train, epochs=4, batch_size=15000, callbacks = [checkpoint])

Resume training from model-0.865600.hdf5
Epoch 1/4

Epoch 00001: saving model to /content/drive/My Drive/models/mnist/model-0.864567.hdf5
Epoch 2/4

Epoch 00002: saving model to /content/drive/My Drive/models/mnist/model-0.886300.hdf5
Epoch 3/4

Epoch 00003: saving model to /content/drive/My Drive/models/mnist/model-0.883550.hdf5
Epoch 4/4

Epoch 00004: saving model to /content/drive/My Drive/models/mnist/model-0.897650.hdf5


<keras.callbacks.History at 0x7fba19830f98>

## Long Running Job

In [0]:
score = model.evaluate(x_test, y_test)
print('Test accuracy:', score[1])

Test accuracy: 0.8962


In [31]:
model.fit(x_train, y_train, epochs=200, batch_size=15000)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x7fba19830fd0>

In [32]:
score = model.evaluate(x_test, y_test)
print('Test accuracy:', score[1])

Test accuracy: 0.971
