<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/amli/v2/content/02_data/00_introduction_to_colab/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Introduction to Colab

In this unit we will explore [Colaboratory](https://colab.research.google.com/) (Colab for short). Colab is a tool for notebook-based programming. This style of programming turns out to be a great platform for machine learning education and research!

A **notebook** is a file that contains code, documentation, and output from the execution of the code. [Jupyter](https://jupyter.org/) is currently the most popular form of notebook.

Jupyter notebooks can be edited and executed locally. It turns out that they are also very effective when they are hosted in the cloud. **Colab** is just that. Colab is a platform that stores notebooks in Google Drive and executes the code in the notebooks on virtual machines in the cloud. This allows for a zero-setup instant environment for working on data science problems.

## Overview

### Estimated Duration

60 minutes

# Notebooks

Colab notebooks are *almost* exactly the same as Jupyter notebooks. You can open a notebook created in Jupyter in Colab, and vice versa. Since Colab is running online and created by Google, it has a few different features related to integration with Google products. In practice this shouldn't affect you during this course. If you do find yourself downloading the notebooks and running them directly in Jupyter, you might have to adapt your notebook when a Google-specific feature is used.

# Cells

A notebook is a list of cells. There are two types of cells: code cells and text cells. We'll work with both in this lab.

## Code cells

Below is a **code cell**. You can click in the cell to select it. To execute the code in the cell you have a few options:

1. Click the **Play icon** in the left gutter of the cell.
1. Type **Cmd/Ctrl+Enter** to run the cell in place.
1. Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists).
1. Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.
1. Click the **Runtime** menu and select **Run the focused cell**.


You'll notice in the **Runtime** menu, there are also options for running all cells, cells before, cells after, and the selected cells.


In [0]:
a = 10
a

After you run a cell, you can see the output displayed immediately after the cell.

You might have also noticed a little delay the first time you ran the code cell. This is related to *runtimes*, which we will talk about soon.

When you run code, Colab passes that code to [IPython](https://ipython.org/) to execute. The IPython session doesn't restart for every code cell, so you can do things like set a variable in one block:

In [0]:
a = 1000

And then use that variable in another block.

In [0]:
print(a)

It doesn't matter which order the code blocks show up in in a notebook. What matters is the order that they are executed in.

In the two code blocks below we define a variable in one and print the variable in the other. If you run the first block before the second, you'll get an error. If you run the second and then the first, everything will work fine.

In [0]:
# This cell should be executed after the subsequent cell.

print(b)

In [0]:
# This cell should be executed first since it defines 'b'.

b = 1234

Most programs execute in a very defined flow. The nature of an interactive programming environment allows you to run code cells in any order that fits your workflow. This can be very convenient, but it can also lead to tricky bugs where variables have unexpected values due to the order in which the data scientist ran them in their exploration.

### Exercise 1: Writing and Running Code in Colab

This Colab runs Python 3 code. Let's write and execute some Python in the cell below.

Write the following code in the student solution cell:

```python
  print("Hello Colab!")
```

And then run the cell.

**Student Solution**

In [0]:
# Your Solution Goes Here

---

#### Answer Key

In [0]:
print("Hello Colab!")

---

### Shell Commands

When you run code through a notebook, that code is running on a computer somewhere. With Colab, that machine is likely a virtual machine in one of Google's data centers. These machines are full [Linux](https://www.linux.org/) installations.

Linux offers many powerful commands through what is known as a **shell**. You can access these commands using the exclamation point, `!`, at the start of a line.

For example, the shell command to list all of the files in a directory is `ls`. To run `ls` in the shell simply type `!ls` into a code cell, and execute it.

You can see `ls` in action below.

In [0]:
!ls

There are many more shell commands. The University of Washington has a nice [reference card](https://courses.cs.washington.edu/courses/cse390a/14au/bash.html) listing some of the more common commands.

#### Exercise 2: Running Shell Commands

Find the shell command that displays the current working directory. Execute that command in a code block.

**Student Solution**

In [0]:
# Your Solution Goes Here

---

##### Answer Key

In [0]:
!pwd

---

### Magics

We've seen Python code and shell commands in code blocks. There are other commands called **magics** that can change how a line or cell works. This is a concept native to Jupyter, so you can consult the [documentation  for Jupyter's magics](http://nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb) to see a list of magics. 

Line magics start with a single percentage mark, `%`, and are limited to operating on one line of the cell.

For instance, the `%timeit` magic below runs the line of code passed to it multiple times and records those times.


In [0]:
import numpy as np

%timeit np.linalg.eigvals(np.random.rand(100, 100))

Cell magics start with two percentage signs, `%%`, and apply to all subsequent content in the cell. For example, the `%%html` magic interprets the cell as HTML and outputs rendered HTML instead of raw text.

In [0]:
%%html
<marquee style='width: 30%; color: blue;'><b>Whee!</b></marquee>

#### Exercise 3: Magics for Shell Commands

We have seen how to run a shell command using the exclamation point, `!`. You can also use a shell magic to run shell commands without having to prefix each with `!`.

Find a magic that tells Colab to interpret a code cell as Bash (a common Linux shell), then run the `ls` command using that magic.

**Student Solution**

In [0]:
# Your Solution Goes Here

---

##### Answer Key

In [0]:
%%bash
ls

---

### Charts and Graphs

Colab also allows for charts and graphs to be displayed inline. Don't pay too much attention to the code below, but notice that when you run the code block, a chart is displayed in the output.

In [0]:
import numpy as np
from matplotlib import pyplot as plt

ys = 200 + np.random.randn(100)
x = [x for x in range(len(ys))]

plt.plot(x, ys, '-')
plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6)

plt.title("Fills and Alpha Example")
plt.show()

### Getting Help

Colab provides hints to explore attributes of Python objects, as well as to quickly view documentation strings. As an example, first run the following cell to import the [`numpy`](http://www.numpy.org) module.

In [0]:
import numpy as np

Now, insert your cursor after `np.random` below and type a dot: `.`. Wait just a moment, and you should see a drop-down list of all of the attributes of `np.random`.

If you don't see the drop-down list, try pressing the **Tab** key. Some platforms have slightly different triggers for the suggested completions.

In [0]:
np.random

If you type an opening parenthesis after `np.random.rand` below, you should see a pop-up with documentation about the function. If not, try pressing the **Tab** key.

In [0]:
np.random.rand

To open the documentation in a persistent pane at the bottom or right-hand side of your screen, add a **?** after the object or method name and execute the cell using **Cmd/Ctrl+Enter**:

In [0]:
np.random?

## Text cells

Now that we've seen an example of a code cell, this is a **text cell**.

To edit a text cell, **double-click** on the cell.  When you do, you'll notice there are extra asterisk characters around the word "double-click".  That's because Colab text cells use markdown syntax (described more below).

### Markdown

There are two ways of specifying format within a text cell. The first is by using the icons which are visible when you're editing text. You can specify the size of the text, make the text bold or italic, format the text as code text, make the text a clickable URL, display an image, indentation, bulleted texts, or insert a horizontal separator line. Another way is to type the markdown syntax yourself. A reference to the markdown supported by Colab can be found in [this notebook](https://colab.research.google.com/notebooks/markdown_guide.ipynb).



### Basic Formatting

Bold text can be achieved with double-asterisks like `**bold text**` or double-underscores like `__bold text__`.

```
The **runtime** supports __Python__ programming.
```

> The **runtime** supports __Python__ programming.

Italic text can be achieved with single-asterisks like `*italic text*` or single-underscores like `_italic text_`.


```
The *runtime* supports _Python_ programming.
```

> The *runtime* supports _Python_ programming.

Strikethrough can be achieved with double-tilde like `~~strikethrough~~`.

```
The runtime supports ~~Python~~ programming.
```

> The runtime supports ~~Python~~ programming.

#### Exercise 4: Basic Formatting With Markdown

In the student solution cell below replace:

```
Your Solution Goes Here
```

With the following text:

```
The quick brown fox jumps over the very extremely lazy dog.
```

Then  markdown that does the following:

1. Italicize the words `quick`, `jumps`, and `lazy`.
1. Strike through the words `very extremely` with one piece of markdown.
1. Bold the words `fox` and `dog`.

**Student Solution**

Your Solution Goes Here

---

##### Answer Key

```
The *quick* brown **fox** *jumps* over the ~~very extremely~~ *lazy* **dog**.
```

or 

```
The _quick_ brown __fox__ _jumps_ over the ~~very extremely~~ _lazy_ __dog__.
```

---

### Lists

Lists can be added to text cells. For **ordered lists** use numbers:

```
1. Apples
1. Oranges
1. Kiwi
```

Becomes:

> 1. Apples
> 1. Oranges
> 1. Kiwi

Notice that the number was always one. This doesn't have to be the case. You could get the same effect with:

```
1. Apples
2. Oranges
3. Kiwi
```

However, this can get messy when you want to reorder the list. Just stick to using `1` and make future maintenance of your text cell easier.

If ordinals don't matter, you can get an bulleted lists using asterisks:

```
* Apples
* Oranges
* Kiwi
```

Becomes:

> * Apples
> * Oranges
> * Kiwi

And finally, you can nest lists using indentation:

```
* Apples
  1. Fugi
  1. Gaia
  1. Cosmic Crisp
* Oranges
  1. Valencia
  1. Navel
* Kiwi
  * Artic
  * Hardy
```

Becomes:

> * Apples
>   1. Fugi
>   1. Gaia
>   1. Cosmic Crisp
> * Oranges
>   1. Valencia
>   1. Navel
> * Kiwi
>   1. Artic
>   1. Hardy

#### Exercise 5: Making Lists

Create the following text in markdown.

![List to create in markdown: Copyright Google](https://i.imgur.com/MKGHfhN.png)

**Student Solution**

Your Solution Goes Here

---

##### Answer Key

```
**Data Science Checklist**

* Data
  1. Feth
  1. Decompress
  1. Analyze
  1. Clean
* Model
  1. Build
  1. Test 
* Deployment
  1. Deploy
  1. Monitor
```

---

### Links

It is possible to link to external resources in markdown using square brackets and parentheses:

`[...](...)`

The text to be displayed is placed in the square brackets and a URL is placed in the parentheses like:

`[Colaboratory](https://research.google.com/colaboratory)`

Which becomes:

[Colaboratory](https://research.google.com/colaboratory)

#### Exercise 6: Links to Resources

We will be using a toolkit called `scikit-learn` in this course. The URL for the user guide is `https://scikit-learn.org/stable/user_guide.html`. Create a link with the text `scikit-learn User Guide` that links to `https://scikit-learn.org/stable/user_guide.html`.

**Student Solution**

Your Solution Goes Here

---

##### Answer Key

```
  [scikit-learn User Guide](https://scikit-learn.org/stable/user_guide.html)
```

---

### Tables

It is also possible to represent data in text cells in a tabular format. Tables consist of rows and columns. Columns are separated by vertical bars:

```
Column 1 | Column 2 | Column 3
```

Each new line in a table is another row:

```
Row 1 Column 1 | Row 1 Column 2
Row 2 Column 1 | Row 2 Column 2
```

You can also add a header column by separating the first row (containing header data) from the remaining rows using a row containing three dashes, `---`, per column.

Putting it all together:


```
Language | Creator(s)
--- | ---
Python | Guido van Rossum
R | Ross Ihaka, Robert Gentleman
Java | James Gosling
```

Becomes:

Language | Creator(s)
--- | ---
Python | Guido van Rossum
R | Ross Ihaka, Robert Gentleman
Java | James Gosling


#### Exercise 7: Creating Tables

Using markdown, create the following table:

![Table](https://i.imgur.com/RunAbi1.png)

**Student Solution**

Your Solution Goes Here

---

##### Answer Key

```
Toolkit | Year Created | Language(s)
--- | --- | ---
TensorFlow | 2015 | C++, Python, CUDA
scikit-learn | 2010 | Python
Pytorch | 2016 | C++, Python, CUDA
```

---

### Math Notations

In the fields of data science and machine learning, it is sometimes useful to express ideas as mathematical equations. Using [LaTeX](http://www.latex-project.org/) syntax and [MathJax](https://www.mathjax.org) rendering, Colab allows in-line equation building.

Simply enclose valid LaTeX in a pair of **\$** signs.

For example: 

```
$\sqrt{3x-1}+(1+x)^2$
```

Becomes:

$\sqrt{3x-1}+(1+x)^2.$


#### Exercise 7: Write a Formula

The Pythagorean Theorem is represented in two forms below. Create these with inline LaTeX.

![Pythagorean Theorem](https://i.imgur.com/F5eKQEQ.png)

##### Answer Key

```

$ a^2 + b^2 = c^2 $

$ c = \sqrt{a^2 + b^2} $ 

```

---

## Adding and moving cells
You can add new cells by using the **+ CODE** and **+ TEXT** buttons that show when you hover between cells. These buttons are also in the toolbar above the notebook where they can be used to add a cell *below* the currently selected cell.

You can move a cell by selecting it and clicking **Cell Up** or **Cell Down** in the top toolbar. 

Consecutive cells can be selected by "lasso selection" by dragging from outside one cell and through the group.  Non-adjacent cells can be selected concurrently by clicking one and then holding down **Cmd/Ctrl** while clicking another.  Similarly, using **Shift** instead of **Cmd/Ctrl** will select all intermediate cells.

## Commenting on a cell
You can comment on a Colaboratory notebook like you would on a Google Document. Comments are attached to cells, and are displayed next to the cell they refer to. If you have **comment-only** permissions, you will see a comment button on the top right of the cell when you hover over it.

If you have edit or comment permissions you can comment on a cell in one of three ways: 

1. Select a cell and click the comment button in the toolbar above the top-right corner of the cell.
1. Right click a text cell and select **Add a comment** from the context menu.
3. Use the shortcut **Cmd/Ctrl+Shift+M** to add a comment to the currently selected cell. 

You can resolve and reply to comments, and you can target comments to specific collaborators by typing *+[email address]* (e.g., `+user@domain.com`). Addressed collaborators will be emailed. 

The Comment button in the top-right corner of the page shows all comments attached to the notebook.

# Integration with Drive

Colaboratory is integrated with Google Drive. It allows you to share, comment, and collaborate on the same document with multiple people:

* The **SHARE** button (top-right of the toolbar) allows you to share the notebook and control permissions set on it.

* **File->Save a Copy in Drive...** creates a copy of the notebook in Drive.

* **File->Save** saves the File to Drive. **File->Save and pin revision** pins the version so it doesn't get deleted from the revision history. 

* **File->Revision history** shows the notebook's revision history. 

* Multiple people can **collaboratively edit** the same notebook at the same time. Like Google Docs, you can see collaborators both within the document (top right, left of the comments button) and within a cell (right of the cell). 

# Runtimes

In this course, we will typically use the default Python 3 runtime provided by Colab. But what is a runtime anyway?

When you run code through Colab, that code is not running on the machine that you are directly using by default. Instead, Colab sends your code to a **virtual machine** running on Google Cloud Platform.

This machine has a specific version of Python and supporting libraries installed on it. The machine has processors, RAM, disk space, and other attributes you'd expect on any computer.

When you first attempt to run a code cell in a Colab, a new runtime is created for you. Colab keeps your virtual machine running for a few hours and then turns it off.

There are ways to connect to custom runtimes, but that is beyond the scope of this introductory lab.

## Runtime Actions

There are some runtime actions that you can perform. In the 'Runtime' menu in the toolbar you can find options to:

* Run the current cell
* Run all code cells
* Run all code cells before/after the current cell
* Run a selected subset of code cells

You can also find options to restart the runtime. This is handy when you've been working in a notebook for a while and aren't sure what state it is in. A restart clears the memory and, optionally, reruns all of the cells.

If you have installed modules, downloaded files, or changed the non-RAM state of the runtime in any way, you can even "Factory reset" the runtime to get it back into a pristine state.

## Runtime Configuration

In the 'Runtime' menu you can also click 'Change Runtime Type' where you can toggle between Python 2 and Python 3, use hardware acceleration (GPUs or TPUs), and choose to store runtime output or not in the notebook.