# Jupyter Notebooks Overview

For this class, we will be using Jupyter Notebooks for most or all of our work.

Jupyter Notebooks consist of:
- Markdown (formatted text) cells
- Code cells (Python in our case although notebooks can be run with other languages)
- Output from code cells

Google Colab is a Jupyter environment.  Jupyter Lab can also be installed locally on your computer and used.

Jupyter notebooks are ideal for data science because we can seamlessly switch between explaining what we are doing and using code.

For example, this is a text cell.

In [None]:
# and this is a code cell where we'll multiply some things
x = 9
y = 8

for i in range(x):
  print(i*y)


0
8
16
24
32
40
48
56
64


**A big part of data science is communicating your results and explaining your process.**

# Helpful Colab Basics

## Keyboard Shortcuts
|Key Combo|Action|
---|---
Shift + Enter| Run cell  
Ctrl + Shift + S | Select a cell
Ctrl + Shift + Enter | Run Selection
Ctrl + Space | Code Completion
Ctrl + M + H | Open Keyboard Shortcuts
Ctrl + M + Z | Undo Last Cell Action

## Help


Using help or ? will show you the doc string for a function, method or object.

In [None]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



This works on functions

In [None]:
len?

And methods

In [None]:
L = [1, 2, 3]
L.insert?

And objects

In [None]:
L?

## Tab Completion

'Tab Completion' really uses `Ctrl + Space`  unless you turn off auto-completion in settings.  
This will show you available methods and attributes.


In [None]:
# Use ctrl + space after the .
my_string = "I like Python"
my_string.

Using tab completion when importing packages

In [None]:
# use ctrl + space after import
import matplotlib.pyplot as plt


In [None]:
# use ctrl + space after r
from numpy import recarray

##  Colab as a virtual environment
Colab is a Jupyter environment that runs in the cloud.

Beyond just editing a Jupyter notebook, you get a runtime that it's connected to (basically a virtual machine, i.e. VM).

You can do a lot of things that take advantage of this like
- run shell commands
- save files to the VM (these are temporary as they are lost when the notebook is closed)

Running shell commands with !
  - !ls
  - !pip install ...

In fact, you can generally run these without the `!`

(For `cd` you want `%cd` instead)

Read more [here](https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html#Shell-Related-Magic-Commands).

In [None]:
ls -la

total 16
drwxr-xr-x 1 root root 4096 Sep 20 13:22 [0m[01;34m.[0m/
drwxr-xr-x 1 root root 4096 Sep 23 16:36 [01;34m..[0m/
drwxr-xr-x 4 root root 4096 Sep 20 13:21 [01;34m.config[0m/
drwxr-xr-x 1 root root 4096 Sep 20 13:22 [01;34msample_data[0m/


In [None]:
pwd

'/content'

In [None]:
%cd sample_data

/content/sample_data


In [None]:
!ls

anscombe.json		     california_housing_train.csv  mnist_train_small.csv
california_housing_test.csv  mnist_test.csv		   README.md


In [None]:
!pwd


/content/sample_data


In [None]:
! ps  faux


USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          37  0.0  0.0   5808   996 ?        Ss   16:36   0:00 tail -n +0 -F /root/.config/Googl
root          31  0.0  0.0   5808  1012 ?        Ss   16:36   0:00 tail -n +0 -F /root/.config/Googl
root           1  0.0  0.0   1076     8 ?        Ss   16:36   0:00 /sbin/docker-init -- /datalab/run
root           7  0.2  0.4 897768 57676 ?        Sl   16:36   0:02 /tools/node/bin/node /datalab/web
root          16  0.0  0.0   7376  3552 ?        S    16:36   0:00  \_ /bin/bash -e /usr/local/colab
root        3861  0.0  0.0   5776  1004 ?        S    16:51   0:00  |   \_ sleep 1
root          75  0.8  0.0      0     0 ?        Z    16:36   0:08  \_ [python3] <defunct>
root          76  0.1  0.3  68028 52904 ?        S    16:36   0:00  \_ python3 /usr/local/bin/colab-
root          94  0.5  1.2 565428 169140 ?       Sl   16:36   0:05  \_ /usr/bin/python3 /usr/local/b
root         626  0.9  0.8 1126628 112784 ? 

In [None]:
!id

uid=0(root) gid=0(root) groups=0(root)


## Sharing your notebooks

Sharing options are consistent with Google Drive in general.  

Sharing Options:
- Private
- Public
- Specific people
- People with the link
- If you put something in a shared folder, it will automatically be shared too


## Saving your notebooks & revision history

*   You can save your notebook by going to File -> Save.
*   File -> Save and Pin Revision will pin the version so it doesn't get deleted from the revision history.
*   File -> Revision History will show your notebook's revision history. This can be useful if you need to revert back to a previous version of your notebook.


## Moving your notebooks
- Demo how to move notebooks in Drive
- Save somebody else's notebook - File -> Save a copy in Drive


# Sample JupyterLab Notebooks

[NBViewer](https://nbviewer.jupyter.org/)

[A Gallery of Interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)



# More Info
[Overview of Colab](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)  
[Python Data Science Handbook - Help and Documentation](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.01-Help-And-Documentation.ipynb)

In [17]:
cd /content/


/content


In [18]:
!ls -la


total 16
drwxr-xr-x 1 root root 4096 Sep 20 13:22 .
drwxr-xr-x 1 root root 4096 Sep 23 16:36 ..
drwxr-xr-x 4 root root 4096 Sep 20 13:21 .config
drwxr-xr-x 1 root root 4096 Sep 20 13:22 sample_data


In [19]:
cd sample_data/


/content/sample_data


In [20]:
pwd

'/content/sample_data'

In [21]:
ls -la


total 55512
drwxr-xr-x 1 root root     4096 Sep 20 13:22 [0m[01;34m.[0m/
drwxr-xr-x 1 root root     4096 Sep 20 13:22 [01;34m..[0m/
-rwxr-xr-x 1 root root     1697 Jan  1  2000 [01;32manscombe.json[0m*
-rw-r--r-- 1 root root   301141 Sep 20 13:22 california_housing_test.csv
-rw-r--r-- 1 root root  1706430 Sep 20 13:22 california_housing_train.csv
-rw-r--r-- 1 root root 18289443 Sep 20 13:22 mnist_test.csv
-rw-r--r-- 1 root root 36523880 Sep 20 13:22 mnist_train_small.csv
-rwxr-xr-x 1 root root      930 Jan  1  2000 [01;32mREADME.md[0m*


In [22]:
!head README.md

This directory includes a few sample datasets to get you started.

*   `california_housing_data*.csv` is California housing data from the 1990 US
    Census; more information is available at:
    https://developers.google.com/machine-learning/crash-course/california-housing-data-description

*   `mnist_*.csv` is a small sample of the
    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is
    described at: http://yann.lecun.com/exdb/mnist/



In [23]:
cat README.md

This directory includes a few sample datasets to get you started.

*   `california_housing_data*.csv` is California housing data from the 1990 US
    Census; more information is available at:
    https://developers.google.com/machine-learning/crash-course/california-housing-data-description

*   `mnist_*.csv` is a small sample of the
    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is
    described at: http://yann.lecun.com/exdb/mnist/

*   `anscombe.json` contains a copy of
    [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet); it
    was originally described in

    Anscombe, F. J. (1973). 'Graphs in Statistical Analysis'. American
    Statistician. 27 (1): 17-21. JSTOR 2682899.

    and our copy was prepared by the
    [vega_datasets library](https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/anscombe.json).


In [24]:
!which file

/usr/bin/file


In [25]:
!file README.md

README.md: ASCII text


In [26]:
!wc README.md

 19  80 930 README.md


In [27]:
%%bash

wc README.md
file README.md
head README.md | cat -n


 19  80 930 README.md
README.md: ASCII text
     1	This directory includes a few sample datasets to get you started.
     2	
     3	*   `california_housing_data*.csv` is California housing data from the 1990 US
     4	    Census; more information is available at:
     5	    https://developers.google.com/machine-learning/crash-course/california-housing-data-description
     6	
     7	*   `mnist_*.csv` is a small sample of the
     8	    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is
     9	    described at: http://yann.lecun.com/exdb/mnist/
    10	
