<a href="https://colab.research.google.com/github/mahynski/chemometric-carpentry/blob/main/notebooks/1.1_The_Jupyter_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
❓ ***Objective***: This notebook will introduce you to using the [Jupyter Notebook](https://jupyter.org/) in Google Colab.  

🔁 ***Remember***: You can always revisit this notebook for reference again in the future.  Ideas and best practices will be reinforced in future notebooks, so don't worry about remembering everything the first time you see something new.

🧑 Author: Nathan A. Mahynski

📆 Date: May 1, 2024

---

# The Basics

This is [Jupyter](https://jupyter.org/) notebook.  These notebooks are the *de facto* standard tool for data science, machine learning, and artificial intelligence work these days.  They are simple, easy to use, and very powerful since they blend:

* computations
* outputs
* explanatory text
* mathematics
* images
* media
* *generative AI* for code suggestions (on paid tiers 💰, use with caution ⚠)

Because of this, these notebooks are great ways to teach, document methods, and share code; here are some suggested [best practices](https://www.kaggle.com/code/alejopaullier/make-your-notebooks-look-better) for keeping your notebook looking neat, clean, and presentable to others.

There are 2 types of "cells" in notebooks:
1. 💻 Code (defaults to Python 🐍)
2. 📜 Text (some editors call it [Markdown](https://www.markdownguide.org/) because that is the language used to render the text)

---
> ❗ We will use indented text with this icon to indicate you should interact with the notebook.  For example, you can add a new type of cell from ```Insert > Code cell```, for example; you can also hover over the bottom of cell in Colab. Cells are executed by pressing `Shift`+`Enter` simultaneously.
---

In [None]:
# Example code cell - in a code cell, you can add comments by adding a "#" to
# the start of the line.
pi = 3.14159
2*pi

6.28318

The cells with text use a language called [Markdown](https://www.markdownguide.org/). Markdown can help you organize your thoughts and work by creating all sorts of nice text and structure.  Here is a [cheat sheet](https://www.markdownguide.org/cheat-sheet/) for easy reference.  Some examples of things you can do include:

```markdown
# Headers

## Subheaders

**bolded words**

[Create a hyperlink](www.nist.gov)

Tables are easy, too!

| Header | Column 1 |
| Sample 1 | 1.23 |
| Sample 2 | 2.34 |
```

👉 Note that "#" mean something different in text/Markdown cells than they do in code cells.

You can also writing nice equations in text cells with [LaTeX](https://www.overleaf.com/learn/how-to/Writing_Markdown_in_LaTeX_Documents): $E = mc^2$

We will make use of these capabilities throughout the course.  


---
> ❗ Try double clicking this cell to see all the text options that Colab provides!
---

In [None]:
# Thanks to the magic of Python you can even display YouTube videos right in
# your notebook using a code cell!
from IPython.display import YouTubeVideo

YouTubeVideo('HW29067qVWk', width="560", height="315")

You can setup and run Jupyter notebooks from a server on your personal machine, a remote server, or right from [Google Drive](https://drive.google.com) using [Google Colab](https://colab.research.google.com/).  These can be configured to display notebooks in different ways and include different features.  For the sake of simplicity and ease we will work from Colab for this course.

---
> ❗ Headers are particularly helpful because of code-folding (try clicking next to headers!) and they automatically generate a Table of Contents (see icon at top left).  Colab enables these extensions automatically.
---

# Google [Colab](https://colab.research.google.com/)

In [None]:
# An additional future reference on Colab.
YouTubeVideo('RLYoEyIHL6A', width="560", height="315")

[Google Colab](https://colab.research.google.com/) is free for anyone with a Google account.  You can purchase paid tiers 💰 of service which comes with access to powerful GPUs and other features.  At the time of writing Colab comes with free access to CPUs, GPUs, and [tensor processing units](https://cloud.google.com/tpu) (TPUs).

---
> ❗ Check out the runtime capabilities in the icon at the top right! Go to ```Runtime > Change runtime type```.  You can also use this to run an R kernel instead of a python.
---

The advantage of this is you can test out code, then scale up the resources behind your notebook as needed.  The free tier is plenty powerful for all the analysis we will do in this course and for many chemometric applications.

You can also change your look and feel from ```Tools > Settings```.

---
> ❗ Perhaps most importantly, you can directly connect this to your [Google Drive](https://drive.google.com) and store, save, and process data directly in the cloud.
> Option 1: Copy/paste the code below
> ```python
>     from google.colab import drive
>     drive.mount('/content/drive')
> ```
>
> Option 2: Select the "Files" tab on the left and the code will populate automatically.
---

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# You can search your mounted drive using the tools on the right or by using
# (most) linux commands prefixed by the "%".
%ls ./drive/MyDrive

ls: cannot access './drive/MyDrive': No such file or directory


Google also provides "code snippets" (see `<>` on the left) which can help you find publicly available examples of code that does certain functions. This can help you write code faster, but be wary of running code you do not fully understand ⚠.



---
> ❗ When running, this notebook "lives" on a Google server somewhere.  To save your work when done, go to `File > Save a copy in Drive`.  If you use [GitHub](https://github.com/) you can also save directly to a repository!
---

Another advantage of using Colab is that it tries to intelligently give you information about functions, variables, other other objects.

---
> ❗ Try mousing over different things in a code cell to see what happens.
---

# Managing your Session

👉 The order of execution matters in your notebook.  Values are updated as they (re)assigned so you can easily overwrite or change values unexpectedly.

---
> ❗ Try excuting the cells below in different orders.
---

In [None]:
# The variable "a" has not been defined yet.
print(a)

In [None]:
a = 1

In [None]:
a = 2

In [None]:
# What is the value now?
print(a)

1


* If you are unsure, you can restart the runtime by going to `Runtime > Restart session` in Colab.  This will wipe all saved variables, calculations, etc.

  * Changing a Colab runtime type will have the same effect.

* You can also `Runtime > Restart and run all` which will restart the runtime and then go cell-by-cell and execute each one in order until the end of the notebook or an error occurs.

* This will NOT unmount your Google Drive nor will it uninstall any packages you might have already installed in your runtime.

# Installing Python Packages

❓ Q: If you are running a notebook on a local machine you can control the environment and, for example, what is installed.  How can we do this on a remote Colab server?

🙋 A: Use [pip](https://pip.pypa.io/en/stable/)

[pip](https://pip.pypa.io/en/stable/) is the package installer for Python and can be used to install things in your runtime environment. Packages can found in the [Python Package Index](https://pypi.org/). Many scientific and data science packages are automatically installed in Colab so you only need to `import` them. We've seen some examples of this so far, and we will dive into this in more detail later. For now, you can install a new package in a `code` cell using the following command.

```
%pip install name_of_package
```

---
> ❗ Let's install [watermark](https://github.com/rasbt/watermark) which is a tool that will help us keep track of the versions of packages installed.  Run the code cell below.
---

In [None]:
%pip install watermark

Collecting watermark
  Downloading watermark-2.4.3-py2.py3-none-any.whl (7.6 kB)
Collecting jedi>=0.16 (from ipython>=6.0->watermark)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, watermark
Successfully installed jedi-0.19.1 watermark-2.4.3


In [None]:
# Let's try it out by importing the package.
import watermark

In [None]:
# We can see a function's signature and "Docstring" by using a single "?" before
# or after the command.
watermark.watermark?

In [None]:
# This is equivalent to calling help() on a function.
help(watermark.watermark)

In [None]:
# We can see the exact code with 2 question marks.
watermark.watermark??

In [None]:
print(watermark.watermark())

Last updated: 2024-04-22T18:19:54.681265+00:00

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.1.58+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit



In [None]:
# We can also use the watermark magic extension by loading it.
%load_ext watermark

In [None]:
# This is a convenient command since it will print basic information about the
# machine you are running on and what versions of packages you have loaded.
%watermark -t -m -v --iversions

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.1.58+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

google   : 2.0.3
watermark: 2.4.3



In [None]:
# After importing a new package it will automatically show up.
import numpy as np

%watermark -t -m -v --iversions

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 6.1.58+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit

google   : 2.0.3
numpy    : 1.25.2
watermark: 2.4.3



It is good practice to have one cell at the top of your notebook where you load all the packages you need, then call watermark to make these visible.

```
import numpy as np
import scipy as sp
import pandas as pd

%watermark -t -m -v --iversions
```

# Saving Code

👉 This is another best practice suggestion. Your notebook and functions can change quickly over time, especially when writing new code and debugging.  Remember how your variables change? 🤔

As a result, restarting your runtime and re-running your calculations one last time from start to finish is a good idea.  In addition, you can export (or just copy and paste) python code to a `.py` file, then import it.  Ideally, you should version control such files with [git](https://github.com/).  This will also allow you to re-use the code easily in the future.

---
> ❗ Let's try running the cells below to see compare code created in the notebook, vs. code imported from a separate file.
---

In [None]:
def Fibonacci(n):
	"""
  	This is where your Docstring goes.

	Input
	-----
	n : int
		Fibonacci number to get.

	Returns
	-------
	number : int
		The nth Fibonacci number.
	"""

	if n < 0:
		print("Incorrect input")

	elif n == 0:
		return 0

	elif n == 1 or n == 2:
		return 1

	else:
		return Fibonacci(n-1) + Fibonacci(n-2)

In [None]:
# Try out your Docstring!
Fibonacci?

In [None]:
for i in range(10):
    print(Fibonacci(i))

0
1
1
2
3
5
8
13
21
34


---
> ❗ Now save the Fibonacci function above to a separate file on your Google Drive.
---

To import this file you need to add the directory the file lives in to
Python's "path".  When a function is imported, Python looks through all
the directories it knows about to find a file with the same name (with a .py extension).  By default, it only knows about the directory this notebook lives in.

In [None]:
# This is one way to add a new directory to Python's path.
import sys, os
sys.path.append(
    os.path.join(os.path.abspath('./'), 'drive/MyDrive/Colab Notebooks')
)

In [None]:
from fibonacci import Fibonacci as file_Fibonacci

In [None]:
for i in range(10):
  print(file_Fibonacci(i))

0
1
1
2
3
5
8
13
21
34
