# Jupyter Notebooks

In this notebook, we will work with the following:

- Importing packages and namespaces.
- Using alternative interfaces to Python.
- Doing cool things with Jupyter.
- Seeing some examples of visualization.
- Considering some challenges of Jupyter.

## Importing packages

By convention, imports go at the top of a Python script or notebook (see [PEP 8](https://www.python.org/dev/peps/pep-0008/#imports)). In relevant part:

>Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

>Imports should be grouped in the following order:

>    - Standard library imports.
>    - Related third party imports.
>    - Local application/library specific imports.

>You should put a blank line between each group of imports.

In [None]:
# standard library
import sys
import time

# third party
import numpy as np
import pandas as pd
import plotly.express as px
from textblob import TextBlob

In [None]:
pd.set_option("mode.copy_on_write", True)

Note a few things in the block above.

1. The `import sys` is the simplest version.
1. For some things we use a lot (and also by convention), we would like to abbreviate the names of some packages. For example, `pandas` is often imported as `pd`, both because it is used often and also by convention (see [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/dsintro.html)).
1. We can also import particular things from a package, like the class `TextBlob` from the package `textblob`.
1. The lines that start with `#` are comments. Those lines are not executed by Python, and they are useful for us to make notes about what we are doing.


Let's see how these work in action.

In [None]:
print(sys.executable)

Note that to find the contents of this attribute `executable` within the `sys` package, we have to use the package namespace `sys`. Most about namespaces below.

## Namespaces

Somewhat abstactly, the [python docs](https://docs.python.org/3.7/tutorial/classes.html#python-scopes-and-namespaces) define a namespace as follows.

> A namespace is a mapping from names to objects.

For our purposes, we can think of them as paths to get to tools of interest. This topic goes (much) deeper, but a more instrumental understanding is fine for our use.

If we want to know what is contained in a namespace, we can easily find out with the `dir()` built-in function.

In [None]:
dir(sys)

While we might think of namespaces as synonymous with packages, it's more general than that.
Individual objects have their own namespaces, like the `TextBlob` class we imported earlier.

In [None]:
dir(TextBlob)

We can also look at what is in the global namespace.

In [None]:
dir()

Note that we see the `sys` and `pd` packages that we imported, and we also see the `TextBlob` class that we imported from its package.

## import this

A fun import is The Zen of Python philosophy, which can be accessed by importing `this`. 
Note that I'm slightly breaking the rules above for the purposes of illustration.

In [None]:
import this  # noqa: E402, F401

# Alternative interfaces for Python.

As we'll talk about later, Jupyter notebooks are a great interface for working with Python (or R or a number of other kernels).
However, they are not the only game in town.

1. **Python interpreter.** We can access the Python interpreter from the terminal.
2. **Running a script from the terminal.** We can also make our own script and run it from the terminal.

# Jupyter

The Jupyter lab interface and notebooks provide a number of conveniences for research.

- Jupyter Lab: text editor, terminal, window layouts.
- Rich text: bold, italic, headings.
- Bullets, lists, and code (non-executing).
- Links, images, and equations.
- Display of graphics.
- Convenience items with cell magics.

## Rich text

Jupyter uses the simple [markdown](https://daringfireball.net/projects/markdown/) syntax for formatting text. There are some extensions and differences from original markdown, so you may find the [Jupyter Notebooks docs](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) to be a better reference.

- **Bold** a word by enclosing it in pairs of asterisks: `**Bold**`.
- *Italicize* a word by enclosing it in single asterisks: `*Italicise*`.
- ***Do both*** with three asterisks: `***Do both***`.
- We can also use headings by starting the line with one or more pound signs, where one is a top-level heading: `#`.

# First heading
## Second heading
### Third heading

```
# First heading
## Second heading
### Third heading
```

## Bullets and lists

- Bullets can be made by beginning a line with a hyphen and a space: ` - Bullets. . .`.

Numbered lists start with a number, period, and a space:

1. First
1. Second
1. Third

Note that they all start with `1. `, and markdown handles numbering for us. 
We could, of course, number them ourselves.

```
1. First
1. Second
1. Third
```

We can also nest lists and types by indenting:

- Bullet
    1. Nested list item
    1. Another one
- Another bullet
    1. More lists
        - More bullets
        
```
- Bullet
    1. Nested list item
    1. Another one
- Another bullet
    1. More lists
        - More bullets
```

## Code

We can reference code in two ways.
First, we can use inline code like `import this` by using backticks `` ` `` to enclode the code: `` `import this` ``.
Second, we can make code blocks by using beginning and ending lines with three backticks: ```` ``` ````.
Do note that I'm having to be tricky to display backticks inside of code.

```
def f_to_c(temp_f):
    return (temp_f - 32) * 5/9
```

We can make it a little nicer (with syntax highlighting) by adding the code type to the first line: ```` ```python ````.

```python
def f_to_c(temp_f):
    return (temp_f - 32) * 5/9
```

## Links and images.

We can add links, like one to my [github page](https://github.com/jtkiley), using the text in brackets followed by the link in parentheses: `[github page](https://github.com/jtkiley)`.

We can add images by using similar syntax to point to an image: `![alt text](../_img/pandas_logo.svg)`.

![alt text](../_img/pandas_logo.svg)

## Equations

Similar to code, we can also use math and equations inline and in blocks.
For inline math, like the union of a set $S \cup T = \{x \mid x \in S \vee x \in T\}$, we can use a single dollar sign to denote math: `$S \cup T = \{x \mid x \in S \vee x \in T\}$`.

We can also use blocks by using beginning and ending lines with two dollar signs: `$$`.

$$
\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2
$$

```
$$
\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2
$$
```

There are many math features, including matrices:

$$ A = \begin{pmatrix}
\underbrace{\begin{matrix} a_{0,0} \\ a_{1,0} \\ \vdots \\ a_{m-1,0} \end{matrix}}_{a_0} &
\underbrace{\begin{matrix} a_{0,1} \\ a_{1,1} \\ \vdots \\ a_{m-1,1} \end{matrix}}_{a_1} &
\begin{matrix} \dots \\ \dots \\ \ddots \\ \dots \end{matrix} &
\underbrace{\begin{matrix} a_{0,n-1} \\ a_{1,n-1} \\ \vdots \\ a_{m-1,n-1} \end{matrix}}_{a_{n-1}} \\
\end{pmatrix}
$$

```
$$ A = \begin{pmatrix}
\underbrace{\begin{matrix} a_{0,0} \\ a_{1,0} \\ \vdots \\ a_{m-1,0} \end{matrix}}_{a_0} &
\underbrace{\begin{matrix} a_{0,1} \\ a_{1,1} \\ \vdots \\ a_{m-1,1} \end{matrix}}_{a_1} &
\begin{matrix} \dots \\ \dots \\ \ddots \\ \dots \end{matrix} &
\underbrace{\begin{matrix} a_{0,n-1} \\ a_{1,n-1} \\ \vdots \\ a_{m-1,n-1} \end{matrix}}_{a_{n-1}} \\
\end{pmatrix}
$$
```

# Visualization

We can also display graphics that are output from our work with data.

In [None]:
# Create some random data
data1 = pd.DataFrame(np.random.rand(200, 4), columns=[letter for letter in "ABCD"])

In [None]:
# Display the top of the dataframe
data1.head()

In [None]:
# Make a histogram of the columns
px.histogram(data1, x="A").show()

In [None]:
fig2 = px.scatter_matrix(data1).show()

In [None]:
px.scatter_3d(data1, x="A", y="B", z="C", color="D").show()

For many examples of really cool vizualizations that are easy to do (and have code samples), see the [plotly express documentation](https://plot.ly/python/plotly-express/).

## Cell Magics

There are many forms of [cell magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time) that provide convenience features.

If you find yourself getting errors for a file not being found, it may help to know where the working directory is.
You can use the `%pwd` magic.

In [None]:
%pwd

A really common issue with large text datasets is that some things take a long time to run.
To know how long that is, we can use the `%%time` magic to get the time a cell takes to run.
Do note how we're using two percent signs: `%%`.
That makes the magic apply to the cell, instead of just the rest of the line.

In [None]:
%%time

# Use time.sleep() to make this cell take some time.
time.sleep(2)
print("Done!")

# Sharing notebooks

We have a few options to share our notebooks with others.

1. If we want them to be able to run the code themselves, we should share the notebook file (`.ipynb`). Often, we will also need to send information about the environment (e.g., `requirements.txt`, `devcontainer.json`) and any data files that we rely on. Since data files may be large, you will often want to use a service like Dropbox to send them.
1. If we only want to show the contents, we can export a version in HTML.
    1. In the menu bar at the top of the Jupyter tab, click ". . .", then "Export".
    1. In the resulting "Export As" popover menu at the top, click "HTML".
    1. In the next popover, click "OK". Alternatively, change the location or name of the file as desired before clicking "OK". Note that, in a container or in Codespaces, the location will be there and not (yet) in your local file system.
    1. In the bottom left corner popup, click "No" when prompted "Would you like to open the exported file?" Choosing "Yes" does not work in containers or Codespaces.
    1. Download the file to your local file system by clicking the "Explorer" tab on the left, right-clicking the HTML file, and selecting "Download". Depending on your whether you are in a container or Codespaces, you may either get a file download to your local Downloads folder or a pop-up asking where to save the file.


# Jupyter challenges

There are a few challenges when using Jupyter.

1. The cell structure is flexible, but it will allow you to do things out of order, and anything you import or assign is still there, even if you change the code. To avoid this issue:
    1. Purposefully try to keep your code in the order that it should be run.
    1. Periodically, use "Restart Kernel and Run All Cells..." to make sure that your notebook runs in order.
1. If you want to version control projects, you might remove cell outputs before committing (like the course repository does) to keep comparisons readable.
1. It is not a good format for a manuscript (unlike [R notebooks](https://bookdown.org/yihui/rmarkdown/notebook.html)). However, it is much more capable. For something more like R notebooks, [Quarto](https://quarto.org) is shaping up to be a high-quality Python alternative.

Overall, the Jupyter notebook is a great tool.
Once you have some experience using them, you may find them fairly natural to work with.

# Breakout Exercises

Let's do a few exercises to reinforce the concepts we learned above.


1. import and namespace
1. markdown
1. Exporting HTML

## EX1: import and namespace

We saw above how to import a package and inspect the namespace of it.
Later in the course, we will be using the `pynytimes` package.
Let's use it for an example here.

1. Import the `pynytimes` package.
1. Inspect the namespace. Which object do you think helps us create a connection to the article API?

In [None]:
# 1-1 code

In [None]:
# 1-2 code

## EX2: markdown

Rememeber that we can use markdown to have rich text features. Let's try it.

1. Make sure the cell below is a markdown cell.
1. Enter the following sentence: "Getting free excerpts from the New York Times is cool, and the readme for the package can be found here."
1. Make the word "free" italicized and the word "cool" bold.
1. Make the phrase "found here" a link to `https://github.com/michadenheijer/pynytimes`.

## EX3: Exporting HTML

Many coauthors will be unfamiliar with using Jupyter notebooks, and it may not be a good time investment to have them set it up and learn how it works, only to review your work.
However, if they can read it, a lot of the code will make sense.
An easy way to share it is to export an HTML file that they can view in a web browser.

1. Export this page as an HTML file.
1. Using your computer's file browsing app, find the exported HTML file and double-click it to open it in your browser.