# Authoring Jupyter Notebooks for Data Science Education

This is a demo notebook on creating Jupyter notebooks for education for the 2023 National Workshop on Data Science Education at UC Berkeley.

## Importing Modules

First we must import the libraries that we will use in this notebook. Select the following cell and hit shift-enter to run the cell. You can also run the cell by clicking on the "Run" button in the above Jupyter pane. When working with students, a common bug is forgetting to run this cell.

In [None]:
# Don't change this cell; just run it. 
import numpy as np
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

## Adding Markdown Cells

Instructions and background information can be added to Jupyter notebooks with markdown cells. Double click on this cell and play around with editing the markdown cell. "#" in the title allow us to control font size/style

### Some Useful Features

You can insert equations with LaTeX:

\begin{equation} 
e^{\pi i} + 1 = 0
\end{equation}

You can add tables:

| Function | Description                                                   |
|----------|---------------------------------------------------------------|
| `abs`      | Returns the absolute value of its argument                    |
| `max`      | Returns the maximum of all its arguments                      |
| `min`      | Returns the minimum of all its arguments                      |
| `pow`      | Raises its first argument to the power of its second argument |
| `round`    | Rounds its argument to the nearest integer                     |


You can denote important information in **bold**, *italics*, or <span style="color: #BC412B">**color**</span>.

You can add [links](https://www.espn.com/college-football/game/_/gameId/401404041).

You can insert images (from filepath):

<img src="images/data8.png">

## Loading Data into a Table

We can create tables in several different ways with the `datascience` module. One such way is to do it manually via `with_columns`:

In [None]:
t = Table().with_columns(
    'Player', ['Curry', 'James', 'Jokic', 'Butler'],
    'Points',  [  31,   25,   29,   19],
    'Assists', [  6,   9,   12,  10],
)
t

More commonly, we will want to load data from a pre-existing csv file.

In [None]:
power_plants = Table().read_table('data/California_Power_Plants.csv')	
power_plants.show(5)

You can also load data from a URL.

In [None]:
sat = Table().read_table('https://www.inferentialthinking.com/data/sat2014.csv')
sat.show(5)

## Workflow

Below we demonstrate an example workflow, excerpted from Data 8 Homework 5, Spring 2023.

*First, we start with some background about the dataset and load in the dataset.*

James is trying to analyze how well the Cal football team performed in the 2021 season. A football game is divided into four periods, called quarters. The number of points Cal scored in each quarter and the number of points their opponent scored in each quarter are stored in a file called `cal_fb.csv`.

In [None]:
# Just run this cell
# Read in the cal_fb csv file as a table called games
games = Table().read_table("data/cal_fb.csv")
games.show()

*An example question. The skeleton code cell below can have varying levels of scaffolding depending on the audience and learning goals.*

Let's start by finding the total points each team scored in a game.

**Question 1.** Write a function called `sum_scores`.  It should take four arguments, where each argument represents integers corresponding to the team's score for each quarter. It should return the team's total score for that game. 

*Hint:* Don't overthink this question!


<!--
BEGIN QUESTION
name: q1_1
manual: false
points:
 - 1
 - 1
-->

In [None]:
def sum_scores(..., ..., ..., ...):
    '''Returns the total score calculated by adding up the score of each quarter'''
    ...

sum_scores(14, 7, 3, 0) #DO NOT CHANGE THIS LINE

In [None]:
grader.check("q1_1")

*A question that requires more critical thinking with very little scaffolding provided in the skeleton.*

**Question 2.** Create a new table `final_scores` with three columns in this *specific* order: `Opponent`, `Cal Score`, `Opponent Score`. You will have to create the `Cal Score` and `Opponent Score` columns. Use the function `sum_scores` you just defined in the previous question for this problem. **(5 Points)**

*Hint:* If you want to apply a function that takes in multiple arguments, you can pass multiple column names as arguments in `tbl.apply()`. The column values will be passed into the corresponding arguments of the function. Take a look at the Python Reference Sheet and Lecture 13's demo for syntax.

*Note:* If you’re running into issues creating `final_scores`, check that `cal_scores` and `opp_scores` output what you want. If you're encountering `TypeError`s, check the [Python Reference](https://www.data8.org/sp23/reference/) to see if the inputs/outputs of the function are what you expect.

<!--
BEGIN QUESTION
name: q1_2
manual: false
points:
 - 0
 - 0
 - 5
-->

In [None]:
cal_scores = ...
opp_scores = ...
final_scores = ...
final_scores

In [None]:
grader.check("q1_2")

*An example of a written question where students answer by entering text into a markdown cell.*

**Question 3**. James attempts question 2, but he does not pass any column names into apply. Explain why this approach results in an error and how he can fix it.

_Type your answer here, replacing this text._

## Common Student Errors

In [None]:
sat = Table().read_table('https://www.inferentialthinking.com/data/sat2014.csv')
sat.show(5)

One common student error is destructively modifying data and running cells in a non-linear order. For example, try running the below cell twice. The first time works fine, but the second time there is an error because the `state` column no longer exists in the table!

In [None]:
states_arr = sat.column('State')
sat = sat.drop('State')
sat.show(5)

Another common error is when students call packages that are not imported (correctly) or run cells before running the import cells where packages are installed. The latter is especially common when students return to work on assignments after leaving them open for several hours. 

In [None]:
sns.histplot(sat.column("Combined"))

## Keyboard Shortcuts

Jupyter uses some really useful keyboard shortcuts that make authoring, navigating and running notebooks a more seamless experience. Some useful ones are listed below but you can also find a more comprehensive guide [here](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330). 

The following shortcuts work at any time: 
- `shift` + `enter` : Run selected cell and move cursor to the next cell directly below
- `ctrl` / `cmd` + `enter` : Run selected cell and stay on the same cell (useful if you need to run the same cell multiple times) 

The following shortcuts only work when in command mode (i.e. not editing the contents of a specific cell): 
- `enter` - Edit the selected cell 
- `a` : Insert a new code cell above selected cell  
- `b` : Insert a new code cell below selected cell
- `d, d` : Delete selected cell 
- `y` : Convert selected cell to Code cell 
- `m` : Convert selected cell to Markdown cell