# Lab 01

## Introduction to Jupyter Notebooks

In this assignment, you’ll be introduced to Jupyter Notebooks and learn how to navigate and use the notebook interface effectively.

## Guidelines

- Follow good programming practices by using descriptive variable names, maintaining appropriate spacing for readability, and adding comments to clarify your code.

- Ensure written responses use correct spelling, complete sentences, and proper grammar.

**Name:**

**Section:**

**Date:**

Let's get started!

## What is a Jupyter Notebook?

A Jupyter Notebook is an interactive coding environment that allows you to write and run code, display visualizations, and document your work with text, equations, and images, all in the same place. It's especially popular in data science because of its ability to combine code execution and explanation.

### Edit Mode vs. Command Mode 

Jupyter Notebook has a modal user interface. This means that the keyboard does different things depending on which mode the Notebook is in. There are two modes: **"Edit mode"** and **"Command mode"**. 

Edit mode allows you to type and modify the content within a cell, similar to a text editor.

Command mode lets you manage the notebook as a whole (e.g., adding, deleting, or moving cells), but it doesn't allow you to type directly into individual cells.

### The Kernel

The **kernel** is the program that runs the code inside your notebook and displays the results. In the top-right corner of your window, you'll see the name `[Python 3 (ipykernel)]` and a circle that indicates the status of your kernel. An empty circle (⚪) means the kernel is idle and ready to execute code, while a filled circle (⚫) means the kernel is busy running code.

### Cells

Jupyter Notebooks are made up of cells. As covered in lecture, there are two types of cells in a Jupyter Notebook: code cells and Markdown cells. 

- Code cells are where we write all of our Python code.

- Markdown cells allow us to write text, like the text you're reading right now. In Homework 1, you'll get a chance to learn a bit of basic Markdown.

#### Code cells

Running a code cell executes all the code within it, and any output will appear directly below the cell. Notice the brackets `[ ]:` on the left side of the cells.

Before running the cell, this will be empty (`[ ]`). While the cell is running, you'll see `[*]`, indicating that the code is still processing. If the asterisk (`*`) remains for too long, it may mean the code is taking longer than expected, and you might need to interrupt the kernel (explained below). Once the cell has finished running, a number will appear inside the brackets, such as `[1]`, representing the order in which cells have been executed. The first cell you run will display a `1`, the second will display a `2`, and so on.

If your kernel becomes unresponsive, your notebook slows down significantly, or the kernel disconnects, you can try the following steps:

1. At the top of your screen, click **Kernel**, then **Interrupt Kernel**. Trying running your code again.

1. If that doesn't help, click **Kernel**, then **Restart Kernel**. If you do this, you will have to run your code cells from the start of your notebook up until where you paused your work.

After you run a cell, the brackets will be populated with a number, indicating how many times you've executed code cells in this session, including the current one.

To run the code in a code cell, first click on that cell to activate it.  It'll be highlighted with a little green or blue rectangle.  Next, hold down the `shift` key and press `return` or `enter`. You could also click the "Run cell" button ( ▶| ) in the cell toolbar above.

**Note:** After you make changes to the text cell don't forget to click the "Run cell" button at the top that looks like ▶| or hold down `shift` + `return` to view the changes.

#### Errors

Whenever you write code, you'll make mistakes.  When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong. Errors are okay; even experienced programmers make many errors.  When you make an error, you just have to find the source of the problem, fix it, and move on.

There is an error in the next cell. 

Run it and see what happens.

In [None]:
print("This line is missing something."

The error message you're seeing:

```python
  File "<ipython-input-1>", line 1
    print("This line is missing something."
                                           ^
SyntaxError: incomplete input
```

##### Explanation

1. This part indicates where the error occurred:

   - `Cell In[1]` refers to the first cell of your notebook.
   
   - `line 1` refers to the first line within that cell.

1. The caret (`^`) is pointing to the end of the line, indicating the exact spot where Python expects additional input, in this case, a closing parenthesis `)`.

1. A `SyntaxError` means there is something wrong with the structure of your code, and Python cannot understand or execute it as written. 

1. The message "incomplete input" is telling you that Python reached the end of the code (or the end of the line) but didn't find what it expected to complete the `print()` function call. Here, the missing piece is the closing parenthesis `)`.

To fix the error, you simply need to add the closing parenthesis to complete the function call:

```python
print("This line is missing something.")
```

**Question 1.** Complete the input that resolves the syntax issue.

In [None]:
print("This line is missing something."

### Markdown cells

To edit an existing Markdown cell, simply double-click on it and make your changes. After editing, **"run"** the cell just like you would with a code cell to render the Markdown correctly.

**Question 2.** Now, edit the following cell to include your two favorite colors.

_TYPE YOUR ANSWER HERE REPLACING THIS TEXT_

**Question 3.** Create an ordered list of steps for conducting the [Data Investigation Process](https://iase-pub.org/ojs/SERJ/article/view/41/457). Your list should include these steps: Frame the Problem, Consider & Gather Data, Process Data, Explore & Visualize Data, Consider Models, and Communicate & Propose Action. 

**Note:** Although the [Data Investigation Process](https://iase-pub.org/ojs/SERJ/article/view/41/457) isn’t strictly linear, we’ll number the steps from one to six in this exercise to practice formatting text using Markdown.

**Hint:** Click [here](https://www.markdownguide.org/basic-syntax/#ordered-lists) to learn what an ordered list is and how to create one in Markdown.

_TYPE YOUR ANSWER HERE REPLACING THIS TEXT_

**Note:** Make sure to review your response to the previous question to ensure it is properly formatted.

### Converting Cells

To convert a cell from Markdown to code (or vice versa), you should use the dropdown in the menu bar. For instance, to convert the following cell from Markdown to code, you should:

1. Click the cell.
1. Click "Markdown" in the menu bar, and change it to "Code."

Your menu bar should then look like this:

<img src="images/menu.png" alt="Menu" width="400">

**Question 4.** Change the following cell from a Markdown cell to a code cell and run the cell. You should get 6 as an output.

# Change this to a code cell
1 + 2 + 3 

You can also delete cells by clicking the scissors ✄ button in the menu bar, or pressing 🗑️ on the highlighted cell. 

**Be Careful:** There's no undo button when you delete a cell, so use this command wisely

**Question 5.** Delete the cell below this one.

In [None]:
# This cell should be deleted!

### Numbers

Quantitative information arises everywhere in data science. In addition to representing commands to print out lines, expressions can represent numbers and methods of combining numbers. The expression 3.2500 evaluates to the number 3.25. 

Run the cell and see.

In [None]:
3.2500

Notice that we didn't have to print. When you run a notebook cell, if the last line has a value, then Jupyter helpfully prints out that value for you. However, it won't print out prior lines automatically.

In [None]:
print(2)
3
4

Above, you should see that 4 is the value of the last expression, 2 is printed, but 3 is lost forever because it was neither printed nor last.

You don't want to print everything all the time anyway. But if you feel sorry for 3, change the cell above to print it.

### Arithmetic

Many basic arithmetic operations are built into Python. The table below describes all the arithmetic operators used in the course. 

| Expression Type  | Operator | Example     | Value    |
|------------------|----------|-------------|----------|
| Addition         | `+`      | `2 + 3`     | `5`      |
| Subtraction      | `-`      | `2 - 3`     | `-1`     |
| Multiplication   | `*`      | `2 * 3`     | `6`      |
| Division         | `/`      | `7 / 3`     | `2.66667`|
| Remainder        | `%`      | `7 % 3`     | `1`      |
| Exponentiation   | `**`     | `2 ** 0.5`  | `1.41421`|


The common operator that differs from typical math notation is **, which raises one number to the power of the other. So, 2**3 stands for 
 and evaluates to 8.

The order of operations is the same as what you learned in elementary school, and Python also has parentheses. For example, compare the outputs of the cells below. The second cell uses parentheses for a happy new year.9+6*5-6*3**2*2**3/4*7

In [None]:
9 + 6 * 5 - 6 * 3**2 * 2**3 / 4 * 7

In [None]:
9 + (6 * 5 - (6 * 3))**2 * ((2**3) / 4 * 7)

In standard math notation, the first expression is,

$$9 + 6 \times 5 - 6 \times 3^2 \times \frac{2^3}{4} \times 7,$$

while the second expression is,

$$9 + (6 \times 5 - (6 \times 3))^2 \times \left(\frac{2^3}{4} \times 7\right).$$

**Question 6.** Write a Python expression in this next cell that's equal to $\displaystyle 5 \times \left(3 \frac{10}{11}\right) - 50 \frac{1}{3} + 2^{0.5 \times 22} - \frac{7}{33} + 8$.  

That's five times three and ten elevenths, minus fifty and a third, plus two to the power of half twenty-two, minus seven thirty-thirds plus seven.

Replace the ellipses (`...`) with your expression.  Try to use parentheses only when necessary.


**Note:** By "$\displaystyle 3 \frac{10}{11}$\", we mean $\displaystyle 3+\frac{10}{11}$, not $\displaystyle 3 \times \frac{10}{11}$.

**Hint:** **Note:** Be sure to test your code to ensure it runs without syntax errors and produces a meaningful result that makes sense in the context of the question. The correct output should start with a familiar number.

In [None]:
...

### Names

In natural language, we have terminology that lets us quickly reference very complicated concepts.  We don't say, "That's a large mammal with brown fur and sharp teeth!"  Instead, we just say, "Bear!"

In Python, we do this with *assignment statements*. An assignment statement has a name on the left side of an `=` sign and an expression to be evaluated on the right.

In [None]:
ten = 3 * 2 + 4

When you run that cell, Python first computes the value of the expression on the right-hand side, `3 * 2 + 4`, which is the number 10.  Then it assigns that value to the name `ten`.  At that point, the code in the cell is done running.

After you run that cell, the value 10 is bound to the name `ten`:

In [None]:
ten

The statement `ten = 3 * 2 + 4` is not asserting that `ten` is already equal to `3 * 2 + 4`, as we might expect by analogy with math notation.  Rather, that line of code changes what `ten` means; it now refers to the value 10, whereas before it meant nothing at all.

If the designers of Python had been ruthlessly pedantic, they might have made us write

```
define the name ten to hereafter have the value of 3 * 2 + 4 
```
instead.  You will probably appreciate the brevity of "`=`"!  But keep in mind that this is the real meaning.

Run the following cell which uses a variable name `eleven` that hasn't been assigned to anything. 

You'll see an error.

In [None]:
eleven

**Question 7.** Create a variable called `eleven` and assign it an expression that evaluates to 11. Your expression must include all five arithmetic operators: `+`, `-`, `*`, `/`, and `**`.

In [None]:
...

A common pattern in Jupyter notebooks is to assign a value to a name and then immediately evaluate the name in the last line in the cell so that the value is displayed as output. 

In [None]:
close_to_pi = 355 / 113
close_to_pi

Another common pattern is that a series of lines in a single cell will build up a complex computation in stages, naming the intermediate results.

In [None]:
semimonthly_salary = 843 + 3/4 # 843.75
monthly_salary = 2 * semimonthly_salary
number_of_months_in_a_year = 12
yearly_salary = number_of_months_in_a_year * monthly_salary
yearly_salary

Names in Python can have letters (upper- and lower-case letters are both okay and count as different letters), underscores, and numbers.  The first character can't be a number (otherwise a name might look like a number).  And names can't contain spaces, since spaces are used to separate pieces of code from each other.

Other than those rules, what you name something doesn't matter **to Python**.  For example, this cell does the same thing as the above cell, except everything has a different name:

In [None]:
a = 842.5
b = 2 * a
c = 12
d = c * b
d

**However**, names are very important for making your code *readable* to yourself and others.  The cell above is shorter, but it's totally useless without an explanation of what it does.

### Comments

You may have noticed these lines in the cell in which you answered Question 3.2:

    # Change the next line 
    # so that it computes the number of seconds in a decade 
    # and assigns that number the name, seconds_in_a_decade.
    
This is called a *comment*. It doesn't make anything happen in Python; Python ignores anything on a line after a `#`.  Instead, it's there to communicate something about the code to you, the human reader. Comments are extremely useful. 

### Calling Functions

The most common way to combine or manipulate values in Python is by calling functions. Python comes with many built-in functions that perform common operations.

For example, the `abs` function takes a single number as its argument and returns the absolute value of that number. Run the next two cells and see if you understand the output.

In [None]:
abs(5)

In [None]:
abs(-5)

#### Application: Computing Walking Distances

Loujane is on the corner of 7th Avenue and 42nd Street in Midtown Manhattan, and she wants to know far she'd have to walk to get to Gramercy School on the corner of 10th Avenue and 34th Street.

She can't cut across blocks diagonally, since there are buildings in the way.  She has to walk along the sidewalks.  Using the map below, she sees she'd have to walk 3 avenues (long blocks) and 8 streets (short blocks).  In terms of the given numbers, she computed 3 as the difference between 7 and 10, *in absolute value*, and 8 similarly.  

Loujane also knows that blocks in Manhattan are all about 80m by 274m (avenues are farther apart than streets).  So in total, she'd have to walk $(80 \times |42 - 34| + 274 \times |7 - 10|)$ meters to get to the park.

<img src="images/map.jpg"/>

**Question 8.** Fill in the line `num_avenues_away = ...` in the next cell so that the cell calculates the distance Loujane must walk and gives it the name `manhattan_distance`.  Everything else has been filled in for you.  **Use the `abs` function.** 

**Note:** Be sure to test your code to ensure it runs without syntax errors and produces a meaningful result that makes sense in the context of the question.

In [None]:
# The number of streets away
num_streets_away = abs(42 - 34)

# Compute the number of avenues away
num_avenues_away = ...

street_length_m = 80
avenue_length_m = 274

# Compute the total distance Loujane must walk
manhattan_distance = street_length_m*num_streets_away + avenue_length_m * num_avenues_away

# Display the distance that was computed 
manhattan_distance

### Nested Expressions

Function calls and arithmetic expressions can themselves contain expressions. You saw an example in the last question:

`abs(42 - 34)`

has 2 number expressions in a subtraction expression in a function call expression. And you probably wrote something like `abs(7 - 10)` to compute `num_avenues_away`.

Nested expressions can turn into complicated-looking code. However, the way in which complicated expressions break down is very regular.

Suppose we are interested in lengths of cats that are very unusual. We'll say that a length is unusual to the extent that it's far away on the number line from the average cat length. An estimate of the average cat length (averaging, we hope, over all cats on Earth today) is **18.2** inches.

So if Ravioli is 21.7 inches long, then her length is 
`|21.7 - 18.2|`, or 
**3.5**, inches away from the average. Here's a picture of that:

<img src="images/lengths.png">

The source for average cat length is [Wikipedia](https://en.wikipedia.org/wiki/Cat#:~:text=The%20domestic%20cat%20has%20a,(9%20and%2011%20lb).). The listed lengths for cats are not real and may not be plausible (but the names are of real cats!)

And here's how we'd write that expression in one line of Python code:

In [None]:
abs(21.7 - 18.2)

What's going on here?  `abs` takes just one argument, so the stuff inside the parentheses is all part of that **single argument**.  Specifically, the argument is the value of the expression `21.7 - 18.2`.  The value of that expression is `3.5`.  That value is the argument to `abs`.  The absolute value of that is `3.5`, so `3.5` is the value of the full expression `abs(21.7 - 18.2)`.

Picture simplifying the expression in several steps:

1. `abs(21.7 - 18.2)`
2. `abs(3.5)`
3. `3.5`

In fact, that's basically what Python does to compute the value of the expression.

**Question 9.** Say that Genghis's length is 16.7 inches.  In the next cell, use `abs` to compute the absolute value of the difference between Genghis's length and the average cat length.  Give that value the name `genghis_distance_from_average_in`.

In [None]:
# Replace the ... with an expression 
# to compute the absolute value 
# of the difference between Genghis's length (16.7 in) and the average cat length.
genghis_distance_from_average_in = ...

# Again, we've written this here 
# so that the distance you compute will get printed 
# when you run this cell.
genghis_distance_from_average_in

#### More Nesting

Now say that we want to compute the more unusual of the two cat lengths.  We'll use the function `max`, which (again) takes two numbers as arguments and returns the larger of the two arguments.  Combining that with the `abs` function, we can compute the larger distance from average among the two length.

Just read the run the cell below.

In [1]:
cat1_length_in = 21.7
cat2_length_in = 16.7
average_cat_length = 18.2

# The larger distance from the average cat length, among the two length:
larger_distance_in = max(abs(cat1_length_in - average_cat_length), abs(cat2_length_in - average_cat_length))

# Print out our results in a nice readable format:
print("The larger distance from the average length among these two cats is", larger_distance_in, "inches.")

The larger distance from the average length among these two cats is 3.5 inches.


The line where `larger_distance_in` is computed looks complicated, but we can break it down into simpler components just like we did before.

The basic recipe is to repeatedly simplify small parts of the expression:

* **Basic expressions:** Start with expressions whose values we know, like names or numbers.

    Examples: `cat2_length_in` or `16.7`

* **Find the next simplest group of expressions:** Look for basic expressions that are directly connected to each other. This can be by arithmetic or as arguments to a function call. 
    
    Example: `cat2_length_in - average_cat_length`.

* **Evaluate that group:** Evaluate the arithmetic expression or function call. Use the value computed to replace the group of expressions.  

    Example: `cat2_length_in - average_cat_length` becomes `-1.3`.

* **Repeat:** Continue this process, using the value of the previously-evaluated expression as a new basic expression. Stop when we've evaluated the entire expression.

    Example: `abs(-1.3)` becomes `1.3`, and `max(3.5, 1.3)` becomes `3.5`.

You can run the next cell to see a slideshow of that process.

In [None]:
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/14EKhQjGe0bUpLwKGo6BrPCtjsORSkY8h4BzAqvXoLTM/embed?start=false&loop=false&delayms=3000', 800, 600)

**Question 10.** Given the lengths of Ayyash's cats Hummus, Pita, and Lentil, write an expression that computes the smallest difference between any of the three lengths. Your expression shouldn't have any numbers in it, only function calls and the names `hummus`, `pita`, and `lentil`. Give the value of your expression the name `min_length_difference`.


In [None]:
# The three cats' lengths, in inches:
hummus =  24.5 # Hummus is 24.5 inches long
pita = 19.7 # Gatkes is 19.7 inches long
lentil = 15.8 # Zeepty is 15.8 inches long
             
# We'd like to look at all 3 pairs of lengths, 
# compute the absolute difference between each pair, 
# and then find the smallest of those 3 absolute differences.  

# This is left to you.
# If you're stuck, try computing the value for each step of the process 
# (like the difference between Hummus's length and Pita's length) 
# on a separate line and giving it a name (like hummus_pita_length_diff)
min_length_difference = ...

## Submission

Make sure that all cells in your assignment have been executed to display all output, images, and graphs in the final document.

**Note:** Save the assignment before proceeding to download the file.

After downloading, locate the `.ipynb` file and upload **only** this file to Moodle. The assignment will be automatically submitted to Gradescope for grading.