# Lab 1: Notebooks and Expressions

In today's lab, you'll learn how to:

* navigate Jupyter notebooks (like this one);
* write and evaluate some basic *expressions* in Python, the computer language of the course;
* call *functions* to use code other people have written; and
* break down Python code into smaller parts to understand it.

Sections 1-6 are to help you get started writing Python code.  Sections 7-10 are questions exercising yo=ur knowledge of those topics.

This lab covers parts of [Chapter 3](http://www.inferentialthinking.com/chapters/03/programming-in-python.html) of the online textbook. You should read the examples in the book, but not right now. Instead, let's get started!

## 1. Jupyter Notebooks
This webpage is called a Jupyter notebook. A notebook is a place to write programs and view their results, and also to write text.

### 1.1. Text Cells
In a notebook, each rectangle containing text or code is called a *cell*.

Text cells (like this one) can be edited by double-clicking on them. They're written in a simple format called [Markdown](http://daringfireball.net/projects/markdown/syntax) to add formatting and section headings.  You don't need to learn Markdown, but you might want to.

After you edit a text cell, click the "run cell" button at the top that looks like ▶ or hold down `shift` + `return` to confirm any changes. (Try not to delete the instructions of the lab.)

#### Exercise

This next paragraph is in its own text cell.  Edit it to add your name, and then click the "run cell" ▶ button or hold down `shift` + `return`.  This sentence, for example, should be deleted.  So should this one.

My name is YOUR-NAME-HERE!

### 1.2. Code Cells

Cells can also contain code in the Python 3 language. Running a code cell will execute all of the code it contains.

To run the code in a code cell, first click on that cell to activate it.  It'll be highlighted with a little green or blue rectangle.  Next, either press ▶ or hold down the `shift` key and press `return` or `enter`.

Try running the next cell:

In [None]:
print("Hello, World!")

The fundamental building block of Python code is an expression. Cells can contain multiple lines with multiple expressions. When you run a cell, the lines of code are executed in the order in which they appear. Every `print` expression prints a line. Run the next cell and notice the order of the output.

In [None]:
print("First this line is printed,")
print("and then this one.")

### 1.3. Writing Jupyter Notebooks
You can use Jupyter notebooks for your own projects or documents.  When you make your own notebook, you'll need to create your own cells for text and code.

To add a cell, click the + button in the menu bar.  It'll start out as a text cell.  You can change it to a code cell by clicking inside it so it's highlighted, clicking the drop-down box next to the restart (⟳) button in the menu bar, and choosing "Code".

#### Exercise

Add a code cell below this one.  Write code in it that prints out:
   
    A whole new cell!

Run your cell to verify that it works.

### 1.4. Errors

Python is a language, and like natural human languages, it has rules.  It differs from natural language in two important ways:
1. The rules are *simple*.  You can learn most of them in a few weeks and gain reasonable proficiency with the language in a semester.
2. The rules are *rigid*.  If you're proficient in a natural language, you can understand a non-proficient speaker, glossing over small mistakes.  A computer running Python code is not smart enough to do that.

Whenever you write code, you'll make mistakes.  When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong.

Errors are okay; even experienced programmers make many errors.  When you make an error, you just have to find the source of the problem, fix it, and move on.

We have made an error in the next cell.  Run it and see what happens.

In [None]:
print("This line is missing something."

**Note:** In the toolbar, there is the option to click `Cell > Run All`, which will run all the code cells in this notebook in order. However, the notebook stops running code cells if it hits an error, like the one in the cell above.

You should see something like this (minus our annotations):

<img src="error.jpg"/>

The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure, and this `SyntaxError` tells you that you that Python can not interpret your expression because it does not follow Python's syntax.  "`EOF`" means "end of file," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you see a cryptic message like this, you can often get by without deciphering it.  (Of course, if you're frustrated, ask a neighbor or a staff member for help.)

#### Exercise

Try to fix the code above so that you can run the cell and see the intended message instead of an error.

### 1.5. The Kernel
The kernel is a program that executes the code inside your notebook and outputs the results. In the top right of your window, you can see a circle that indicates the status of your kernel. If the circle is empty (⚪), the kernel is idle and ready to execute code. If the circle is filled in (⚫), the kernel is busy running some code. 

Next to every code cell, you'll see some text that says `[...]`. Before you run the cell, you'll see `[ ]`. When the cell is running, you'll see `[*]`. If you see an asterisk (\*) next to a cell that doesn't go away, it's likely that the code inside the cell is taking too long to run, and it might be a good time to interrupt the kernel (discussed below). When a cell is finished running, you'll see a number inside the brackets, like so: `[1]`. The number corresponds to the order in which you run the cells; so, the first cell you run will show a 1 when it's finished running, the second will show a 2, and so on. 

You may run into problems where your kernel is stuck for an excessive amount of time, your notebook is very slow and unresponsive, or your kernel loses its connection.  Oftentimes, the problem can be fixed by selecting **Kernel -> Interrupt Kernel** from the menu, or pressing the stop button (◼️) from the toolbar at the top of the window.  Fix the issue in your code and then run the cell again.

See our [troubleshooting page](https://www.cs.williams.edu/~cs104/docs/how-to-jupyter/troubleshooting.html) for fixing other problems related to "stuck" kernels.

### 1.6. Submitting your work
All assignments in the course will be distributed as notebooks like this one, and you will submit your work from the notebook. We will use a system called Otter that checks your work and helps you submit. At the top of each assignment, you'll see a cell like the one below. Always run it when you open up one of our notebooks.

In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab01.ipynb")

You will submit your finished solution by running the export command below to generate a zip file containing your solution that you will then upload to Gradescope. It's fine to submit multiple times.  We will only grade your final submission.  You can run the export command now to see what it does, but don't worry about uploading anything yet.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)

## 2. Numbers

Quantitative information arises everywhere in data science. In addition to representing commands to print out lines, expressions can represent numbers and methods of combining numbers. The expression `3.2500` evaluates to the number 3.25. (Run the cell and see.)

In [None]:
3.2500

Notice that we didn't have to `print`. When you run a notebook cell, if the last line has a value, then Jupyter helpfully prints out that value for you. However, it won't print out prior lines automatically.

In [None]:
print(2)
3
4

Above, you should see that 4 is the value of the last expression, 2 is printed, but 3 is lost forever because it was neither printed nor last.

You don't want to print everything all the time anyway.  But if you feel sorry for 3, change the cell above to print it.

### 2.1. Arithmetic
The line in the next cell subtracts.  Its value is what you'd expect.  Run it.

In [None]:
3.25 - 1.5

Many basic arithmetic operations are built into Python.  The textbook section on [Expressions](http://www.inferentialthinking.com/chapters/03/1/expressions.html) describes all the arithmetic operators used in the course.  The common operator that differs from typical math notation is `**`, which raises one number to the power of the other. So, `2**3` stands for $2^3$ and evaluates to 8. 

The order of operations is the same as what you learned in elementary school, and Python also has parentheses.  For example, compare the outputs of the cells below. The second cell uses parentheses for a happy new year!

In [None]:
3+6*5-6*3**2*2**3/4*7

In [None]:
6+(6*5-(6*3))**2*((2**3)/4*7)

In standard math notation, the first expression is

$$3 + 6 \times 5 - 6 \times 3^2 \times \frac{2^3}{4} \times 7,$$

while the second expression is

$$6 + (6 \times 5 - (6 \times 3))^2 \times (\frac{(2^3)}{4} \times 7).$$

#### Exercise

Write a Python expression in this next cell that's equal to $5 \times (3 \frac{10}{11}) - 50 \frac{1}{3} + 2^{.5 \times 22} - \frac{7}{33} + 4$.  That's five times three and ten elevenths, minus fifty and a third, plus two to the power of half twenty-two, minus seven thirty-thirds plus three.  By "$3 \frac{10}{11}$" we mean $3+\frac{10}{11}$, not $3 \times \frac{10}{11}$.

Replace the ellipses (`...`) with your expression.  Try to use parentheses only when necessary.

*Hint:* The correct output should start with a familiar number.

In [None]:
1+5*(3+10/11.0) -(50+1/3.0) + (2**(0.5*22)) -(7/33.0) + 4 # SOLUTION

## 3. Names
In natural language, we have terminology that lets us quickly reference very complicated concepts.  We don't say, "That's a large ruminant with horns, hoofs, and a tail!"  Instead, we just say, "cow!"

In Python, we do this with *assignment statements*. An assignment statement has a name on the left side of an `=` sign and an expression to be evaluated on the right.

In [None]:
ten = 3 * 2 + 4

When you run that cell, Python first computes the value of the expression on the right-hand side, `3 * 2 + 4`, which is the number 10.  Then it assigns that value to the name `ten`.  At that point, the code in the cell is done running.

After you run that cell, the value 10 is bound to the name `ten`:

In [None]:
ten

The statement `ten = 3 * 2 + 4` is not asserting that `ten` is already equal to `3 * 2 + 4`, as we might expect by analogy with math notation.  Rather, that line of code changes what `ten` means; it now refers to the value 10, whereas before it meant nothing at all.

If the designers of Python had been ruthlessly pedantic, they might have made us write

    define the name ten to hereafter have the value of 3 * 2 + 4 

instead.  You will probably appreciate the brevity of "`=`"!  But keep in mind that this is the real meaning.

#### Exercise

Try writing code that uses a name (like `eleven`) that hasn't been assigned to anything.  You'll see an error!

In [None]:
...

A common pattern in Jupyter notebooks is to assign a value to a name and then immediately evaluate the name in the last line in the cell so that the value is displayed as output. 

In [None]:
close_to_pi = 355/113
close_to_pi

Another common pattern is that a series of lines in a single cell will build up a complex computation in stages, naming the intermediate results.

In [None]:
semimonthly_salary = 841.25
monthly_salary = 2 * semimonthly_salary
number_of_months_in_a_year = 12
yearly_salary = number_of_months_in_a_year * monthly_salary
yearly_salary

Names in Python can have letters (upper- and lower-case letters are both okay and count as different letters), underscores, and numbers.  The first character can't be a number (otherwise a name might look like a number).  And names can't contain spaces, since spaces are used to separate pieces of code from each other.

Other than those rules, what you name something doesn't matter *to Python*.  For example, this cell does the same thing as the above cell, except everything has a different name:

In [None]:
a = 841.25
b = 2 * a
c = 12
d = c * b
d

**However, names are very important for making your code *readable* to yourself and others.**  The cell above is shorter, but it's totally useless without an explanation of what it does.

## 4. Checking your Code


Now that you know how to name things, you can start using the built-in *tests* to check whether your work is correct. Sometimes, there are multiple tests for a single question, and passing all of them is required to receive credit for the question. Please don't change the contents of the test cells. 

Go ahead and attempt the next question. Running the cell directly after it will test whether you have assigned `seconds_in_a_decade` correctly. If you haven't, this test will tell you the correct answer. Resist the urge to just copy it, and instead try to adjust your expression.

#### Exercise

Assign the name `seconds_in_a_decade` to the number of seconds between midnight January 1, 2010 and midnight January 1, 2020. Note that there are two leap years in this span of a decade. A non-leap year has 365 days and a leap year has 366 days.

*Hint:* If you're stuck, the next section shows you how to get hints.

In [None]:
# Change the next line 
# so that it computes the number of seconds in a decade 
# and assigns that number the name, seconds_in_a_decade.

seconds_in_a_decade = ...

# We've put this line in this cell 
# so that it will print the value you've given to seconds_in_a_decade when you run it.  
# You don't need to change this.
seconds_in_a_decade

In [None]:
grader.check("q4.1")

## 5. Comments
You may have noticed these lines in the cell in which you answered the previous question:

    # Change the next line 
    # so that it computes the number of seconds in a decade 
    # and assigns that number the name, seconds_in_a_decade.
    
This is called a *comment*. It doesn't make anything happen in Python; Python ignores anything on a line after a `#`.  Instead, it's there to communicate something about the code to you, the human reader. Comments are extremely useful. 

## 6. Calling Functions

The most common way to combine or manipulate values in Python is by calling functions. Python comes with many built-in functions that perform common operations.

For example, the `abs` function takes a single number as its argument and returns the absolute value of that number. Run the next two cells and see if you understand the output.

In [None]:
abs(5)

In [None]:
abs(-5)

#### Exercise 

Chunhua is on the corner of 7th Avenue and 42nd Street in Midtown Manhattan, and she wants to know far she'd have to walk to get to Gramercy School on the corner of 10th Avenue and 34th Street.

She can't cut across blocks diagonally, since there are buildings in the way.  She has to walk along the sidewalks.  Using the map below, she sees she'd have to walk 3 avenues (long blocks) and 8 streets (short blocks).  In terms of the given numbers, she computed 3 as the difference between 7 and 10, *in absolute value*, and 8 similarly.  

Chunhua also knows that blocks in Manhattan are all about 80m by 274m (avenues are farther apart than streets).  So in total, she'd have to walk $(80 \times |42 - 34| + 274 \times |7 - 10|)$ meters to get to the park.

<img src="map.jpg"/>

Fill in the line `num_avenues_away = ...` in the next cell so that the cell calculates the distance Chunhua must walk and gives it the name `manhattan_distance`.  Everything else has been filled in for you.  **Use the `abs` function.** Also, be sure to run the test cell afterward to test your code.


In [None]:
# Here's the number of streets away:
num_streets_away = abs(42-34)

# Compute the number of avenues away in a similar way:
num_avenues_away = ...

street_length_m = 80
avenue_length_m = 274

# Now we compute the total distance Chunhua must walk.
manhattan_distance = street_length_m*num_streets_away + avenue_length_m*num_avenues_away

# We've included this line so that you see the distance you've computed 
# when you run this cell.  
# You don't need to change it, but you can if you want.
manhattan_distance

In [None]:
grader.check("q6.1")

#### Multiple arguments
Some functions take multiple arguments, separated by commas. For example, the built-in `max` function returns the maximum argument passed to it.

In [None]:
max(2, -3, 4, -5)

## 7. Interpreting Graphs (10 pts)



The textbook describes counting the number of times that the literary characters were named in each chapter of the classic book, [*Little Women*](https://www.inferentialthinking.com/chapters/01/3/1/literary-characters). In computer science, the word "character" also refers to a letter, digit, space, or punctuation mark; any single element of a text. The following code generates a scatter plot in which each dot corresponds to a chapter of *Little Women*. The horizontal position of a dot measures the number of periods in the chapter. The vertical position measures the total number of characters.

In [None]:
# This cell contains code that hasn't yet been covered in the course,
# but you should be able to interpret the scatter plot it generates.

from datascience import *
from urllib.request import urlopen
import numpy as np
%matplotlib inline

little_women_url = 'https://www.inferentialthinking.com/data/little_women.txt'
chapters = urlopen(little_women_url).read().decode().split('CHAPTER ')[1:]
text = Table().with_column('Chapters', chapters)
Table().with_columns(
    'Periods',    np.char.count(chapters, '.'),
    'Characters', text.apply(len, 0)
    ).scatter(0)

#### Part 7.1 (5 pts)


Around how many periods are there in the chapter with the most characters? Assign either 1, 2, 3, 4, or 5 to the name `characters_q1` below.

1. 250
2. 390
3. 440
4. 32,000
5. 40,000

In [None]:
characters_q1 = ...

In [None]:
grader.check("q7.1")

The test above checks that your answers are in the correct format. **This test does not check that you answered correctly**, only that you assigned a number successfully in each multiple-choice answer cell.

#### Part 7.2 (5 pts)


Which of the following chapters has the most characters per period? Assign either 1, 2, or 3 to the name `characters_q2` below.
1. The chapter with about 60 periods
2. The chapter with about 350 periods
3. The chapter with about 440 periods

In [None]:
characters_q2 = ...

In [None]:
grader.check("q7.2")

Again, the test above checks that your answers are in the correct format, but not that you have answered correctly.

To discover more interesting facts from this plot, read [Section 1.3.2](https://inferentialthinking.com/chapters/01/3/2/Another_Kind_Of_Character.html) of the textbook.

## 8. Names and Assignment Statements (15 pts)



#### Part 8.1 (5 pts)


When you run the following cell, Python produces a cryptic error message.

In [None]:
4 = 2 + 2

Choose the best explanation of what's wrong with the code, and then assign 1, 2, 3, or 4 to `names_q1` below to indicate your answer.

1. Python is smart and already knows `4 = 2 + 2`.

2. `4` is already a defined number, and it doesn't make sense to make a number be a name for something else. In Python, "`x = 2 + 2`" means "assign `x` as the name for the value of `2 + 2`."

3. It should be `2 + 2 = 4`.

In [None]:
names_q1 = ...

In [None]:
grader.check("q8.1")

#### Part 8.2 (5 pts)


When you run the following cell, Python will produce another cryptic error message.

In [None]:
two = 3
six = two plus two

Choose the best explanation of what's wrong with the code and assign 1, 2, 3, or 4 to `names_q2` below to indicate your answer.

1. The `plus` operation only applies to numbers, not the word "two".

2. The name "two" cannot be assigned to the number 3.

3. Two plus two is four, not six.

4. Python cannot interpret the name `plus`, as it has not been defined.

<!--
BEGIN QUESTION
name: q3_2
manual: False
points:
 - 0
 - 4
-->

In [None]:
names_q2 = ...

In [None]:
grader.check("q8.2")

#### Part 8.3 (5 pts)


When you run the following cell, Python will, yet again, produce another cryptic error message.

In [None]:
x = print(5)
y = x + 2

Choose the best explanation of what's wrong with the code and assign 1, 2, or 3 to `names_q3` below to indicate your answer.

1. Python doesn't want `y` to be assigned.

2. The `print` operation is meant for displaying values to the programmer, not for assigning values!

3. Python can’t do addition between one name and one number. It has to be 2 numbers or 2 predefined names.

<!--
BEGIN QUESTION
name: q3_3
manual: false
points:
 - 0
 - 4
-->

In [None]:
names_q3 = ...

In [None]:
grader.check("q8.3")

## 9. Differences between Majors (20 pts)



Adapted from information found on the Williams website, the table below displays the average number of degree recipients in three majors for the years 2008-2012 and 2011-2222.

| Major                              | 2008-2012    | 2018-2021   |
|------------------------------------|--------------|-------------|
| Comparative Literature             |  5           | 11          |
| Psychology                         | 62           | 45          |
| Mathematics                        | 53           | 61          |


#### Part 9.1 (5 pts)


Suppose you want to find the **biggest** absolute difference between the numbers of degree recipients in the two years, among the three majors.

In the cell below, compute this value and call it `biggest_change`. Use a single expression (a single line of code) to compute the answer. Let Python perform all the arithmetic (like subtracting 5 from 11) rather than simplifying the expression yourself. The built-in `abs` function takes a numerical input and returns the absolute value. The built-in `max` function can take in 3 arguments and returns the maximum of the three numbers.

In [None]:
biggest_change = ...
biggest_change

In [None]:
grader.check("q9.1")

#### Part 9.2 (5 pts)


Which of the three majors had the **smallest** absolute difference? Assign `smallest_change_major` to 1, 2, or 3 where each number corresponds to the following major:

1. Comparative Literature
2. Psychology
3. Mathematics

Choose the number that corresponds to the major with the smallest absolute difference.

You should be able to answer by rough mental arithmetic, without having to calculate the exact value for each major.

In [None]:
smallest_change_major = ...
smallest_change_major

In [None]:
grader.check("q9.2")

#### Part 9.3 (5 pts)


For each major, define the “relative change” to be the following: $\large{\frac{\text{absolute difference}}{\text{value in 2008-2012}} * 100}$ 

Fill in the code below such that `gws_relative_change`, `linguistics_relative_change` and `rhetoric_relative_change` are assigned to the relative changes for their respective majors.

In [None]:
complit_relative_change = (abs(...) / 5) * 100

psych_relative_change = ...
math_relative_change = ...
complit_relative_change, psych_relative_change, math_relative_change

In [None]:
grader.check("q9.3")

#### Part 9.4 (5 pts)


Assign `biggest_rel_change_major` to 1, 2, or 3 where each number corresponds to to the following: 

1. Comparative Literature
2. Psychology
3. Mathematics

Choose the number that corresponds to the major with the biggest relative change. 

In [None]:
# Assign biggest_rel_change_major to the number corresponding to the major with the biggest relative change.
biggest_rel_change_major = ...
biggest_rel_change_major

In [None]:
grader.check("q9.4")

Myopia, or nearsightedness, results from a number of genetic and environmental factors. In 1999, Quinn et al studied the relation between myopia and ambient lighting at night (for example, from nightlights or room lights) during childhood.

## 10. Welcome Survey (5 pts)



In [None]:
#### Part 10.1 (5 pts)


Once you have submitted, please also complete the CS 104 welcome survey: **Do we want a survey to get some data about students?**

Assign `survey` to the secret string given at the end of the welcome survey:

In [None]:
survey = ...

## 11. You're Done!


**Important submission information:** Follow these steps to submit your work:
* Run the tests and verify that they pass as you expect. 
* Choose **Save Notebook** from the **File** menu.
* **Run the final cell** and click the link below to download the zip file. 

Once you have downloaded that file, go to [Gradescope](https://www.gradescope.com/) and submit the zip file to the corresponding assignment. The name of this assignment is "Lab 1 Autograder". **Be sure your work is saved before running the last cell!**

Once you have submitted, your Gradescope assignment should look something like the following image if you have passed all tests.

NOTE: *This is an image of a generic Gradescope submission result --- it does not included the same test numbers as this assignment's.*

<img src="gradescope.png">

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()