# Python Basics

Please complete this notebook by filling in the cells provided.

**Recommended Readings:**

- [Programming in Python](http://www.inferentialthinking.com/chapters/03/programming-in-python.html)

For all problems that you must write explanations and sentences for, you **must** provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. 

**Deadline:**

This assignment is **due Friday, 10/5 at 9:00 PM**.

Directly sharing answers is not okay, but discussing problems with the instructor or with other students is encouraged.

You should **start early** as the assignment is **huge** so that you have time to get help if you're stuck. Email Wenxi if needed. Happy to meet and discuss any questions you have. Just email me to make an appointment.

## 1. Names and Assignment Statements

**Question 1.1.** When you run the following cell, Python produces a cryptic error message.

In [None]:
4 = 2 + 2

Choose the best explanation of what's wrong with the code, and then assign 1, 2, 3, or 4 to `names_q1` below to indicate your answer. 

1. Python is smart and already knows `4 = 2 + 2`.

2. In Python, it's a rule that the `=` sign must have a variable name to its left, and `4` isn't a variable name.

3. It should be `2 + 2 = 4`.

4. I don't get an error message. This is a trick question.


In [None]:
names_q1 = ...

**Question 1.2.** When you run the following cell, Python will produce another cryptic error message.

In [None]:
two = 3
six = two plus two

Choose the best explanation of what's wrong with the code and assign 1, 2, 3, or 4 to `names_q2` below to indicate your answer. 

1. The `plus` operation only applies to numbers, not the word "two".

2. The name "two" cannot be assigned to the number 3.

3. Two plus two is four, not six.

4. The name `plus` isn't a built-in operator; instead, addition uses `+`.


In [None]:
names_q2 = ...

**Question 1.3.** Run the following cell.

In [None]:
x = 2
y = 3 * x
x = 4

What is `y` after running this cell, and why? Choose the best explanation and assign 1, 2, 3, or 4 to `names_q3` below to indicate your answer. 

1. `y` is equal to 6, because the second `x = 4` has no effect since `x` was already defined.

2. `y` is equal to 6, because `x` was 2 when `y` was assigned, and 3 * 2 is 6.

3. `y` is equal to 12, because `x` is 4 and 3 * 4 is 12.

4. `y` is equal to 12, because assigning `x` to 4 will update `y` to 12 since `y` was defined in terms of `x`.


In [None]:
names_q3 = ...

## 2. Differences Between Majors

Berkeley’s Office of Planning and Analysis (OPA) provides data on numerous aspects of the campus. Adapted from the OPA website, the table below displays the number of degree recipients in three majors in the 2008-2009 and 2017-2018 academic years.

| Major                              | 2008-2009    | 2017-2018   |
|------------------------------------|--------------|-------------|
| Gender and Women's Studies         |      17      |    28       |
| Linguistics                        |      49      |    67       |
| Rhetoric                           |      113     |    56       |



**Question 2.1.** Suppose you want to find the **biggest** absolute difference between the number of degree recipients in the two years, among the three majors.

In the cell below, compute this value and call it `biggest_change`. Use a single expression (a single line of code) to compute the answer. Let Python perform all the arithmetic (like subtracting 49 from 67) rather than simplifying the expression yourself. The built-in `abs` function takes a numerical input and returns the absolute value. The built-in `max` function can take in 3 arguments and returns the maximum of the three numbers. 


In [None]:
biggest_change = ...
biggest_change

**Question 2.2.** Which of the three majors had the **smallest** absolute difference? Assign `smallest_change_major` to 1, 2, or 3 where each number corresponds to the following major:

1. Gender and Women's Studies  
2. Linguistics  
3. Rhetoric

Choose the number that corresponds to the major with the smallest absolute difference. 

_Hint:_ You should be able to answer by rough mental arithmetic, without having to calculate the exact value for each major.


In [None]:
smallest_change_major = ...
smallest_change_major

**Question 2.3.**  For each major, define the “relative change” to be the following: $\large{\frac{\text{absolute difference}}{\text{value in 2008-2009}} * 100}$ 

Fill in the code below such that `gws_relative_change`, `linguistics_relative_change` and `rhetoric_relative_change` are assigned to the relative changes for their respective majors.


In [None]:
gws_relative_change = (abs(...) / 17) * 100
linguistics_relative_change = ...
rhetoric_relative_change = ...
gws_relative_change, linguistics_relative_change, rhetoric_relative_change

**Question 2.4.** Assign `biggest_rel_change_major` to 1, 2, or 3 where each number corresponds to to the following: 

1. Gender and Women's Studies  
2. Linguistics  
3. Rhetoric

Choose the number that corresponds to the major with the biggest relative change.


In [None]:
biggest_rel_change_major = ...
biggest_rel_change_major

## 3. Creating Arrays

In [None]:
# Run this cell to set up the notebook, but please don't change it.

import numpy as np
from datascience import *
import warnings
warnings.simplefilter('ignore', FutureWarning)

**Question 3.1.** Make an array called `weird_numbers` containing the following numbers (in the given order)

1. -2
2. the floor of 17.6
3. 3
4. 5 to the power of the ceil of 5.3

*Hint:* `floor` and `ceil` are functions in the `math` module. Importing modules is covered in Excercise 1!

*Note:* Python lists are different/behave differently than NumPy arrays. In Data 8, we use NumPy arrays, so please make an **array**, not a Python list.


In [None]:
# Our solution involved one extra line of code before creating weird_numbers.
...
weird_numbers = ...
weird_numbers

**Question 3.2.** Make an array called `book_title_words` containing the following three strings: "Eats", "Shoots", and "and Leaves".


In [None]:
book_title_words = ...
book_title_words

Strings have a method called `join`.  `join` takes one argument, an array of strings.  It returns a single string.  Specifically, the value of `a_string.join(an_array)` is a single string that's the [concatenation](https://en.wikipedia.org/wiki/Concatenation) ("putting together") of all the strings in `an_array`, **except** `a_string` is inserted in between each string.

**Question 3.3.** Use the array `book_title_words` and the method `join` to make two strings:

1. "Eats, Shoots, and Leaves" (call this one `with_commas`)
2. "Eats Shoots and Leaves" (call this one `without_commas`)

*Hint:* If you're not sure what `join` does, first try just calling, for example, `"MIS3020".join(book_title_words)` .


In [None]:
with_commas = ...
without_commas = ...

# These lines are provided just to print out your answers.
print('with_commas:', with_commas)
print('without_commas:', without_commas)

## 4. Indexing Arrays

The following questions let you practice accessing individual elements of arrays.  In Python (and in many programming languages), each element is accessed by its *index*; for example, the first element is the element at index 0. Indices must be **integers**.

**Question 4.1.** The cell below creates an array of some numbers.  Set `third_element` to the third element of `some_numbers`.


In [None]:
some_numbers = make_array(-1, -3, -6, -10, -15)

third_element = ...
third_element

**Question 4.2.** The next cell creates a table that displays some information about the elements of `some_numbers` and their order.  Run the cell to see the partially-completed table, then fill in the missing information (the cells that say "Ellipsis") by assigning `blank_a`, `blank_b`, `blank_c`, and `blank_d` to the correct elements in the table.

*Hint:* Replace the `...` with strings or numbers. As a reminder, indices should be **integers**.


In [None]:
blank_a = ...
blank_b = ...
blank_c = ...
blank_d = ...
elements_of_some_numbers = Table().with_columns(
    "English name for position", make_array("first", "second", blank_a, blank_b, "fifth"),
    "Index",                     make_array(blank_c, 1, 2, blank_d, 4),
    "Element",                   some_numbers)
elements_of_some_numbers

**Question 4.3.** You'll sometimes want to find the **last** element of an array.  Suppose an array has 142 elements.  What is the index of its last element?

*Note:* Your answer must be a positive number.


In [None]:
index_of_last_element = ...

More often, you don't know the number of elements in an array, its *length*.  (For example, it might be a large dataset you found on the Internet.)  The function `len` takes a single argument, an array, and returns an integer that represents the `len`gth of that array.

**Question 4.4.** The cell below loads an array called `president_birth_years`.  Calling `tbl.column(...)` on a table returns an array of the column specified, in this case the `Birth Year` column of the `president_births` table. The last element in that array is the most recent among the birth years of all the deceased Presidents. Assign that year to `most_recent_birth_year`.

**Note:** Avoid Googling the answer. You should be able to answer this question only using table methods.


In [None]:
president_birth_years = Table.read_table("president_births.csv").column('Birth Year')

most_recent_birth_year = ...
most_recent_birth_year

**Question 4.5.** Finally, assign `min_of_birth_years` to the minimum of the first, sixteenth, and last birth years listed in `president_birth_years`.

**Note:** Use the Python `min` function and table methods to find the answer. Avoid manually calculating the result yourself!


In [None]:
min_of_birth_years = ...
min_of_birth_years

## 5. Basic Array Arithmetic

**Question 5.1.** Multiply the numbers 42, -4224, 424224242, and 250 by 157. Assign each variable below such that `first_product` is assigned to the result of $42 * 157$, `second_product` is assigned to the result of $-4224 * 157$, and so on.

*Note*: For this question, **don't** use arrays.


In [None]:
first_product = ...
second_product = ...
third_product = ...
fourth_product = ...
print("First Product:", first_product)
print("Second Product:", second_product)
print("Third Product:", third_product)
print("Fourth Product:", fourth_product)

**Question 5.2.** Now, do the same calculation, but using an array called `numbers` and only a single multiplication (`*`) operator.  Store the 4 results in an array named `products`. 


In [None]:
numbers = ...
products = ...
products

**Question 5.3.** Oops, we made a typo!  Instead of 157, we wanted to multiply each number by 1577.  Compute the correct products in the cell below using array arithmetic.  Notice that your job is really easy if you previously defined an array containing the 4 numbers.


In [None]:
correct_products = ...
correct_products

**Question 5.4.** We've loaded an array of temperatures in the next cell.  Each number is the highest temperature observed on a day at a climate observation station, mostly from the US.  Since they're from the US government agency [NOAA](https://www.noaa.gov/), all the temperatures are in Fahrenheit.

Convert all the temperatures to Celsius by first subtracting 32 from them, then multiplying the results by $\frac{5}{9}$, i.e. $C = (F - 32) * \frac{5}{9}$. After converting the temperatures to Celsius, make sure to **ROUND** the final result  to the nearest integer using the `np.round` function.


In [None]:
max_temperatures = Table.read_table("temperatures.csv").column("Daily Max Temperature")

max_temperatures_celsius = ...
celsius_temps_rounded = ...
celsius_temps_rounded

**Question 5.5.** The cell below loads all the *lowest* temperatures from each day (in Fahrenheit).  Compute the daily temperature range for each day. That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature.  **Pay attention to the units and give your answer in Celsius!** <span style="color:red">Make sure **NOT** to round your answer for this question!</span> 

*Hint:* Use `min_temperatures` and/or `max_temperatures`, and be careful with when you perform your unit conversions. Write out the mathematical computation by hand if you're stuck!



In [None]:
min_temperatures = Table.read_table("temperatures.csv").column("Daily Min Temperature")

celsius_temperature_ranges = ...
celsius_temperature_ranges

## 6. Tables

**Question 6.1.** Suppose you have 4 apples, 3 oranges, and 3 pineapples.  (Perhaps you're using Python to solve a high school Algebra problem.)  Create a table that contains this information.  It should have two columns: `fruit name` and `amount`.  Assign the new table to the variable `fruits`.

*Note:* Use lower-case and singular words for the name of each fruit, like `"apple"`.


In [None]:
# Our solution uses 1 statement split over 3 lines.
fruits = ...
        ...
        ...
fruits

**Question 6.2.** The file `inventory.csv` contains information about the inventory at a fruit stand.  Each row represents the contents of one box of fruit. Load it as a table named `inventory` using the `Table.read_table()` function. `Table.read_table(...)` takes one argument (data file name in string format) and returns a table. 


In [None]:
inventory = ...
inventory

**Question 6.3.** Does each box at the fruit stand contain a different fruit? Set `all_different` to `True` if each box contains a different fruit or to `False` if multiple boxes contain the same fruit.

*Hint:* You don't have to write code to calculate the True/False value for `all_different`. Just look at the `inventory` table and assign `all_different` to either `True` or `False` according to what you can see from the table in answering the question.


In [None]:
all_different = ...
all_different

**Question 6.4.** The file `sales.csv` contains the number of fruit sold from each box last Saturday.  It has an extra column called `price per fruit ($)` that's the price *per item of fruit* for fruit in that box.  The rows are in the same order as the `inventory` table.  Load these data into a table called `sales`. 


In [None]:
sales = ...
sales

**Question 6.5.** How many fruits did the store sell in total on that day?


In [None]:
total_fruits_sold = ...
total_fruits_sold

**Question 6.6.** What was the store's total revenue (the total price of all fruits sold) on that day?

*Hint:* If you're stuck, think first about how you would compute the total revenue from just the grape sales.


In [None]:
total_revenue = ...
total_revenue

**Question 6.7.** Make a new table called `remaining_inventory`.  It should have the same rows and columns as `inventory`, except that the amount of fruit sold from each box should be subtracted from that box's **original** count, so that the `count` column is **updated to be** the amount of fruit remaining after Saturday.


In [None]:
...remaining_inventory = ...
    ...
    ...
    ...
remaining_inventory

## 7 Functions and Iteration

Next let's do some more functions and interations.


**Recommended Readings**: 

* [Applying Functions](https://www.inferentialthinking.com/chapters/08/1/Applying_a_Function_to_a_Column.html)
* [Conditionals](https://www.inferentialthinking.com/chapters/09/1/Conditional_Statements.html)
* [Iteration](https://www.inferentialthinking.com/chapters/09/2/Iteration.html)

James is trying to analyze how well the Cal football team performed in the 2021 season. A football game is divided into four periods, called quarters. The number of points Cal scored in each quarter and the number of points their opponent scored in each quarter are stored in a table called `cal_fb.csv`.

In [None]:
# Just run this cell
# Read in the cal_fb csv file
games = Table().read_table("cal_fb.csv")
games.show()

Let's start by finding the total points each team scored in a game.

**Question 7.1.** Write a function called `sum_scores`.  It should take four arguments, where each argument represents integers corresponding to the team's score for each quarter. It should return the team's total score for that game. 

*Hint:* Don't overthink this question!



In [None]:
def sum_scores(..., ..., ..., ...):
    '''Returns the total score calculated by adding up the score of each quarter'''
    ...

sum_scores(14, 7, 3, 0) #DO NOT CHANGE THIS LINE

We can get specific row objects from a table. You can use `tbl.row(n)` to get the `n`th row of a table. `row.item("column_name")` will allow you to select the element that corresponds to `column_name` in a particular row. Here's an example:

In [None]:
# Just run this cell
# We got the Axe!
games.row(9) # <-- this will return a row object

In [None]:
# Just run this cell
games.row(9).item("Cal 4Q") # <-- this will return a item (e.g. an int) from a row object

In [None]:
cal_scores = games.apply(sum_scores, 'Cal 1Q', 'Cal 2Q', 'Cal 3Q', 'Cal 4Q')
opp_scores = games.apply(sum_scores, 'Opp 1Q', 'Opp 2Q', 'Opp 3Q', 'Opp 4Q')
final_scores = Table().with_columns(
    'Opponent', games.column('Opponent'),
    'Cal Score', cal_scores,
    'Opponent Score', opp_scores)

final_scores

**Question 7.2.** We want to see for a particular game whether or not Cal lost. Write a function called `did_cal_lose`.  It should take one argument: a **row object** from the `final_scores` table. It should return either `True` if Cal's score was less than the Opponent's score, and `False` otherwise.

*Note 1*: "Row object" means a row from the table extracted (behind the scenes) using `tbl.row(index)` that contains all the data for that specific row. It is **not** the index of a row. Do not try and call `final_scores.row(row)` inside of the function.

*Note 2*: If you're still confused by row objects, try printing out `final_scores.row(1)` in a new cell to visually see what it looks like! This piece of code is pulling out the row object located at index 1 of the `final_scores` table and returning it. When you display it in a cell, you'll see that it is not located within a table, but is instead a standalone row object!


In [None]:
def did_cal_lose(row):
    ...

did_cal_lose(final_scores.row(1)) #DO NOT CHANGE THIS LINE

## 8. Unrolling Loops

"Unrolling" a `for` loop means to manually write out all the code that it executes.  The result is code that does the same thing as the loop, but without the structure of the loop.  For example, for the following loop:

    for num in np.arange(3):
        print("The number is", num)

The unrolled version would look like this:

    print("The number is", 0)
    print("The number is", 1)
    print("The number is", 2)


Unrolling a `for` loop is a great way to understand what the loop is doing during each step. In this exercise, you'll practice unrolling a `for` loop.


In the question below, write code that does the same thing as the given code, but with any `for` loops unrolled.  It's a good idea to run both your answer and the original code to verify that they do the same thing.  (Of course, if the code does something random, you'll get a different random outcome than the original code!)

**Question 8.1.** Unroll the code below.


In [None]:
for joke_iteration in range(4):
    print("Knock. " * (joke_iteration + 1))
    print("Who's there?")
    print("Banana.")
    print("Banana who?")
print("Knock, knock.")
print("Who's there?")
print("Orange.")
print("Orange who?")
print("Orange you glad I didn't say banana?")

In [None]:
...

**Question 8.2.** Unroll the code below.


In [None]:
for joke_iteration in range(4):
    print("Knock. " * (joke_iteration + 1))
    print("Who's there?")
    print("Banana.")
    if joke_iteration == 2:
        break
    print("Banana who?")
print("Knock, knock.")
print("Who's there?")
print("Orange.")
print("Orange who?")
print("Orange you glad I didn't say banana?")

**Question 8.3.** Unroll the code below.


In [None]:
for joke_iteration in range(4):
    if joke_iteration == 2:
        continue
    print("Knock. " * (joke_iteration + 1))
    print("Who's there?")
    print("Banana.")
    print("Banana who?")
print("Knock, knock.")
print("Who's there?")
print("Orange.")
print("Orange who?")
print("Orange you glad I didn't say banana?")

**Question 8.4.** Translate the for loop in Question 1 of this section into a while loop.

In [None]:
...

You're done with this Assignment!!! 

**Important submission information:** Be sure to rename your work as **Assignment_YourFirstName_YourLastName** and submit the notebook file (.ipynb) to UM Learn **It is your responsibility to make sure your work is properly saved before submission**

**Kim** offers congrats on finishing this epic assignment! Just a few questions after all! 

<img src="./kim.jpg" width="30%" alt="Close up picture of dog smiling."/>

