# Lab 2: Arrays and Tables

Please complete this notebook by filling in the cells provided. Before you begin, execute the previous cell to load the provided tests.

**Helpful Resource:**
- [Python Reference](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html): 

**Recommended Readings:**
- [Arrays](https://inferentialthinking.com/chapters/05/1/Arrays.html)
- [What is Data Science?](http://www.inferentialthinking.com/chapters/01/what-is-data-science.html)
- [Causality and Experiments](http://www.inferentialthinking.com/chapters/02/causality-and-experiments.html) 
- [Programming in Python](http://www.inferentialthinking.com/chapters/03/programming-in-python.html)

For all problems that you must write explanations and sentences for, you **must** provide your answer in the designated space. Moreover, throughout this lab and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you may fail tests that you thought you were passing previously!

**Note: We will run more tests than what we provide to you.  You may pass all of tests in this notebook, but your final grade will only be 100% if it passes all of our tests too. We will be running more tests for correctness once everyone turns in the homework.

## 1. Creating Arrays

In [None]:
# Run this cell to set up the notebook, but please don't change it.

import numpy as np
from datascience import *
import warnings
warnings.simplefilter('ignore', FutureWarning)

#### Part 1.1

Make an array called `numbers` containing the following numbers (in the given order)

1. -2
2. the floor of 12.6
3. 3
4. 5 to the power of the ceil of 5.3

*Hint:* `floor` and `ceil` are functions in the `math` module. Importing modules is covered in 2.1 of Lab 2!

*Note:* Python lists are different/behave differently than NumPy arrays. In Data 8, we use NumPy arrays, so please make an **array**, not a Python list.

In [None]:
# Our solution involved one extra line of code before creating
# numbers.
import math # SOLUTION
numbers = make_array(-2, math.floor(12.5), 3, 5 ** math.ceil(5.3)) # SOLUTION
numbers

In [None]:
""" # BEGIN TEST CONFIG
points: 0
hidden: false
failure_message: Be sure to create an array with four elements.
""" # END TEST CONFIG
len(numbers)

In [None]:
# HIDDEN 
numbers.item(0) == -2

In [None]:
# HIDDEN
numbers.item(1) == 12

In [None]:
# HIDDEN
numbers.item(2) == 3

In [None]:
# HIDDEN
numbers.item(3) == 15625

#### Part 1.2

Make an array called `book_title_words` containing the following three strings: "Eats", "Shoots", and "and Leaves".

In [None]:
book_title_words = make_array("Eats", "Shoots", "and Leaves") # SOLUTION
book_title_words

In [None]:
""" # BEGIN TEST CONFIG
points: 0
hidden: false
failure_message: Be sure to use make_array to create an array.
""" # END TEST CONFIG
import numpy as np
type(book_title_words) == np.ndarray

In [None]:
len(book_title_words) == 3

In [None]:
book_title_words.item(0) == 'Eats'

In [None]:
book_title_words.item(1) == 'Shoots'

In [None]:
book_title_words.item(2) == 'and Leaves'

#### Part 1.3

Strings have a method called `join`.  `join` takes one argument, an array of strings.  It returns a single string.  Specifically, the value of `a_string.join(an_array)` is a single string that's the [concatenation](https://en.wikipedia.org/wiki/Concatenation) ("putting together") of all the strings in `an_array`, **with** `a_string` inserted in between each string.

Use the array `book_title_words` and the method `join` to make two strings:

1. "Eats, Shoots, and Leaves" (call this one `with_commas`)
2. "Eats Shoots and Leaves" (call this one `without_commas`)

In [None]:
with_commas = ", ".join(book_title_words) # SOLUTION
without_commas = " ".join(book_title_words) # SOLUTION

# These lines are provided just to print out your answers.
print('with_commas:', with_commas)
print('without_commas:', without_commas)

In [None]:
with_commas == 'Eats, Shoots, and Leaves'

In [None]:
without_commas == 'Eats Shoots and Leaves'

## 2. Indexing Arrays

These exercises give you practice accessing individual elements of arrays with the `array.item(index)` method.  In Python, elements are accessed by its *index*; for example, the first element is the element at index 0. Indices must be **integers**.

***Note:* If you have previous coding experience, you may be familiar with bracket notation. DO NOT use bracket notation when indexing (i.e. `arr[0]`), as this can yield different data type outputs than what we will be expecting. This can cause you to fail an autograder test.**

Be sure to refer to the [Python Reference](http://data8.org/fa21/python-reference.html) on the website if you feel stuck!

#### Part 2.1

The cell below creates an array of some numbers.  Set `third_element` to the third element of `some_numbers`.

In [None]:
some_numbers = make_array(-1, -3, -6, -10, -15)

third_element = some_numbers.item(2) # SOLUTION
third_element

In [None]:
""" # BEGIN TEST CONFIG
points: 0
hidden: false
failure_message: Remember that indices start at 0...
""" # END TEST CONFIG
third_element != -3

In [None]:
# HIDDEN
third_element == -6

#### Part 2.2

The next cell creates a table that displays some information about the elements of `some_numbers` and their order.  Run the cell to see the partially-completed table, then fill in the missing information (the cells that say "Ellipsis") by assigning `blank_a`, `blank_b`, `blank_c`, and `blank_d` to the correct elements in the table.

*Hint:* Replace the `...` with strings or numbers. As a reminder, indices should be **integers**.

In [None]:
blank_a = "third" # SOLUTION
blank_b = "fourth" # SOLUTION
blank_c = 0 # SOLUTION
blank_d = 3 # SOLUTION
elements_of_some_numbers = Table().with_columns(
    "English name for position", make_array("first", "second", blank_a, blank_b, "fifth"),
    "Index",                     make_array(blank_c, 1, 2, blank_d, 4),
    "Element",                   some_numbers)
elements_of_some_numbers

In [None]:
elements_of_some_numbers.column(0).item(2) == 'third'

In [None]:
elements_of_some_numbers.column(0).item(3) == 'fourth'

In [None]:
elements_of_some_numbers.column(1).item(0) == 0

In [None]:
elements_of_some_numbers.column(1).item(3) == 3

#### Part 2.3

You'll sometimes want to find the **last** element of an array.  Suppose an array has 142 elements.  What is the index of its last element?

In [None]:
index_of_last_element = 141 # SOLUTION

In [None]:
index_of_last_element == 141

#### Part 2.4

More often, you don't know the number of elements in an array, its *length*.  (For example, it might be a large dataset you found on the Internet.)  The function `len` takes a single argument, an array, and returns the `len`gth of that array (an integer).

The cell below loads an array called `president_birth_years`.  Calling `tbl.column(...)` on a table returns an array of the column specified, in this case the `Birth Year` column of the `president_births` table. The last element in that array is the most recent among the birth years of all the deceased Presidents. Assign that year to `most_recent_birth_year`.

In [None]:
president_birth_years = Table.read_table("president_births.csv").column('Birth Year')

most_recent_birth_year = president_birth_years.item(len(president_birth_years) - 1) # SOLUTION
most_recent_birth_year

In [None]:
most_recent_birth_year == 1917

#### Part 2.5

In [None]:
min_of_birth_years = min(president_birth_years.item(0), president_birth_years.item(15), most_recent_birth_year)
min_of_birth_years

In [None]:
""" # BEGIN TEST CONFIG
points: 0
hidden: false
""" # END TEST CONFIG
min_of_birth_years > 0

In [None]:
# HIDDEN
min_of_birth_years == 1732

## 3. Basic Array Arithmetic

#### Part 3.1

Multiply the numbers 42, -4224, 424224242, and 250 by 157. Assign each variable below such that `first_product` is assigned to the result of $42 * 157$, `second_product` is assigned to the result of $-4224 * 157$, and so on.

For this question, **don't** use arrays.

In [None]:
first_product = 42 * 157 # SOLUTION
second_product = -4224 * 157 # SOLUTION
third_product = 2424224242 * 157 # SOLUTION
fourth_product = 250 * 157 # SOLUTION
print(first_product, second_product, third_product, fourth_product)

In [None]:
first_product == 6594

In [None]:
second_product == -663168

In [None]:
third_product == 380603205994

In [None]:
fourth_product == 39250

#### Part 3.2

Now, do the same calculation, but using an array called `numbers` and only a single multiplication (`*`) operator.  Store the 4 results in an array named `products`.

In [None]:
numbers = make_array(42, -4224, 242224242, 250) # SOLUTION
products = numbers * 157 # SOLUTION
products

In [None]:
np.array_equal(products, [       6594,     -663168, 38029205994,       39250])

#### Part 3.3 (4 poitns)

Oops, we made a typo!  Instead of 157, we wanted to multiply each number by 1577.  Compute the correct products in the cell below using array arithmetic.  Notice that your job is really easy if you previously defined an array containing the 4 numbers.

In [None]:
correct_products = numbers * 1577 # SOLUTION
correct_products

In [None]:
np.array_equal(products, [       66234,     -6661248, 381987629634,       394250])

#### Part 3.4

We've loaded an array of temperatures in the next cell.  Each number is the highest temperature observed on a day at a climate observation station, mostly from the US.  Since they're from the US government agency [NOAA](https://www.noaa.gov/), all the temperatures are in Fahrenheit.  Convert them all to Celsius by first subtracting 32 from them, then multiplying the results by $\frac{5}{9}$. Make sure to **ROUND** the final result after converting to Celsius to the nearest integer using the `np.round` function.

In [None]:
max_temperatures = Table.read_table("temperatures.csv").column("Daily Max Temperature")

celsius_max_temperatures = np.round(9 / 5 * max_temperatures - 32) # SOLUTION
celsius_max_temperatures

In [None]:
celsius_max_temperatures.item(0) == 13

In [None]:
celsius_max_temperatures.item(len(celsius_max_temperatures) - 1) == 78

In [None]:
celsius_max_temperatures.item(50) == 62

#### Part 3.5 

The cell below loads all the *lowest* temperatures from each day (in Fahrenheit).  Compute the daily temperature range for each day. That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature.  **Pay attention to the units, give your answer in Celsius!** Make sure **NOT** to round your answer for this question! 

*Note:* Remember that in the previous part, `celsius_max_temperatures` was rounded, so you might not want to use that in this question.

In [None]:
min_temperatures = Table.read_table("temperatures.csv").column("Daily Min Temperature")

celsius_temperature_ranges = np.round(9 / 5 * (max_temperatures - min_temperatures)) # SOLUTION
celsius_temperature_ranges

In [None]:
celsius_temperature_ranges.item(0) == 22.0

In [None]:
celsius_temperature_ranges.item(len(celsius_max_temperatures) - 1) == 36.0

In [None]:
celsius_temperature_ranges.item(50) == 38.0

## 4. Old Faithful

[Old Faithful](https://en.wikipedia.org/wiki/Old_Faithful) is a geyser in Yellowstone that erupts every 44 to 125 minutes. People are [often told that the geyser erupts every hour](http://yellowstone.net/geysers/old-faithful/), but in fact the waiting time between eruptions is more variable. Let's take a look.

#### Part 4.1

The first line below assigns `waiting_times` to an array of 272 consecutive waiting times between eruptions, taken from a classic 1938 dataset. Assign the names `shortest`, `longest`, and `average` so that the `print` statement is correct. **(4 Points)**

In [None]:
waiting_times = Table.read_table('old_faithful.csv').column('waiting')

shortest = min(waiting_times) # SOLUTION
longest = max(waiting_times) # SOLUTION
average = np.mean(waiting_times) # SOLUTION

print("Old Faithful erupts every", shortest, "to", longest, "minutes and every", average, "minutes on average.")

In [None]:
shortest > 40

In [None]:
longest > shortest

In [None]:
shortest == 43

In [None]:
longest == 96

In [None]:
np.isclose(average,70.897058823)

#### Part 4.2

Assign `biggest_decrease` to the biggest decrease in waiting time between two consecutive eruptions. For example, the third eruption occurred after 74 minutes and the fourth after 62 minutes, so the decrease in waiting time was 74 - 62 = 12 minutes.

*Hint*: We want to return the absolute value of the biggest decrease.

In [None]:
# np.diff() calculates the difference between subsequent values  
# in a NumPy array.
differences = np.diff(waiting_times) 
biggest_decrease = abs(min(differences)) # SOLUTION
biggest_decrease

In [None]:
biggest_decrease > 0

In [None]:
biggest_decrease == 45

#### Part 4.3

Suppose the surveyors started watching Old Faithful at the start of the first eruption. Assume that they watch until the end of the tenth eruption. For some of that time they will be watching eruptions, and for the rest of the time they will be waiting for Old Faithful to erupt. How many minutes will they spend waiting for eruptions?

*Hint:* One way to approach this problem is to use the `take` or `where` method on the table `faithful`. 

*Another Hint:* `first_nine_waiting_times` must be an array.

In [None]:
faithful = Table.read_table('old_faithful.csv')

faithful_with_eruption_nums = faithful.take(9) # SOLUTION
first_nine_waiting_times = faithful_with_eruption_nums.column("waiting") # SOLUTION
total_waiting_time_until_tenth = sum(first_nine_waiting_times) # SOLUTION
total_waiting_time_until_tenth

In [None]:
total_waiting_time_until_tenth > 0

In [None]:
total_waiting_time_until_tenth == 85

#### Part 4.4

Let’s imagine your guess for the next waiting time was always just the length of the previous waiting time. If you always guessed the previous waiting time, how big would your error in guessing the waiting times be, on average? **(4 Points)**

For example, since the first four waiting times are 79, 54, 74, and 62, the average difference between your guess and the actual time for just the second, third, and fourth eruptions would be $\frac{|79-54|+ |54-74|+ |74-62|}{3} = 19$.

In [None]:
differences = np.diff(waiting_times)
average_error = sum(abs(differences)) / len(differences) # SOLUTION
average_error

In [None]:
average_error > 0

In [None]:
np.isclose(average_error, 20.52029520295203)

## 5. Tables

#### Part 5.1

Suppose you have 4 apples, 3 oranges, and 3 pineapples.  Create a table that contains this information.  It should have two columns: `fruit name` and `count`.  Assign the new table to the variable `fruits`.

**Note:** Use lower-case and singular words for the name of each fruit, like `"apple"`.

In [None]:
# Our solution uses 1 statement split over several lines.
fruits = Table().with_columns({  # SOLUTION
    "fruit name" : make_array("apple", "orange", "pineapple"), # SOLUTION
    "count": make_array(4,3,2) # SOLUTION
}) # SOLUTION
fruits

In [None]:
fruits.labels == ('fruit name', 'count')

In [None]:
fruits.num_rows == 3

In [None]:
np.array_equiv(fruits.column("fruit name"), ['apple', 'orange', 'pineapple'])

In [None]:
np.array_equiv(fruits.column("count"), [4, 3, 2])

#### Part 5.2

The file `inventory.csv` contains information about the inventory at a fruit stand.  Each row represents the contents of one box of fruit. Load it as a table named `inventory` using the `Table.read_table()` function. `Table.read_table(...)` takes one argument (data file name in string format) and returns a table.

In [None]:
inventory = Table.read_table('inventory.csv') # SOLUTION
inventory

In [None]:
inventory.num_rows == 8

In [None]:
inventory.labels == ('box ID', 'fruit name', 'count')

#### Part 5.3

Does each box at the fruit stand contain a different fruit? Set `all_different` to `True` if each box contains a different fruit or to `False` if multiple boxes contain the same fruit.

*Hint:* You don't have to write code to calculate the True/False value for `all_different`. Just look at the `inventory` table and assign `all_different` to either `True` or `False` according to what you can see from the table in answering the question.

In [None]:
all_different = False # SOLUTION
all_different

In [None]:
type(all_different) == bool

In [None]:
all_different == False

#### Part 5.4

The file `sales.csv` contains the number of fruit sold from each box in one day.  It has an extra column called "price per fruit (\$)" that's the price *per item of fruit* for fruit in that box.  The rows are in the same order as the `inventory` table.  Load these data into a table called `sales`.

In [None]:
sales = Table.read_table("sales.csv") # SOLUTION
sales

In [None]:
sales.num_rows == 8

In [None]:
sales.row(0).item("count sold") == 3

In [None]:
sales.row(4).item("price per fruit ($)") == 0.15

#### Part 5.5

How many fruits did the store sell in total on that day?

In [None]:
total_fruits_sold = sum(sales.column("count sold")) # SOLUTION
total_fruits_sold

In [None]:
total_fruits_sold >= 0

In [None]:
total_fruits_sold == 638

#### Part 5.6

What was the store's total revenue (the total price of all fruits sold) on that day?

*Hint:* If you're stuck, think first about how you would compute the total revenue from just the grape sales.

In [None]:
total_revenue = sum(sales.column("count sold") * sales.column("price per fruit ($)")) # SOLUTION
total_revenue

In [None]:
total_revenue >= 0

In [None]:
total_revenue == 106.85

#### Part 5.7

Make a new table called `remaining_inventory`.  It should have the same rows and columns as `inventory`, except that the amount of fruit sold from each box should be subtracted from that box's **original** count, so that the "count" is **updated to be** the amount of fruit remaining after that day's sales.

In [None]:
remaining_inventory = inventory.with_columns({  # SOLUTION 
    "count": inventory.column("count") - sales.column("count sold") # SOLUTION
}) # SOLUTION
# SOLUTION

remaining_inventory

In [None]:
remaining_inventory.num_rows == 8

In [None]:
remaining_inventory.labels == ('box ID', 'fruit name', 'count')

In [None]:
remaining_inventory.column("count").item(0) == 42

In [None]:
remaining_inventory.column("count").item(2) == 20

## You're Done!

**Important submission information:** Follow these steps to submit your work:
* Run the tests and verify that they pass as you expect. 
* Choose **Save Notebook** from the **File** menu.
* **Run the final cell** and click the link below to download the zip file. 

Once you have downloaded that file, go to [Gradescope](https://www.gradescope.com/) and submit the zip file to the corresponding assignment. The name of this assignment is "Lab 2". **Be sure your work is saved before running the last cell!**

Once you have submitted, your Gradescope assignment should show you passing all the tests you passed in your assignment notebook.