# Run the cell below

To run a code cell (i.e.; execute the python code inside a Jupyter notebook) you can click the play button on the ribbon underneath the name of the notebook. Before you begin click the "Run cell" button at the top that looks like ▶| or hold down `Shift` + `Return`.

In [1]:
import numpy as np
from datascience import *

# Homework 02: Arrays and Tables

Please complete this notebook by filling in the cells provided. Before you begin, execute the following cell to load the provided tests. Each time you start your server, you will need to execute this cell again to load the tests.

This assignment is due by the deadline listed in Canvas/Gradescope. Start early so that you can come to office hours if you're stuck. Check the course website for the office hours schedule. Late work will not be accepted as per the course expectations.

Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. Refer to the course expectations document to learn more about how to learn cooperatively.

For all problems that you must write our explanations and sentences for, you **must** provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `max_temperature` in your answer to one question, do not reassign it later on.

**Helpful Resource:**
- [Python Reference](http://data8.org/sp21/python-reference.html): Cheat sheet of helpful array & table methods used in this course.

**Recommended Reading:**
- [What is Data Science](http://www.inferentialthinking.com/chapters/01/what-is-data-science.html)
- [Causality and Experiments](http://www.inferentialthinking.com/chapters/02/causality-and-experiments.html) 
- [Programming in Python](http://www.inferentialthinking.com/chapters/03/programming-in-python.html)

## 1. Creating Arrays

**Question 1.** Make an **array** called `weird_numbers` containing the following numbers (in the given order):

1. $-2$
2. $\sin(12)$
3. $3$
4. $5^{\cos(12)}$

**Hint:** `sin` and `cos` are functions in the `math` module. Importing modules is covered in a previous lab! You should import the module in your solution cell below.

**Note:** Python lists are different/behave differently than numpy arrays. In this course, we use `numpy` arrays, so please make an **array**, not a python list.


In [2]:
# Our solution involved one extra line of code before creating
# weird_numbers.
import math # SOLUTION
weird_numbers = make_array(-2, math.sin(12), 3, 5**math.cos(12) ) # SOLUTION
weird_numbers

array([-2.        , -0.53657292,  3.        ,  3.88891638])

In [3]:
isinstance(weird_numbers, np.ndarray)

True

In [4]:
len(weird_numbers)

4

In [5]:
np.allclose(weird_numbers, np.array([-2., -0.53657292,  3.,  3.88891638]), rtol=1e-03, atol=1e-03)

True

**Question 2.** Make an array called `book_title_words` containing the following three strings: "Eats", "Shoots", and "and Leaves".


In [6]:
book_title_words = make_array('Eats', 'Shoots', 'and Leaves') # SOLUTION
book_title_words

array(['Eats', 'Shoots', 'and Leaves'],
      dtype='<U10')

In [7]:
type(book_title_words) == np.ndarray

True

In [8]:
not any([',' in text for text in book_title_words])

True

In [9]:
'and ' in book_title_words.item(2)

True

In [10]:
len(book_title_words)

3

In [11]:
book_title_words

array(['Eats', 'Shoots', 'and Leaves'],
      dtype='<U10')

### `join`
Strings have a method called `join`.  `join` takes one argument, an array of strings.  It returns a single string.  Specifically, the value of `a_string.join(an_array)` is a single string that's the [concatenation](https://en.wikipedia.org/wiki/Concatenation) ("putting together") of all the strings in `an_array`, **except** `a_string` is inserted in between each string in the array.

**Question 3.** Use the array `book_title_words` and the method `join` to make two strings:

1. "Eats, Shoots, and Leaves" (call this one `with_commas`)

2. "Eats Shoots and Leaves" (call this one `without_commas`)

**Hint:** If you're not sure what `join` does, first try just calling, for example, `"foo".join(book_title_words)` .


In [12]:
with_commas = ', '.join(book_title_words) # SOLUTION
without_commas = ' '.join(book_title_words) # SOLUTION

print('with commas: ', with_commas)
print('without commas: ', without_commas)

with commas:  Eats, Shoots, and Leaves
without commas:  Eats Shoots and Leaves


In [13]:
',' in with_commas

True

In [14]:
',' not in without_commas

True

In [15]:
len(with_commas)

24

In [16]:
len(without_commas)

22

In [17]:
with_commas == 'Eats, Shoots, and Leaves'

True

In [18]:
without_commas == 'Eats Shoots and Leaves'

True

## 2. Indexing Arrays

These exercises give you practice accessing individual elements of arrays.  In Python (and in many programming languages), elements are accessed by *index*, so the first element is the element at index 0.

**Note:** If you have previous coding experience, you may be familiar with bracket notation. DO NOT use bracket notation when indexing (i.e. `arr[0]`), as this can yield different data type outputs than what we will be expecting. This can cause you to fail an autograder test.**

Be sure to refer to the [Python Reference](http://data8.org/fa20/python-reference.html) on the website if you feel stuck!

**Question 4.** The cell below creates an array of some numbers.  Set `third_element` to the third element of `some_numbers`.


In [19]:
some_numbers = make_array(-1, -3, -6, -10, -15)

third_element = some_numbers.item(2) # SOLUTION
third_element

-6

In [20]:
type(third_element) == int or type(third_element) == numpy.int64

True

In [21]:
# It would appear you wrote:
# some_numbers.item(3)
# But the third element has index 2,
# not an index of 3
third_element != -10

True

In [22]:
third_element

-6

**Question 5.** The next cell creates a table that displays some information about the elements of `some_numbers` and their order.  Run the cell to see the partially-completed table, then fill in the missing information (the cells that say "Ellipsis") by assigning `blank_a`, `blank_b`, `blank_c`, and `blank_d` to the correct elements in the table.

**Hint:*** Replace the ... with strings or numbers.


In [23]:
blank_a = 'third' # SOLUTION
blank_b = 'fourth' # SOLUTION
blank_c = 0 # SOLUTION
blank_d = 3 # SOLUTION

# Don't change anything below this line!
elements_of_some_numbers = Table().with_columns(
    "English name for position", make_array("first", "second", blank_a, blank_b, "fifth"),
    "Index",                     make_array(blank_c, 1, 2, blank_d, 4),
    "Element",                   some_numbers)
elements_of_some_numbers

English name for position,Index,Element
first,0,-1
second,1,-3
third,2,-6
fourth,3,-10
fifth,4,-15


In [24]:
elements_of_some_numbers.column(0).item(2)

'third'

In [25]:
elements_of_some_numbers.column(0).item(3)

'fourth'

In [26]:
elements_of_some_numbers.column(1).item(0)

0

In [27]:
elements_of_some_numbers.column(1).item(3)

3

**Question 6.** You'll sometimes want to find the *last* element of an array.  Suppose an array has 142 elements.  What is the index of its last element?


In [28]:
index_of_last_element = 141 # SOLUTION

In [29]:
index_of_last_element

141

More often, you don't know the number of elements in an array, its *length*.  (For example, it might be a large dataset you found on the Internet.)  The function `len` takes a single argument, an array, and returns the `len`gth of that array (an integer).

**Question 7.** The cell below loads an array called `president_birth_years`.  Calling `.column(...)` on a table returns an array of the column specified, in this case the `Birth Year` column of the `president_births` table. The last element in that array is the most recent birth year of any deceased president. Assign that year to `most_recent_birth_year`. Your solution should used both the `len(...)` and `.item(...)` functions, and you should not have to know the number of rows in `president_birth_years` to answer this question. Your code should be able to work no matter how big or small `president_birth_years` might be.


In [30]:
president_birth_years = Table.read_table("data/president_births.csv").column('Birth Year')

most_recent_birth_year = president_birth_years.item( len( president_birth_years ) - 1 ) # SOLUTION
most_recent_birth_year

1917

In [31]:
most_recent_birth_year

1917

**Question 8.** Finally, assign `sum_of_birth_years` to the sum of the first, sixteenth, and last birth year in `president_birth_years`. Your solution should not include the actual years in the calculation, but rather references to the array where those years are stored.


In [32]:
sum_of_birth_years = president_birth_years.item(0) + president_birth_years.item(15) + most_recent_birth_year # SOLUTION
sum_of_birth_years

5457

In [33]:
sum_of_birth_years

5457

## 3. Basic Array Arithmetic

**Question 9.** Multiply the numbers 42, -4224, 424224242, and 250 by 157. Assign each variable below such that `first_product` is assigned to the result of $42 \cdot 157$, `second_product` is assigned to the result of $-4224 \cdot 157$, and so on. 

For this question, **don't** use arrays.


In [34]:
first_product = 42 * 157 # SOLUTION
second_product = -4224 * 157 # SOLUTION
third_product = 424224242 * 157 # SOLUTION
fourth_product = 250 * 157 # SOLUTION
print(first_product, second_product, third_product, fourth_product)

6594 -663168 66603205994 39250


In [35]:
first_product

6594

In [36]:
second_product

-663168

In [37]:
third_product

66603205994

In [38]:
fourth_product

39250

**Question 10.** Now, do the same calculation, but using an array called `numbers` that contains the 4 original numbers (42, -4224, 424224242, and 250) and only a single multiplication (`*`) operator.  Store the 4 results in an array named `products`.


In [39]:
numbers = make_array( 42, -4224, 424224242, 250) # SOLUTION
products = numbers * 157 # SOLUTION
products

array([       6594,     -663168, 66603205994,       39250])

In [40]:
np.allclose( products, np.array([6594, -663168, 66603205994, 39250]))

True

**Question 11.** Oops, we made a typo!  Instead of 157, we wanted to multiply each number by 1577.  Compute the correct products in the cell below using array arithmetic.  Notice that your job is really easy if you previously defined an array containing the 4 numbers.


In [41]:
correct_products = numbers * 1577 # SOLUTION
correct_products

array([       66234,     -6661248, 669001629634,       394250])

In [42]:
np.allclose( correct_products, np.array([66234, -6661248, 669001629634, 394250]))

True

**Question 12.** We've loaded an array of temperatures in the next cell.  Each number is the highest temperature observed on a day at a climate observation station, mostly from the US.  Since they're from the US government agency [NOAA](noaa.gov), all the temperatures are in Fahrenheit.  Convert them all to Celsius by first subtracting 32 from them, then multiplying the results by $\frac{5}{9}$. Make sure to **ROUND** the final result after converting to Celsius to the nearest integer using the `np.round` function. `np.round` works a lot like the builtin `round` function in python, but it can operate on arrays.


In [43]:
max_temperatures = Table.read_table("data/temperatures.csv").column("Daily Max Temperature")

celsius_max_temperatures = np.round(( max_temperatures - 32) * (5/9)) # SOLUTION
celsius_max_temperatures

array([ -4.,  31.,  32., ...,  17.,  23.,  16.])

In [44]:
sum( celsius_max_temperatures ) != 356705.0

True

In [45]:
sum(celsius_max_temperatures)

1280677.0

In [46]:
len( celsius_max_temperatures )

65000

In [47]:
celsius_max_temperatures.item(2003)

20.0

**Question 13.** The cell below loads all the *lowest* temperatures from each day (in Fahrenheit).  Compute the size of the daily temperature range for each day.  That is, compute the difference between each daily maximum temperature and the corresponding daily minimum temperature.  **Pay attention to the units, give your answer in Celsius!** Make sure **NOT** to round your answer for this question! Note: Remember that in Question 3.4, `celsius_max_temperatures` was rounded, so you probably don't want to use that in this question.


In [48]:
min_temperatures = Table.read_table("data/temperatures.csv").column("Daily Min Temperature")

celsius_temperature_ranges = (5/9) * (max_temperatures - min_temperatures) # SOLUTION
celsius_temperature_ranges

array([  6.66666667,  10.        ,  12.22222222, ...,  17.22222222,
        11.66666667,  11.11111111])

In [49]:
np.round(sum(celsius_temperature_ranges), 0)

768487.0

In [50]:
len(celsius_temperature_ranges)

65000

In [51]:
celsius_temperature_ranges.item(1)

10.0

## 4. World Population


The tests from this point on will **not** necessarily tell you whether or not your answers are correct.

The cell below loads a table of estimates of the world population for different years, starting in 1950. The estimates come from the [US Census Bureau website](https://www.census.gov/en.html).

In [52]:
world = Table.read_table("data/world_population.csv").select('Year', 'Population')
world.show(4)

Year,Population
1950,2557628654
1951,2594939877
1952,2636772306
1953,2682053389


The name `population` is assigned to an array of population estimates.

In [53]:
population = world.column('Population')
population

array([2557628654, 2594939877, 2636772306, 2682053389, 2730228104,
       2782098943, 2835299673, 2891349717, 2948137248, 3000716593,
       3043001508, 3083966929, 3140093217, 3209827882, 3281201306,
       3350425793, 3420677923, 3490333715, 3562313822, 3637159050,
       3712697742, 3790326948, 3866568653, 3942096442, 4016608813,
       4089083233, 4160185010, 4232084578, 4304105753, 4379013942,
       4451362735, 4534410125, 4614566561, 4695736743, 4774569391,
       4856462699, 4940571232, 5027200492, 5114557167, 5201440110,
       5288955934, 5371585922, 5456136278, 5538268316, 5618682132,
       5699202985, 5779440593, 5857972543, 5935213248, 6012074922,
       6088571383, 6165219247, 6242016348, 6318590956, 6395699509,
       6473044732, 6551263534, 6629913759, 6709049780, 6788214394,
       6866332358, 6944055583, 7022349283, 7101027895, 7178722893,
       7256490011])

In this question, you will apply some built-in Numpy functions to this array. Numpy is a module that is often used in Data Science!

<img src="images/array_diff.png" style="width: 600px;"/>

The difference function `np.diff` subtracts each element in an array from the element after it within the array. As a result, the length of the array `np.diff` returns will always be one less than the length of the input array.

<img src="images/array_cumsum.png" style="width: 700px;"/>

The cumulative sum function `np.cumsum` outputs an array of partial sums. For example, the third element in the output array corresponds to the sum of the first, second, and third elements.

**Note:** This homework aims to get you comfortable with array arithmetic using methods like `np.diff` and `np.cumsum`, but they are not heavily used later in the course.

**Question 14.** Very often in data science, we are interested understanding how values change with time. Use `np.diff` and `np.max` (or just `max`) to calculate the largest annual change in population between any two consecutive years.


In [54]:
largest_population_change = max( np.diff(population) ) # SOLUTION
largest_population_change

87515824

In [55]:
largest_population_change

87515824

**Question 15.** What do the values in the resulting array represent? Choose one of the following options, and assign the corresponding number to the name `cumulative_sum_answer`.

**Tip:** Look at the population array and compute the actual population changes for the first couple years of each option.

In [56]:
np.cumsum(np.diff(population))

array([  37311223,   79143652,  124424735,  172599450,  224470289,
        277671019,  333721063,  390508594,  443087939,  485372854,
        526338275,  582464563,  652199228,  723572652,  792797139,
        863049269,  932705061, 1004685168, 1079530396, 1155069088,
       1232698294, 1308939999, 1384467788, 1458980159, 1531454579,
       1602556356, 1674455924, 1746477099, 1821385288, 1893734081,
       1976781471, 2056937907, 2138108089, 2216940737, 2298834045,
       2382942578, 2469571838, 2556928513, 2643811456, 2731327280,
       2813957268, 2898507624, 2980639662, 3061053478, 3141574331,
       3221811939, 3300343889, 3377584594, 3454446268, 3530942729,
       3607590593, 3684387694, 3760962302, 3838070855, 3915416078,
       3993634880, 4072285105, 4151421126, 4230585740, 4308703704,
       4386426929, 4464720629, 4543399241, 4621094239, 4698861357])

1) The total population change between consecutive years, starting at 1951.

2) The total population change between 1950 and each later year, starting at 1951.

3) The total population change between 1950 and each later year, starting inclusively at 1950.


In [57]:
# Assign cumulative_sum_answer to 1, 2, or 3
cumulative_sum_answer = 2 # SOLUTION

In [58]:
type(cumulative_sum_answer) == int

True

In [59]:
cumulative_sum_answer in [1, 2, 3]

True

In [60]:
# HIDDEN
cumulative_sum_answer

2

## 5. Old Faithful


Old Faithful is a geyser in Yellowstone that erupts every 44 to 125 minutes (according to [Wikipedia](https://en.wikipedia.org/wiki/Old_Faithful)). People are [often told that the geyser erupts every hour](http://yellowstone.net/geysers/old-faithful/), but in fact the waiting time between eruptions is more variable. Let's take a look.

**Question 16.** The first line below assigns `waiting_times` to an array of 272 consecutive waiting times between eruptions, taken from a classic 1938 dataset. Assign the names `shortest`, `longest`, and `average` so that the `print` statement is correct.


In [61]:
waiting_times = Table.read_table('data/old_faithful.csv').column('waiting')

shortest = min(waiting_times) # SOLUTION
longest = max(waiting_times) # SOLUTION
average = np.average(waiting_times) # SOLUTION

print("Old Faithful erupts every", shortest, "to", longest, "minutes and every", average, "minutes on average.")

Old Faithful erupts every 43 to 96 minutes and every 70.8970588235 minutes on average.


In [62]:
shortest <= average <= longest

True

In [63]:
# HIDDEN
shortest

43

In [64]:
# HIDDEN
longest

96

In [65]:
# HIDDEN
np.isclose(average, 70.8970588235)

True

**Question 17.** Assign `biggest_decrease` to the biggest decrease in waiting time between two consecutive eruptions. For example, the third eruption occurred after 74 minutes and the fourth after 62 minutes, so the decrease in waiting time was $62 - 74 = -12$ minutes, for a decrease of 12 minutes.

**Hint 1**: You'll need an array arithmetic function [mentioned in the textbook](https://www.inferentialthinking.com/chapters/05/1/arrays.html#Functions-on-Arrays). You have also seen this function earlier in the homework!

**Hint 2**: We want to return the absolute value of the biggest decrease.


In [66]:
biggest_decrease = abs( min( np.diff(waiting_times) ) ) # SOLUTION
biggest_decrease

45

In [67]:
# Hint: If you are getting 47, you may be computing
# the biggest change rather than the biggest decrease
biggest_decrease != 47

True

In [68]:
30 <= biggest_decrease < 47

True

In [69]:
# HIDDEN
biggest_decrease

45

**Question 18.** If you expected Old Faithful to erupt every hour, you would expect to wait a total of `60 * k` minutes to see `k` eruptions. Use the `np.arange` function to create a range of values where each element represents the total expected time (in minutes) one would spend waiting for  waiting for `k` erruptions to occur under the assumption erumptions occur once an hour. Set `expected_wait` to this range of values. You can read about `np.arange` as needed in this [textbook section](https://www.inferentialthinking.com/chapters/05/2/Ranges.html).

Then, set `difference_from_expected` to an array with 272 elements, where the element at index `i` is the absolute difference between the actual and expected total amount of waiting time to see the first `i+1` eruptions. 

**Hint 1**: You'll need to compare a cumulative sum to a range. 

For example, since the first three waiting times are 79, 54, and 74, the total waiting time for 3 eruptions is 79 + 54 + 74 = 207. The expected waiting time for 3 eruptions is 60 * 3 = 180. Therefore, `difference_from_expected.item(2)` should be $|207 - 180| = 27$. 

**Hint 2**: When using the absolute value function, use `np.abs` instead of `abs`, since `numpy` is already loaded in this notebook, and that's what the autograder is going to use.


In [70]:
expected_wait = np.arange(1, 273) * 60 # SOLUTION
difference_from_expected = np.abs( np.cumsum(waiting_times) - expected_wait ) # SOLUTION
difference_from_expected

array([  19,   13,   27,   29,   54,   49,   77,  102,   93,  118,  112,
        136,  154,  141,  164,  156,  158,  182,  174,  193,  184,  171,
        189,  198,  212,  235,  230,  246,  264,  283,  296,  313,  319,
        339,  353,  345,  333,  353,  352,  382,  402,  400,  424,  422,
        435,  458,  462,  455,  477,  476,  491,  521,  515,  535,  529,
        552,  563,  567,  584,  605,  604,  628,  616,  638,  638,  670,
        688,  706,  711,  724,  746,  742,  761,  772,  774,  790,  790,
        808,  824,  847,  862,  884,  894,  899,  912,  940,  956,  976,
        964,  990,  990, 1020, 1010, 1028, 1031, 1043, 1067, 1082, 1073,
       1095, 1097, 1125, 1114, 1137, 1158, 1145, 1169, 1161, 1187, 1208,
       1223, 1222, 1251, 1270, 1269, 1290, 1280, 1305, 1304, 1331, 1324,
       1333, 1350, 1346, 1374, 1395, 1380, 1402, 1397, 1427, 1412, 1435,
       1431, 1460, 1446, 1468, 1459, 1485, 1478, 1497, 1518, 1518, 1540,
       1557, 1573, 1572, 1592, 1581, 1617, 1610, 16

In [71]:
difference_from_expected.size

272

In [72]:
difference_from_expected.item(271) == np.abs( 60 * 272 - sum( waiting_times ) )

True

**Question 20.** Let’s imagine your guess for the next wait time was always just the length of the previous waiting time. If you always guessed the previous waiting time, how big would your error in guessing the waiting times be, on average?

For example, the first three waiting times are 79, 54, and 74. After the first eruption you guess the next eruption will happen after 79 minutes, but it occurs after 54 resulting in an error of $\lvert 79 - 54 \rvert$. For the next eruption you now guess it will take 54 minutes, but it takes 74, so the error is $\lvert 54 - 74 \rvert$. Therefore, the average difference between your guess and the actual time for just the second and third eruption would be $\frac{|79-54|+ |54-74|}{2} = 22.5$.


In [73]:
average_error = np.average( np.abs( np.diff(waiting_times) ) ) # SOLUTION
average_error

20.520295202952031

In [74]:
15 <= average_error <= 25

True

In [75]:
# HIDDEN
np.isclose( average_error, 20.52029520295203)

True

## 6. Tables


**Question 21.** Suppose you have 4 apples, 3 oranges, and 3 pineapples.  (Perhaps you're using Python to solve a high school Algebra problem.)  Create a table that contains this information.  It should have two columns: `fruit name` and `count`.  Assign the new table to the variable `fruits`.

**Note:** Use lower-case and singular words for the name of each fruit, like `"apple"`.

Your table should look like:

| fruit name | count |
|------------|-------|
| apple      | 4     |
| orange     | 3     |
| pineapple  | 3     |


In [76]:
# Our solution uses 1 statement split over 3 lines. You can write yours on 1 line if you wish.
fruits = Table().with_columns( # SOLUTION
    'fruit name', make_array('apple', 'orange', 'pineapple'), # SOLUTION
    'count', make_array(4, 3, 3) ) # SOLUTION
fruits

fruit name,count
apple,4
orange,3
pineapple,3


In [77]:
fruits.sort(0)

fruit name,count
apple,4
orange,3
pineapple,3


**Question 22.** The file `inventory.csv` contains information about the inventory at a fruit stand.  Each row represents the contents of one box of fruit. Load it as a table named `inventory` using the `Table.read_table()` function. `Table.read_table(...)` takes one argument (data file name in string format) and returns a table.


In [78]:
inventory = Table.read_table('data/inventory.csv') # SOLUTION
inventory

box ID,fruit name,count
53686,kiwi,45
57181,strawberry,123
25274,apple,20
48800,orange,35
26187,strawberry,255
57930,grape,517
52357,strawberry,102
43566,peach,40


In [79]:
inventory.sort(0).column(0).item(0)

25274

**Question 23.** Does each box at the fruit stand contain a different fruit? Set `all_different` to `True` if each box contains a different fruit or to `False` if multiple boxes contain the same fruit.

**Hint:** You don't have to write code to calculate the True/False value for `all_different`. Just look at the `inventory` table and assign `all_different` to either `True` or `False` according to what you can see from the table in answering the question.


In [80]:
all_different = False # SOLUTION
all_different

False

In [81]:
all_different in {True, False}

True

In [82]:
# HIDDEN
all_different == False

True

**Question 24.** The file `sales.csv` contains the number of fruit sold from each box last Saturday.  It has an extra column called "price per fruit (\$)" that's the price *per item of fruit* for fruit in that box.  The rows are in the same order as the `inventory` table.  

Load these data into a table called `sales`.


In [83]:
sales = Table.read_table('data/sales.csv') # SOLUTION
sales

box ID,fruit name,count sold,price per fruit ($)
53686,kiwi,3,0.5
57181,strawberry,101,0.2
25274,apple,0,0.8
48800,orange,35,0.6
26187,strawberry,25,0.15
57930,grape,355,0.06
52357,strawberry,102,0.25
43566,peach,17,0.8


In [84]:
sales.sort(0)

box ID,fruit name,count sold,price per fruit ($)
25274,apple,0,0.8
26187,strawberry,25,0.15
43566,peach,17,0.8
48800,orange,35,0.6
52357,strawberry,102,0.25
53686,kiwi,3,0.5
57181,strawberry,101,0.2
57930,grape,355,0.06


**Question 25.** How many fruits did the store sell in total on that day?


In [85]:
total_fruits_sold = sum( sales.column('count sold' ) ) # SOLUTION
total_fruits_sold

638

In [86]:
# We're looking for the total number of *pieces* sold
# Not the number of kinds of fruit or the number of boxes
total_fruits_sold > 10

True

In [87]:
type(total_fruits_sold) == np.int64

True

In [88]:
# HIDDEN
total_fruits_sold

638

**Question 26.** What was the store's total revenue (the total price of all fruits sold) on that day?

**Hint:** If you're stuck, think first about how you would compute the total revenue from just the grape sales.


In [89]:
total_revenue = sum( sales.column('count sold') * sales.column('price per fruit ($)') ) # SOLUTION
total_revenue

106.84999999999999

In [90]:
# If you're stuck, here's a hint: you want to multiply the count
# sold in each box by the per-item price of fruits in that box.
# You can use element wise multiplication for that.
# Then you want the sum of those products, Use sum().
50 <= total_revenue <= 150

True

In [91]:
# HIDDEN
np.isclose(total_revenue, 106.85)

True

**Question 27.** Make a new table called `remaining_inventory`.  It should have the same rows and columns as `inventory`, except that the amount of fruit sold from each box should be subtracted from that box's count, so that the "count" is the amount of fruit remaining after Saturday. Remember, the rows in `inventory` and `sales` are already in the same order.


In [92]:
remaining_inventory = Table().with_columns( # SOLUTION
    'box ID', inventory.column('box ID'), # SOLUTION
    'fruit name', inventory.column('fruit name'), # SOLUTION
    'count', inventory.column('count') - sales.column('count sold') ) # SOLUTION
remaining_inventory

box ID,fruit name,count
53686,kiwi,42
57181,strawberry,22
25274,apple,20
48800,orange,0
26187,strawberry,230
57930,grape,162
52357,strawberry,0
43566,peach,23


In [93]:
# Your table doesn't have all 3 columns
remaining_inventory.num_columns

3

In [94]:
# You forgot to subtract off the sales
remaining_inventory.column('count').item(0) != 45

True

In [95]:
remaining_inventory.where('fruit name', 'grape')

box ID,fruit name,count
57930,grape,162
