In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw06.ipynb")

# Homework 6 – Advanced Table Methods 🍔

## Data 94, Spring 2021

This homework is due on **Sunday, March 21st at 11:59PM. Though the assignment is due after the quiz, you should finish it before the quiz as all of the content on it is in scope.** You must submit the assignment to Gradescope. Submission instructions can be found at the bottom of this notebook. See the [syllabus](http://data94.org/syllabus/#late-policy-and-extensions) for our late submission policy.

In [None]:
# Run this cell.
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
import seaborn as sns

## The Data

In this assignment we will explore several 2020 restaurant rankings from [Restaurant Business Online](https://www.restaurantbusinessonline.com) (RBO).

Our data is split across three different CSV files, each of which contains a different set of restaurants and different ranking methodology. Run the cell below to load in our data as tables.

In [None]:
future = Table.read_table('data/Future50.csv')

independents = Table.read_table('data/Independents100.csv')
# This fixes the format of the Sales column; don't worry about it.
independents = independents.with_columns(
    'Sales', independents.apply(int, 'Sales')
)

chains = Table.read_table('data/Top250.csv')

The `future` table contains information about the 50 fastest growing restaurant chains in the US whose yearly sales are between \\$25 and \\$50 million. The table is sorted by percentage change in sales from 2019 to 2020 (`'YOY_Sales'`; YOY stands for "Year-over-Year") in decreasing order. The `'Location'` column refers to the city where the chain is headquartered, not to any specific location.

Run the cell below to take a look at the `future` table, and [click here](https://www.restaurantbusinessonline.com/future-50) to see the ranking on RBO's website.

In [None]:
future

The `independents` table contains information about the 100 highest-grossing independent restaurants in the US, sorted by sales (`'Sales'`) in decreasing order. A restaurant is classified as "independent" if it has fewer than five locations; unlike in the `future` table, the `'City'` column here corresponds to the location of the restaurant. 

Run the cell below to take a look at the `independents` table, and [click here](https://www.restaurantbusinessonline.com/top-100-independents-2019) to see the ranking on RBO's website.

In [None]:
independents

Finally, the `chains` table contains information about the 250 largest restaurant chains in the US, sorted by sales (`'Sales'`) in decreasing order. 

Run the cell below to take a look at the `chains` table, and [click here](https://www.restaurantbusinessonline.com/top-500-chains) to see the ranking on RBO's website.

(Note: Here, sales are measured in millions, so McDonald's sales value of `40412` really means \$40.4 billion dollars.)

In [None]:
chains

## Question 1 – Chains 🍟

Let's start by asking questions about the `chains` table; we'll take a closer look at our other two tables later.

### Question 1a

The `chains` table has many columns that we aren't going to look at. Below, modify `chains` so that it only has the columns `'Rank'`, `'Restaurant'`, `'Sales'`, `'YOY_Sales'`, and `'Segment_Category'`. (We told you to avoid doing this in lecture, but in this case we are certainly not going to use any of the other columns, so dropping them is fine.) 

<!--
BEGIN QUESTION
name: q1a
points: 1
-->

In [None]:
chains = ...
chains

In [None]:
grader.check("q1a")

### Question 1b

What are the most popular segment categories in `chains`?

Below, assign `segment_counts` to a table with two columns, `'Segment_Category'` and `'count'`. Each row should correspond to one `'Segment_Category'`, and `'count'` should describe the number of restaurants with that `'Segment_Category'` in `chains`. Your table should be sorted by `'count'` in decreasing order, so that the first row corresponds to the most common segment category.

The first five rows of `segment_counts` should look like this:

| Segment_Category       |   count |
|-----------------------:|--------:|
| Varied Menu            |      22 |
| Mexican                |      14 |
| Quick Service & Burger |      13 |
| Burger                 |      10 |
| Family Style           |      10 |

_Hint: Use `.group` and then `.sort`._

<!--
BEGIN QUESTION
name: q1b
points: 2
-->

In [None]:
segment_counts = ...
segment_counts

In [None]:
grader.check("q1b")

### Question 1c

In the previous question, we determined that the segment category that appeared most often was `'Varied Menu'`. It's not immediately obvious what that means!

Below, assign `varied_menu_only` to a table with only the rows in `chains` where the segment category was `'Varied Menu'`. Don't sort or make any other modifications.

<!--
BEGIN QUESTION
name: q1c
points: 1
-->

In [None]:
varied_menu_only = ...
varied_menu_only

In [None]:
grader.check("q1c")

<!-- BEGIN QUESTION -->

### Question 1d

Comment on the restaurants you see in the `varied_menu_only` table above. Touch on the following points:
- What is the overall rank of the highest ranked `'Varied Menu'` restaurant?
- Have you heard of any of these restaurants before? Have you been to any of them?
- Google some of the top few restaurants to get a sense of the kind of food and drink they serve and whether they're fast-food or sit-down. Come up with a single sentence that describes the majority of these restaurants.

We're not looking for anything specific here – we really just want to make sure you're thinking about what the data represents, rather than viewing it as a bunch of numbers.

<!--
BEGIN QUESTION
name: q1d
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Question 1e

In the last two subparts we looked at the most common segment categories in `chains`. But what if we're interested in determining the segment categories that averaged the most sales?

Below, assign `top_selling_segments` to a table with two columns, `'Segment_Category'` and `'Average Sales'`, such that:
- Each row corresponds to a single segment category, and the `'Average Sales'` column contains the average (mean) sales for each category.
- Only the segment categories with average sales of over \\$2.5 billion dollars are included. (Note, \\$2.5 billion is equivalent to `2500` in our data's units.)
- Rows are sorted by `'Average Sales'` in decreasing order.

The first few rows of `top_selling_segments` should look like this:

| Segment_Category            |   Average Sales |
|----------------------------:|----------------:|
| Quick Service & Coffee Cafe |         7972.25 |
| Quick Service & Burger      |         6106.46 |
| Quick Service & Mexican     |         6071.5  |

_Hint: Our answer uses `.group`, `.sort`, `.where`. `.select`, and `.relabel`. There's no need to fit everything on one line – as long as your final answer is stored in `top_selling_segments`, you can take as many lines as you need and do things in whatever order you like._

<!--
BEGIN QUESTION
name: q1e
points: 3
-->

In [None]:
top_selling_segments = ...
top_selling_segments

In [None]:
grader.check("q1e")

### Question 1f

So far, we haven't really looked at the `'YOY_Sales'` column in `chains`. Remember, the values in `'YOY_Sales'` tell us the percentage change in sales from 2019 to 2020 for each restaurant chain (YOY means "Year-over-Year"); a `'YOY_Sales'` value of 8.6% means the restaurant earned 8.6% more in sales in 2020 than it did in 2019.

In [None]:
# Returns an array of the first five elements in the YOY_Sales column just for us to see
chains.column('YOY_Sales').take(np.arange(5))

Since the values in the `'YOY_Sales'` column are stored as strings, not numbers, we can't reliably sort by `'YOY_Sales'`. (Try it out – if you sort by `'YOY_Sales'` in decreasing order, it will tell you the highest `'YOY_Sales'` any restaurant had was 9.9%, though there are several restaurants with `'YOY_Sales'` values of over 10%.)

In [None]:
# Use this cell for experimentation, if you want!

We're going to try something different. Instead of converting `'YOY_Sales'` into a number, we're going to place the values in `'YOY_Sales'` into one of five categories:

| Growth Category | Year-over-Year Sales (\%) |
| --- | --- |
| `'rapid increase'` | $\geq 10$ |
| `'steady increase'` | $[2.5, 10)$ |
| `'stagnant'` | $[-2.5, 2.5)$ | 
| `'steady decrease'` | $[-10, -2.5)$ |
| `'rapid decrease'` | $< -10$ |

Remember, $[a, b)$ means greater than or equal to $a$ and less than $b$.

We'll break this question into a few more parts, since there's a lot going on.

#### Part 1

First, write a function `str_to_cat` that takes in a percentage string and returns the corresponding growth category according to the above table. Example behavior is shown below.

```py
>>> str_to_cat('-15.8%')
'rapid decrease'

>>> str_to_cat('4.8%')
'steady increase'
```

_Hint: You'll need to use string methods from earlier in the semester in order to convert `pct_str` to a float. To find the correct growth category, don't use five `if` statements: instead, try and use a dictionary. See how you can adapt the grade calculator example from Lecture 19; you can assume that there's no percentage change smaller than -100\%._

<!--
BEGIN QUESTION
name: q1f-part1
points: 2
-->

In [None]:
def str_to_cat(pct_str):
    ...

In [None]:
grader.check("q1f-part1")

#### Part 2

Now, apply `str_to_cat` to the appropriate column in `chains` to get an array of growth categories. Create a new table called `chains_growth` with all of the columns in `chains` plus a new sixth column, `'Growth Category'`, with the aforementioned array. The first few rows of `chains_growth` should look like this:

|   Rank | Restaurant   |   Sales | YOY_Sales   | Segment_Category            | Growth Category   |
|-------:|-------------:|--------:|------------:|----------------------------:|------------------:|
|      1 | McDonald's   |   40412 | 4.9%        | Quick Service & Burger      | steady increase   |
|      2 | Starbucks    |   21380 | 8.6%        | Quick Service & Coffee Cafe | steady increase   |
|      3 | Chick-fil-A  |   11320 | 13.0%       | Quick Service & Chicken     | rapid increase    |
|      4 | Taco Bell    |   11293 | 9.0%        | Quick Service & Mexican     | steady increase   |
|      5 | Burger King  |   10204 | 2.7%        | Quick Service & Burger      | steady increase   |

`chains` itself should not be modified!

<!--
BEGIN QUESTION
name: q1f-part2
points: 2
-->

In [None]:
chains_growth = ...
chains_growth

In [None]:
grader.check("q1f-part2")

### Question 1g

Now that our table has two categorical columns – `'Segment_Category'` and `'Growth Category'` – we can pivot it! Remember, pivoting is an alternative to grouping by two columns, which we can also do.

In [None]:
# Run this cell; you don't need to change anything.
# If you see a warning, ignore it.

chains_growth.group(['Segment_Category', 'Growth Category'])

If we were to pivot and specify that we want `'Segment_Category'` to make up the rows in our new table, we'd have 48 rows in the pivoted table, since there are 48 unique segment categories.

In [None]:
# Run this cell; you don't need to change anything.

chains_growth.group('Segment_Category').num_rows

It would be hard to gain anything meaningful from a pivoted table from 48 rows, so instead we're going to filter just a subset of segment categories and then pivot.

Again, we'll break this problem into a few subparts.

#### Part 1

Below, assign `common_chains` to a table with the same labels as `chains_growth`, but only with chains whose segment category is shared by at least 9 other chains (i.e. only the segment categories that have 10 or more restaurants). The first five rows of `common_chains` should look like this:

|   Rank | Restaurant      |   Sales | YOY_Sales   | Segment_Category       | Growth Category   |
|-------:|----------------:|--------:|------------:|-----------------------:|------------------:|
|      1 | McDonald's      |   40412 | 4.9%        | Quick Service & Burger | steady increase   |
|      5 | Burger King     |   10204 | 2.7%        | Quick Service & Burger | steady increase   |
|      7 | Wendy's         |    9762 | 4.2%        | Quick Service & Burger | steady increase   |
|     13 | Sonic Drive-In  |    4687 | 4.6%        | Quick Service & Burger | steady increase   |
|     24 | Jack in the Box |    3504 | 1.1%        | Quick Service & Burger | stagnant          |

To do this, you may want to:
1. Create an called `common_segments` that contains the names of the segment categories with 10 or more restaurants. You can create this array by grouping, filtering, and calling `.column` on `chains_growth`. (This array will only have 6 elements.)
2. Filter `chains_growth` using `are.contained_in` and `common_segments`.

Note: This is just one of several possible approaches. We will not use the array `common_segments` when grading your work, so if you want to take another approach that's fine.

Regardless, `chains_growth` should not be modified!

<!--
BEGIN QUESTION
name: q1g-part1
points: 2
-->

In [None]:
common_segments = ...
common_chains = ...

common_chains

In [None]:
grader.check("q1g-part1")

#### Part 2

Now, pivot `common_chains` to create a new table, `common_chains_pivoted`, with a row for each segment category and a column for each growth category. The entries in `common_chains_pivoted` should describe the number of chains with a given combination of segment category and growth category. The first few rows of `common_chains_pivoted` should look like this:

| Segment_Category       |   rapid decrease |   steady decrease |   stagnant |   steady increase |   rapid increase |
|-----------------------:|-----------------:|------------------:|-----------:|------------------:|-----------------:|
| Burger                 |                1 |                 3 |          3 |                 2 |                1 |
| Family Style           |                1 |                 4 |          2 |                 1 |                2 |
| Italian/Pizza          |                1 |                 3 |          2 |                 4 |                0 |


An issue you will face is that the columns of your pivoted table will be in alphabetical order by default; you must fix this so that they are in the order shown above. Use `.select` to help you here; try and see if you can do this using column indexes instead of column labels!

<!--
BEGIN QUESTION
name: q1g-part2
points: 2
-->

In [None]:
common_chains_pivoted = ...
common_chains_pivoted

In [None]:
grader.check("q1g-part2")

### Question 1h

After you've completed Question 1g, run the cell below to see a heatmap.

In [None]:
# Run this cell.
sns.heatmap(common_chains_pivoted.to_df().set_index('Segment_Category'), cmap="YlGnBu");

<!-- BEGIN QUESTION -->

This heatmap is a visual depiction of the `common_chains_pivoted` table that you created in the last subpart. Darker colors represent larger values, as per the legend on the right. (If you have questions about how to interpret this plot, don't hesitate to ask!) 

In the cell below, comment on what you see in the heatmap above. Touch on the following points:
- Which column has the darkest values? What does that tell you about the overall growth trend for chains in the last year?
- Did many restaurants grow or shrink rapidly? Or were most changes gradual?
- Which combination of segment category and growth category was the most common?

<!--
BEGIN QUESTION
name: q1h
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## Question 2 – Cities 🌆

Awesome! Now that we've gotten a feel for the `chains` dataset, let's move on to the `future` and `independents` datasets. In this section, we'll focus on identifying "hot food cities", since both datasets give us locations. 

Remember, the `future` table contains information about the fastest growing chains with sales between \\$25 and \\$50 million – there's no overlap between the chains in `future` and the chains in `chains` since the "smallest" chain in `chains` made over \\$125 million last year. In `future` we're given the locations where the chains are headquartered. In the `independents` table, the restaurants are not chains, so we're given their actual locations.

### Question 2a

We'll start by looking at the `independents` table. Run the following cell and scroll through the output to get a feel for the cities that appear in the table.

In [None]:
independents.show()

Some cities appear many times, like `'New York'` and `'Washington'`, and some appear only once, like `'Bal Harbour'`.

Below, assign `city_checks` to a table with two columns, `'City'` and `'Average Check median'`, containing the median `'Average Check'` in all cities with at least 3 restaurants in the dataset, sorted by median `'Average Check'` in decreasing order. The first few rows of `city_checks` should look like this:

| City          |   Average Check median |
|--------------:|-----------------------:|
| Las Vegas     |                     99 |
| Miami         |                     98 |
| New York      |                     84 |

Unlike in Question 1, we're not going to give you any guidance, save for the fact that this task is very similar to something you already did in Question 1. As always, take as many lines as you need.

<!--
BEGIN QUESTION
name: q2a
points: 3
-->

In [None]:
city_checks = ...
city_checks

In [None]:
grader.check("q2a")

### Question 2b

Let's now take a look at the `future` table.

In [None]:
future

We're eventually going to want to join `independents` and `future` by city, but right now we can't do that, since `future` does not have a column with just the name of the city in it. `future`'s `'Location'` column also includes the state's name.

Below, create a table `future_with_city` with the `'Rank'`, `'Restaurant'`, `'Sales'`, and `'YOY_Sales'` columns from `future` but with an additional column, `'City'`, that contains the name of the city in which the restaurant is headquartered. The first few rows of `future_with_city` are shown below; the order of the columns in `future_with_city` must match the output below.

|   Rank | Restaurant   | City             |   Sales | YOY_Sales   |
|-------:|-------------:|-----------------:|--------:|------------:|
|      1 | Evergreens   | Seattle          |      24 | 130.5%      |
|      2 | Clean Juice  | Charlotte        |      44 | 121.9%      |
|      3 | Slapfish     | Huntington Beach |      21 | 81.0%       |
|      4 | Clean Eatz   | Wilmington       |      25 | 79.7%       |
|      5 | Pokeworks    | Irvine           |      49 | 77.1%       |

This involves calling `.apply` with a function that you write on your own. Your solution will certainly take more than one line. Work one step at a time!

<!--
BEGIN QUESTION
name: q2b
points: 3
-->

In [None]:
future_with_city = ...
future_with_city

In [None]:
grader.check("q2b")

### Question 2c

We will say a "hot food city" is a city with at least one restaurant in the Independents 100 (`independents`) and at least one restaurant in the Future 50 (`future_with_city`).

Below, create a table `hot_cities` with three columns – `'City'`, `'Independents 100'`, and `'Future 50'`. Each row should correspond to one city, and the values in the latter two columns should contain the number of restaurants in that city that are on the `'Independents 100'` and `'Future 50'` lists, respectively. The first few rows of `hot_cities` should look like this:

| City          |   Independents 100 |   Future 50 |
|--------------:|-------------------:|------------:|
| Atlanta       |                  2 |           1 |
| Denver        |                  1 |           1 |
| Los Angeles   |                  1 |           1 |
| New York      |                 21 |           8 |

Note: This is the first time in this assignment that you're required to use `.join`, so here's some advice:
- Start by creating two tables, `independents_counts` and `future_with_city_counts`. `independents_counts` should contain the name of each city and the number of restaurants in the Independents 100 ranking in each city. You've done this several times throughout this assignment; each of these two tables can be created with a single method.
- Join the two aforementioned tables on the appropriate column and relabel. Make sure that you relabel the columns correctly – the order that you join the two tables matters!

<!--
BEGIN QUESTION
name: q2c
points: 3
-->

In [None]:
independents_counts = ...
future_with_city_counts = ...
hot_cities = independents_counts.join('City', future_with_city_counts) \
                                ...
hot_cities

In [None]:
grader.check("q2c")

### Fun Demo

Run the following cell to see a plot that visualizes `hot_cities`.

In [None]:
# Run this cell.
import pandas as pd
t1 = hot_cities.to_df().set_index('City').iloc[:, [0]]
t1['Category'] = 'Independents 100'
t1.columns = ['Count', 'Category']
t2 = hot_cities.to_df().set_index('City').iloc[:, [1]]
t2['Category'] = 'Future 50'
t2.columns = ['Count', 'Category']
plt.figure(figsize = (8, 5))
sns.barplot(data = pd.concat([t1, t2]).reset_index(), x = 'City', y = 'Count', hue = 'Category', palette='Set2');

Nice work! What's the hottest food city, according to this bar plot? No need to write your answer anywhere, this is just something to think about.

## Question 3 – Stars ⭐️

So far, we haven't looked at the ratings for each restaurant – and that's because none of the three tables we have right now contain ratings.

However, there's a fourth CSV file we have access to that can provide us with Yelp ratings for most – but not all – restaurants in `independents`.

### Question 3a

In the cell below, load in the CSV `'data/independents_ratings.csv'` as a table and assign it to the variable name `ratings`.

<!--
BEGIN QUESTION
name: q3a
points: 1
-->

In [None]:
ratings = ...
ratings

In [None]:
grader.check("q3a")

### Question 3b

It's worth thinking about why we have to include the `'City'` column in the above table – isn't it enough to just have the name of the restaurant and its rating?

Well, no – not if there are multiple restaurants of the same name! Below, assign `duplicate_restaurants` to an array containing the names of restaurants in `independents` that appear more than once. You don't need to sort them in any particular order.

<!--
BEGIN QUESTION
name: q3b
points: 2
-->

In [None]:
duplicate_restaurants = ...
duplicate_restaurants

In [None]:
grader.check("q3b")

### Question 3c

To make clear why the above is a problem, consider the following:

In [None]:
# Run this cell.
independents.where('Restaurant', are.contained_in(duplicate_restaurants))

Above, we have a table with the names of each restaurant whose name is not unique. There are 8 rows in this table. Look at what happens if we try and join this table with `independents_with_rating` **without the city column**:

In [None]:
# Run this cell.
independents.where('Restaurant', are.contained_in(duplicate_restaurants)) \
            .join('Restaurant', ratings) \
            .show()

We want the table that results from this join to have at most 8 rows, corresponding to each of the 8 rows in the table we saw before. (There may be fewer than 8 since we don't have ratings for every single restaurant.) However, it has 19 rows – and many of the rows don't make sense! For example, the second last row has `'City'` as `'Las Vegas'` but `'City_2'` as `'New York'`, meaning that the row clearly doesn't correspond to a real restaurant.

There's a solution. Instead of joining `independents` and `ratings` on just `'Restaurant'`, we can join on both `'Restaurant'` and `'City'`. This combines rows in `independents` and `ratings` where the `'Restaurant'` is the same **and** the `'City'` is the same, avoiding the issue above where we have rows with multiple cities.

Below, assign `independents_with_ratings` to the result of joining `independents` and `ratings` on both `'Restaurant'` and `'City'`. To join on multiple columns, pass a list as the first argument to `join` rather than just a string. After joining, re-order the columns so that the column order is the same as in `independents`, just with `'Ratings'` attached on the end, and sort the table so that `'Rank'` is in increasing order.

The first few rows of your result should look like this (notice there is no row with rank 3 since `ratings` doesn't have a rating for that restaurant):

|   Rank | Restaurant                          |    Sales |   Average Check | City     | State   |   Meals Served |   Rating |
|-------:|------------------------------------:|---------:|----------------:|---------:|--------:|---------------:|---------:|
|      1 | Carmine's (Times Square)            | 39080335 |              40 | New York | N.Y.    |         469803 |      4   |
|      2 | The Boathouse Orlando               | 35218364 |              43 | Orlando  | Fla.    |         820819 |      4   |
|      4 | LAVO Italian Restaurant & Nightclub | 26916180 |              90 | New York | N.Y.    |         198500 |      2.5 |
|      5 | Bryant Park Grill & Cafe            | 26900000 |              62 | New York | N.Y.    |         403000 |      3.5 |
|      6 | Gibsons Bar & Steakhouse            | 25409952 |              80 | Chicago  | Ill.    |         348567 |      4   |

_Hint: The `join` part of this problem is the only part that is new, and we tell you exactly how to do it above. Also, to select the columns in the right order, the snippet `list(independents.labels) + ['Rating']` might be helpful._

<!--
BEGIN QUESTION
name: q3c
points: 2
-->

In [None]:
independents_with_ratings = independents.join(['Restaurant', 'City'], ratings) \
                                        .select(list(independents.labels) + ['Rating']) \
                                        ...
independents_with_ratings

In [None]:
grader.check("q3c")

### Question 3d

Finally, assign `city_ratings` to a table where each column corresponds to a star rating and each row corresponds to a city. Each entry in `city_ratings` should be the median `'Average Check'` for a given combination of star rating and city. Only include the cities that are in the `city_checks` from Question 2a.

The first few rows of `city_ratings` should look like this:

| City          |   2.5 |   3.0 |   3.5 |   4.0 |   4.5 |
|--------------:|------:|------:|------:|------:|------:|
| Chicago       |     0 |     0 |  83.5 |    80 |    95 |
| Las Vegas     |     0 |     0 |  97   |   101 |    92 |
| Miami         |     0 |     0 |  88   |    98 |     0 |

As a reminder, the above says that the median `'Average Check'` price at 3.5 star restaurants in Chicago is \\$83.50.

<!--
BEGIN QUESTION
name: q3d
points: 3
-->

In [None]:
city_ratings = independents_with_ratings.pivot('Rating', 'City', 'Average Check', np.median) \
                                        ...

city_ratings

In [None]:
grader.check("q3d")

What trends do you notice in `city_ratings`? Again, there's nowhere you need to write your answer, this is just something for you to think about.

That's it – you should now be well-versed in manipulating tables to gain insight about real data. 🍔

# Done!

Congrats! You've finished another Data 94 homework assignment!

To submit your work, follow the steps outlined on Ed.

The point breakdown for this assignment is given in the table below:

| **Category** | Points |
| --- | --- |
| Autograder | 32 |
| Written | 4 |
| **Total** | 36 |

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()