In [46]:
# Initialize Otter
import otter
grader = otter.Notebook("lab.ipynb")

# Lab 1 – Python, NumPy, and Pandas

## DSC 80, Fall 2024

### Due Date: Friday, October 4th at 11:59PM

## Instructions

Welcome to the first assignment in DSC 80 this quarter!

Much like in DSC 10, this Jupyter Notebook contains the statements of the problems and provides code and Markdown cells to display your answers to the problems. Unlike DSC 10, the notebook is *only* for displaying a readable version of your final answers. The coding will be done in an accompanying `lab.py` file that is imported into the current notebook, and **you will only submit that `lab.py` file**, not this notebook!

Some additional guidelines:
- **Unlike in DSC 10, labs will have both public tests and hidden tests.** The bulk of your grade will come from your scores on hidden tests, which you will only see on Gradescope after the assignment deadline.
- **Do not change the function names in the `lab.py` file!** The functions in the `lab.py` file are how your assignment is graded, and they are graded by their name. If you changed something you weren't supposed to, you can find the original code in the [course GitHub repository](https://github.com/dsc-courses/dsc80-2024-fa).
- Notebooks are nice for testing and experimenting with different implementations before designing your function in your `lab.py` file. You can write code here, but make sure that all of your real work is in the `lab.py` file, since that's all you're submitting.
- You are encouraged to write your own additional helper functions to solve the lab, as long as they also end up in `lab.py`.

**To ensure that all of the work you want to submit is in `lab.py`, we've included a script named `lab-validation.py` in the lab folder. You shouldn't edit it, but instead, you should call it from the command line (e.g. the Terminal) to test your work.** More details on its usage are given at the bottom of this notebook.

**Importing code from `lab.py`**:

* Below, we import the `.py` file that's contained in the same directory as this notebook.
* We use the `autoreload` notebook extension to make changes to our `lab.py` file immediately available in our notebook. Without this extension, we would need to restart the notebook kernel to see any changes to `lab.py` in the notebook.
    - `autoreload` is necessary because, upon import, `lab.py` is compiled to bytecode (in the directory `__pycache__`). Subsequent imports of `lab` merely import the existing compiled python.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from lab import *

In [4]:
from pathlib import Path
import io
import pandas as pd
import numpy as np

### Infrastructure Summary

Run the following cell to see a [video 🎥](https://www.loom.com/share/0ea254b85b2745e59322b5e5a8692e91?sid=b77c5c2d-0c24-40fb-8cfc-8574d49d9019) that summarizes the above information and walks you through how to
- set up your programming environment (see the instructions in [Tech Support](https://dsc80.com/tech_support) for more details),
- access assignments,
- work on and test assignments, and
- submit assignments.

The video is also linked on the [Resources tab of the course website](https://dsc80.com/resources).

In [5]:
from IPython.display import IFrame
IFrame(src="https://www.loom.com/embed/0ea254b85b2745e59322b5e5a8692e91", width=750, height=500)

Let's get started! 🎉

## Part 1: Python Basics 🐍

### Question 0 – Consecutive Integers

Complete the implementation of the function `consecutive_ints`, which takes in a possibly empty list of integers (`ints`) and returns `True` if there exist two adjacent list elements that are consecutive integers and `False` otherwise.

For example, since `9` is next to `8`, `consecutive_ints([5, 3, 6, 4, 9, 8])` should evaluate to `True`, since `9` and `8` are consecutive integers. On the other hand, `consecutive_ints([1, 3, 5, 7, 9])` should evaluate to `False`.

***Note***: If you look at `lab.py`, you'll notice that the solution to this problem is already there. This question is done for you to show you what a completed homework problem looks like.

In [6]:
# The cells below are here for you to write scratch work in. 
# You should write the code for your answer in `lab.py`, not here.

In [7]:
consecutive_ints([5, 3, 6, 4, 9, 8])

True

In [8]:
consecutive_ints([1, 3, 5, 7, 9])

False

To run the public tests on your code for a given question, run the cell containing a call to `grader.check` that immediately follows it. 

Remember, your grade will primarily be determined by hidden tests, which are **not** run when you run `grader.check`, so it's important to extensively test your functions on your own by calling them on different inputs. Does they work for edge cases? Real-world data is **very messy** and you should expect your data processing code to break without thorough testing!

You can write custom tests either by calling your functions on different inputs here in the notebook, or by writing doctests in `lab.py`, as you did in DSC 20.

In [9]:
grader.check("q0")

### Question 1 – Median vs. Mean

Complete the implementation of the function `median_vs_mean`, which takes in a non-empty list of numbers (`nums`) and returns `True` if median of the list is less than or equal to the mean of the list and `False` otherwise.

Recall, if a list has even length, the median is the mean of the middle two elements.

***Note:*** In this question, you may only use built-in functions and methods in Python. You should not use `numpy` or `pandas` at all, nor should you import any additional packages.

In [10]:
median_vs_mean([5, 3, 6, 4, 9])

True

In [11]:
median_vs_mean([5, 3, 6, 4, 0, 0])


False

In [12]:
grader.check("q1")

## Part 2: Strings and Files 🧵

The following questions will familiarize you with the basics of working with strings and reading data from files. Remember that by default, data from files are stored as strings in Python.

### Question 2 – $n$ Prefixes

Complete the implementation of the function `n_prefixes`, which takes a string `s` and a positive integer `n`. It returns a string containing the first `n` consecutive prefixes of `s` in reverse order.

For example, let's suppose `s` is the string `'Billy!'` and `n` is `4`. The consecutive prefixes of `'Billy!'` are:
- `'B'`
- `'Bi'`
- `'Bil'`
- `'Bill'`
- `'Billy'`
- `'Billy!'`

The first 4 of these are `'B'`, `'Bi'`, `'Bil'`, and `'Bill'`. If we combine these 4 in reverse order, we get `'BillBilBiB'`, which is what `n_prefixes('Billy!', 4)` should return. As another example, `n_prefixes('Marina', 3)` should return `'MarMaM'`. **You may assume that `n` is no larger than the length of `s`.**

***Hint:*** Recall that [strings may be sliced](https://docs.python.org/3/tutorial/introduction.html#strings), like lists.

In [13]:
n_prefixes('Billy!', 4)

'BillBilBiB'

In [14]:
n_prefixes('Marina', 3)

'MarMaM'

In [15]:
grader.check("q2")

### Question 3 – Exploded Numbers 💣

Complete the implementation of the function `exploded_numbers`, which takes in a list of integers (`ints`) and a non-negative integer (`n`) and **returns a list of strings** containing numbers from the list expanded by `n` numbers in both directions, separated by spaces. Each integer should be [zero padded](https://www.tutorialspoint.com/python/string_zfill.htm) so that all integers outputted have the same length.

For example, consider `exploded_numbers([3, 8, 15], 2)`.
- If we explode 3 by 2 numbers in both directions, we get 1, 2, 3, 4, 5.
- If we explode 8 by 2 numbers in both directions, we get 6, 7, 8, 9, 10.
- If we explode 15 by 2 numbers in both directions, we get 13, 14, 15, 16, 17.

The longest length of any of the exploded numbers above is 2, so all of the outputted integers should have length 2.

- The string corresponding to 3 in the input is `'01 02 03 04 05'`.
- The string corresponding to 8 in the input is `'06 07 08 09 10'`.
- The string corresponding to 15 in the input is `'13 14 15 16 17'`.

So, `exploded_numbers([3, 8, 15], 2)` should return `['01 02 03 04 05', '06 07 08 09 10', '13 14 15 16 17']`. 

As another example, `exploded_numbers([9, 99], 3)` should return `['006 007 008 009 010 011 012', '096 097 098 099 100 101 102']`.

***Note***: You can assume that negative numbers will never be encountered. That is, when testing your code, we will never explode a number so much that it becomes negative.

In [16]:
exploded_numbers([3, 8, 15], 2)

['01 02 03 04 05', '06 07 08 09 10', '13 14 15 16 17']

In [17]:
exploded_numbers([9, 99], 3)

['006 007 008 009 010 011 012', '096 097 098 099 100 101 102']

In [18]:
exploded_numbers([2, 10], 2)
# ['00 01 02 03 04', '08 09 10 11 12']

['00 01 02 03 04', '08 09 10 11 12']

In [19]:
grader.check("q3")

### Question 4 – Reading Files

[Recall](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) that the built-in function `open` takes in a file path and returns *a file object* (sometimes called a *file handle*). Below are a few properties of file objects:

* `open(path)` opens the file at location `path` for reading.
* `open(path)` is an *iterable*, which contains successive lines of the file.
* Once a file object is opened, after use it should be closed to avoid memory leaks. To ensure a file is closed once done, you should use a *context manager* as follows:
```py
with open(path) as fh:
    for line in fh:
        process_line(line)
```
* To read the entire file into a string, use the `read` method:
```py
with open(path) as fh:
    s = fh.read()
```

However, you should be careful when reading an entire file into memory that the file isn't too big! *You should avoid this whenever possible!*

Complete the implementation of the function `last_chars`, which takes in file object (`fh`) and returns a string consisting of the last character of each line. Note that you don't have to use `open` at all; the argument given to you is a file object, not a file path.

***Note:*** A newline (`'\n'`) is the "delimiter" of the lines of a file, and doesn't count as part of the line (as the tests imply). Every other character is part of the line. For more info on this, see [the interpretation](https://en.wikipedia.org/wiki/Newline#Interpretation) of files as a 'newline delimited variables' file.

If your implementation is correct, you should see `'hrg'` when running the cell below:

In [20]:
# You'll see the Path(...) / syntax a lot.
# It creates the correct path to your file, 
# whether you're using Windows, macOS, or Linux.
# (Note that macOS and Linux use / to denote separate folders in paths,
# while Windows uses \.)

fp = Path('data') / 'chars.txt'
last_chars(open(fp))

'hrg'

In [21]:
grader.check("q4")

## Part 3: `numpy` exercises 🥧

For a refresher on `numpy` and arrays, refer to the relevant section of the [DSC 10 course notes](https://notes.dsc10.com/02-data_sets/arrays.html).

### Question 5 – Array Methods

Complete the implementations of the functions `add_root` and `where_square`. Specifications are given below. Your solutions should **not** contain any loops or list comprehensions.

#### `add_root`

`add_root` should take in a `numpy` array, `A`, and return a new `numpy` array that contains the element-wise sum of the elements in `A` with the _square roots of the positions of the elements in `A`_. 

For instance, if `A` contains the values 5, 9, and 4, the output array should contain the values 5 (5 + $\sqrt{0}$), 10 (9 + $\sqrt{1}$), and 5.4142... (4 + $\sqrt{2}$).

<br>

#### `where_square`

`where_square` should take in a `numpy` array, `A`, and return a new `numpy` array of Booleans whose `i`th element is `True` if and only if the `i`th element of `A` is a perfect square. 

For instance, `where_square(np.array([2, 9, 16, 15]))` should return `array([False, True, True, False])`.

In [22]:
test1 = np.array([5,9,4])
add_root(test1)

array([ 5.        , 10.        ,  5.41421356])

In [23]:
test2 = np.array([2,9,16,15])
where_square(test2)

array([False,  True,  True, False])

In [24]:
# Don't change this cell -- it is needed for the tests to work
A_1 = np.array([2, 4, 6, 7])
out_1 = add_root(A_1)

A_2 = np.array([1, 2, 16, 17, 32, 49])
out_2 = where_square(A_2)

In [25]:
grader.check("q5")

### Question 6 - Filtering Matrices

Complete the implementation for the function `filter_cutoff_loop` and `filter_cutoff_np`.

#### Part 1: `filter_cutoff_loop`

`filter_cutoff_loop` should take in a matrix (2-dimensional `numpy` array) and a `float` cutoff. The function should return a 2-dimensional `numpy` array with only coumns that have a column mean (strictly) greater than the cutoff value. **This function should be implemented with loops or list comprehensions, not using `numpy` or `pandas` operations.** For example, given the matrix: 

$$
a = \begin{bmatrix}
    1 & -2 & 3 & -3 \\
    0 & 1 & 3 & 6 \\
\end{bmatrix}
$$
The column means are computed as follows: 
$$
\begin{bmatrix}
1\\
0
\end{bmatrix} 
= \frac{1+0}{2} = 0.5 \text{, }
\begin{bmatrix}
-2\\
1
\end{bmatrix} 
= \frac{-2+1}{2} = 0.5 \text{, }
\begin{bmatrix}
3\\
3
\end{bmatrix} 
= \frac{3+3}{2} = 3 \text{, }
\begin{bmatrix}
-3\\
6
\end{bmatrix} 
= \frac{-3+6}{2} = 1.5
$$

If the cutoff value is 1, only the last two columns have means above 1, thus only the last two columns are kept, as shown in the first example below. 

```py
>>> a = np.array([[1,-2, 3,-3], [0, 1, 3, 6]])
>>> filter_cutoff_loop(a, 1.0)
# the column means are [0.1,-0.5, 3, 1.5]
# only columns 3 & 4 have column means higher than 1
# we only return columns 3 & 4
array([[ 3, -3],
       [ 3,  6]])
```

```py
>>> a = np.array([[1,-2, 3,-3], [0, 1, 3, 6]])
>>> filter_cutoff_loop(a, 2.0)
# the column means are [0.1,-0.5, 3, 1.5]
# only column 3 has a column mean higher than 2
# we only return column 3
array([[3],
       [3]])
```

In [26]:
# so i have to take the same index for each list and then sum them up and divide by the amount of numbers. then i have to compare the value to the other input. 
# if the value is higher than the input, then the columns are kept.

In [27]:
a = np.array([[1,-2, 3,-3], [0, 1, 3, 6]])
filter_cutoff_loop(a, 1.0)

array([[ 3, -3],
       [ 3,  6]])

In [28]:
a = np.array([[1,-2, 3,-3], [0, 1, 3, 6]])
filter_cutoff_loop(a, 2.0)

array([[3],
       [3]])

In [29]:
grader.check("q6")

### Question 6

#### Part 2: `filter_cutoff_np`

`filter_cutoff_np` should take in the same inputs and output the same results as  `filter_cutoff_loop`.  **Do not use loops or list comprehensions for this implementation, only `numpy` operations.**


***Hints:***: 
- What does the axis argument do in `np.mean`?
- Remember we can slice arrays using Boolean values. How does the code below work? 
```py
>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
>>> a[:, [True, False, True, False]]
```


In [30]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
a[:, [True, False, True, False]]

array([[1, 3],
       [5, 7]])

In [31]:
np.mean(a, axis = 0) # column means 

array([3., 4., 5., 6.])

In [32]:
a = np.array([[1,-2, 3,-3], [0, 1, 3, 6]])
filter_cutoff_np(a, 1.0)

array([[ 3, -3],
       [ 3,  6]])

In [33]:
a = np.array([[1,-2, 3,-3], [0, 1, 3, 6]])
filter_cutoff_np(a, 2.0)

array([[3],
       [3]])

In [34]:
grader.check("q6.2")

### Question 7 – Stock Prices 📈

Complete the implementations of the functions `growth_rates` and `with_leftover`. Specifications are given below. Your solutions should **not** contain any loops or list comprehensions.

#### `growth_rates`

`growth_rates` should take in a `numpy` array, `A`, of [stock prices](https://en.wikipedia.org/wiki/Stock) for a single stock on successive days in USD. It should return an array of growth rates. That is, the `i`th number of the returned array should contain the rate of growth in stock price between the $i^{th}$ day to the $(i+1)^{th}$ day. The growth rate between two values is defined as $\frac{\text{final} - \text{initial}}{\text{initial}}$. You should return growth rates as **proportions, rounded to two decimal places**.

<br>

#### `with_leftover`

Again, suppose `A` is a `numpy` array of stock prices. Consider the following scheme: 

- Suppose that you start each day with \$20 to purchase stocks. 
- Each day, you purchase as many shares as possible of the stock. (The price changes each day, according to `A`.)
- Any money left-over after a given day is saved for possibly buying stock on a future day.

The function `with_leftover` should take in `A` and return the day (as an `int`) on which you can buy at least one full share using just "left-over" money. If this never happens, return `-1`. Note that the first stock purchase occurs on Day 0, and that you cannot purchase fractions of a share of a stock.

For example, if the stock price is \$3 every day, then the answer is `1` (corresponding to Day 1):
- Day 0: Buy 6 stocks with \\$20, and \\$2 is added to the leftover. Your total leftover is currently \\$2. This is not enough to buy one extra share, so you continue.
- Day 1: Buy 6 stocks with \\$20, and another \\$2 is added to the leftover. Your total leftover is now \\$4, so you can now buy one extra share. Hence, the answer is Day 1, and `with_leftover` should return `1`.

***Hint:*** `np.cumsum` may be helpful.

In [35]:
# when leftover is greater than or equal to the stock price of the same day, return that day
# np.cumsum() should be used for the money that is left over. 
# 20 % stock price to get left over
# then add the left over with cumsum 
# but still check after everyday if you have money left over to buy stock. 
# if total leftover at the end (so the last element of the cumsum array) is less than the last stock day, then return -1

In [36]:
# Don't change this cell -- it is needed for the tests to work
fp = Path('data') / 'stocks.csv'
stocks = np.array([float(x) for x in open(fp)])
out_3_stocks = growth_rates(stocks)

A_4 = np.array([3, 3, 3, 3])
out_4 = with_leftover(A_4)
print(out_4)

1


In [37]:
grader.check("q7")

## Part 4: Introduction to `pandas` 🐼

This part will help build familiarity with DataFrames in `pandas`. Fortunately, you've already a version of `pandas` before in DSC 10, called `babypandas`! Review the [DSC 10 course notes](https://notes.dsc10.com/02-data_sets/dataframes.html) as necessary.

One key difference between `babypandas` and `pandas` is the idiomatic way of accessing a column. In `babypandas`, to access column `'x'` in DataFrame `df`, you used `df.get('x')`. In `pandas`, the more common way is `df['x']`.

As always for `pandas` questions:
1. Avoid writing loops through the rows of the DataFrame to do the problem, and
2. Test the output/correctness of your code with the help of the dataset given, but be sure your code will also run on data that is similar to but different from the dataset given. (One way to do this is to sample rows from the provided DataFrame using the `.sample` method).

The file `data/salary.csv` contains salary information for the 2021-22 National Basketball Association (NBA) season 🏀. Specifically, it contains the name, team, and salary of all players who have played at least 15 games last season. We will load this file and store it as a DataFrame named `salary`.

In [38]:
# Do not edit this cell -- it is needed for the tests
salary_fp = Path('data') / 'salary.csv'
salary = pd.read_csv(salary_fp)
salary.head()

Unnamed: 0,Player,Position,Team,Salary
0,John Collins,PF,Atlanta Hawks,23000000
1,Danilo Gallinari,PF,Atlanta Hawks,20475000
2,Bogdan Bogdanović,SG,Atlanta Hawks,18000000
3,Clint Capela,C,Atlanta Hawks,17103448
4,Delon Wright,SG,Atlanta Hawks,8526316


### Question 8 – `pandas` Basics

Your job is to complete the implementation of the function `salary_stats`, which takes in a DataFrame like `salary` and returns a **Series** containing the following statistics:
- `'num_players'`: The number of players.
- `'num_teams'`: The number of teams.
- `'total_salary'`: The total salary amount for all players.
- `'highest_salary'`: The name of the player with the highest salary. **Assume there are no ties.**
- `'avg_los'`: The average salary of the `'Los Angeles Lakers'`, rounded to two decimal places.
- `'fifth_lowest'`: The name and team of the player who has the fifth lowest salary, separated by a comma and a space (e.g. `'Billy Triton, Cleveland Cavaliers'`). **Assume there are no ties.**
- `'duplicates'`: A Boolean that is `True` if there are any duplicate last names, and `False` otherwise. Note that some players may have a suffix on their name, such as "Jr." or "III" -- you should ignore these. That is, "Billy Triton Jr." and "Tyler Triton" should be considered to have the same last name.
- `'total_highest'`: The total salary of the team that has the highest paid player.

The index of each element in the outputted Series is specified above.

***Notes***: 
- Your function should work on a dataset of the same format that contains information from other years. This means that `salary_stats` should not "hard-code" any numbers or strings, but should compute them all programatically. In all cases, you may assume that none of the answers involving ranking involves a tie.
- The public tests don't test to see if your function actually returns the right numbers. You should manually inspect your result to make sure that all values seem appropriate.

In [39]:
# 'num_players': groupby player and then count how many rows there are
# 'num_teams': groupby team and then count how many rows there are
# 'total_salary': just sum up the salary column (if this fails, then maybe you have to groupby player first and then sum up salary)
# 'highest_salary': find the max salary 
# 'avg_loss': groupby team and use mean() and then loc the Lakers and round the salary
# 'fifth_lowest': sort the salary column and find the fifth row and get the name and team and then concatonate in string format. 
# 'duplicates': this can possibl be done with indexing and getting the 1 positon for each name and comparing and there if there is one, return true
# 'total_highest': sort the salary column and go to the max value and return it's corresponding team name. then groupby team name and sum the salary. 

In [40]:
# Do not edit this cell -- it is needed for the tests
salary_fp = Path('data') / 'salary.csv'
salary = pd.read_csv(salary_fp)
stats = salary_stats(salary)

salary_sample_fp = Path('data') / 'salary_sample.csv'
salary_sample = pd.read_csv(salary_sample_fp)
sample_stats = salary_stats(salary_sample)

In [41]:
stats

num_players                                   381
num_teams                                      30
total_salary                           3433118794
highest_salary                      Stephen Curry
avg_los                               13266896.82
fifth_lowest      Miye Oni, Oklahoma City Thunder
duplicates                                   True
total_highest                           130428103
dtype: object

In [42]:
grader.check("q8")

### Question 9 – Reading Malformed `.csv` Files

`data/malformed.csv` is a file of comma-separated values, containing the following fields:


|column name|description|type|
|---|---|---|
|`'first'`|first name of person|`str`|
|`'last'`|last name of person|`str`|
|`'weight'`|weight of person (lbs)|`float`|
|`'height'`|height of person (in)|`float`|
|`'geo'`|location of person; comma-separated latitude/longitude|`str`|

Unfortunately, the entries contains errors with the placement of commas (`,`) and quotes (`"`) that cause `pandas`' `read_csv` function to fail parsing the file with the default settings. Don't believe us? Try using `pd.read_csv` on `data/malformed.csv` and look at what happens.

As a result, instead of using `pd.read_csv`, you must read in the file manually using Python's built-in `open` function.

Complete the implementation of the function `parse_malformed`, which takes in a file path (`fp`) and returns a parsed, properly-typed DataFrame with the information in the corresponding file. For example, `fp` may be `'data/malformed.csv'`. The DataFrame should contain the columns described in the data description table above (with the specified types).

***Note:*** 
- The only kinds of issues you need your function to handle are comma and quote misplacements; don't try and find any other issues with the CSV. 
- With that said, you should assume that `data/malformed.csv` is a sample of a larger file that has the same sorts of errors, but potentially in different lines. For example, `data/malformed.csv` has an unnecessary quote `"` in line 4, but your function may be called on another CSV that has a perfectly fine line 4 but an unnecessary quote on some other line.
- So, **don't** implement `parse_malformed` assuming that the commas and quotes are mispositioned on specific lines; rather, implement `parse_malformed` such that it can handle these issues on every single line they appear in.
- A good way to proceed is to open `data/malformed.csv` and look carefully at the comma and quote placements.


The first few rows of `parse_malformed('data/malformed.csv')` should be:

<img src="./imgs/example-df.png" width=45%>

In [43]:
# Do not edit -- needed for tests
fp = Path('data') / 'malformed.csv'
cols = ['first', 'last', 'weight', 'height', 'geo']
df = parse_malformed(fp)
dg = pd.read_csv(fp, nrows=4, skiprows=10, names=cols)

df

Unnamed: 0,first,last,weight,height,geo
0,Julia,Wagner,142.0,86.0,"39.8,15.4"
1,Angelica,Rija,155.0,56.0,"38.2,-71.7"
2,Tyler,Micajah,116.0,73.0,"38.0,6.9"
3,Kathleen,Nakea,163.0,69.0,"36.3,-86.8"
4,Axel,Ronit,95.0,74.0,"36.8,128.2"
...,...,...,...,...,...
95,Yasmeen,Jahron,135.0,84.0,"38.3,-127.3"
96,Meghan,Carlyann,101.0,66.0,"36.6,80.5"
97,Tess,Shree,146.0,68.0,"38.8,64.9"
98,Maria,Kalvyn,115.0,51.0,"37.1,-90.4"


In [44]:
grader.check("q9")

## Congratulations! You're done Lab 1! 🏁

As a reminder, all of the work you want to submit needs to be in `lab.py`.

To ensure that all of the work you want to submit is in `lab.py`, we've included a script named `lab-validation.py` in the lab folder. You shouldn't edit it, but instead, you should call it from the command line (e.g. the Terminal) to test your work.

Once you've finished the lab, you should open the command line and run, in the directory for this lab:

```
python lab-validation.py
```

**This will run all of the `grader.check` cells that you see in this notebook, but only using the code in `lab.py` – that is, it doesn't look at any of the code in this notebook. If all of your `grader.check` cells pass in this notebook but not all of them pass in your command line with the above command, then you likely have code in your notebook that isn't in your `lab.py`!**

You can also use `lab-validation.py` to test individual questions. For instance,

```
python lab-validation.py q1 q4 q7 q8
```

will run the `grader.check` cells for Questions 1, 4, 7, and 8 – again, only using the code in `lab.py`. The [video](#Infrastructure-Summary) linked above shows you how to use the script as well.

Once `python lab-validation.py` shows that you're passing all test cases, you're ready to submit your `lab.py` (and only your `lab.py`) to Gradescope. Once submitting to Gradescope, make sure to stick around until all test cases pass.

There is also a call to `grader.check_all()` below in _this_ notebook, but make sure to also follow the steps above.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [45]:
grader.check_all()

q0 results: All test cases passed!

q1 results: All test cases passed!

q2 results: All test cases passed!

q3 results: All test cases passed!

q4 results: All test cases passed!

q5 results: All test cases passed!

q6 results: All test cases passed!

q6.2 results: All test cases passed!

q7 results: All test cases passed!

q8 results: All test cases passed!

q9 results: All test cases passed!