<img src="img/dsci511_header.png" width="600">

# Lab 1: Reading in and wrangling data

## Instructions
rubric={mechanics:5}

Check off that you have read and followed each of these instructions:

- [ ] All files necessary to run your work must be pushed to your GitHub.ubc.ca repository for this lab.
- [ ] You need to have a minimum of 3 commit messages associated with your GitHub.ubc.ca repository for this lab.
- [ ] You must also submit `.ipynb` file and the rendered PDF in this worksheet/lab to Gradescope. Entire notebook must be executed so the TA's can see the results of your work. 
- [ ] **There is autograding in this lab, so please do not move or rename this file. Also, do not copy and paste cells, if you need to add new cells, create new cells via the "Insert a cell below" button instead.**
- [ ] To ensure you do not break the autograder remove all code for installing packages (i.e., DO NOT have `! conda install ...` or `! pip install ...` in your homework!
- [ ] Follow the [MDS general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions/).
- [ ] <mark>This lab has hidden tests. In this lab, the visible tests are just there to ensure you create an object with the correct name. The remaining tests are hidden intentionally. This is so you get practice deciding when you have written the correct code and created the correct data object. This is a necessary skill for data scientists, and if we were to provide robust visible tests for all questions you would not develop this skill, or at least not to its full potential.</mark>


## Code Quality
rubric={quality:5}

The code that you write for this assignment will be given one overall grade for code quality, see our code quality rubric as a guide to what we are looking for. Also, for this course (and other MDS courses that use R), we are trying to follow the PEP 8 code style. There is a guide you can refer too: https://peps.python.org/pep-0008/

Each code question will also be assessed for code accuracy (i.e., does it do what it is supposed to do?).

## Writing 
rubric={writing:5}

To get the marks for this writing component, you should:

- Use proper English, spelling, and grammar throughout your submission (the non-coding parts).
- Be succinct. This means being specific about what you want to communicate, without being superfluous.


## Let's get started!

Run the cell below to load the packages needed for this lab.

In [2]:
import pandas as pd
import numpy as np
import altair as alt
pd.set_option('display.max_rows', 6)

## Exercise 1: Reading in Data

Read the data files listed in the table below, and store them as pandas data frames with the names provided in the table. We will use hidden tests to grade this, so you will get to practice deciding that your job is done, and done correctly.

**Note - if the column names are missing from any data sets you need to add them yourself programmatically via python**

| File  | Name for Data Frame | File location |
|---|---|----|
| `abbotsford_lang.xlsx`  | `abbotsford` | `data` directory of this repo |
| `calgary_lang.csv`  | `calgary`  | `data` directory of this repo |
| `edmonton_lang.xlsx`  | `edmonton`  | https://github.com/ttimbers/canlang/blob/master/inst/extdata/edmonton_lang.xlsx?raw=true |
|  `kelowna_lang.csv` | `kelowna`  | `data` directory of this repo |
| `vancouver_lang.csv`  | `vancouver`  | `data` directory of this repo |
| `victoria_lang.csv`  | `victoria`  | https://github.com/ttimbers/canlang/raw/master/inst/extdata/victoria_lang.tsv |


### The Data

The data you will be working with in this first exercise is language data from the 2016 Canadian Census for cities in Western Canada. If you are unfamiliar with Western Canadian geography, here’s a map to help you start to get more familiar:

<img src="https://www.canadatours.com/images/maps/Canada_W.gif" width=500>

Image source: https://www.canadatours.com/canada_maps.cfm?#W 

### Exercise 1.1: Read in the Abbotsford language Data
rubric={autograde:5}

In [2]:
abbotsford = pd.read_excel("data/abbotsford_lang.xlsx", sheet_name = "data") # SOLUTION
abbotsford

Unnamed: 0,category,language,mother_tongue,most_at_home,most_at_work,lang_known
0,Aboriginal languages,"Aboriginal languages, n.o.s.",5,5,0,0
1,Non-Official & Non-Aboriginal languages,Afrikaans,135,75,0,285
2,Non-Official & Non-Aboriginal languages,"Afro-Asiatic languages, n.i.e.",0,0,0,10
...,...,...,...,...,...,...
211,Non-Official & Non-Aboriginal languages,Wu (Shanghainese),15,5,0,10
212,Non-Official & Non-Aboriginal languages,Yiddish,5,0,0,0
213,Non-Official & Non-Aboriginal languages,Yoruba,20,0,0,50


In [3]:
# TEST

# Checking if Data Frame is defined
assert abbotsford is not None, "The Data Frame abbotsford has not been defined"

# Check if abbotsford is a DataFrame
assert isinstance(abbotsford, pd.core.frame.DataFrame), "abbotsford is not a pandas Data Frame"


In [4]:
# HIDDEN TEST

nrows = 214
ncols = 6
assert abbotsford.shape == (nrows,ncols), "Data Frame has incorrect shape"

msum = 173570
assert sum(abbotsford['mother_tongue']) == msum

# Checking column names
expected_columns = ['category', 'language', 'mother_tongue', 'most_at_home', 'most_at_work', 'lang_known']
assert list(abbotsford.columns) == expected_columns, f"Column names are incorrect. Expected: {expected_columns}, but got: {list(abbotsford.columns)}."

# Check dtype for 'lang_known'
assert abbotsford['lang_known'].dtype == np.dtype('int64'), f"lang_known column is not int64"


### Exercise 1.2: Read in the Calgary language Data
rubric={autograde:5}

In [2]:
calgary = pd.read_csv("data/calgary_lang.csv") # SOLUTION
calgary

Unnamed: 0,category,language,mother_tongue,most_at_home,most_at_work,lang_known
0,Aboriginal languages,"Aboriginal languages, n.o.s.",20,5,0,15
1,Non-Official & Non-Aboriginal languages,Afrikaans,960,505,15,1955
2,Non-Official & Non-Aboriginal languages,"Afro-Asiatic languages, n.i.e.",45,15,0,170
...,...,...,...,...,...,...
211,Non-Official & Non-Aboriginal languages,Wu (Shanghainese),380,210,0,580
212,Non-Official & Non-Aboriginal languages,Yiddish,80,10,0,175
213,Non-Official & Non-Aboriginal languages,Yoruba,1430,350,0,3460


In [3]:
# TEST

# Checking if Data Frame is defined
assert calgary is not None, "The Data Frame calgary has not been defined"

# Check if calgary is a DataFrame
assert isinstance(calgary, pd.core.frame.DataFrame), "calgary is not a pandas Data Frame"


In [8]:
# HIDDEN TEST

nrows = 214
ncols = 6
assert calgary.shape == (nrows,ncols), "Data Frame has incorrect shape"

msum = 1341120
assert sum(calgary['mother_tongue']) == msum

# Checking column names
expected_columns = ['category', 'language', 'mother_tongue', 'most_at_home', 'most_at_work', 'lang_known']
assert list(calgary.columns) == expected_columns, f"Column names are incorrect. Expected: {expected_columns}, but got: {list(calgary.columns)}."

# Check dtype for 'lang_known'
assert type(calgary['lang_known'][0]) == np.int64, f"lang_known column is not int64"

### Exercise 1.3: Read in the Edmonton language Data
rubric={autograde:5}

In [8]:
url = "https://github.com/ttimbers/canlang/blob/master/inst/extdata/edmonton_lang.xlsx?raw=true" # SOLUTION
edmonton = pd.read_excel(url, sheet_name = "data") # SOLUTION
edmonton

Unnamed: 0,category,language,mother_tongue,most_at_home,most_at_work,lang_known
0,Aboriginal languages,"Aboriginal languages, n.o.s.",25,10,0,0
1,Non-Official & Non-Aboriginal languages,Afrikaans,575,300,0,1220
2,Non-Official & Non-Aboriginal languages,"Afro-Asiatic languages, n.i.e.",65,20,0,155
...,...,...,...,...,...,...
211,Non-Official & Non-Aboriginal languages,Wu (Shanghainese),235,120,0,260
212,Non-Official & Non-Aboriginal languages,Yiddish,55,0,0,65
213,Non-Official & Non-Aboriginal languages,Yoruba,700,280,0,1600


In [9]:
# TEST

# Checking if Data Frame is defined
assert edmonton is not None, "The Data Frame edmonton has not been defined"

# Check if edmonton is a DataFrame
assert isinstance(edmonton, pd.core.frame.DataFrame), "edmonton is not a Data Frame"


In [10]:
# HIDDEN TEST

nrows = 214
ncols = 6
assert edmonton.shape == (nrows,ncols), "Data Frame has incorrect shape"

msum = 1273005
assert sum(edmonton['mother_tongue']) == msum

# Checking column names
expected_columns = ['category', 'language', 'mother_tongue', 'most_at_home', 'most_at_work', 'lang_known']
assert list(edmonton.columns) == expected_columns, f"Column names are incorrect. Expected: {expected_columns}, but got: {list(edmonton.columns)}."

# Check dtype for 'lang_known'
assert edmonton['lang_known'].dtype == np.dtype('int64'), f"lang_known column is not int64"


### Exercise 1.4: Read in the Kelowna language Data
rubric={autograde:5}

In [11]:
# BEGIN SOLUTION
names_for_cols = ["category", "language", "mother_tongue", "most_at_home", "most_at_work", "lang_known"]
kelowna = pd.read_csv('data/kelowna_lang.csv', sep = ';', decimal = ',', skiprows = [0,1,2,3,4,5,220], names = names_for_cols)
# END SOLUTION
kelowna

Unnamed: 0,category,language,mother_tongue,most_at_home,most_at_work,lang_known
0,Aboriginal languages,"Aboriginal languages, n.o.s.",0,0,0,10
1,Non-Official & Non-Aboriginal languages,Afrikaans,175,75,0,280
2,Non-Official & Non-Aboriginal languages,"Afro-Asiatic languages, n.i.e.",5,0,0,0
...,...,...,...,...,...,...
211,Non-Official & Non-Aboriginal languages,Wu (Shanghainese),10,0,0,0
212,Non-Official & Non-Aboriginal languages,Yiddish,0,5,0,0
213,Non-Official & Non-Aboriginal languages,Yoruba,5,0,0,0


In [12]:
# TEST

# Checking if Data Frame is defined
assert kelowna is not None, "The Data Frame kelowna has not been defined"

# Check if kelowna is a DataFrame
assert isinstance(kelowna, pd.core.frame.DataFrame), "kelowna is not a pandas Data Frame"

In [13]:
# HIDDEN TEST

nrows = 214
ncols = 6
assert kelowna.shape == (nrows,ncols), "Data Frame has incorrect shape"

msum = 190845
assert sum(kelowna['mother_tongue']) == msum

# Checking column names
expected_columns = ['category', 'language', 'mother_tongue', 'most_at_home', 'most_at_work', 'lang_known']
assert list(kelowna.columns) == expected_columns, f"Column names are incorrect. Expected: {expected_columns}, but got: {list(kelowna.columns)}."

# Check dtype for 'lang_known'
assert kelowna['lang_known'].dtype == np.dtype('int64'), f"lang_known column is not int64"

### Exercise 1.5: Read in the Vancouver language Data
rubric={autograde:5}

In [14]:
vancouver = pd.read_csv('data/vancouver_lang.csv') # SOLUTION
vancouver

Unnamed: 0,category,language,mother_tongue,most_at_home,most_at_work,lang_known
0,Aboriginal languages,"Aboriginal languages, n.o.s.",70,15,0,35
1,Non-Official & Non-Aboriginal languages,Afrikaans,1435,520,10,4225
2,Non-Official & Non-Aboriginal languages,"Afro-Asiatic languages, n.i.e.",45,10,0,95
...,...,...,...,...,...,...
211,Non-Official & Non-Aboriginal languages,Wu (Shanghainese),4330,2495,45,5385
212,Non-Official & Non-Aboriginal languages,Yiddish,220,10,0,385
213,Non-Official & Non-Aboriginal languages,Yoruba,190,40,0,505


In [15]:
# TEST

# Checking if Data Frame is defined
assert vancouver is not None, "The Data Frame vancouver has not been defined"

# Check if vancouver is a DataFrame
assert isinstance(vancouver, pd.core.frame.DataFrame), "vancouver is not a pandas Data Frame"

In [16]:
# HIDDEN TEST

nrows = 214
ncols = 6
assert vancouver.shape == (nrows,ncols), "Data Frame has incorrect shape"

msum = 2361925
assert sum(vancouver['mother_tongue']) == msum

# Checking column names
expected_columns = ['category', 'language', 'mother_tongue', 'most_at_home', 'most_at_work', 'lang_known']
assert list(vancouver.columns) == expected_columns, f"Column names are incorrect. Expected: {expected_columns}, but got: {list(vancouver.columns)}."

# Check dtype for 'lang_known'
assert vancouver['lang_known'].dtype == np.dtype('int64'), f"lang_known column is not int64"

### Exercise 1.6: Read in the Victoria language Data
rubric={autograde:5}

In [17]:
url = "https://github.com/ttimbers/canlang/raw/master/inst/extdata/victoria_lang.tsv" # SOLUTION
victoria = pd.read_csv(url, sep = '\t')
victoria

Unnamed: 0,category,language,mother_tongue,most_at_home,most_at_work,lang_known
0,Aboriginal languages,"Aboriginal languages, n.o.s.",10,0,0,25
1,Non-Official & Non-Aboriginal languages,Afrikaans,175,50,0,580
2,Non-Official & Non-Aboriginal languages,"Afro-Asiatic languages, n.i.e.",0,0,0,20
...,...,...,...,...,...,...
211,Non-Official & Non-Aboriginal languages,Wu (Shanghainese),125,65,0,135
212,Non-Official & Non-Aboriginal languages,Yiddish,35,0,0,55
213,Non-Official & Non-Aboriginal languages,Yoruba,20,0,0,90


In [18]:
# TEST

# Checking if Data Frame is defined
assert victoria is not None, "The Data Frame victoria has not been defined"

# Check if victoria is a DataFrame
assert isinstance(victoria, pd.core.frame.DataFrame), "victoria is not a pandas Data Frame"

In [19]:
# HIDDEN TEST

nrows = 214
ncols = 6
assert victoria.shape == (nrows,ncols), "Data Frame has incorrect shape"

msum = 357030
assert sum(victoria['mother_tongue']) == msum

# Checking column names
expected_columns = ['category', 'language', 'mother_tongue', 'most_at_home', 'most_at_work', 'lang_known']
assert list(victoria.columns) == expected_columns, f"Column names are incorrect. Expected: {expected_columns}, but got: {list(victoria.columns)}."

# Check dtype for 'lang_known'
assert victoria['lang_known'].dtype == np.dtype('int64'), f"lang_known column is not int64"

## Exercise 2: Basic Data Wrangling

rubric={autograde:10}

Read the file `region_lang.csv` (located in the `data` directory of this repo) into a pandas data frame. We will use this data frame to uncover the name of the Canadian census metropolitan area which has the second greatest number of people who claim that the language they speak most often at home is **Spanish**. Return the region name as a string and assign this string to a variable named `spanish2`.

In [3]:
# BEGIN SOLUTION
region_lang = pd.read_csv("data/region_lang.csv", usecols = range(1,8))
spanish = region_lang[region_lang["language"] == "Spanish"].sort_values(by = 'most_at_home', ascending = False)
spanish2 = spanish.iloc[1]['region']
# END SOLUTION

In [21]:
spanish2

'Toronto'

In [22]:
# TEST

assert spanish2 is not None, "The variable spanish2 has not been assigned a value."

assert isinstance(spanish2, str), "The variable spanish2 is not a string."

In [23]:
# HIDDEN TEST

assert spanish2 == 'Toronto'

## Exercise 3: More Data Wrangling

rubric={accuracy:20}

For this exercise, we want you to choose a Canadian census metropolitan area from the `region_lang` data set you encountered in the previous question and find the top 5 languages spoken most often at home from that area. Your final result should be a data frame with two columns: 1. `language` 2. `perc_pop`.

The column perc_pop should be the percentage of the area’s population who reported that they speak that language most often at home. You can find the population size for each Canadian census metropolitan area in the file `region_data.csv` located in the `data` directory of this repo.

In [24]:
# BEGIN SOLUTION
region_pop = pd.read_csv("data/region_data.csv", usecols = ["region", "population"])
ottawa_pop = (region_pop[region_pop["region"] == "Ottawa - Gatineau"].iloc[0])['population']; ottawa_pop

ottawa_top5_langs = (
          region_lang[region_lang["region"] == "Ottawa - Gatineau"]
          .sort_values(by = 'most_at_home', ascending = False)
          .iloc[:5]
          [['language', 'most_at_home']]
         )

ottawa_top5_langs['perc_pop'] = (ottawa_top5_langs['most_at_home']*100/ottawa_pop).round(2)
ottawa_top5_langs.drop(columns = 'most_at_home', inplace = True)

ottawa_top5_langs
# END SOLUTION

Unnamed: 0,language,perc_pop
1920,English,58.03
2095,French,26.7
345,Arabic,1.73
4125,Mandarin,1.1
6260,Spanish,0.71


## Exercise 4: Tidying Data

rubric={autograde:10}

Let’s load a data set that is not tidy, because it is too wide for the statistical question being asked, and then use pandas to tidy it.

This next data set that we will be looking at contains environmental data from 1914 to 2018. The data was collected by the DFO (Canada’s Department of Fisheries and Oceans) at the Pacific Biological Station (Departure Bay). Daily sea surface temperatures were recorded. Original data source: http://www.pac.dfo-mpo.gc.ca/science/oceans/data-donnees/lightstations-phares/index-eng.html

A statistical question we might be interested in answering with this data set is, has sea surface temperature been changing over time, and is there an association between time of year (i.e., month) and this change over time? Read the `departure_bay_temperature.csv` data set in from the `data` directory and decide what tidying you will have to do, and then get to work and tidy it!

Assign the the tidy data frame you create to the variable `tidy_temps`. Set the second column name to be `month` & third column name to be `temp`.

In [25]:
# BEGIN SOLUTION
untidy_temps = pd.read_csv("data/departure_bay_temperature.csv", skiprows = 2)
tidy_temps = pd.melt(untidy_temps, id_vars = "Year",
                     var_name = "month", value_name = "temp")
# END SOLUTION

In [26]:
tidy_temps

Unnamed: 0,Year,month,temp
0,1914,Jan,7.2
1,1915,Jan,5.6
2,1916,Jan,1.2
...,...,...,...
1257,2016,Dec,5.5
1258,2017,Dec,6.9
1259,2018,Dec,


In [27]:
# TEST

# Check the data frame is defined
assert tidy_temps is not None, "Data Frame tidy_temps has not been defined."

# Check that tidy_temps is a Data Frame
assert isinstance(tidy_temps, pd.core.frame.DataFrame), "tidy_temps is not a Data Frame"

In [28]:
# HIDDEN TEST

nrows = 1260
ncols = 3
assert tidy_temps.shape == (nrows,ncols), "tidy_temps has incorrect shape"

# Check column names
expect = 'monthtempyear'
actual = "".join(sorted([c.lower() for c in list(tidy_temps.columns)]))

assert expect == actual, "Column names not as expected"

# Sum of temperatures
expected_sum = 10508
actual_sum = round(  sum(tidy_temps['temp'].dropna())   )   
tolerance = 3

assert np.abs(expected_sum - actual_sum) <= tolerance

### Reward: Visualizing the data

Let’s take a look and see whether sea surface temperature been changing over time at Departure Bay, BC. Given that time of year is a factor that influences temperature, we’ll plot this for each month separately:

In [29]:
alt.Chart(tidy_temps).mark_point().encode(
    alt.X('Year:N', axis = alt.Axis(labels=False, ticks = False), title = 'Year', ),
    alt.Y('temp:Q', title= 'Temperature')
).properties(
    width=200,
    height=200
).facet(alt.Facet('month:N', sort = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']),
    columns = 4
).interactive()

## Exercise 5: More Tidying

rubric = {autograde:10}

Use one of the `pandas` functions to tidy the data that you will load in from the `language_diversity.csv` file located in the data directory. This data was collected to answer research questions, such as what factors are associated with language diversity (as measured by the number of languages spoken in a country). Read in the `language_diversity.csv` data set and decide what tidying you will have to do, and then get to work and tidy it! Assign the tidy data frame you create to the variable `tidy_lang`.

In [30]:
# BEGIN SOLUTION
untidy_lang = pd.read_csv("data/language_diversity.csv", sep = '\t')
tidy_lang = pd.pivot(untidy_lang,
                      columns = "Measurement", values = "Value",
                      index = ["Continent","Country"])
tidy_lang.reset_index(inplace = True);
tidy_lang.columns.name = None;
# END SOLUTION

In [31]:
tidy_lang

Unnamed: 0,Continent,Country,Area,Langs,MGS,Population,Stations,Std
0,Africa,Algeria,2381741.0,18.0,6.60,25660.0,102.0,2.29
1,Africa,Angola,1246700.0,42.0,6.22,10303.0,50.0,1.87
2,Africa,Benin,112622.0,52.0,7.14,4889.0,7.0,0.99
...,...,...,...,...,...,...,...,...
71,Oceania,Papua New Guinea,462840.0,862.0,10.88,3772.0,8.0,1.96
72,Oceania,Solomon Islands,28896.0,66.0,12.00,3301.0,1.0,0.00
73,Oceania,Vanuatu,12189.0,111.0,12.00,163.0,4.0,0.00


In [32]:
# TEST

assert tidy_lang is not None, "Data Frame tidy_lang has not been defined"

assert isinstance(tidy_lang, pd.core.frame.DataFrame), "tidy_lang is not a Data Frame"

In [33]:
# HIDDEN TEST

nrows = 74
ncols = 8
assert tidy_lang.shape == (nrows,ncols), f"tidy_lang has incorrect shape. Expected (74,8), got: {tidy_lang.shape}"

# Check column names
expectname = 'areacontinentcountrylangsmgspopulationstationsstd'
actualname = "".join(sorted([c.lower() for c in list(tidy_lang.columns)]))

assert actualname == expectname, "Column names not as expected"

# Check sum of Langs column

expectedlangs = 6640.0
actuallangs = sum(tidy_lang['Langs'].astype('float64'))

tolerance = 2

assert np.abs(expectedlangs - actuallangs) <= tolerance

### Let's plot!

Now that we have this data in a tidy format, let’s explore it and plot the number of languages spoken in each country in the data set against the country’s population:

In [34]:
alt.Chart(tidy_lang).mark_point().encode(
    x=alt.X('Population').scale(type="log"),
    y=alt.Y('Langs').scale(type="log"),
    color='Continent:N',
    shape='Continent:N',
).interactive()

## Exercise 6 (Challenging)

rubric = {accuracy:5}

(This exercise may be more time consuming than the previous ones. Attempt it only if you finish the previous questions early and want a bit more of a challenge.)

The file `data/beach_data.xlsx` contains data from the Narrabeen beach survey program in Sydney, Australia. The survey program started in the 1970's and has continued to the present day. The survey program is aimed to measure the width of the beach every few weeks. There are five locations along the beach for which measurements are made, from location 1 at the northern end of the beach, to location 5 at the southern end. All the data is available [here](http://narrabeen.wrl.unsw.edu.au/explore_data/time_series/).

Your tasks:

* Determine the largest absolute deviation in width for each beach location in 2010, relative to the mean beach width at that location across all time.
* Determine the standard deviation in width for each beach location in 2010.
* Present the results in a single Data Frame and sort it in descending order of maximum absolute deviation. For example, the corresponding data frame for the year 2011 would look like this:


| Location  | Abs Max | Std |
| --- | --- | --- |
|  3 | 35.805778 | 13.258404 |
|  4 | 31.611717 | 9.892066 |
|  5 | 26.424559 | 7.463615 |
|  2 | 25.652488 | 8.867712 |
|  1 | 23.943018 | 8.244922 |

In [35]:
# BEGIN SOLUTION
beach_df = pd.read_excel('data/beach_data.xlsx', sheet_name = 'Data')
# Subtract the mean of each location to get a DataFrame of deviations from the mean
beach_df.iloc[:,-5:] -= beach_df.iloc[:,-5:].mean()
# Look only at dates from 2010
filter_df = beach_df[beach_df['Year']==2010]
# Ignore the columns for day/month/year
filter_df = filter_df.drop(labels=["Day", "Month", "Year"], axis = 1)
# Make a Data Frame of maximum absolute deviations
ans_df = pd.DataFrame(filter_df.abs().max(), columns=["Abs Max"])
# Add a column for the standard deviations
ans_df["Std"] = pd.DataFrame(filter_df.std())
# Sort by descending order for Abs Max
ans_df.sort_values(by="Abs Max", ascending= False, inplace=True)
# END SOLUTION

**Congratulations!** You are done the lab!!! Pat yourself on the back, convert the notebook to PDF and submit your lab to **GitHub** and Gradescope! Make sure you have 3 Git commits!