<div class="alert alert-success" markdown="1">

#### Homework -1

# Example Homework

### EECS 398-003: Practical Data Science, Fall 2024

#### **No Due Date**
    
</div>

## Instructions

Welcome!

**This is not a real assignment**, and is not due. Instead, it exists just to make sure you're able to access and run Jupyter Notebooks locally, along with all of the packages necessary for this class. It also walks you through how your work is autograded, both locally in your notebook and on Gradescope.

To access this notebook, you'll need to clone our [public GitHub repository](https://github.com/practicaldsc/fa24/). The [⚙️ Environment Setup](https://practicaldsc.org/env-setup) page on the course website walks you through the necessary steps.

## Imports and Getting Started

**To start, run the cell below**, either by clicking it and hitting `SHIFT + ENTER` on your keyboard or the Play ▶️ button in the toolbar at the top of the page.

If it runs without an error, that's a great sign. All it's doing is importing several Python libraries and configuring our notebook so that it's ready to go for data analysis. If you'd like, you can open the file `lec_utils.py` – all of the code there is run when we run the cell below. 

(If it says "Matplotlib is building the font cache; this may take a moment." don't worry.)

In [None]:
from lec_utils import *

If everything was successful, you should see a `[1]` to the left of the cell above. This is telling us that the cell above is the first cell we've run so far. If you run the cell above again, the `[1]` will change to a `[2]`. Try it!

Now that all of our packages are loaded, we can use them!

Run the cell below to load in a dataset containing the latitude and longitude of every state's capital.

In [None]:
capitals = pd.read_csv('data/us-state-capitals.csv')
capitals.head()

If you see a table above with 5 rows and 4 columns, you're in good shape. The table above is called a DataFrame, which is `pandas`' name for tables. `pandas` is a popular Python library in the data science ecosystem for working with **tab**ular data (that is, data that looks like a **tab**le).

Note that the bolded values at the far left – that is, **0**, **1**, **2**, ..., – are **not part of a column**! Instead, the bolded values at the far left form the **index** of the DataFrame. We'll hear more about these in lecture.

Where's Michigan? Above, we ran `capitals.head()`, which is showing us just the head, or first 5 rows, of the DataFrame. We can find the row corresponding to Michigan by **querying** for it:

In [None]:
capitals[capitals['name'] == 'Michigan']

We can also do things like find the state whose capital is the furthest north:

In [None]:
capitals.sort_values('latitude', ascending=False)['name'].iloc[0]

Or furthest east:

In [None]:
capitals.sort_values('longitude')['name'].iloc[-1]

We can also create powerful visualizations, like a map of the 50 states with a circle at each capital:

In [None]:
fig = px.scatter_geo(capitals, lat='latitude', lon='longitude', hover_name='description')
fig.update_layout(geo_scope='usa', template='plotly', title='Locations of US State Capitals')

Cool! Note that the map is interactive, meaning you can hover over each dot to see the name of the capital.

## The Autograder

Like you may have seen in other programming classes, your work in this class will be autograded – that is, automatically graded by the computer. The Python module we'll be using for autograding is called [Otter Grader](https://otter-grader.readthedocs.io/en/latest/), and it was developed by Berkeley specifically for use in data science classes like ours.

Run the cell below to import `otter` and initialize it for this notebook. In most homeworks, this cell will be the very first cell in your notebook.

In [None]:
import otter
grader = otter.Notebook("example-hw.ipynb")

Let's work through a few example questions to get a feel for how it works.

### Question 1: Seconds in an Hour

Below, you should see a question, a place to write your answer, and another cell containing `grader.check('q1')`. Running this last cell will check your answer to the question. If it's wrong, you'll see an error message. Try putting in a really small number, like 15, just to see what happens.

Assign `seconds_in_an_hour` to the number of seconds in an hour.

In [None]:
seconds_in_an_hour = ...
seconds_in_an_hour

In [None]:
grader.check("q1")

### Question 2: Furthest South Capital

Unlike the question above, most questions we ask you will involve writing multiple lines of code. You're always free to define intermediate variables before the final answer, as long as your final answer is assigned to the correct variable name.

Typically, the tests that you have access to in your notebook only verify that your answer is of the right data type and on the right track. These tests **will not** guarantee that your answer is correct. We will run hidden tests on your code once you submit to Gradescope. Here's an example of how that may work.

Assign `state_capital_furthest_south` to the name of the state whose capital is furthest south. You **shouldn't** hard-code the answer – that is, don't type `'Texas'` if you think the answer is Texas – rather, you should use Python code to arrive at the answer. (Note that we already did something very similar right before we drew the map above – you can start with that code.) 

In [None]:
state_capital_furthest_south = ...
state_capital_furthest_south

In [None]:
grader.check("q2")

Notice above that no matter what state you give as your answer, `grader.check('q2')` tells you your answer passes all of the test cases. (Try putting `'Michigan'` as your answer and see what happens.) That's because we're only verifying that your answer is indeed a state, but not necessarily the correct state. When you submit this notebook to Gradescope, it'll verify that your answer is actually correct.

### Question 3: First Letter

Often, you'll be asked to define a function rather than a standalone variable. In such questions, our test cases will assess the behavior of your function on various inputs, including edge cases. Here's one such question.

Complete the implementation of the function `starts_with`, which takes in `letter`, a string of length one, and returns a list containing the names of all the states whose first letter is `letter`, in any order. If there are no states that begin with `letter`, return the string `'No such states'`. Example behavior is shown below.

```python
>>> starts_with('P')
['Pennsylvania']

>>> starts_with('A')
['Alabama', 'Alaska', 'Arizona', 'Arkansas']

>>> starts_with('X')
'No such states'
```

Note that you're not supposed to already know how to do this! Try if you'd like, and if you get stuck, click the box below to see the answer. You're welcome to create additional cells above the one below to experiment with code on the way to writing your solution.

<details>
    <summary><b><span style="background-color: #FFCB05; color: #00274C">Click me</span> to see the solution, which you can copy-paste into the function definition below.</summary>

```python
    names = capitals['name']
    names_starting = list(names[names.str[0] == letter])
    if len(names_starting) == 0:
        print('No such states')
    return names_starting
```

</details>

In [None]:
def starts_with(letter):
    ...

In [None]:
grader.check("q3")

## Finish Line 🏁

Congratulations! You're ready to submit the Example Homework. Again, this isn't required, but it's a good idea to walk through this before Homework 1 rolls around.

To submit your homework:

1. Select `Kernel -> Restart & Run All` to ensure that you have executed all cells, including the test cells.
2. Read through the notebook to make sure everything is fine and all tests passed.
3. Run the cell below to run all tests, and make sure that they all pass.
4. Download your notebook using `File -> Download as -> Notebook (.ipynb)`, then upload your notebook to Gradescope under "Example Homework (Not Due)".
5. Stick around while the Gradescope autograder grades your work. Make sure you see that all **public tests** have passed on Gradescope. **Remember that homeworks have hidden tests, which you will not see your scores on until a few days after the deadline!**
6. Check that you have a confirmation email from Gradescope and save it as proof of your submission.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()