In [2]:
## Please make sure "project.py" is in your "lab5" folder.
## We will use the same API for both the lab and the project
import project
import datetime

----------------------------------
## Segment 2: Learning the API
### Task 2.1: Examine the `hurricanes` CSV file

The `project.py` file will allow you to access the dataset you'll use this week, `hurricanes.csv`. Start by looking at the hurricane dataset [here](https://github.com/msyamkumar/cs220-s22-projects/blob/main/lab-p5/hurricanes.csv) pulled from the [List of United States hurricanes](https://en.wikipedia.org/wiki/List_of_United_States_hurricanes) on Wikipedia.

Look at a hurricane in the dataset, such as hurricane Baker, and briefly familiarize with each of the columns. The data shows:
* name
* the date of formation
* the date of dissipation
* max wind speed (in MPH)
* damage (in US dollars)
* deaths

Often, we'll organize data by assigning numbers (called indexes) to different parts of the data (e.g., rows or columns in a table). In Computer Science, indexing typically starts with the number `0`; i.e., when you have a sequence of things, you'll start counting them from `0` instead of `1`. Thus, you should **ignore the numbers shown by GitHub to the left of the rows**. From the perspective of `project.py`, the indexes of Baker, Camille, and Eloise are 0, 1, and 2 respectively (and so on).

For example, consider this example from `hurricanes.csv`:

<img src="https://github.com/msyamkumar/cs220-s22-projects/raw/main/lab-p5/table.png" width="240" alt="Hurricanes outlined with position and name: 1: Baker, 2: Camille, 3: Eloise, 4: Frederic, 5: Elena">

The **index** for the Hurricane Eloise is 2 but its actual **location** is 3.
Therefore, you must follow this convention for all the questions
asking for the value at a particular index.


### Task 2.2: Explore the API
Use the inspection process we learned in Lab-P3 and Lab-P4 to know more details of the 'project' API. In lab-p4, we saw how to use `dir`, and `help` to learn the API. Run the following in cells to explore the API:

In [3]:
dir(project)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__hurricane__',
 '__init__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'count',
 'get_damage',
 'get_deaths',
 'get_dissipated',
 'get_formed',
 'get_mph',
 'get_name']

Spend some time reading about each of the six functions that don't begin with two underscores. For example, run this to learn about `count`:

In [15]:
help(project.count)
help(project.get_deaths)
help(project.get_formed)
help(project.get_mph)
help(project.get_dissipated)
help(project.get_name)
help(project.get_damage)

Help on function count in module project:

count()
    This function will return the number of records in the dataset

Help on function get_deaths in module project:

get_deaths(idx)
    get_deaths(idx) returns the deaths of the hurricane in row idx

Help on function get_formed in module project:

get_formed(idx)
    get_formed(idx) returns the date of formation of the hurricane in row idx

Help on function get_mph in module project:

get_mph(idx)
    get_mph(idx) returns the mph of the hurricane in row idx

Help on function get_dissipated in module project:

get_dissipated(idx)
    get_dissipated(idx) returns the date of dissipation of the hurricane in row idx

Help on function get_name in module project:

get_name(idx)
    get_name(idx) returns the name of the hurricane in row idx

Help on function get_damage in module project:

get_damage(idx)
    get_damage(idx) returns the damage in dollars of the hurricane in row idx



or alternatively, you could run the following to just see the function's documentation:

In [6]:
print(project.count.__doc__)

This function will return the number of records in the dataset


You may also open up the `project.py` file directly to learn about the functions provided. E.g., you might see this:

```python
def count():
    """This function will return the number of records in the dataset"""
    return len(__hurricane__)
```

You don't need to understand the code in the functions, but the strings in triple quotes (called *docstrings*) explain what each function does. As it turns out, all `project.count.__doc__` is providing you the docstring for the `count` function.

Try to learn other functions in `project.py`, by using `help` function. For example, you may try: 


In [7]:
help(project.get_name)

Help on function get_name in module project:

get_name(idx)
    get_name(idx) returns the name of the hurricane in row idx



In [None]:
# TODO: Try getting help for each of the functions.

Complete the following TODOs, and check your results against what you see in `hurricanes.csv`.

**Remember:** In Computer Science, we start indexing at 0. GitHub will start indexing at 2 (the row following the header row), ignore this.

In [8]:
# Get the name of the hurricane at index 0.
# This one is done for you.
project.get_name(0)

'Baker'

In [9]:
# TODO: Get the name of the hurricane at index 1.
# Your answer should be 'Camille'. Verify this in the CSV as well.

project.get_name(1)

'Camille'

In [11]:
# Get the wind speed of the hurricane at index 2.
# This one is done for you.
project.get_mph(2)

125

In [12]:
# TODO: Get the wind speed of the hurricane at index 7.

project.get_mph(7)


165

In [13]:
# TODO: Get the damage of the hurricane at index 5.
project.get_damage(5)

'4.7B'

Notice that the damage amount ends a "B". In this dataset, "K" represents one thousand, "M" represents one million, and "B" represents one billion. For P5, you'll need to convert these strings to the appropriate ints (e.g., `"1.5K"` will become `1500`, `"2.55M"` will become `2550000`).

In [104]:
# Get the name of the hurricane at the end of the dataset.
# This one is done for you.
project.get_name(project.count() - 1)



142


In [18]:
# Try getting the name at project.count() instead. What happens? Why?
project.get_name(project.count())

IndexError: list index out of range

----------------------------------
## Segment 3: Working with strings

### Task 3.1: Indexing / slicing Strings

Stepping back from the Hurricane data, Tasks 3.1 and 3.2 introduce us to performing operations with strings. While this will be covered in more detail during Friday's lecture, we will cover the essentials now.

We can think of a string as a sequence of characters. For example, the string `my_str = 'hello_world!'` can be written as...

| Index  | 0    | 1    | 2    | 3    | 4    | 5    | 6    | 7    | 8    | 9    | 10   | 11   |
| ------ | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| String | h    | e    | l    | l    | o    | _    | w    | o    | r    | l    | d    | !    |

... where we can then access specific characters of the string by an index, e.g. `my_str[0]` which returns `'h'` or `my_str[8]` which returns `'r'`.

Furthermore, we can "slice" strings -- that is, get a particular section of characters. For example,

- `my_str[1:5]` returns `'ello'`
- `my_str[:8]` returns `'hello_wo'`
- `my_str[5:]` returns `'_world!'`
- `my_str[:]` returns `'hello_world!'`

Try running this in the cell below.

In [52]:
my_str = 'hello_world!'
print("my_str[0] returns", my_str[0])
print("my_str[8] returns", my_str[8])
print("my_str[1:5] returns", my_str[1:5])
print("my_str[:8] returns", my_str[:8])
print("my_str[5:] returns", my_str[5:])
print("my_str[:] returns", my_str[:])

my_str[0] returns h
my_str[8] returns r
my_str[1:5] returns ello
my_str[:8] returns hello_wo
my_str[5:] returns _world!
my_str[:] returns hello_world!


Notice that slicing is *inclusive* on the lower bound and *exclusive* on the upper bound. We can also leave out a bound to start from the beginning (e.g. `my_str[:6]`) or the end (e.g. `my_str[8:]`). Lastly, a negative index will count *backwards* from the *end* of the string.

In [None]:
print("my_str[-1] returns", my_str[-1])
print("my_str[-4:-1] returns", my_str[-4:-1])

**Your Turn!** Try slicing the below phone number! Can you extract the area code (first 3 digits), exchange code (middle 3 digits), and line number (last 4 digits) of the given phone number?

In [None]:
phone_number = "608-555-1234"
area_code = ???
exchange_code = ???
line_number = ???
print("area_code:", area_code)
print("exchange_code:", exchange_code)
print("line_number:", line_number)

In [None]:
# TODO: use slicing to extract just the last digit of the phone number
last_digit = ???
print("last digit:", last_digit)

### Task 3.2 Case-Sensitivity

Other helpful string functions include `upper` and `lower`. `upper` converts a string to all UPPERCASE letters, while `lower` converts a string to all lowercase letters.

In [24]:
print('helLO wOrLd'.upper())
print('helLO wOrLd'.lower())

HELLO WORLD
hello world


If we want to see if the user typed in `cs220`, we should also accept `cS220`, `Cs220`, and `CS220`. We can use `upper` or `lower` to do that! 

In [None]:
my_class = input("What class are you in? ")
if my_class.upper() == "CS220": # notice we compare this to an uppercase CS220
    print("Right on!")
else:
    print("That must be some other class...")

**Your Turn!** Ask the user to type in their name. If their name matches your name, tell them so!  **This should be case-insensitive.**

In [28]:
my_name = input("What's your name? ")
if my_name.lower() == "Naved": # TODO: Check if they typed in your name! Use all lowercase.
    print("We share names!")
else:
    print("Oh well...")

What's your name? naved
Oh well...


### Task 3.3: Calculating Damage Costs

Task 2.2 showed us that damage costs are calculated in thousands, millions, and billions. It would be helpful to have code that converts this string into an integer.

We can slice off the *last* character by using the index `:-1` (that is the entire string *up to* the last character).

Complete the code to print what the cost ends in (e.g. `K`, `M`, or `B`).

In [31]:
my_cost = "3.3B"
print("Cost Amount:", my_cost[:-1])
print("Ending In:  ", "B")

Cost Amount: 3.3
Ending In:   B


### Task 3.4: Extracting from a Date

Run the below cell which prints the formation and dissipation date of the first hurricane.

In [33]:
print(project.get_formed(0))
print(project.get_dissipated(0))

08/18/1950
09/01/1950


The dates are represented as a string in `mm/dd/yyyy` notation. Two digits are used to represent the month and day even when they can be represented with a single digit, that is, `'9/1/1950'` is represented as `'09/01/1950'`.

To extract the month, we could run the following code...

In [34]:
project.get_formed(0)[:2]

'08'

Notice, however, that this is the *string* `'08'`.

Write the code to get this as the *int* (e.g. `8`).

In [38]:
# TODO: Get the month of the first hurricane as an integer.
int(project.get_formed(0)[:2])


8

### Task 3.5: Helper Functions for Month, Day, and Year

The below functions will be useful in p5. Complete the TODOs for getting the month, day, and year as an int.

In [77]:
def get_month(date):
    """Returns the month when the date is the in the 'mm/dd/yyyy' format"""
    return int(date[0:2])

def get_day(date):
    """Returns the day when the date is the in the 'mm/dd/yyyy' format"""
    return int(date[3:5])

def get_year(date):
    """Returns the year when the date is the in the 'mm/dd/yyyy' format"""
    return int(date[6:10])

Write some test cases (e.g., `get_year("10/02/2022")`) to check if your functions are correct.

In [84]:
# TODO Write a test case for get_month

get_day("22/12/2020")

# TODO Write a test case for get_day



# TODO Write a test case for get_year
#get_year("05/10/2022")

12

### Task 3.6: Using Helper Functions

Using the helper functions you made above, complete the following...

**Hint:** You'll use these helper functions in combination with `project.get_formed(idx)` and `project.get_dissipated(idx)`!

Print the *day* that the hurricane at index `10` *formed*.

In [86]:
# TODO: Print the day that the hurricane at index 10 formed.
#       This should be 7
get_day(project.get_formed(10))

7

Print the *year* that the hurricane at index `7` *formed*.

In [89]:
# TODO: Print the year that the hurricane at index 7 formed.
#       This should be 2004

get_year(project.get_formed(7))

2004

Print the *month* that the hurricane at index `2` *dissipated*.

In [90]:
# TODO: Print the month that the hurricane at index 2 dissipated.
#       This should be 9

get_month(project.get_formed(2))

9

----------------------------------
## Segment 4: Looping

### Task 4.1: `while` and `for` loops

Run the below code and observe the output.

In [19]:
i = 0
while i < 5:
    print(i)
    i += 1

0
1
2
3
4


Equivalently, we can use `for` and `range(n)`. The `range(n)` function returns a sequence of numbers, from `0` to `n` but not including `n`.

In [20]:
for i in range(5):
    print(i)

0
1
2
3
4


Now, write the code that will print the numbers from 0 to 25 *inclusive* as both a `while` and `for` loop.

In [21]:
# TODO Write a while loop that prints the numbers from 0 to 25 inclusive.
 
j = 0 
while j < 26:
    print(j)
    j += 1

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


In [23]:
# TODO Write a for loop that prints the numbers from 0 to 25 inclusive.

for i in range(25):
    print(i)


0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


### Task 4.2: Print Hurricane Data

Print the index, name, and wind speed of each hurricane. Your output should show all the entries in the dataset.

In [42]:
# range takes in a number. We want to iterate over all entries.
# how do we get the number of entries without hardcoding it?
for idx in range(project.count()):
    name = project.get_name(idx)
    wind_speed = project.get_mph(idx)
    print(idx, name, wind_speed, sep='\t')

0	Baker	105
1	Camille	175
2	Eloise	125
3	Frederic	130
4	Elena	125
5	Opal	150
6	Danny	80
7	Ivan	165
8	Dennis	150
9	Katrina	175
10	Michael	160
11	SaLLy	110
12	Zeta	115
13	Carol	115
14	Donna	145
15	Gloria	145
16	Bob	115
17	1893 Sea Islands Hurricane	120
18	1919 Florida Keys Hurricane	150
19	Tampa Bay Hurricane of 1921	140
20	Nassau Hurricane of 1926	140
21	1926 Miami hurricane	150
22	Labor Day	185
23	Easy	120
24	King	130
25	Florence	115
26	Flossy	90
27	Cleo	150
28	Dora	130
29	Isbell	115
30	Betsy	140
31	Alma	115
32	Inez	165
33	Gladys	100
34	Agnes	85
35	David	175
36	Kate	120
37	Floyd	75
38	Andrew	175
39	Erin	100
40	Earl	100
41	Georges	155
42	Irene	110
43	Charley	150
44	Frances	145
45	Jeanne	120
46	Rita	180
47	Wilma	185
48	Hermine	80
49	Matthew	165
50	Irma	180
51	Ethel	115
52	Hilda	140
53	Edith	160
54	Carmen	150
55	Babe	75
56	Bob	75
57	Danny	90
58	Juan	85
59	Florence	80
60	Lili	145
61	Cindy	75
62	Humberto	90
63	Gustav	155
64	Isaac	80
65	Nate	90
66	Barry	75
67	Delta	140
68	Ida	150
69	Belle	12

### Task 4.3: Filter Hurricanes by Speed

Print the names of all hurricanes with a speed under 80mph. There are 8 such hurricanes.

In [51]:
# TODO: Print the names of all hurricanes with a speed under 80mph.

for idx in range(project.count()):
    name = project.get_name(idx)
    speed = project.get_mph(idx)
    if speed < 80:
        print(name)

Floyd
Babe
Bob
Cindy
Barry
Cindy
Bob
Gaston


### Task 4.4: Filter Hurricanes by Deaths

Print the names of all hurricanes with over 1000 deaths. There are 5 such hurricanes.

In [96]:
# Print the names of all hurricanes with over 1000 deaths.

for idx in range(project.count()):
    name_hurricanes = project.get_name(idx)
    death_number = project.get_deaths(idx)
    if death_number > 1000:
        print (name_hurricanes)



Katrina
David
Jeanne
San Ciriaco
Maria


### Task 4.5: Filter Hurricanes by Name

Print the names of all hurricanes that start with letter "D".

In [54]:
# Print the names of all hurricanes that start with letter "D". There are 12 such hurricanes, counting repeats.

for idx in range(project.count()):
    name = project.get_name(idx)
    if name[0] == "D":
        print (name)


Danny
Dennis
Donna
Dora
David
Danny
Delta
Diana
Debra
Dolly
Dot
Dolphin


### Task 4.6: Find the Fastest Hurricane

Print the name of the hurricane which has the fastest wind speed.

*Special Note*: `None` is a Python keyword which denotes *nothing*. At the beginning of this loop, by saying `fastest_hurr_idx = None`, we make no assumptions about what the fastest hurricane is. Inside the loop, if the `fastest_hurr_idx` is `None`, we know that is our first (and currently fastest) hurricane.

In [55]:
fastest_hurr_idx = None
max_speed = 0
for idx in range(project.count()):
    current_speed = project.get_mph(idx)
    if fastest_hurr_idx == None or current_speed > max_speed:
        max_speed = current_speed
        fastest_hurr_idx = idx

if fastest_hurr_idx != None:
    print(project.get_name(fastest_hurr_idx), 'had the fastest speed of', max_speed)

Allen had the fastest speed of 190


### Task 4.7: Find the Slowest Hurricane

Print the name of the hurricane which has the slowest wind speed.

In [None]:
slowest_hurr_idx = None
min_speed = 0
for idx in range(project.count()):
    current_speed = project.get_mph(idx)
    if  or ???:
        min_speed = ???
        slowest_hurr_idx = ???

if slowest_hurr_idx != None:
    print(project.get_name(slowest_hurr_idx), 'had the slowest speed of', min_speed)

### Task 4.8: Print Hurricanes Between

Given `start_year` and `end_year`, print the names of all hurricanes that *were formed* in between (inclusive).

In [94]:
def print_hurricanes_between(start_year, end_year):
     for i in range(project.count()):
        # TODO: Check if the year the hurricane formed is in range.
        # HINT: use get_year to get the year of the current hurricane
        if start_year <= get_year(project.get_formed(i)) <= end_year:  
            print(project.get_name(i), "happened on", project.get_formed(i))

print_hurricanes_between(2017, 2021)

Michael happened on 10/07/2018
SaLLy happened on 09/11/2020
Zeta happened on 10/24/2020
Irma happened on 08/30/2017
Nate happened on 10/04/2017
Barry happened on 07/11/2019
Delta happened on 10/04/2020
Ida happened on 08/26/2021
Florence happened on 08/31/2018
Isaias happened on 07/30/2020
Harvey happened on 08/17/2017
Hanna happened on 07/23/2020
Mangkhut happened on 09/06/2018
Yutu happened on 10/21/2018
Maria happened on 09/16/2017


----------------------------------
## Segment 5: Working with the datetime module

The code below uses Python's [datetime module](https://docs.python.org/3/library/datetime.html), which will be used further in p5.

Execute the below function definition and its calls. It will calculate the number of days between 2 dates.

In [105]:
def get_number_of_days(start_date, end_date):
    """Gets the number of days between the start_date (in 'mm/dd/yyyy' format) and end_date 
    (in 'mm/dd/yyyy' format)"""
    # The second argument is a format string to tell the function how to process the date string
    day1 = datetime.datetime.strptime(start_date, '%m/%d/%Y') 
    day2 = datetime.datetime.strptime(end_date, '%m/%d/%Y')
    delta = day2 - day1
    return delta.days

In [106]:
print(get_number_of_days('02/21/2022', '02/23/2022'))
print(get_number_of_days('01/01/2021', '01/01/2022'))
print(get_number_of_days('04/20/2022', '08/12/2022'))

2
365
114


The function `get_number_of_days` uses the `datetime` module to calculate this for us with 2 steps:

1. Convert the dates into a datetime object (we'll talk about objects later in the semester) using `datetime.datetime.strptime`.

2. Subtract the objects `day2 - day1` and return the difference in days `delta.days`.

### Task 5.1: Calculating Hurricane Duration

We can calculate how long a hurricane lasts as the number of days between `project.get_formed(idx)` and `project.get_dissipated(idx)`. Complete the function to calculate this duration.

In [109]:
def get_hurricane_duration(hurricane_idx):
    # Calculate the duration between when the hurricane formed and dissipated.
    duration = get_number_of_days(project.get_formed(hurricane_idx), project.get_dissipated(hurricane_idx))
    return duration

Test your code using the below. Hurricane Karen should last 11 days and Hurricane Cindy should last 6 days.

In [110]:
# Hurricane Karen
hurricane1_idx = 118
hurricane1_name = project.get_name(hurricane1_idx)
hurricane1_duration = get_hurricane_duration(hurricane1_idx)
print(hurricane1_name, 'lasts', hurricane1_duration, 'days.')

Karen lasts 11 days.


In [111]:
# Hurricane Cindy
hurricane2_idx = 90
hurricane2_name = project.get_name(hurricane2_idx)
hurricane2_duration = get_hurricane_duration(hurricane2_idx)
print(hurricane2_name, 'lasts', hurricane2_duration, 'days.')

Cindy lasts 6 days.


### Task 5.2: Finding Hurricane with Longest Duration

Using an algorithm similar to Task 4.6 or 4.7, find the hurricane that has the longest duration.

In [125]:
# TODO: Use an algorithim similar to 4.6 or 4.7.
# HINT: Use the get_hurricane_duration function used in 5.1!
fastest_hurr_idx = None
max_speed = 0
for idx in range(project.count()):
    current_speed = project.get_mph(idx)
    if fastest_hurr_idx == None or current_speed > max_speed:
        max_speed = current_speed
        fastest_hurr_idx = idx

if fastest_hurr_idx != None:
    print(project.get_name(fastest_hurr_idx), 'had the fastest speed of', max_speed)

    
    
    
longest_hurr_idx = None
longest_duration = 0

for idx in range(project.count()):
    current_duration = get_hurricane_duration(idx)
    if longest_hurr_idx == None or current_duration > longest_duration:
        longest_duration = current_duration 
        longest_hurr_idx = idx
        
        
if longest_hurr_idx != None:
    print(project.get_name(longest_hurr_idx))



Allen had the fastest speed of 190
San Ciriaco


----------------------------------
You are now ready to work on [p5](https://github.com/msyamkumar/cs220-s22-projects/tree/main/p5)!
Remember to only work with p5 with your partner from this point on. Have fun!