# Coding Discussion 02 - Week 3
## 13 September 2020
##### Kryslette Bunyi

***
## Instructions
In the repository, I have provided a dataset from the New York Times on the number of COVID cases and deaths by day, from January 21 through September 2. Note that states do not appear in a dataset prior to the days in which they had their first confirmed case.

The chunk of code below imports the data for this assignment. We'll talk more on Week 5 regarding what this code is doing.

In [1]:
## Read in the data (we would provide this)
import csv
with open('us-states.csv') as file:
        state_covid_data = []
        for row in csv.reader(file):
            state_covid_data.append(row)

len(state_covid_data)

10080

Note that the data is imported as a nested list. Below I've sliced the list to print off the first 5 rows of the data. Note that the first row contains the variable names.

In [2]:
state_covid_data[:5]

[['date', 'state', 'fips', 'cases', 'deaths'],
 ['2020-01-21', 'Washington', '53', '1', '0'],
 ['2020-01-22', 'Washington', '53', '1', '0'],
 ['2020-01-23', 'Washington', '53', '1', '0'],
 ['2020-01-24', 'Illinois', '17', '1', '0']]

***
## Questions

### (1) Count up the number of _unique_ dates in the data. 

In [3]:
# Initialize an empty list that will contain the dates in the data
dates=[]

# Iterate through the rows of the dataset to compile the dates.
for row in range(1,len(state_covid_data)): # Skip the 1st row since it's a header
    dates.append(state_covid_data[row][0]) # Extract the date and append it to our list

# Generate a list containing unique date occurrences (set) and print out the number of elements (len).
print("There are",len(set(dates)),"unique dates in the data.")

There are 225 unique dates in the data.


### (2) Find the first date in which the District of Columbia recorded a case. 

In [4]:
## Preparation Part 1 of 2
# Check the state data for the exact name format of the District of Columbia.

states = [i[1] for i in state_covid_data]
print("States covered in the dataset:\n", set(states))

States covered in the dataset:
 {'Tennessee', 'District of Columbia', 'Hawaii', 'Minnesota', 'Massachusetts', 'Idaho', 'Virginia', 'Kansas', 'Alaska', 'Rhode Island', 'North Carolina', 'Guam', 'California', 'state', 'North Dakota', 'Wyoming', 'Texas', 'Montana', 'Arizona', 'New Jersey', 'Vermont', 'Utah', 'Illinois', 'New York', 'Connecticut', 'Mississippi', 'Missouri', 'Puerto Rico', 'West Virginia', 'Virgin Islands', 'Washington', 'Maine', 'Maryland', 'Oklahoma', 'Iowa', 'Arkansas', 'Louisiana', 'South Carolina', 'Pennsylvania', 'Northern Mariana Islands', 'Georgia', 'New Hampshire', 'Michigan', 'South Dakota', 'Florida', 'Indiana', 'Ohio', 'Delaware', 'Wisconsin', 'Colorado', 'Nevada', 'Kentucky', 'Oregon', 'Alabama', 'Nebraska', 'New Mexico'}


In [5]:
# The above result confirms that correct name format is "District of Columbia".

## Preparation Part 2 of 2
# Check if the dates were encoded in ascending order (i.e., oldest to newest).

set(dates)

{'2020-01-21',
 '2020-01-22',
 '2020-01-23',
 '2020-01-24',
 '2020-01-25',
 '2020-01-26',
 '2020-01-27',
 '2020-01-28',
 '2020-01-29',
 '2020-01-30',
 '2020-01-31',
 '2020-02-01',
 '2020-02-02',
 '2020-02-03',
 '2020-02-04',
 '2020-02-05',
 '2020-02-06',
 '2020-02-07',
 '2020-02-08',
 '2020-02-09',
 '2020-02-10',
 '2020-02-11',
 '2020-02-12',
 '2020-02-13',
 '2020-02-14',
 '2020-02-15',
 '2020-02-16',
 '2020-02-17',
 '2020-02-18',
 '2020-02-19',
 '2020-02-20',
 '2020-02-21',
 '2020-02-22',
 '2020-02-23',
 '2020-02-24',
 '2020-02-25',
 '2020-02-26',
 '2020-02-27',
 '2020-02-28',
 '2020-02-29',
 '2020-03-01',
 '2020-03-02',
 '2020-03-03',
 '2020-03-04',
 '2020-03-05',
 '2020-03-06',
 '2020-03-07',
 '2020-03-08',
 '2020-03-09',
 '2020-03-10',
 '2020-03-11',
 '2020-03-12',
 '2020-03-13',
 '2020-03-14',
 '2020-03-15',
 '2020-03-16',
 '2020-03-17',
 '2020-03-18',
 '2020-03-19',
 '2020-03-20',
 '2020-03-21',
 '2020-03-22',
 '2020-03-23',
 '2020-03-24',
 '2020-03-25',
 '2020-03-26',
 '2020-03-

In [6]:
# The above result confirms that dates were encoded in ascending order.
# Thus, the first row containing DC in the dataset would correspond to the first recorded case.
# We now proceed to determining the date of DC's first recorded case.

# Iterate through the rows until data for DC appears
for row in range(1,len(state_covid_data)): #skip the 1st row since it's a header
    if state_covid_data[row][1] == "District of Columbia":
        print("The first case in the District of Columbia was noted on", state_covid_data[row][0],".")
        break

The first case in the District of Columbia was noted on 2020-03-07 .


### (3) Write a function that takes in a _state name_ as input (e.g. "Wisconsin") and outputs the date of its first case.

In [7]:
def date_first_case(state):
    
    """This is a function that determines the date of the first recorded case in a state.

    Arg:
        state (str): object of class string corresponding to the state name.

    Returns (Prints):
        str: object of class string containing the date of the first recorded case, if state name is valid.
                Otherwise, an error message is printed.

    """
    
    # Initialize an empty string that will contain the date of the first recorded case.
    datefirstcase = ""
    
    # Iterate through the rows until data for the state appears
    for row in range(1,len(state_covid_data)): #Skip the 1st row since it's a header
        if state_covid_data[row][1] == state:
            datefirstcase = state_covid_data[row][0] #The string takes on the date indicated in the pertinent row
            print("The first case in",state, "was noted on", datefirstcase,".")
            break
    if datefirstcase == "": #The string remains empty if no data was found on the state.
        print("Either no case has been recorded in",state,"or you entered an invalid name format.")

        
        
# Testing of the function
date_first_case("Wisconsin")
date_first_case("California")
date_first_case("District of Columbia")
date_first_case("Northern Mariana Islands")
date_first_case("DC")

The first case in Wisconsin was noted on 2020-02-05 .
The first case in California was noted on 2020-01-25 .
The first case in District of Columbia was noted on 2020-03-07 .
The first case in Northern Mariana Islands was noted on 2020-03-28 .
Either no case has been recorded in DC or you entered an invalid name format.


### (Optional) Bonus

Write a function that takes in a _state name_ as input (e.g. "Wisconsin") and outputs the date when the number of reported cases within the state exceeded 1000.

In [8]:
def date_exceeded_1000_cases(state):
    
    """This is a function that determines the date when the number of recorded cases in a state exceeded 1,000.

    Arg:
        state (str): object of class string corresponding to the state name.

    Returns (Prints):
        str: object of class string containing the date when the state's cases exceeded 1,000.
                If the state's cases have not (yet) reached 1,000, the corresponding message will be printed.
                Otherwise, if no data is found on the state, an error message will be printed.

    """
    
    # Initialize variables
    dateexceeded1000 = ""
    statecounter=0
    
    # Iterate through the rows until the number of cases exceeds 1,000.
    # At the same time, note if the state appears on the dataset.
    for row in range(1,len(state_covid_data)): #Skip the 1st row since it's a header
        if state_covid_data[row][1] == state:
            statecounter+=1 #Add 1 to the statecounter if case data is found on the state.
            if int(state_covid_data[row][3]) > 1000:
                dateexceeded1000 = state_covid_data[row][0] #Update the string to reflect date when 1,000 mark was breached
                print("Reported cases in",state, "exceeded 1,000 on", dateexceeded1000, ".")
                break
    if statecounter==0: #If we finish iterating through the rows and find zero case data on the state, print error message.
        print("Either no case has been recorded in",state,"or you entered an invalid name format.")
    elif dateexceeded1000 == "": #If there is/are data on the state but the date string remains empty, print message that state input was valid but cases have not yet breached 1,000.
        print("Reported cases in",state, "have not yet exceeded 1,000.")

        
        
# Testing of the function
date_exceeded_1000_cases("Wisconsin")
date_exceeded_1000_cases("California")
date_exceeded_1000_cases("District of Columbia")
date_exceeded_1000_cases("Northern Mariana Islands")
date_exceeded_1000_cases("DC")

Reported cases in Wisconsin exceeded 1,000 on 2020-03-28 .
Reported cases in California exceeded 1,000 on 2020-03-19 .
Reported cases in District of Columbia exceeded 1,000 on 2020-04-06 .
Reported cases in Northern Mariana Islands have not yet exceeded 1,000.
Either no case has been recorded in DC or you entered an invalid name format.
