## Procedural Overview
In order to make COVID-19 data more digestible, it is aggregated into data for each state. The .csv files created during this procedure display information about COVID-19 cases, deaths, and mask use. Python functions such as for loops, concatenation, and other methods from the Pandas software make this data manipulation and analysis possible. 

## Setting up the Data
1. Import pandas as an object, so that its methods can be used to manipulate the data
1. Import each of the datasets as its own object using pandas
    1. The datasets used in this example were found at [COVID-19 Data](https://github.com/nytimes/covid-19-data)
    1. The first csv, imported as "covid", is a csv titled "us-counties-2022.csv" and was downloaded from [here](https://github.com/nytimes/covid-19-data/blob/master/us-counties-2022.csv)
    1. The second csv, imported as "covid", is a csv titled "mask-use-by-county.csv" and was downloaded from [here](https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv)

In [100]:
import pandas as pd
covid = pd.read_csv("us-counties-2022.csv")
mask = pd.read_csv("mask-use-by-county.csv")

## Creating a Reference List of States
* Create an object called "states" that contains an alphabetical list of the 50 states as string objects
* This list will be used to help compile the data to be more digestable

In [31]:
states = ["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"]

## Creating a Compiled List of COVID Data by State
Create an empty dataframe using pandas that has the columns, "State", "Cases", and "Deaths"
- This will provide an empty space to aggregate the data we choose

In [32]:
combined = pd.DataFrame({"State":[], "Cases":[], "Deaths":[]})

Use a for loop to iterate through the list of states, adding a row to the dataframe for each iteration
1. To create the for loop, start by writing the line "for state in states:"
1. Afterwards, write the code for each iteration, making sure to indent each line included in the loop
    1. The first line of this section should create a dataframe object with three values, one for "State", one for "Cases", and one for "Deaths"
    1. "State" should be set to the state variable in the for loop, which will use each string from the reference list as the name of the state
    1. "Cases" should be the sum of all of the cases for a certain state. This can be accomplished by filtering the large covid dataset's "state" column for each state using the state variable created in the for loop and then using the .sum() function to add all of these cases together
    1. "Deaths" should be defined similarly to "Cases", through the filtering of the large dataset and the summation of each value in the deaths column
    1. After this line, the cumulative dataframe should be combined along the horizontal axis with the temporary dataframe created in the for loop. This should be accomplished using the pd.concat() function with the axis=0

In [None]:
for state in states:
    current = pd.DataFrame({"State":[state], "Cases":[covid[covid["state"]==state].cases.sum()], "Deaths":[covid[covid["state"]==state].deaths.sum()]})
    combined = pd.concat([combined, current],axis=0,ignore_index=True,sort=False)

Make the cumulative dataframe into a csv file using the .to_csv function from the Pandas software

In [None]:
combined.to_csv("StateData-22.csv", index=False)

## Creating a Comparison of COVID and Mask Data from January-June 2022
Begin by renaming the "COUNTYFP" column of the mask dataframe to "fips", using the .rename() command

In [104]:
mask = mask.rename(columns={"COUNTYFP" : "fips"})

Next, shorten the covid dataframe to contain only values from January-June using the .iloc function

In [None]:
covid = covid.iloc[0:589051, :]

Create a dataframe that contains every column of combined data from the covid and mask dataframes
- Use the pd.merge() function, merge on the "fips" column, and set how = "right" so that the mask dataframe, which has fewer rows, is used to limit the comparison

In [None]:
tg = pd.merge(covid, mask, on = "fips", how = "right")

Create an empty dataframe using pandas that has the columns "State", "Cases", "Deaths", "Never", and "Always"
- This will provide an empty space to aggregate the data we choose

In [None]:
compare = pd.DataFrame({"State":[], "Cases":[], "Deaths":[], "Never":[], "Always":[]})

Use the steps from "Creating a Compiled List of COVID Data by State" as a reference if necessary
1. "States" should be defined in the for loop using the reference list of states
1. For "Cases" and "Deaths", the summed value (demonstrated above) should be divided by the number of values in each, which would yield the average. This can be accomplished use the len() method for each of the filtered values respectively
1. "Never" should be defined similarly to "Cases" and "Deaths", through the filtering of the combined dataset and the average of each value in the NEVER column
1. "Always" should be defined in the same way as "Never"

In [None]:
for state in states:
    current = pd.DataFrame({"State":[state], "Cases":[tg[tg["state"]==state].cases.sum()/len(tg[tg["state"]==state].cases)], "Deaths":[tg[tg["state"]==state].deaths.sum()/len(tg[tg["state"]==state].deaths)], "Never":[tg[tg["state"]==state].NEVER.sum()/len(tg[tg["state"]==state].NEVER)], "Always":[tg[tg["state"]==state].ALWAYS.sum()/len(tg[tg["state"]==state].ALWAYS)]})
    compare = pd.concat([compare, current],axis=0,ignore_index=True,sort=False)

Make the cumulative dataframe into a .csv file using the .to_csv function from the Pandas software

In [None]:
compare.to_csv("Jan-JunData.csv", index=False)