# Sorting Vaccination Data
### May 9, 2023

The purpose of this script is to parse a csv containing vaccination rates of Wisconsin counties designated as "Urban" or "Rural" and to classify these counties as "greater than", "less than", or "equal to" the median vaccination rate for all of the counties. The csv format requirements to run this code are as follows:

Column 1: "Date"
    row format Mmm-yy
    example: Apr-21
    
Columns 1-71: Counties
    format: float vaccination rate
    example: 43.1117

Column 72: Median
    format: float vaccination rate
    example: 51.6733
    
     The result of the comparison of the vaccination rates to the median vaccination rate will be output to a csv titled "comparison_results." These results can then be run through the 2-Grouping_High_Low_Urb_Rur.ipynb script to determine if a county is "mostly" greater than the median, ie "highly vaccinated" or mostly less than, ie "lowly vaccinated".

In [50]:
import pandas as pd

# Load vaccination data into a dataframe
df = pd.read_csv('/Users/mavoeg/Desktop/SARS/Wisconsin/WI_Data_Counties/input_files/allseries_vax.csv', header=[0])

# Choose the median column to compare vaccination rates against
column_to_compare = df['Median']

# Set the index to be the date column
df.set_index('Date', inplace=True)

# Store the index values in a variable
row_index = df.index[0:28]

In [51]:
# Iterate through each county and date
for county in df.columns[0:72]:
    for date in row_index:

        # Create an empty list to store the comparison results for each county and date
        county_comparison_results = []

        # Iterate through each row using iterrows
        for index, row in df.iterrows():

            # Compare the median vaccination rate values to the vaccination rate values of the columns
            for column in df.columns[0:72]:
                value_to_compare = row[column]
                other_value = row['Median']

                if value_to_compare > other_value:
                    compare_result = "above"
                elif value_to_compare < other_value:
                    compare_result = "below"
                else:
                    compare_result = "equal to"

                # Append the comparison result to the list
                county_comparison_results.append([index, column, compare_result])

In [52]:
# create a dataframe frame from the list with the appropriate columns
county_comparison_resultsdf = pd.DataFrame(county_comparison_results, columns = ["Date", "County", "Comparison"])

# export the dataframe to a csv
county_comparison_resultsdf.to_csv('/Users/mavoeg/computational_folder/gh_folder/county_comparison_results.csv')

The .value_counts() method returns the count of the string "above" or "below" by the 'County' category 

The union() method is used here to ensure all counties are included from both Count Greater Than and Count Less. This also ensure both have matching indices (counties)

This works by returning a set that contains all items from the original set and all items from the specified set(s).
For example:
    set1.union(set2)

In [53]:
# Use .value_counts() method to calculate the greater than and less than count, including counts equal to 0

count_greater_than = county_comparison_resultsdf[county_comparison_resultsdf["Comparison"] == 'above']['County'].value_counts(dropna=False, sort=False)
count_less_than = county_comparison_resultsdf[county_comparison_resultsdf['Comparison'] == 'below']['County'].value_counts(dropna=False, sort=False)

# Use the union method to get the unique counties from both counts by converting to list and save the index order
unique_counties = list(count_greater_than.index.union(count_less_than.index))

# Use reindex and fill_value = 0 to ensure all counties are captured 
#Create a DataFrame with two columns, including counts equal to 0
df_counts = pd.DataFrame({
    'County': unique_counties,
    'Count_Greater_Than': count_greater_than.reindex(unique_counties, fill_value=0),
    'Count_Less_Than': count_less_than.reindex(unique_counties, fill_value=0)
})
print(df_counts)

                County  Count_Greater_Than  Count_Less_Than
Adams            Adams                   2               25
Ashland        Ashland                  26                1
Barron          Barron                   1               26
Bayfield      Bayfield                  27                0
Brown           Brown                   26                1
...                ...                 ...              ...
Waukesha     Waukesha                   27                0
Waupaca        Waupaca                   0               27
Waushara     Waushara                    0               27
Winnebago   Winnebago                   24                3
Wood              Wood                  26                1

[72 rows x 3 columns]


In [54]:
rural = ["Ashland County",
"Bayfield County",
"Crawford County",
"Door County",
"Forest County",
"Iron County",
"Lafayette County", 
"Manitowoc County",
"Menominee County", 
"Oneida County",
"Portage County",
"Price County",
"Richland County",
"Sauk County",
"Trempealeau County", 
"Vilas County",
"Washburn County",
"Wood County", "Adams County", "Barron County", 
"Buffalo County",
"Burnett County",
"Clark County",
"Dodge County", 
"Dunn County",
"Florence County",
"Grant County",
"Green Lake County",
"Jackson County",
"Jefferson County", 
"Juneau County",
"Langlade County",
"Lincoln County",
"Marinette County",
"Marquette County", 
"Monroe County",
"Pepin County",
"Polk County",
"Rusk County",
"Sawyer County",
"Shawano County",
"Taylor County",
"Vernon County",
"Walworth County",
"Waupaca County",
"Waushara County"]

urban = ['Brown County',
'Columbia County',
'Dane County',
'Eau Claire County',
'Green County',
'Iowa County',
'Kenosha County',
'La Crosse County',
'Marathon County',
'Milwaukee County',
'Outagamie County',
'Ozaukee County',
'Racine County',
'Rock County',
'Sheboygan County',
'Waukesha County',
'Winnebago County',
'Calumet County',
'Chippewa County',
'Douglas County',
'Fond du Lac County',
'Kewaunee County',
'Oconto County',
'Pierce County',
'Saint Croix County',
'Washington County']

In [91]:
# Create the Urb_Rur column in the DataFrame and assign the "Rural" as a default
df_counts['Urb_Rur'] = 'Rural'

#iterate over each row in the DataFrame
for index, row in df_counts.iterrows():
    #check if the 'County' value matches any string in the list
    for county in urban:
        # remove county from the list of urban counties
        urban_replace = county.replace('County', '')
        if urban_replace in row['County']:
            
            # dataframe.at[position, Column Name]
            df_counts.at[index, 'Urb_Rur'] = 'Urban'


In [92]:
df_counts.to_csv('/Users/mavoeg/computational_folder/gh_folder/counts.csv', index=False)