# Urban/Rural, High/Low Metadata 

### July 22, 2023

This script takes in a metadata tsv/csv (from GISAID or GB) and updates the metadata to include a new column, "urban_rural_high_low." The new column categorizes each county by matching the county in the "location" column to the counties in the lists "rural_high", "rural_low", "urban_high", and "urban_low."

You should use this script whenever you want to run a Nextstrain build with tips colored by new categories.


In [92]:
import pandas as pd

# Load the existing CSV file into a DataFrame
df = pd.read_csv('/Users/mavoeg/computational_folder/gh_folder/ncov/data/clean_ready_for_nextstrain/clean_wicounties_05_18_2023.tsv', sep= '\t')



In [93]:
# Assigning counties to Rural High-Low categories
rural_high = ["Ashland",
"Bayfield",
"Crawford",
"Door",
"Forest",
"Iron",
"Lafayette", 
"Manitowoc",
"Menominee", 
"Oneida",
"Portage",
"Price",
"Richland",
"Sauk",
"Trempealeau", 
"Vilas",
"Washburn",
"Wood"]

rural_low = ["Adams", "Barron", 
"Buffalo",
"Burnett",
"Clark",
"Dodge", 
"Dunn",
"Florence",
"Grant",
"Green Lake",
"Jackson", "Jackson",
"Juneau",
"Langlade",
"Lincoln",
"Marinette",
"Marquette", 
"Monroe",
"Pepin",
"Polk",
"Rusk",
"Sawyer",
"Shawano",
"Taylor",
"Vernon",
"Walworth",
"Waupaca",
"Waushara"]


In [94]:
# Assigning counties to Urban High-Low Cateogries 
urban_high = ['Brown',
'Columbia',
'Dane',
'Eau Claire',
'Green',
'Iowa',
'Kenosha',
'La Crosse',
'Marathon',
'Milwaukee',
'Outagamie',
'Ozaukee',
'Racine',
'Rock',
'Sheboygan',
'Waukesha',
'Winnebago']

urban_low = ['Calumet',
'Chippewa',
'Douglas',
'Fond du Lac',
'Kewaunee',
'Oconto',
'Pierce',
'St.Croix']

mid_vax = ['Washington', "Jefferson"]

In [99]:
# Define the urban rural, high low variables to add to the metadata column
urbhigh_value = 'Urban High'
urblow_value = 'Urban Low'
rurhigh_value = "Rural High"
rurlow_value = "Rural Low"
midvax_value = "Mid Vax"


In [100]:
def urb_rur_high_low(location):
    if location in rural_high:
        return rurhigh_value
    elif location in rural_low:
        return rurlow_value
    elif location in urban_high:
        return urbhigh_value
    elif location in urban_low:
        return urblow_value
    elif location in mid_vax:
        return midvax_value
    else:
        return ""  # if no urb_rur_high_low, return empty

# apply the function to add a new column with the categories assigned based on the location column
df["urban_rural_high_low"] = df["location"].apply(urb_rur_high_low)


In [74]:
# Save the updated df to a new TSV file
df.to_csv('/Users/mavoeg/computational_folder/gh_folder/ncov/data/clean_ready_for_nextstrain/updated_file.tsv', index=False)