# US_Measles_Risk_UVR

### Task 1. Raw measles risk
Calculate raw risk for each county with $$ r_{ij}^{t} = C_{i}^{t} \times V_{ij}^{t} \times UVR_{j}^{t} \times P_{j}^{t} $$
where <br/>
$i$ is the origin country, <br/>
$j$ is the US county, <br/>
$t$ is the year, <br/>
$r_{ij}^{t}$ is the measles risk from country $i$ to county $j$ in year $t$, <br/>
$C_{i}^{t}$ is the case incidence in Country $i$ in year $t$, <br/>
$V_{ij}^{t}$ is the travel volume (million) from country $i$ to county $j$ in year $t$, <br/>
$UVR_{j}^{t}$ is 1 - vaccination rate in county $j$ in year $t$, <br/>
$P_{j}^{t}$ is the county $j$ population in year $t$. <br/>
$$ r_{j}^{t} = \sum_{i} r_{ij}^{t} = (\sum_{i} C_{i}^{t} \times V_{ij}^{t}) \times UVR_{j}^{t} \times P_{j}^{t}$$
where <br/>
$r_{j}^{t}$ is the measles risk of county $j$ in year $t$, <br/>

### Task 2. Rearrange travel volume by population
For counties where is no international travel - update $V_{ij}^{t}$. <br/>
Task 2.1: calculate the average of raw risk in neighboring counties <br/>
Task 2.2: proportion to population <br/>

### Task 3. Rearrange travel volume by Voronoi diagram

####  Goal:
Update $V_{ij}^{t}$ for all US counties.
#### Preparation: 
Create Thiessen polygons for all known __675__ airports in the US (in `Voronoi.mxd`).
1. Make sure the airports layer contains IATA code and cooridates. 
* `Create Thiessen Polygons` for `US_airports_675` to create `US_airports_Thiessen` (Output Fields: ALL).
* `Dissolve` `us_states` to create `US_Boundary` as the mask.
* `Clip` `US_airports_Thiessen` with `US_Boundary` to make sure all Thiessen polygons are within the US. Output: `US_airports_Thiessen_Clip`.
* Calculate geometry (`ThiessenAreaKM2`) for each Thiessen polygon.
* `Intersect` `US_airports_Thiessen_Clip` and `us_states` to get `Thiessen_County_Intersect`.
* Calcuate geometry (`IntersectAreaKM2`) for each polygon in `Thiessen_County_Intersect`.
* Calcuate percentage of intersected polygon to the airport Thiessen polygon (`ThiessenAreaPct = [IntersectAreaKM2] * 100/ [ThiessenAreaKM2]`).
* Export `Thiessen_County_Intersect` as `Thiessen_County_Intersect_Pct.csv`

#### Method:
Diffusing international incoming travel volume ( $V_{ij}^{t}$) to all neighboring counties. 

## Task 1: Calculate measles risk in county level

In [1]:
# environment setting
import pandas as pd
import datetime
t = datetime.datetime.now()
year = 2019 # only support 2019 in this script
year_pop = 'pop2019'
s_or_c = 'confirmed' # confirmed, suspected
year_iata = 2018 # we use 2018 IATA data for 2018 and 2019 (2007 to 2018)
out_folder = r'C:\Users\Ensheng\Desktop\mapping\Data_Output\\'
#pd.set_option("display.max_rows", 999)

#### Import US measles cases

In [2]:
in_table = r"C:\Users\Ensheng\Desktop\mapping\Data_Cases\Reported_Measles_US_Counties\reported_cases.csv"
real_case = pd.read_csv(in_table)
real_case = real_case[['FIPS','# of Cases','intl_case']]
print(len(real_case))
real_case.head(3)

97


Unnamed: 0,FIPS,# of Cases,intl_case
0,2122,1.0,
1,4019,1.0,1.0
2,6001,1.0,


#### Import world population

In [3]:
# ref: https://population.un.org/wpp/Download/Standard/Population/
in_table = r'C:\Users\Ensheng\Desktop\mapping\scripts\world_pop.xlsx'
df_pop = pd.read_excel(in_table)
print(len(df_pop))
df_pop.head(5)

235


Unnamed: 0,name,Country code,pop1950,pop1951,pop1952,pop1953,pop1954,pop1955,pop1956,pop1957,...,pop2011,pop2012,pop2013,pop2014,pop2015,pop2016,pop2017,pop2018,pop2019,pop2020
0,Afghanistan,4,7752.118,7840.156,7935.997,8039.694,8151.317,8270.991,8398.875,8535.163,...,30117.413,31161.376,32269.589,33370.794,34413.603,35383.032,36296.113,37171.921,38041.754,38928.346
1,Albania,8,1263.174,1287.5,1316.093,1348.112,1382.898,1419.994,1459.12,1500.181,...,2928.592,2914.096,2903.79,2896.305,2890.513,2886.438,2884.169,2882.74,2880.917,2877.797
2,Algeria,12,8872.247,9023.269,9186.138,9364.371,9560.149,9774.283,10006.147,10253.778,...,36661.445,37383.895,38140.133,38923.692,39728.025,40551.392,41389.189,42228.408,43053.054,43851.044
3,American Samoa,16,18.94,19.293,19.542,19.695,19.753,19.754,19.709,19.667,...,55.759,55.667,55.713,55.791,55.812,55.741,55.62,55.465,55.312,55.191
4,Andorra,20,6.196,6.689,7.247,7.865,8.525,9.232,9.989,10.779,...,83.747,82.427,80.774,79.213,78.011,77.297,77.001,77.006,77.142,77.265


In [4]:
# ref: http://worldpopulationreview.com/country-codes/
# ref: https://www.iban.com/country-codes
# note: add BLM manually
in_table = r'C:\Users\Ensheng\Desktop\mapping\diffusion_model\country_code.csv'
df_code = pd.read_csv(in_table)
print(len(df_code))
df_code.head(5)

238


Unnamed: 0,name,alpha2,alpha3,num3
0,Afghanistan,AF,AFG,4
1,Albania,AL,ALB,8
2,Algeria,DZ,DZA,12
3,American Samoa,AS,ASM,16
4,Andorra,AD,AND,20


In [5]:
df_pop3 = pd.merge(df_pop, df_code, how='left', left_on='Country code',right_on='num3')
print("Info: " + str(len(df_pop3)) + " countries in UN dataset.")
print("Warning: " + str(len(df_pop3.loc[df_pop3['num3'].isnull()])) + " countries mismatched.")
df_pop3.head(3)

Info: 235 countries in UN dataset.


Unnamed: 0,name_x,Country code,pop1950,pop1951,pop1952,pop1953,pop1954,pop1955,pop1956,pop1957,...,pop2015,pop2016,pop2017,pop2018,pop2019,pop2020,name_y,alpha2,alpha3,num3
0,Afghanistan,4,7752.118,7840.156,7935.997,8039.694,8151.317,8270.991,8398.875,8535.163,...,34413.603,35383.032,36296.113,37171.921,38041.754,38928.346,Afghanistan,AF,AFG,4
1,Albania,8,1263.174,1287.5,1316.093,1348.112,1382.898,1419.994,1459.12,1500.181,...,2890.513,2886.438,2884.169,2882.74,2880.917,2877.797,Albania,AL,ALB,8
2,Algeria,12,8872.247,9023.269,9186.138,9364.371,9560.149,9774.283,10006.147,10253.778,...,39728.025,40551.392,41389.189,42228.408,43053.054,43851.044,Algeria,DZ,DZA,12


#### Import WHO data

In [6]:
# if year != 2019:
print (year, s_or_c)
# ref: https://www.who.int/immunization/monitoring_surveillance/burden/vpd/surveillance_type/active/measles_monthlydata/en/
in_table = r'C:\Users\Ensheng\Desktop\mapping\Data_Cases\WHO_Measles\WHO_Measles_Dec2019.xlsx'
df_who = pd.read_excel(in_table, sheet_name=s_or_c)
print(len(df_who), "countries")
df_outbreak_raw = df_who
df_outbreak_raw.head(3)

2019 confirmed
194 countries


Unnamed: 0,ISO3,Country,Total
0,MDG,Madagascar,127575.0
1,UKR,Ukraine,56986.0
2,PHL,Philippines,42467.0


In [7]:
df_outbreak_raw[df_outbreak_raw['Country']=='Tonga']

Unnamed: 0,ISO3,Country,Total
105,TON,Tonga,440.0


In [8]:
# if year != 2019:
print (year)
# ref: https://www.who.int/immunization/monitoring_surveillance/burden/vpd/surveillance_type/active/measles_monthlydata/en/
# in_table = r'C:\Users\Ensheng\Desktop\mapping\diffusion_model\measlescasesbycountrybymonth.xls'
# df_who = pd.read_excel(in_table,sheet_name='WEB')
# df_who = df_who.loc[df_who['Year'] == year]

# col_list= list(df_who)
# col_list.remove('Year')
# df_who['Total'] = df_who[col_list].sum(axis=1)
# print(len(df_who))
# df_outbreak_raw = df_who[['ISO3','Country','Total']]
    
# df_outbreak_raw.head(3)

2019


In [9]:
df_outbreak = pd.merge(df_outbreak_raw, df_pop3, how='left', left_on='ISO3',right_on='alpha3')
print(len(df_outbreak))
df_outbreak = df_outbreak[['alpha3', 'Country', 'Total', year_pop]]
print(str(len(df_outbreak_raw) - df_outbreak.alpha3.notnull().sum()) + " row(s) have NaN as ISO 3 (alpha3).")
df_outbreak.sort_values(by='alpha3').head(5)

194
0 row(s) have NaN as ISO 3 (alpha3).


Unnamed: 0,alpha3,Country,Total,pop2019
78,AFG,Afghanistan,166.0,38041.754
16,AGO,Angola,2962.0,31825.295
62,ALB,Albania,481.0,2880.917
170,AND,Andorra,0.0,77.142
79,ARE,United Arab Emirates,164.0,9770.529


#### Import $V_{ij}^{t}$

In [10]:
# IATA data
in_table = r'C:\Users\Ensheng\Desktop\mapping\IATA\flow_XY.csv'
df_iata = pd.read_csv(in_table)
df_iata = df_iata.loc[df_iata['year'] == year_iata] # slice for certain year
df_iata = df_iata[['FIPS', 'ISO', 'paxVolume']]
print(len(df_iata))
df_iata.head(5)

38265


Unnamed: 0,FIPS,ISO,paxVolume
370971,46013.0,ARE,17
370972,48441.0,ARE,16
370973,17167.0,ARE,8
370974,5119.0,ARE,496
370975,39153.0,ARE,81


#### Import $UVR_{j}^{t}$ and $P_{j}^{t}$

In [11]:
in_table = r'C:\Users\Ensheng\Desktop\mapping\Data_NME\Data_VR_AllCounties.csv'
df_nme = pd.read_csv(in_table)
print(len(df_nme))
df_nme.head(5)

3085


Unnamed: 0,FIPS,County,VR,UVR,Population
0,1001,"Autauga, AL",0.9639,0.0361,55504
1,1003,"Baldwin, AL",0.9653,0.0347,212628
2,1005,"Barbour, AL",0.8827,0.1173,25270
3,1007,"Bibb, AL",0.9454,0.0546,22668
4,1009,"Blount, AL",0.973,0.027,58013


In [12]:
# in_table = r'C:\Users\Ensheng\Desktop\mapping\diffusion_model\ModelInputOutputAll 4_23.csv'
# df_fipspop = pd.read_csv(in_table)
# df_fipspop = df_fipspop[['FIPS','Population']]
# print(len(df_fipspop))
# df_fipspop.head(5)

In [13]:
# # merge county population
# df_temp = pd.merge(df_nme, df_fipspop, how='left', left_on='FIPS',right_on='FIPS')
# df_temp.head(5)
# df_nme = df_temp
# print(len(df_nme))
# df_nme.head(5)

#### Calculate $r_{ij}^{t}$

In [14]:
df_temp = pd.merge(df_iata, df_outbreak, how='left', left_on='ISO',right_on='alpha3')
df_factors = pd.merge(df_temp, df_nme, how='left', left_on='FIPS',right_on='FIPS')
df_factors.head(5)

Unnamed: 0,FIPS,ISO,paxVolume,alpha3,Country,Total,pop2019,County,VR,UVR,Population
0,46013.0,ARE,17,ARE,United Arab Emirates,164.0,9770.529,"Brown, SD",0.974,0.026,39178.0
1,48441.0,ARE,16,ARE,United Arab Emirates,164.0,9770.529,"Taylor, TX",0.983322,0.016678,136290.0
2,17167.0,ARE,8,ARE,United Arab Emirates,164.0,9770.529,"Sangamon, IL",0.982722,0.017278,196452.0
3,5119.0,ARE,496,ARE,United Arab Emirates,164.0,9770.529,"Pulaski, AR",0.942,0.058,393956.0
4,39153.0,ARE,81,ARE,United Arab Emirates,164.0,9770.529,"Summit, OH",0.916,0.084,541228.0


In [15]:
# rename and reorder col.
df_factors.loc[:,('FIPS_Pop')] = df_factors['Population']
df_factors.loc[:,('ISO_Case')] = df_factors['Total']
df_factors.loc[:,('ISO_Pop')] = df_factors[year_pop]
df_factors = df_factors[['FIPS','County','UVR','FIPS_Pop','ISO','Country','ISO_Case','ISO_Pop','paxVolume']]
print(len(df_factors))
df_factors.head(5)

38265


Unnamed: 0,FIPS,County,UVR,FIPS_Pop,ISO,Country,ISO_Case,ISO_Pop,paxVolume
0,46013.0,"Brown, SD",0.026,39178.0,ARE,United Arab Emirates,164.0,9770.529,17
1,48441.0,"Taylor, TX",0.016678,136290.0,ARE,United Arab Emirates,164.0,9770.529,16
2,17167.0,"Sangamon, IL",0.017278,196452.0,ARE,United Arab Emirates,164.0,9770.529,8
3,5119.0,"Pulaski, AR",0.058,393956.0,ARE,United Arab Emirates,164.0,9770.529,496
4,39153.0,"Summit, OH",0.084,541228.0,ARE,United Arab Emirates,164.0,9770.529,81


In [16]:
# slice
df_factors = df_factors.loc[df_factors['ISO_Case'].notnull()]
print(len(df_factors))
df_factors = df_factors.loc[df_factors['paxVolume'].notnull()]
print(len(df_factors))

32905
32905


#### Calculate $r_{j}^{t}$
##### (Update FIPS_NME to UVR)

In [17]:
df_factors['Route_Risk'] = (df_factors['ISO_Case'] / df_factors['ISO_Pop']) * df_factors['paxVolume'] * df_factors['UVR'] * df_factors['FIPS_Pop']

In [18]:
df_risk = df_factors.groupby(['FIPS','County']).agg({'Route_Risk':'sum', 'paxVolume':'sum','UVR':'mean'}).reset_index()
df_risk.loc[:,('FIPS_RawRisk')] = df_risk['Route_Risk']
df_risk.head(5)

Unnamed: 0,FIPS,County,Route_Risk,paxVolume,UVR,FIPS_RawRisk
0,1033.0,"Colbert, AL",334.0325,22,0.1061,334.0325
1,1045.0,"Dale, AL",288198.2,4754,0.0636,288198.2
2,1073.0,"Jefferson, AL",83249750.0,115308,0.0683,83249750.0
3,1089.0,"Madison, AL",19894190.0,54155,0.0417,19894190.0
4,1097.0,"Mobile, AL",45442420.0,32531,0.0808,45442420.0


#### Normalize and list the Top 25

In [19]:
# import county seats
# ref: https://en.wikipedia.org/wiki/List_of_the_most_populous_counties_in_the_United_States
in_table = r'C:\Users\Ensheng\Desktop\mapping\Data_NME\County_CitySeats.xlsx'
df_seat = pd.read_excel(in_table)
print(len(df_seat))
df_seat.head(3)

64


Unnamed: 0,FIPS,City
0,35001,Albuquerque
1,48453,Austin
2,6029,Bakersfield


In [20]:
highest_risk = df_risk['FIPS_RawRisk'].max()
df_risk['Risk'] = df_risk['FIPS_RawRisk'] / highest_risk
df_risk['FIPS_Rank'] = df_risk['Risk'].rank(ascending=False)
df_risk = pd.merge(df_risk, df_seat, how='left', left_on='FIPS',right_on='FIPS')
df_risk['Year'] = year
df_risk = df_risk[['FIPS','County','City','FIPS_RawRisk','Risk','FIPS_Rank','Year','paxVolume','UVR']]
df_risk = df_risk.sort_values('Risk',ascending = False).reset_index(drop=True)
df_risk = pd.merge(df_risk, real_case, how='left', left_on='FIPS',right_on='FIPS')
df_risk.head(25)

Unnamed: 0,FIPS,County,City,FIPS_RawRisk,Risk,FIPS_Rank,Year,paxVolume,UVR,# of Cases,intl_case
0,6037.0,"Los Angeles, CA",Los Angeles,239943000000.0,1.0,1.0,2019,10930679,0.055,22.0,1.0
1,53033.0,"King, WA",Seattle,53704500000.0,0.223822,2.0,2019,2195272,0.29,13.0,1.0
2,12086.0,"Miami-Dade, FL",Miami,20103160000.0,0.083783,3.0,2019,7100255,0.051628,,
3,17031.0,"Cook, IL",Chicago,16726930000.0,0.069712,4.0,2019,5444790,0.020475,2.0,1.0
4,15003.0,"Honolulu, HI",Honolulu,14129150000.0,0.058885,5.0,2019,2713840,0.085,1.0,
5,36081.0,"Queens, NY","Queens, NYC",11234620000.0,0.046822,6.0,2019,15428483,0.01,7.0,1.0
6,25025.0,"Suffolk, MA",Boston,9364420000.0,0.039028,7.0,2019,3882798,0.14,1.0,
7,48201.0,"Harris, TX",Houston,8384134000.0,0.034942,8.0,2019,3178338,0.025232,4.0,1.0
8,4013.0,"Maricopa, AZ",Phoenix,7963476000.0,0.033189,9.0,2019,1276180,0.073,,
9,6081.0,"San Mateo, CA",Redwood City,6754690000.0,0.028151,10.0,2019,6125305,0.034,6.0,1.0


In [21]:
result = df_risk
output_csv = out_folder + 'MeaslesRisk_US_' +  str(year) + '_raw_' + t.strftime('%m%d%y%H%M') + '.csv'
#result.to_csv(output_csv, index=False, encoding='utf-8')

In [22]:
df_complete = pd.merge(df_factors, df_risk , how='left', left_on='FIPS',right_on='FIPS')
df_complete = df_complete.sort_values(by=['Risk','Route_Risk'], ascending=False)
df_complete['Route_Rank'] = df_complete.groupby('FIPS_Rank')['Route_Risk'].rank(ascending=False,method='dense')
df_complete = df_complete.rename(index=str, columns={"County_x": "County"})
df_complete = df_complete.drop(columns=['County_y'])
df_complete = df_complete.rename(index=str, columns={"paxVolume_x": "paxVolume_route"})
df_complete = df_complete.rename(index=str, columns={"paxVolume_y": "paxVolume_county"})
df_complete = df_complete.rename(index=str, columns={"UVR_y": "UVR"})
df_complete = df_complete.drop(columns=['UVR_x'])
print(len(df_complete))
df_complete.head(5)

32905


Unnamed: 0,FIPS,County,FIPS_Pop,ISO,Country,ISO_Case,ISO_Pop,paxVolume_route,Route_Risk,City,FIPS_RawRisk,Risk,FIPS_Rank,Year,paxVolume_county,UVR,# of Cases,intl_case,Route_Rank
25447,6037.0,"Los Angeles, CA",10163507.0,PHL,Philippines,42467.0,108116.615,367297,80645990000.0,Los Angeles,239943000000.0,1.0,1.0,2019.0,10930679.0,0.055,22.0,1.0,1.0
32267,6037.0,"Los Angeles, CA",10163507.0,WSM,Samoa,5580.0,197.097,2219,35117030000.0,Los Angeles,239943000000.0,1.0,1.0,2019.0,10930679.0,0.055,22.0,1.0,2.0
24384,6037.0,"Los Angeles, CA",10163507.0,NZL,New Zealand,1997.0,4783.063,114512,26725710000.0,Los Angeles,239943000000.0,1.0,1.0,2019.0,10930679.0,0.055,22.0,1.0,3.0
31196,6037.0,"Los Angeles, CA",10163507.0,UKR,Ukraine,56986.0,43993.638,19077,13813210000.0,Los Angeles,239943000000.0,1.0,1.0,2019.0,10930679.0,0.055,22.0,1.0,4.0
14782,6037.0,"Los Angeles, CA",10163507.0,ISR,Israel,998.0,8519.377,137271,8988926000.0,Los Angeles,239943000000.0,1.0,1.0,2019.0,10930679.0,0.055,22.0,1.0,5.0


In [23]:
result = df_complete
output_csv = out_folder + 'MeaslesRisk_US_' +  str(year) + '_raw_route_' + t.strftime('%m%d%y%H%M') + '.csv'
#result.to_csv(output_csv, index=False, encoding='utf-8')

## Task 2: Travel volume proportional to the population (or pop density)

#### Import neighboring relationship table

In [24]:
in_table = r'C:\Users\Ensheng\Desktop\mapping\diffusion_model\nbr.csv'
df_nbr = pd.read_csv(in_table)
df_nbr = df_nbr[['src_FIPS', 'nbr_FIPS']]
print(len(df_nbr))
df_nbr.head(5)

18680


Unnamed: 0,src_FIPS,nbr_FIPS
0,1001.0,1021.0
1,1001.0,1047.0
2,1001.0,1051.0
3,1001.0,1085.0
4,1001.0,1101.0


In [25]:
# find all counties with IATA data
df_iataCounty = df_iata.groupby(['FIPS'])['paxVolume'].sum().reset_index()
df_iataCounty = df_iataCounty.loc[df_iataCounty['paxVolume'].notnull()]
print(str(len(df_nme)) + " counties in the US.")
print(str(len(df_iataCounty)) + " counties have IATA travel data.")

3085 counties in the US.
401 counties have IATA travel data.


In [26]:
# subset of df_nbr to show only src_FIPS with IATA data
df_temp = pd.merge(df_nbr, df_iataCounty, how='left', left_on='src_FIPS',right_on='FIPS')
df_hub = df_temp.loc[df_temp['paxVolume'].notnull()]
print(str(len(df_hub)) + " neighboring relationships remain.") # we will only work with these counties and their neighbors
print(str(df_hub.src_FIPS.nunique()) + " hub counties.")
df_hub.head(10)

2260 neighboring relationships remain.
396 hub counties.


Unnamed: 0,src_FIPS,nbr_FIPS,FIPS,paxVolume
97,1033.0,1059.0,1033.0,22.0
98,1033.0,1077.0,1033.0,22.0
99,1033.0,1079.0,1033.0,22.0
100,1033.0,28141.0,1033.0,22.0
131,1045.0,1005.0,1045.0,5020.0
132,1045.0,1031.0,1045.0,5020.0
133,1045.0,1061.0,1045.0,5020.0
134,1045.0,1067.0,1045.0,5020.0
135,1045.0,1069.0,1045.0,5020.0
136,1045.0,1109.0,1045.0,5020.0


In [27]:
print("The following (island) counties have IATA data but no neighboring counties: ")
print(set(df_iataCounty.FIPS.unique()) - set(df_hub.src_FIPS.unique()))

The following (island) counties have IATA data but no neighboring counties: 
{15007.0, 15001.0, 25019.0, 15003.0, 53055.0}


#### Update hub county list

In [28]:
# src_FIPS is the hub county, nbr_FIPS lists all neighboring counties along with itself, the hub county
# this will also clear out the island county issue
df_iataCounty["src_FIPS"] = df_iataCounty["FIPS"]
df_iataCounty["nbr_FIPS"] = df_iataCounty["FIPS"]
df_iataCounty = df_iataCounty[["src_FIPS","nbr_FIPS"]]
df_hub = df_hub[["src_FIPS","nbr_FIPS"]]
df_hub = df_hub.append(df_iataCounty)
print(str(len(df_hub)) + " neighboring relationships remain.")
print(str(df_hub.src_FIPS.nunique()) + " hub counties.")
df_hub = df_hub.sort_values(["src_FIPS","nbr_FIPS"]).reset_index()
df_hub.head(10)

2661 neighboring relationships remain.
401 hub counties.


Unnamed: 0,index,src_FIPS,nbr_FIPS
0,0,1033.0,1033.0
1,97,1033.0,1059.0
2,98,1033.0,1077.0
3,99,1033.0,1079.0
4,100,1033.0,28141.0
5,131,1045.0,1005.0
6,132,1045.0,1031.0
7,1,1045.0,1045.0
8,133,1045.0,1061.0
9,134,1045.0,1067.0


#### Merge county population

In [29]:
df_pop = pd.merge(df_hub, df_nme , how='left', left_on='nbr_FIPS',right_on='FIPS')
df_pop.head(5)

Unnamed: 0,index,src_FIPS,nbr_FIPS,FIPS,County,VR,UVR,Population
0,0,1033.0,1033.0,1033.0,"Colbert, AL",0.8939,0.1061,54500.0
1,97,1033.0,1059.0,1059.0,"Franklin, AL",0.9775,0.0225,31495.0
2,98,1033.0,1077.0,1077.0,"Lauderdale, AL",0.9354,0.0646,92538.0
3,99,1033.0,1079.0,1079.0,"Lawrence, AL",0.968,0.032,33049.0
4,100,1033.0,28141.0,28141.0,"Tishomingo, MS",0.992,0.008,19542.0


#### Calculate population percentage

In [30]:
df_pop_tmp = df_pop.groupby(['src_FIPS', 'nbr_FIPS']).agg({'Population': 'sum'})
# Change: groupby df_nbr_tmp and divide by sum
df_poppct = df_pop_tmp.groupby(level=0) \
.apply(lambda x: 100 * x / float(x.sum())) \
.rename(columns={'Population':'POPPCT'}) \
.reset_index()

In [31]:
print(len(df_poppct)) # should be the same as len(df_hub), the count of neighboring pairs + the count of hub counties
df_poppct.head(15)

2661


Unnamed: 0,src_FIPS,nbr_FIPS,POPPCT
0,1033.0,1033.0,23.580416
1,1033.0,1059.0,13.626884
2,1033.0,1077.0,40.038248
3,1033.0,1079.0,14.299251
4,1033.0,28141.0,8.455202
5,1045.0,1005.0,8.216523
6,1045.0,1031.0,16.866796
7,1045.0,1045.0,16.005801
8,1045.0,1061.0,8.59077
9,1045.0,1067.0,5.575335


#### Calculate travel volume for each route

In [32]:
df_iata.head(5)

Unnamed: 0,FIPS,ISO,paxVolume
370971,46013.0,ARE,17
370972,48441.0,ARE,16
370973,17167.0,ARE,8
370974,5119.0,ARE,496
370975,39153.0,ARE,81


In [33]:
df_route = pd.merge(df_iata, df_poppct, how='left', left_on='FIPS',right_on='src_FIPS')
print(len(df_route))
df_route.head(15)

258283


Unnamed: 0,FIPS,ISO,paxVolume,src_FIPS,nbr_FIPS,POPPCT
0,46013.0,ARE,17,46013.0,38021.0,6.631108
1,46013.0,ARE,17,46013.0,38081.0,5.262871
2,46013.0,ARE,17,46013.0,46013.0,53.444466
3,46013.0,ARE,17,46013.0,46037.0,7.531444
4,46013.0,ARE,17,46013.0,46045.0,5.346084
5,46013.0,ARE,17,46013.0,46049.0,3.177093
6,46013.0,ARE,17,46013.0,46089.0,3.309415
7,46013.0,ARE,17,46013.0,46091.0,6.553352
8,46013.0,ARE,17,46013.0,46115.0,8.744168
9,48441.0,ARE,16,48441.0,48059.0,6.718859


In [34]:
df_route["IncomingTravel"] = df_route["paxVolume"] * df_route["POPPCT"] / 100
df_route.head(15)

Unnamed: 0,FIPS,ISO,paxVolume,src_FIPS,nbr_FIPS,POPPCT,IncomingTravel
0,46013.0,ARE,17,46013.0,38021.0,6.631108,1.127288
1,46013.0,ARE,17,46013.0,38081.0,5.262871,0.894688
2,46013.0,ARE,17,46013.0,46013.0,53.444466,9.085559
3,46013.0,ARE,17,46013.0,46037.0,7.531444,1.280345
4,46013.0,ARE,17,46013.0,46045.0,5.346084,0.908834
5,46013.0,ARE,17,46013.0,46049.0,3.177093,0.540106
6,46013.0,ARE,17,46013.0,46089.0,3.309415,0.562601
7,46013.0,ARE,17,46013.0,46091.0,6.553352,1.11407
8,46013.0,ARE,17,46013.0,46115.0,8.744168,1.486509
9,48441.0,ARE,16,48441.0,48059.0,6.718859,1.075017


In [35]:
df_iata_new = df_route.groupby(['nbr_FIPS','ISO'])['IncomingTravel'].sum().reset_index()
print(len(df_iata_new))
df_iata_new.head(5)

211613


Unnamed: 0,nbr_FIPS,ISO,IncomingTravel
0,1001.0,ABW,13.182564
1,1001.0,AFG,0.790954
2,1001.0,AGO,0.922779
3,1001.0,ARE,18.45559
4,1001.0,ARG,5.536677


In [36]:
# update df_iata with travel volume for more counties
df_iata_new["FIPS"] = df_iata_new["nbr_FIPS"]
df_iata_new["paxVolume"] = df_iata_new["IncomingTravel"]
df_iata = df_iata_new[["FIPS","ISO","paxVolume"]]
df_iata.head(5)

Unnamed: 0,FIPS,ISO,paxVolume
0,1001.0,ABW,13.182564
1,1001.0,AFG,0.790954
2,1001.0,AGO,0.922779
3,1001.0,ARE,18.45559
4,1001.0,ARG,5.536677


#### Calculate risk (same as Task 1)

#### Calculate $r_{ij}^{t}$

In [37]:
df_temp = pd.merge(df_iata, df_outbreak, how='left', left_on='ISO',right_on='alpha3')
df_factors = pd.merge(df_temp, df_nme, how='left', left_on='FIPS',right_on='FIPS')
df_factors.head(5)

Unnamed: 0,FIPS,ISO,paxVolume,alpha3,Country,Total,pop2019,County,VR,UVR,Population
0,1001.0,ABW,13.182564,,,,,"Autauga, AL",0.9639,0.0361,55504.0
1,1001.0,AFG,0.790954,AFG,Afghanistan,166.0,38041.754,"Autauga, AL",0.9639,0.0361,55504.0
2,1001.0,AGO,0.922779,AGO,Angola,2962.0,31825.295,"Autauga, AL",0.9639,0.0361,55504.0
3,1001.0,ARE,18.45559,ARE,United Arab Emirates,164.0,9770.529,"Autauga, AL",0.9639,0.0361,55504.0
4,1001.0,ARG,5.536677,ARG,Argentina,64.0,44780.677,"Autauga, AL",0.9639,0.0361,55504.0


In [38]:
# rename and reorder col.
df_factors.loc[:,('FIPS_Pop')] = df_factors['Population']
df_factors.loc[:,('ISO_Case')] = df_factors['Total']
df_factors.loc[:,('ISO_Pop')] = df_factors[year_pop]
df_factors = df_factors[['FIPS','County','UVR','FIPS_Pop','ISO','Country','ISO_Case','ISO_Pop','paxVolume']]
print(len(df_factors))
df_factors.head(5)

211613


Unnamed: 0,FIPS,County,UVR,FIPS_Pop,ISO,Country,ISO_Case,ISO_Pop,paxVolume
0,1001.0,"Autauga, AL",0.0361,55504.0,ABW,,,,13.182564
1,1001.0,"Autauga, AL",0.0361,55504.0,AFG,Afghanistan,166.0,38041.754,0.790954
2,1001.0,"Autauga, AL",0.0361,55504.0,AGO,Angola,2962.0,31825.295,0.922779
3,1001.0,"Autauga, AL",0.0361,55504.0,ARE,United Arab Emirates,164.0,9770.529,18.45559
4,1001.0,"Autauga, AL",0.0361,55504.0,ARG,Argentina,64.0,44780.677,5.536677


In [39]:
# slice
df_factors = df_factors.loc[df_factors['ISO_Case'].notnull()]
print(len(df_factors))
df_factors = df_factors.loc[df_factors['paxVolume'].notnull()]
print(len(df_factors))

181923
181923


#### Calculate $r_{j}^{t}$

In [40]:
df_factors['Route_Risk'] = (df_factors['ISO_Case'] / df_factors['ISO_Pop']) * df_factors['paxVolume'] * df_factors['UVR'] * df_factors['FIPS_Pop']

In [41]:
#df_risk = df_factors.groupby(['FIPS','County'])['Route_Risk'].sum().reset_index()
df_risk = df_factors.groupby(['FIPS','County']).agg({'Route_Risk':'sum', 'paxVolume':'sum','UVR':'mean'}).reset_index()
df_risk.loc[:,('FIPS_RawRisk')] = df_risk['Route_Risk']
df_risk.head(5)

Unnamed: 0,FIPS,County,Route_Risk,paxVolume,UVR,FIPS_RawRisk
0,1001.0,"Autauga, AL",60295.52,1587.576203,0.0361,60295.52
1,1003.0,"Baldwin, AL",6402396.0,27993.318657,0.0347,6402396.0
2,1005.0,"Barbour, AL",22419.76,390.613524,0.1173,22419.76
3,1007.0,"Bibb, AL",39492.91,1989.87458,0.0546,39492.91
4,1009.0,"Blount, AL",127912.9,5092.579583,0.027,127912.9


#### Normalize and list the Top 25

In [42]:
highest_risk = df_risk['FIPS_RawRisk'].max()
df_risk['Risk'] = df_risk['FIPS_RawRisk'] / highest_risk
df_risk['FIPS_Rank'] = df_risk['Risk'].rank(ascending=False)
df_risk = pd.merge(df_risk, df_seat, how='left', left_on='FIPS',right_on='FIPS')
df_risk['Year'] = year
df_risk = df_risk[['FIPS','County','City','FIPS_RawRisk','Risk','FIPS_Rank','Year','paxVolume','UVR']]
df_risk = df_risk.sort_values('Risk',ascending = False).reset_index(drop=True)
df_risk = pd.merge(df_risk, real_case, how='left', left_on='FIPS',right_on='FIPS')
df_risk.head(25)

Unnamed: 0,FIPS,County,City,FIPS_RawRisk,Risk,FIPS_Rank,Year,paxVolume,UVR,# of Cases,intl_case
0,6037.0,"Los Angeles, CA",Los Angeles,142218600000.0,1.0,1.0,2019,6601098.0,0.055,22.0,1.0
1,53033.0,"King, WA",Seattle,26117590000.0,0.183644,2.0,2019,1068388.0,0.29,13.0,1.0
2,15003.0,"Honolulu, HI",Honolulu,14129150000.0,0.099348,3.0,2019,2713840.0,0.085,1.0,
3,12086.0,"Miami-Dade, FL",Miami,12311230000.0,0.086566,4.0,2019,5100034.0,0.051628,,
4,6059.0,"Orange, CA",Santa Ana,12080460000.0,0.084943,5.0,2019,2478160.0,0.043,5.0,1.0
5,17031.0,"Cook, IL",Chicago,9830657000.0,0.069124,6.0,2019,3199981.0,0.020475,2.0,1.0
6,6071.0,"San Bernardino, CA",San Bernardino,8056249000.0,0.056647,7.0,2019,2688638.0,0.049,2.0,1.0
7,4013.0,"Maricopa, AZ",Phoenix,6086527000.0,0.042797,8.0,2019,964501.5,0.073,,
8,48201.0,"Harris, TX",Houston,5684577000.0,0.039971,9.0,2019,2154964.0,0.025232,4.0,1.0
9,51059.0,"Fairfax County, VA",,5651831000.0,0.03974,10.0,2019,1327771.0,0.123,,


In [43]:
result = df_risk
output_csv = out_folder + 'MeaslesRisk_US_' +  str(year) + '_pop_' + t.strftime('%m%d%y%H%M') + '.csv'
result.to_csv(output_csv, index=False, encoding='utf-8')

In [44]:
df_complete = pd.merge(df_factors, df_risk , how='left', left_on='FIPS',right_on='FIPS')
df_complete = df_complete.sort_values(by=['Risk','Route_Risk'], ascending=False)
df_complete['Route_Rank'] = df_complete.groupby('FIPS_Rank')['Route_Risk'].rank(ascending=False,method='dense')
df_complete = df_complete.rename(index=str, columns={"County_x": "County"})
df_complete = df_complete.drop(columns=['County_y'])
df_complete = df_complete.rename(index=str, columns={"paxVolume_x": "paxVolume_route"})
df_complete = df_complete.rename(index=str, columns={"paxVolume_y": "paxVolume_county"})
df_complete = df_complete.rename(index=str, columns={"UVR_y": "UVR"})
df_complete = df_complete.drop(columns=['UVR_x'])
print(len(df_complete))
df_complete.head(5)

181923


Unnamed: 0,FIPS,County,FIPS_Pop,ISO,Country,ISO_Case,ISO_Pop,paxVolume_route,Route_Risk,City,FIPS_RawRisk,Risk,FIPS_Rank,Year,paxVolume_county,UVR,# of Cases,intl_case,Route_Rank
12592,6037.0,"Los Angeles, CA",10163507.0,PHL,Philippines,42467.0,108116.615,217667.133001,47792340000.0,Los Angeles,142218600000.0,1.0,1.0,2019.0,6601098.0,0.055,22.0,1.0,1.0
12632,6037.0,"Los Angeles, CA",10163507.0,WSM,Samoa,5580.0,197.097,1307.343368,20689510000.0,Los Angeles,142218600000.0,1.0,1.0,2019.0,6601098.0,0.055,22.0,1.0,2.0
12587,6037.0,"Los Angeles, CA",10163507.0,NZL,New Zealand,1997.0,4783.063,67648.590206,15788360000.0,Los Angeles,142218600000.0,1.0,1.0,2019.0,6601098.0,0.055,22.0,1.0,3.0
12627,6037.0,"Los Angeles, CA",10163507.0,UKR,Ukraine,56986.0,43993.638,11299.525482,8181723000.0,Los Angeles,142218600000.0,1.0,1.0,2019.0,6601098.0,0.055,22.0,1.0,4.0
12542,6037.0,"Los Angeles, CA",10163507.0,ISR,Israel,998.0,8519.377,82088.136639,5375383000.0,Los Angeles,142218600000.0,1.0,1.0,2019.0,6601098.0,0.055,22.0,1.0,5.0


In [45]:
result = df_complete
output_csv = out_folder + 'MeaslesRisk_US_' +  str(year) + '_pop_route_' + t.strftime('%m%d%y%H%M') + '.csv'
result.to_csv(output_csv, index=False, encoding='utf-8')

## Task 3: Travel volume proportional to Voronoi diagram

In [46]:
# environment setting
v_folder = r'C:\Users\Ensheng\Desktop\mapping\Voronoi\\'

#### Import original $V_{ij}^{t}$

In [47]:
# IATA data
in_table = r'C:\Users\Ensheng\Desktop\mapping\IATA\flow_XY.csv'
# Note: CSL and SBP are the same airport. CSL -> SBP (Airport count 676 -> 675)
df_iata = pd.read_csv(in_table)
df_iata = df_iata.loc[df_iata['year'] == year_iata] # slice for certain year
# note: FIPS means the state where the airport (IATA) is located. One airport (IATA) has only one associated state (FIPS).
df_iata = df_iata[['ISO', 'Code', 'FIPS', 'paxVolume']]
print(len(df_iata))
df_iata.head(5)

38265


Unnamed: 0,ISO,Code,FIPS,paxVolume
370971,ARE,ABR,46013.0,17
370972,ARE,ABI,48441.0,16
370973,ARE,SPI,17167.0,8
370974,ARE,LIT,5119.0,496
370975,ARE,CAK,39153.0,81


In [48]:
print("Warning: " + str(len(df_iata.loc[df_iata['Code'].isnull()])) + " airport(s) missing info.")



#### Update incoming travel volume data

In [49]:
# Thiessen data
in_table = v_folder + 'Thiessen_County_Intersect_Pct.csv'
df_tpct = pd.read_csv(in_table)
# note: FIPS_1 means all states within an airport Thiessen polygon. One airport (Code) has at least one associated state (FIPS_1).
# The sum of ThiessenAreaPct for the same airport should be 100%.
df_tpct = df_tpct[['Code', 'FIPS_1', 'ThiessenAreaPct']]
print(len(df_tpct))
df_tpct.sort_values(by='Code').head(15)

7137


Unnamed: 0,Code,FIPS_1,ThiessenAreaPct
6071,ABE,42095,14.27621
6070,ABE,42089,11.460134
6069,ABE,42011,14.898813
6068,ABE,42077,13.235343
6067,ABE,42017,8.398984
6066,ABE,42091,5.618685
6065,ABE,42103,0.115441
6064,ABE,34041,8.097077
6063,ABE,42107,10.020279
6061,ABE,42025,10.374824


In [50]:
# diffuse the travel volume to each county (make sure there is no null values after the left join)
df_temp = pd.merge(df_iata, df_tpct, how='left', on='Code')
df_temp.loc[df_temp['FIPS_1'].isnull()]

Unnamed: 0,ISO,Code,FIPS,paxVolume,FIPS_1,ThiessenAreaPct


In [51]:
df_temp['travelVolume'] = df_temp['paxVolume'] * df_temp['ThiessenAreaPct'] / 100
print(len(df_temp))
df_temp.sort_values(by=['ISO','Code']).head(15)

483595


Unnamed: 0,ISO,Code,FIPS,paxVolume,FIPS_1,ThiessenAreaPct,travelVolume
29360,ABW,ABE,42077.0,534,42025,10.374824,55.40156
29361,ABW,ABE,42077.0,534,34019,3.324779,17.754319
29362,ABW,ABE,42077.0,534,42107,10.020279,53.508292
29363,ABW,ABE,42077.0,534,34041,8.097077,43.238391
29364,ABW,ABE,42077.0,534,42103,0.115441,0.616457
29365,ABW,ABE,42077.0,534,42091,5.618685,30.003779
29366,ABW,ABE,42077.0,534,42017,8.398984,44.850574
29367,ABW,ABE,42077.0,534,42077,13.235343,70.676731
29368,ABW,ABE,42077.0,534,42011,14.898813,79.559661
29369,ABW,ABE,42077.0,534,42089,11.460134,61.197117


In [52]:
df = df_temp.groupby(['ISO','FIPS_1'])['travelVolume'].sum().reset_index()
# update df_iata with travel volume for more counties
df["FIPS"] = df["FIPS_1"]
df["paxVolume"] = df["travelVolume"]
df = df[["FIPS","ISO","paxVolume"]]
df.head(5)

Unnamed: 0,FIPS,ISO,paxVolume
0,1001,ABW,7.354981
1,1003,ABW,34.524634
2,1005,ABW,0.491665
3,1007,ABW,85.46993
4,1009,ABW,82.550875


In [53]:
df_iata = df
print(len(df_iata))
df_iata.head(5)

360692


Unnamed: 0,FIPS,ISO,paxVolume
0,1001,ABW,7.354981
1,1003,ABW,34.524634
2,1005,ABW,0.491665
3,1007,ABW,85.46993
4,1009,ABW,82.550875


#### Calculate risk (same as Task 1)

#### Calculate $r_{ij}^{t}$

In [54]:
df_temp = pd.merge(df_iata, df_outbreak, how='left', left_on='ISO',right_on='alpha3')
df_factors = pd.merge(df_temp, df_nme, how='left', left_on='FIPS',right_on='FIPS')
df_factors.head(5)

Unnamed: 0,FIPS,ISO,paxVolume,alpha3,Country,Total,pop2019,County,VR,UVR,Population
0,1001,ABW,7.354981,,,,,"Autauga, AL",0.9639,0.0361,55504.0
1,1003,ABW,34.524634,,,,,"Baldwin, AL",0.9653,0.0347,212628.0
2,1005,ABW,0.491665,,,,,"Barbour, AL",0.8827,0.1173,25270.0
3,1007,ABW,85.46993,,,,,"Bibb, AL",0.9454,0.0546,22668.0
4,1009,ABW,82.550875,,,,,"Blount, AL",0.973,0.027,58013.0


In [55]:
# rename and reorder col.
df_factors.loc[:,('FIPS_Pop')] = df_factors['Population']
df_factors.loc[:,('ISO_Case')] = df_factors['Total']
df_factors.loc[:,('ISO_Pop')] = df_factors[year_pop]
df_factors = df_factors[['FIPS','County','UVR','FIPS_Pop','ISO','Country','ISO_Case','ISO_Pop','paxVolume']]
print(len(df_factors))
df_factors.head(5)

360692


Unnamed: 0,FIPS,County,UVR,FIPS_Pop,ISO,Country,ISO_Case,ISO_Pop,paxVolume
0,1001,"Autauga, AL",0.0361,55504.0,ABW,,,,7.354981
1,1003,"Baldwin, AL",0.0347,212628.0,ABW,,,,34.524634
2,1005,"Barbour, AL",0.1173,25270.0,ABW,,,,0.491665
3,1007,"Bibb, AL",0.0546,22668.0,ABW,,,,85.46993
4,1009,"Blount, AL",0.027,58013.0,ABW,,,,82.550875


In [56]:
# slice
df_factors = df_factors.loc[df_factors['ISO_Case'].notnull()]
print(len(df_factors))
df_factors = df_factors.loc[df_factors['paxVolume'].notnull()]
print(len(df_factors))

309581
309581


#### Calculate $r_{j}^{t}$

In [57]:
df_factors['Route_Risk'] = (df_factors['ISO_Case'] / df_factors['ISO_Pop']) * df_factors['paxVolume'] * df_factors['UVR'] * df_factors['FIPS_Pop']

In [58]:
#df_risk = df_factors.groupby(['FIPS','County'])['Route_Risk'].sum().reset_index()
df_risk = df_factors.groupby(['FIPS','County']).agg({'Route_Risk':'sum', 'paxVolume':'sum','UVR':'mean'}).reset_index()
df_risk.loc[:,('FIPS_RawRisk')] = df_risk['Route_Risk']
df_risk.head(5)

Unnamed: 0,FIPS,County,Route_Risk,paxVolume,UVR,FIPS_RawRisk
0,1001,"Autauga, AL",33640.83,885.76034,0.0361,33640.83
1,1003,"Baldwin, AL",1571583.0,5521.720125,0.0347,1571583.0
2,1005,"Barbour, AL",46698.38,737.650168,0.1173,46698.38
3,1007,"Bibb, AL",218095.3,10985.952798,0.0546,218095.3
4,1009,"Blount, AL",304344.3,11312.195605,0.027,304344.3


#### Normalize and list the Top 25

In [59]:
highest_risk = df_risk['FIPS_RawRisk'].max()
df_risk['Risk'] = df_risk['FIPS_RawRisk'] / highest_risk
df_risk['FIPS_Rank'] = df_risk['Risk'].rank(ascending=False)
df_risk = pd.merge(df_risk, df_seat, how='left', left_on='FIPS',right_on='FIPS')
df_risk['Year'] = year
df_risk = df_risk[['FIPS','County','City','FIPS_RawRisk','Risk','FIPS_Rank','Year','paxVolume','UVR']]
df_risk = df_risk.sort_values('Risk',ascending = False).reset_index(drop=True)
df_risk = pd.merge(df_risk, real_case, how='left', left_on='FIPS',right_on='FIPS')
df_risk.head(25)

Unnamed: 0,FIPS,County,City,FIPS_RawRisk,Risk,FIPS_Rank,Year,paxVolume,UVR,# of Cases,intl_case
0,6037,"Los Angeles, CA",Los Angeles,239930600000.0,1.0,1.0,2019,10929440.0,0.055,22.0,1.0
1,15003,"Honolulu, HI",Honolulu,14128400000.0,0.058885,2.0,2019,2713697.0,0.085,1.0,
2,12086,"Miami-Dade, FL",Miami,13103570000.0,0.054614,3.0,2019,4676139.0,0.051628,,
3,53033,"King, WA",Seattle,9903356000.0,0.041276,4.0,2019,404521.8,0.29,13.0,1.0
4,53053,"Pierce, WA",,8880764000.0,0.037014,5.0,2019,690366.8,0.379,2.0,
5,4013,"Maricopa, AZ",Phoenix,7515875000.0,0.031325,6.0,2019,1196394.0,0.073,,
6,6081,"San Mateo, CA",Redwood City,6755072000.0,0.028154,7.0,2019,6126696.0,0.034,6.0,1.0
7,6073,"San Diego, CA",San Diego,5645237000.0,0.023529,8.0,2019,960673.4,0.075,2.0,1.0
8,34025,"Monmouth, NJ",,5512127000.0,0.022974,9.0,2019,7086960.0,0.04,2.0,
9,32003,"Clark, NV",Las Vegas,3327249000.0,0.013868,10.0,2019,1735306.0,0.049,1.0,1.0


In [60]:
result = df_risk
output_csv = out_folder + 'MeaslesRisk_US_' +  str(year) + '_voronoi_' + t.strftime('%m%d%y%H%M') + '.csv'
#result.to_csv(output_csv, index=False, encoding='utf-8')

In [61]:
df_complete = pd.merge(df_factors, df_risk , how='left', left_on='FIPS',right_on='FIPS')
df_complete = df_complete.sort_values(by=['Risk','Route_Risk'], ascending=False)
df_complete['Route_Rank'] = df_complete.groupby('FIPS_Rank')['Route_Risk'].rank(ascending=False,method='dense')
df_complete = df_complete.rename(index=str, columns={"County_x": "County"})
df_complete = df_complete.drop(columns=['County_y'])
df_complete = df_complete.rename(index=str, columns={"paxVolume_x": "paxVolume_route"})
df_complete = df_complete.rename(index=str, columns={"paxVolume_y": "paxVolume_county"})
df_complete = df_complete.rename(index=str, columns={"UVR_y": "UVR"})
df_complete = df_complete.drop(columns=['UVR_x'])
print(len(df_complete))
df_complete.head(5)

309581


Unnamed: 0,FIPS,County,FIPS_Pop,ISO,Country,ISO_Case,ISO_Pop,paxVolume_route,Route_Risk,City,FIPS_RawRisk,Risk,FIPS_Rank,Year,paxVolume_county,UVR,# of Cases,intl_case,Route_Rank
238812,6037,"Los Angeles, CA",10163507.0,PHL,Philippines,42467.0,108116.615,367335.431398,80654430000.0,Los Angeles,239930600000.0,1.0,1.0,2019.0,10929440.0,0.055,22.0,1.0,1.0
303140,6037,"Los Angeles, CA",10163507.0,WSM,Samoa,5580.0,197.097,2219.867414,35130760000.0,Los Angeles,239930600000.0,1.0,1.0,2019.0,10929440.0,0.055,22.0,1.0,2.0
227613,6037,"Los Angeles, CA",10163507.0,NZL,New Zealand,1997.0,4783.063,114705.664906,26770910000.0,Los Angeles,239930600000.0,1.0,1.0,2019.0,10929440.0,0.055,22.0,1.0,3.0
294013,6037,"Los Angeles, CA",10163507.0,UKR,Ukraine,56986.0,43993.638,19098.806728,13829000000.0,Los Angeles,239930600000.0,1.0,1.0,2019.0,10929440.0,0.055,22.0,1.0,4.0
145317,6037,"Los Angeles, CA",10163507.0,ISR,Israel,998.0,8519.377,137553.806728,9007445000.0,Los Angeles,239930600000.0,1.0,1.0,2019.0,10929440.0,0.055,22.0,1.0,5.0


In [62]:
result = df_complete
output_csv = out_folder + 'MeaslesRisk_US_' +  str(year) + '_voronoi_route_' + t.strftime('%m%d%y%H%M') + '.csv'
#result.to_csv(output_csv, index=False, encoding='utf-8')