# Gather_Population_And_Other_Mortality_Cause_Data 

This notebook loads and cleans data related to population statistics for Kearney, the state of Nebraska, and the US for the years 2006-2018.  It also loads data related to causes of death in Nebraska for the years 2006-2018, and calculates estiamted statistics for Kearney based on a Kearney-to-Nebraska population ratio.  Results are saved to CSV files.  


# License
This code was developed by Susan Boyd for use in HW1 assigned in DATA 512, a course in the UW MS Data Science degree program. This code is provided under an MIT license


# Chat GPT Attribution
Some functions or code blocks in this Notebook were created with assistance from Chat GPT (https://chat.openai.com/). The impacted code is isolated in a function and the use of Chat GPT is noted, along with information on the prompts used to query Chat GPT provided at the end of the noteboo




# Step 0 Prepare Notebook

In [1]:
# import needed libraries 
import numpy as np
import pandas as pd

# Step 1 Load US Population Data 

In [2]:
#2000-2010
# Source: "Annual Estimates of the Resident Population for the United States, Regions, States, 
# the District of Columbia, and Puerto Rico: April 1, 2010 to July 1, 2019; April 1, 2020; and July 1, 2020 (NST-EST2020)""
# US Census website: https://www.census.gov/data/tables/time-series/demo/popest/intercensal-2000-2010-national.html

f = "Data/Pop_Raw/us_pop_2000_2010.xls"
US_pop2000_2010 = pd.read_excel(f)
US_pop2000_2009 = US_pop2000_2010.iloc[0:10]
#US_pop2000_2009



In [3]:
#2010-2020
# Source: "Annual Estimates of the Resident Population for the United States, Regions, States, the District of Columbia, 
# and Puerto Rico: April 1, 2010 to July 1, 2019; April 1, 2020; and July 1, 2020 (NST-EST2020)"
# Website: https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates . . .
# /2020-evaluation-estimates/2010s-totals-national.html

f= "Data/Pop_Raw/us_pop_2010_2020.xlsx"
US_pop2010_2020 = pd.read_excel(f)
#US_pop2010_2020


In [4]:
# Combine Years 
US_pop = pd.concat([US_pop2000_2009, US_pop2010_2020], axis = 0, ignore_index=True)
US_pop.head()

Unnamed: 0,Year,US POP
0,2000,282162411
1,2001,284968955
2,2002,287625193
3,2003,290107933
4,2004,292805298


In [5]:
# Save to CSV file 
f_out =  "Data/us_pop_2000_2020.csv"
US_pop.to_csv(f_out, index = False)

# Step 2 Load  And Calculate Data for Kearney 

For Kearney, US Census provides census counts (2000, 2010, 2020) but not intercensus estimates.  I'll estimate annaul based on smooth growth rates between census. 

In [6]:
# For Kearney, US Census 
# Source: https://www.census.gov/quickfacts/fact/table/kearneycitynebraska/PST045222
# and https://en.wikipedia.org/wiki/Kearney,_Nebraska
KEARNEY_POP_2020 = 33790
KEARNEY_POP_2010 = 30787
KEARNEY_POP_2000 = 27431

In [7]:
# function to calculate CAGRS
# See Chat GPT attribution at end of notebook 

def calculate_cagr(beginning_value, ending_value, number_of_years):
    cagr = ((ending_value / beginning_value) ** (1 / number_of_years)) - 1
    return cagr


In [8]:
# calculate CAGRs
CAGR_2000_2010 = calculate_cagr(KEARNEY_POP_2000, KEARNEY_POP_2010, 10)
CAGR_2010_2020 = calculate_cagr(KEARNEY_POP_2010, KEARNEY_POP_2020, 10)
print(f"Growth Rate 2000-2010 is_{CAGR_2000_2010}; Growth Rate 2010_2020 is_{CAGR_2010_2020}")

Growth Rate 2000-2010 is_0.011608740676639862; Growth Rate 2010_2020 is_0.009350684800372155


In [9]:
# Create Estimates 

## 00 to 10
Years_00 = [2000]
Pop_00 = [KEARNEY_POP_2000]
Years_to_Calc = [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009]
prior_yr = KEARNEY_POP_2000
for yr in Years_to_Calc: 
    Years_00.append(yr)
    new = prior_yr*(1+CAGR_2000_2010)
    Pop_00.append(round(new))
    prior_yr = new

## 10_to 20 
Years_10 = [2010]
Pop_10 = [KEARNEY_POP_2010]
Years_to_Calc = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
prior_yr = KEARNEY_POP_2010
for yr in Years_to_Calc: 
    Years_10.append(yr)
    new = prior_yr*(1+CAGR_2010_2020)
    Pop_10.append(round(new))
    prior_yr = new

## 21_and_after
## Assume same CAGR as 2010 to 2020 
Years_20 = [2020]
Pop_20 = [KEARNEY_POP_2020]
Years_to_Calc = list(range(2021, 2046))
prior_yr = KEARNEY_POP_2020
for yr in Years_to_Calc: 
    Years_20.append(yr)
    new = prior_yr*(1+CAGR_2010_2020)
    Pop_20.append(round(new))
    prior_yr = new

all_yrs = Years_00 + Years_10 + Years_20
all_pop = Pop_00 + Pop_10 + Pop_20

Kearney_pop = pd.DataFrame({
    "Year": all_yrs,
    "Pop": all_pop
})

print(Kearney_pop.head())
print(Kearney_pop.tail())


   Year    Pop
0  2000  27431
1  2001  27749
2  2002  28072
3  2003  28397
4  2004  28727
    Year    Pop
41  2041  41084
42  2042  41468
43  2043  41856
44  2044  42247
45  2045  42642


In [10]:
# save to file 
f_out = "Data/Kearney_Pop.csv"
Kearney_pop.to_csv(f_out, index = False)

# Step 3 - Calculate Kearney_To_Neb Pop Ratio 

To translate mortality estimates reported for the state of Nebraska to be specific to Kearney, we'll scale the all-Nebraska stats by the population ratio between Kearney and Nebraska.  For this purpose, we'll use 2020 population for both Kearney and Nebraska.  

In [11]:
# Nebraska Pop 2020
# Source: https://www.census.gov/quickfacts/fact/table/NE,US/PST045222
NEBRASKA_POP_2020 =  1961504

In [12]:
# Kearney to Nebrasca Pop Ratio
kearney_to_neb_ratio = KEARNEY_POP_2020/NEBRASKA_POP_2020
print("Kearney to NE pop ratio is:", round(kearney_to_neb_ratio, 3))

Kearney to NE pop ratio is: 0.017


# Step 4  - Other Causes Premature Death

In [13]:
# Traffic fatalities all of Nebraska 
#https://www-fars.nhtsa.dot.gov/States/StatesFatalitiesFatalityRates.aspx

tf_2006 = 269
tf_2007 = 256
tf_2008 = 208
tf_2009 = 223
tf_2010 = 190
tf_2011 = 181
tf_2012 = 212
tf_2013 = 211
tf_2014 = 225
tf_2015 = 246
tf_2016 = 218
tf_2017 = 228
tf_2018 = 230

traffic_fatality_NE_per_year = [tf_2006,tf_2007, tf_2008, tf_2009, tf_2010, tf_2011, tf_2012, tf_2013, \
                   tf_2014, tf_2015, tf_2016, tf_2017, tf_2018]

traffic_fatality_KE_per_year =[x * kearney_to_neb_ratio for x in traffic_fatality_NE_per_year]
traffic_fatality_KE_cum = sum(traffic_fatality_KE_per_year)

print(f"For 2006-2018, Estimated Traffic Fatalites in Kearney is _{round(traffic_fatality_KE_cum, 1)}")

For 2006-2018, Estimated Traffic Fatalites in Kearney is _49.9


In [14]:
#  Firearm Deaths in Nebraska
# From https://usafacts.org/data/topics/security-safety/crime-and-justice/firearms/firearm-deaths/

fd_2006 = 141
fd_2007 = 142
fd_2008 = 150
fd_2009 = 132
fd_2010 = 152
fd_2011 = 161
fd_2012 = 167
fd_2013 = 168
fd_2014 = 179
fd_2015 = 168
fd_2016 = 171
fd_2017 = 160
fd_2018 = 183

fire_arm_deaths_NE_per_year = [fd_2006,fd_2007, fd_2008, fd_2009, fd_2010, fd_2011, fd_2012, fd_2013, \
                   fd_2014, fd_2015, fd_2016, fd_2017, fd_2018]

fire_arm_deaths_KE_per_yr =  [x * kearney_to_neb_ratio for x in fire_arm_deaths_NE_per_year]

fire_arm_deaths_KE_cum = sum(fire_arm_deaths_KE_per_yr)

print(f"For 2006-2018, estimated fire arms deaths in Kearney is_ {round(fire_arm_deaths_KE_cum,1)}")

For 2006-2018, estimated fire arms deaths in Kearney is_ 35.7


In [15]:
# homocide rates (only available for 2014-2018, take average)
# https://www.cdc.gov/nchs/pressroom/sosmap/homicide_mortality/homicide.htm


ho_NE_2014 = 63
ho_NE_2015 = 75
ho_NE_2016 = 60
ho_NE_2017 = 50
ho_NE_2018 = 35
ave_ho_NE_per_yr = np.mean([ho_NE_2014, ho_NE_2015, ho_NE_2016, ho_NE_2017, ho_NE_2018])
cum_NE = ave_ho_NE_per_yr*13
cum_NE
cum_homicide_KE = cum_NE*kearney_to_neb_ratio
cum_homicide_KE 


12.675315472209082

In [16]:
# Other causes of death 
#https://www.cdc.gov/nchs/data-visualization/mortality-leading-causes/index.htm

# load death per Year  # Note that this is only through 2017! 
f = "Data/Other_Mortality_Raw/NCHS_-_Leading_Causes_of_Death__United_States (1).csv"
death_US_per_year = pd.read_csv(f)

# filter to only NE
death_NE_per_year = death_US_per_year[death_US_per_year["State"]=="Nebraska"]
death_NE_per_year

# Filter years 
YEARS = [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]
death_NE_per_year = death_NE_per_year[death_NE_per_year["Year"].isin(YEARS)]

# Drop unneeded columns 
death_NE_per_year = death_NE_per_year[death_NE_per_year.columns.drop(["113 Cause Name", "State", "Age-adjusted Death Rate"])]

# Coerce the "Deaths" colums to numbers
death_NE_per_year['Deaths'] = death_NE_per_year['Deaths'].str.replace(',', '').astype(int)


death_NE_per_year.head()

Unnamed: 0,Year,Cause Name,Deaths
28,2017,Unintentional injuries,811
80,2017,All causes,16878
132,2017,Alzheimer's disease,698
184,2017,Stroke,760
236,2017,CLRD,1224


In [17]:
# Totals 2006-2017
cum_NE_2006_2017 = death_NE_per_year.groupby('Cause Name')['Deaths'].sum().reset_index()
cum_NE_2006_2017 

Unnamed: 0,Cause Name,Deaths
0,All causes,188306
1,Alzheimer's disease,6929
2,CLRD,12629
3,Cancer,41359
4,Diabetes,5751
5,Heart disease,40799
6,Influenza and pneumonia,4016
7,Kidney disease,2964
8,Stroke,9966
9,Suicide,2577


In [18]:
# Heart Disease Cum for NE including 2018
#https://www.cdc.gov/nchs/pressroom/sosmap/heart_disease_mortality/heart_disease.htm for 2018 val
heart_disease_NE_2018 = 3539
heart_disease_cum_NE_2006_2018 = cum_NE_2006_2017[cum_NE_2006_2017["Cause Name"]=="Heart disease"]["Deaths"] + heart_disease_NE_2018

heart_disease_KE_cum = int(heart_disease_cum_NE_2006_2018)*kearney_to_neb_ratio
print("Cumulative heart disease 2016-2018 in NE is:", round(heart_disease_KE_cum, 2))



Cumulative heart disease 2016-2018 in NE is: 763.79


In [19]:
# Create a dataframe of non pollution causes and rates 2006-2018


causes = ["Fire Arms", "Traffic Fatalities", "Heart Disease", "Homocide"]
deaths = [fire_arm_deaths_KE_cum, traffic_fatality_KE_cum, heart_disease_KE_cum, cum_homicide_KE] 

mortality_other = pd.DataFrame ({
    'Cause': causes,
    "Deaths": deaths })
    
mortality_other
    
    

Unnamed: 0,Cause,Deaths
0,Fire Arms,35.727921
1,Traffic Fatalities,49.905394
2,Heart Disease,763.791978
3,Homocide,12.675315


In [20]:
# Save to file 
f_out = "Data/Other_Cause_Mortality.csv"
mortality_other.to_csv(f_out, index= False)

# Chat GPT Attribution 

The following function(s) or codeblock(s) contained in this notebook were written with assistance from Chat GPT available at: https://chat.openai.com/. In some cases, code suggested by Chat GPT was then further modified by the Notebook author, Sue Boyd

***
For assitance writing the "calculate_cagr" function, Chat GPT was given the following prompt:
    
"I have two variables pop2000 and pop2010 representing population in 2000 and 2010 respectively.  In python calculate the compound annual growth rate"    

These are age adjusted figures so some are a bit different then above, but general ballparks look right.   