# **NOTEBOOK 3:**
## Cleaning and Exploring Complete County Statistics Data

**Source:** https://www.openintro.org/data/?data=county_complete

This notebook will explore the data in the file containing information about a lot of statistics for U.S. counties for many years. Given data about diabetes prevalence from Notebook 1, I will focus on 2017.

The goal is to narrow down county data to only measurements about education level, employment, insurance, and income, to connect that to data about diabetes prevalence (from 2017).

After cleaning and adjusting the data sets into smaller data frames, I will merge them with three simplified diabetes prevalence file from Notebook 1, for data exploration in the following notebooks.

### Set up data

In [159]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np

import warnings
warnings.simplefilter('ignore')

### Load and inspect the data

In [160]:
#import complete county file
counties_df = pd.read_csv('../data/county_complete.csv')

In [161]:
#what is the shape of the data frame?
nrows, ncols = counties_df.shape

print(f'Data frame has {nrows} rows and {ncols} columns.')

Data frame has 3142 rows and 188 columns.


In [162]:
#what variables are measured for each county?
print(counties_df.columns)

Index(['fips', 'state', 'name', 'pop2000', 'pop2010', 'pop2011', 'pop2012',
       'pop2013', 'pop2014', 'pop2015',
       ...
       'poverty_under_18_2019', 'two_plus_races_2019',
       'unemployment_rate_2019', 'uninsured_2019',
       'uninsured_65_and_older_2019', 'uninsured_under_19_2019',
       'uninsured_under_6_2019', 'veterans_2019', 'white_2019',
       'white_not_hispanic_2019'],
      dtype='object', length=188)


In [165]:
#renaming and adjusting for merging purposes, want them to be the same
counties_df = counties_df.rename(columns={'name': 'county_name'})
counties_df['county_name'] = counties_df['county_name'].str.lower()

In [167]:
#add a column for state abbreviations, to facilitate data frame merging
state_abbrev_dict = {
    'Alabama':'AL',
    'Alaska':'AK',
    'Arizona':'AZ',
    'Arkansas':'AR',
    'California':'CA',
    'Colorado':'CO',
    'Connecticut':'CT',
    'Delaware':'DE',
    'District of Columbia':'DC',
    'Florida':'FL',
    'Georgia':'GA',
    'Hawaii':'HI',
    'Idaho':'ID',
    'Illinois':'IL',
    'Indiana':'IN',
    'Iowa':'IA',
    'Kansas':'KS',
    'Kentucky':'KY',
    'Louisiana':'LA',
    'Maine':'ME',
    'Maryland':'MD',
    'Massachusetts':'MA',
    'Michigan':'MI',
    'Minnesota':'MN',
    'Mississippi':'MS',
    'Missouri':'MO',
    'Montana':'MT',
    'Nebraska':'NE',
    'Nevada':'NV', 
    'New Hampshire':'NH',
    'New Jersey':'NJ',
    'New Mexico':'NM',
    'New York':'NY',
    'North Carolina':'NC',
    'North Dakota':'ND',
    'Ohio':'OH',
    'Oklahoma':'OK',
    'Oregon':'OR',
    'Pennsylvania':'PA',
    'Rhode Island':'RI',
    'South Carolina':'SC',
    'South Dakota':'SD',
    'Tennessee':'TN',
    'Texas':'TX',
    'Utah':'UT',
    'Vermont':'VT',
    'Virginia':'VA',
    'Washington':'WA',
    'West Virginia':'WV',
    'Wisconsin':'WI',
    'Wyoming':'WY'
}

counties_df['state_abbr'] = counties_df['state'].replace(state_abbrev_dict)

#make sure values match
counties_df.sample(10)[['state', 'state_abbr']]

Unnamed: 0,state,state_abbr
1431,Mississippi,MS
1289,Michigan,MI
2977,Washington,WA
2104,Ohio,OH
2562,Texas,TX
866,Iowa,IA
2382,South Dakota,SD
1660,Nebraska,NE
2637,Texas,TX
208,California,CA


In [168]:
#create list for columns only relevant to 2017
col_2017 = ['state_abbr','county_name','pop2017','hs_grad_2017', 'some_college_2017', 'bachelors_2017','per_capita_income_2017','median_household_income_2017', 'poverty_2017', 'employed_2017','unemployed_2017','unemployment_rate_2017', 'uninsured_2017']

In [169]:
#make a new data frame for relevant 2017 data
counties_2017_df = counties_df[col_2017]
counties_2017_df

Unnamed: 0,state_abbr,county_name,pop2017,hs_grad_2017,some_college_2017,bachelors_2017,per_capita_income_2017,median_household_income_2017,poverty_2017,employed_2017,unemployed_2017,unemployment_rate_2017,uninsured_2017
0,AL,autauga county,55504.0,87.7,29.1,25.0,27841.70,55317.0,13.7,24908.0,1001.0,3.86,8.8
1,AL,baldwin county,212628.0,90.2,31.6,30.7,27779.85,52562.0,11.8,87915.0,3652.0,3.99,10.8
2,AL,barbour county,25270.0,73.1,25.5,12.0,17891.73,33368.0,27.2,7750.0,486.0,5.90,12.3
3,AL,bibb county,22668.0,82.1,25.0,13.2,20572.05,43404.0,15.2,8133.0,373.0,4.39,8.1
4,AL,blount county,58013.0,79.8,34.4,13.1,21367.39,47412.0,15.6,23509.0,985.0,4.02,11.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3137,WY,sweetwater county,43534.0,91.3,35.5,22.2,30282.59,71083.0,12.0,20599.0,982.0,4.55,12.4
3138,WY,teton county,23265.0,95.1,28.0,54.1,48557.37,80049.0,6.8,15005.0,462.0,2.99,11.7
3139,WY,uinta county,20495.0,91.8,37.0,17.4,27048.12,54672.0,14.9,8727.0,411.0,4.50,13.5
3140,WY,washakie county,8064.0,88.5,38.6,21.0,27494.83,51362.0,12.8,3947.0,168.0,4.08,16.8


In [170]:
#create a new csv file for simplified data frame
counties_2017_df.to_csv('../data/counties_2017_stats.csv', index=False)

## Creating smaller data frames

In [171]:
#for population data (2010, 2017)
col_pop = ['state_abbr','county_name','pop2000','pop2010','pop2017','white_not_hispanic_2010','black_2010','native_2010','asian_2010','pac_isl_2010','two_plus_races_2010','hispanic_2010','white_not_hispanic_2017','black_2017','native_2017','asian_2017','pac_isl_2017','two_plus_races_2017','hispanic_2017']

In [172]:
counties_pop_df = counties_df[col_pop]
counties_pop_df

Unnamed: 0,state_abbr,county_name,pop2000,pop2010,pop2017,white_not_hispanic_2010,black_2010,native_2010,asian_2010,pac_isl_2010,two_plus_races_2010,hispanic_2010,white_not_hispanic_2017,black_2017,native_2017,asian_2017,pac_isl_2017,two_plus_races_2017,hispanic_2017
0,AL,autauga county,43671.0,54571,55504.0,77.2,17.7,0.4,0.9,,1.6,2.4,75.42,9.55,0.15,0.47,0.04,0.84,2.67
1,AL,baldwin county,140415.0,182265,212628.0,83.5,9.4,0.7,0.7,,1.5,4.4,83.08,4.77,0.41,0.35,0.00,0.82,4.44
2,AL,barbour county,29038.0,27457,25270.0,46.8,46.9,0.4,0.4,,0.9,5.1,45.74,24.02,0.10,0.31,0.00,0.41,4.21
3,AL,bibb county,20826.0,22915,22668.0,75.0,22.0,0.3,0.1,,0.9,1.8,74.62,11.03,0.18,0.00,0.00,0.42,2.35
4,AL,blount county,51024.0,57322,58013.0,88.9,1.3,0.5,0.2,,1.2,8.1,87.37,0.79,0.18,0.07,0.00,0.85,9.01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3137,WY,sweetwater county,37613.0,43806,43534.0,80.9,1.0,1.0,0.8,,2.3,15.3,79.64,0.41,0.51,0.29,0.23,1.24,16.01
3138,WY,teton county,18251.0,21294,23265.0,82.2,0.2,0.5,1.1,,1.6,15.0,81.49,0.32,0.20,1.10,0.00,0.32,15.00
3139,WY,uinta county,19742.0,21118,20495.0,88.5,0.3,0.8,0.3,0.2,2.0,8.8,87.66,0.07,0.51,0.06,0.00,1.56,9.07
3140,WY,washakie county,8289.0,8533,8064.0,83.9,0.3,1.1,0.6,,2.4,13.6,82.19,0.15,0.32,0.07,0.00,1.83,14.24


In [173]:
#create a new csv file for simplified data frame
counties_pop_df.to_csv('../data/counties_pop_2010_2017stats.csv', index=False)

In [174]:
#for education level (2010, 2017)
col_edu = ['state_abbr','county_name','hs_grad_2010','hs_grad_2017','bachelors_2010','bachelors_2017']

In [175]:
counties_edu_df = counties_df[col_edu]
counties_edu_df

Unnamed: 0,state_abbr,county_name,hs_grad_2010,hs_grad_2017,bachelors_2010,bachelors_2017
0,AL,autauga county,85.3,87.7,21.7,25.0
1,AL,baldwin county,87.6,90.2,26.8,30.7
2,AL,barbour county,71.9,73.1,13.5,12.0
3,AL,bibb county,74.5,82.1,10.0,13.2
4,AL,blount county,74.7,79.8,12.5,13.1
...,...,...,...,...,...,...
3137,WY,sweetwater county,89.9,91.3,17.1,22.2
3138,WY,teton county,95.1,95.1,49.7,54.1
3139,WY,uinta county,88.3,91.8,17.4,17.4
3140,WY,washakie county,89.6,88.5,24.5,21.0


In [176]:
#create a new csv file for simplified data frame
counties_edu_df.to_csv('../data/counties_edu_2010_2017stats.csv', index=False)

In [177]:
#for income and employment (2010, 2017)
col_econ = ['state_abbr','county_name','per_capita_income_2010','per_capita_income_2017','median_household_income_2010','median_household_income_2017','poverty_2010','poverty_2017','employed_2010','unemployed_2010','unemployment_rate_2010','employed_2017','unemployed_2017','unemployment_rate_2017']

In [178]:
counties_econ_df = counties_df[col_econ]
counties_econ_df

Unnamed: 0,state_abbr,county_name,per_capita_income_2010,per_capita_income_2017,median_household_income_2010,median_household_income_2017,poverty_2010,poverty_2017,employed_2010,unemployed_2010,unemployment_rate_2010,employed_2017,unemployed_2017,unemployment_rate_2017
0,AL,autauga county,24568,27841.70,53255,55317.0,10.6,13.7,23431.0,2282.0,8.87,24908.0,1001.0,3.86
1,AL,baldwin county,26469,27779.85,50147,52562.0,12.2,11.8,75120.0,8339.0,9.99,87915.0,3652.0,3.99
2,AL,barbour county,15875,17891.73,33219,33368.0,25.0,27.2,8959.0,1262.0,12.35,7750.0,486.0,5.90
3,AL,bibb county,19918,20572.05,41770,43404.0,12.6,15.2,7914.0,1020.0,11.42,8133.0,373.0,4.39
4,AL,blount county,21070,21367.39,45549,47412.0,13.4,15.6,22460.0,2446.0,9.82,23509.0,985.0,4.02
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3137,WY,sweetwater county,30961,30282.59,69828,71083.0,8.2,12.0,21608.0,1526.0,6.60,20599.0,982.0,4.55
3138,WY,teton county,42224,48557.37,70271,80049.0,8.2,6.8,12569.0,1042.0,7.66,15005.0,462.0,2.99
3139,WY,uinta county,24460,27048.12,58346,54672.0,12.1,14.9,9815.0,742.0,7.03,8727.0,411.0,4.50
3140,WY,washakie county,28557,27494.83,48379,51362.0,5.6,12.8,4318.0,271.0,5.91,3947.0,168.0,4.08


In [179]:
#create a new csv file for simplified data frame
counties_econ_df.to_csv('../data/counties_econ_2010_2017stats.csv', index=False)

## Merging 2017 County Data Frame with Diabetes Prevalence by County Data (2017), for cross-comparisons

In [180]:
#import simplified diabetes data frame from Notebook 1
diabetes_prev_2017_df = pd.read_csv('../data/diabetes_prev_edited.csv')
diabetes_prev_2017_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode
0,metropolitan,anchorage municipality,AK,8.400030,low,green,blue
1,metropolitan,fairbanks north star borough,AK,7.100060,low,green,blue
2,metropolitan,matanuska-susitna borough,AK,8.900071,low,green,blue
3,metropolitan,autauga county,AL,12.700204,high,orange,blue
4,metropolitan,baldwin county,AL,10.300056,medium,yellow,blue
...,...,...,...,...,...,...,...
3139,nonmetropolitan,sweetwater county,WY,8.300101,low,green,red
3140,nonmetropolitan,teton county,WY,2.400087,very low,blue,red
3141,nonmetropolitan,uinta county,WY,10.300429,medium,yellow,red
3142,nonmetropolitan,washakie county,WY,10.700487,medium,yellow,red


In [181]:
#merge data to create one single 2017 county stat data frame
all_county_stats_2017_df = pd.merge(diabetes_prev_2017_df, counties_2017_df, on=['county_name','state_abbr'], how='inner')
all_county_stats_2017_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode,pop2017,hs_grad_2017,some_college_2017,bachelors_2017,per_capita_income_2017,median_household_income_2017,poverty_2017,employed_2017,unemployed_2017,unemployment_rate_2017,uninsured_2017
0,metropolitan,anchorage municipality,AK,8.400030,low,green,blue,294356.0,93.4,35.0,34.6,38324.82,82271.0,8.1,146175.0,9287.0,5.97,13.2
1,metropolitan,fairbanks north star borough,AK,7.100060,low,green,blue,99703.0,94.6,39.8,33.1,34968.82,76250.0,7.7,43768.0,2940.0,6.29,10.2
2,metropolitan,matanuska-susitna borough,AK,8.900071,low,green,blue,106532.0,92.1,39.4,20.6,27306.48,74887.0,9.8,43292.0,4105.0,8.66,16.3
3,metropolitan,autauga county,AL,12.700204,high,orange,blue,55504.0,87.7,29.1,25.0,27841.70,55317.0,13.7,24908.0,1001.0,3.86,8.8
4,metropolitan,baldwin county,AL,10.300056,medium,yellow,blue,212628.0,90.2,31.6,30.7,27779.85,52562.0,11.8,87915.0,3652.0,3.99,10.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3129,nonmetropolitan,sweetwater county,WY,8.300101,low,green,red,43534.0,91.3,35.5,22.2,30282.59,71083.0,12.0,20599.0,982.0,4.55,12.4
3130,nonmetropolitan,teton county,WY,2.400087,very low,blue,red,23265.0,95.1,28.0,54.1,48557.37,80049.0,6.8,15005.0,462.0,2.99,11.7
3131,nonmetropolitan,uinta county,WY,10.300429,medium,yellow,red,20495.0,91.8,37.0,17.4,27048.12,54672.0,14.9,8727.0,411.0,4.50,13.5
3132,nonmetropolitan,washakie county,WY,10.700487,medium,yellow,red,8064.0,88.5,38.6,21.0,27494.83,51362.0,12.8,3947.0,168.0,4.08,16.8


In [183]:
#save to new file
all_county_stats_2017_df.to_csv('../data/all_county_stats_2017.csv', index=False)

In [184]:
#import low diabetes prevalence county data to merge with county stats
low_prev_df = pd.read_csv('../data/low_db_prev.csv')
low_prev_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode
0,metropolitan,anchorage municipality,AK,8.400030,low,green,blue
1,metropolitan,fairbanks north star borough,AK,7.100060,low,green,blue
2,metropolitan,matanuska-susitna borough,AK,8.900071,low,green,blue
3,metropolitan,coconino county,AZ,7.700018,low,green,blue
4,metropolitan,maricopa county,AZ,8.700003,low,green,blue
...,...,...,...,...,...,...,...
594,nonmetropolitan,niobrara county,WY,8.200103,low,green,red
595,nonmetropolitan,park county,WY,8.900219,low,green,red
596,nonmetropolitan,sheridan county,WY,8.700035,low,green,red
597,nonmetropolitan,sweetwater county,WY,8.300101,low,green,red


In [185]:
all_lowprev_county_stats_2017_df = pd.merge(low_prev_df, counties_2017_df, on=['county_name','state_abbr'], how='inner')
all_lowprev_county_stats_2017_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode,pop2017,hs_grad_2017,some_college_2017,bachelors_2017,per_capita_income_2017,median_household_income_2017,poverty_2017,employed_2017,unemployed_2017,unemployment_rate_2017,uninsured_2017
0,metropolitan,anchorage municipality,AK,8.400030,low,green,blue,294356.0,93.4,35.0,34.6,38324.82,82271.0,8.1,146175.0,9287.0,5.97,13.2
1,metropolitan,fairbanks north star borough,AK,7.100060,low,green,blue,99703.0,94.6,39.8,33.1,34968.82,76250.0,7.7,43768.0,2940.0,6.29,10.2
2,metropolitan,matanuska-susitna borough,AK,8.900071,low,green,blue,106532.0,92.1,39.4,20.6,27306.48,74887.0,9.8,43292.0,4105.0,8.66,16.3
3,metropolitan,coconino county,AZ,7.700018,low,green,blue,140776.0,89.7,33.0,35.4,27266.04,53523.0,21.0,71195.0,4186.0,5.55,13.7
4,metropolitan,maricopa county,AZ,8.700003,low,green,blue,4307033.0,87.1,32.9,31.4,29379.27,58580.0,15.7,2045885.0,89102.0,4.17,12.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
590,nonmetropolitan,niobrara county,WY,8.200103,low,green,red,2397.0,89.5,42.7,18.3,24926.58,36793.0,14.9,1283.0,36.0,2.73,15.9
591,nonmetropolitan,park county,WY,8.900219,low,green,red,29568.0,95.1,35.3,32.2,32426.97,60828.0,7.0,15024.0,664.0,4.23,10.4
592,nonmetropolitan,sheridan county,WY,8.700035,low,green,red,30210.0,95.3,40.3,31.3,32651.82,56455.0,6.8,15164.0,627.0,3.97,9.4
593,nonmetropolitan,sweetwater county,WY,8.300101,low,green,red,43534.0,91.3,35.5,22.2,30282.59,71083.0,12.0,20599.0,982.0,4.55,12.4


In [187]:
#save as a new file
all_lowprev_county_stats_2017_df.to_csv('../data/all_lowprev_county_stats_2017.csv', index=False)

In [188]:
#import high diabetes prevalence county data to merge with 2017 county stats
high_prev_df = pd.read_csv('../data/high_db_prev.csv')
high_prev_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode
0,metropolitan,autauga county,AL,12.700204,high,orange,blue
1,metropolitan,bibb county,AL,13.600642,high,orange,blue
2,metropolitan,blount county,AL,14.600135,high,orange,blue
3,metropolitan,calhoun county,AL,17.000104,very high,red,blue
4,metropolitan,chilton county,AL,19.000031,very high,red,blue
...,...,...,...,...,...,...,...
1500,nonmetropolitan,wetzel county,WV,19.400998,very high,red,red
1501,nonmetropolitan,wyoming county,WV,15.900696,very high,red,red
1502,nonmetropolitan,big horn county,WY,14.400186,high,orange,red
1503,nonmetropolitan,hot springs county,WY,12.101911,high,orange,red


In [189]:
all_highprev_county_stats_2017_df = pd.merge(high_prev_df, counties_2017_df, on=['county_name','state_abbr'], how='inner')
all_highprev_county_stats_2017_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode,pop2017,hs_grad_2017,some_college_2017,bachelors_2017,per_capita_income_2017,median_household_income_2017,poverty_2017,employed_2017,unemployed_2017,unemployment_rate_2017,uninsured_2017
0,metropolitan,autauga county,AL,12.700204,high,orange,blue,55504.0,87.7,29.1,25.0,27841.70,55317.0,13.7,24908.0,1001.0,3.86,8.8
1,metropolitan,bibb county,AL,13.600642,high,orange,blue,22668.0,82.1,25.0,13.2,20572.05,43404.0,15.2,8133.0,373.0,4.39,8.1
2,metropolitan,blount county,AL,14.600135,high,orange,blue,58013.0,79.8,34.4,13.1,21367.39,47412.0,15.6,23509.0,985.0,4.02,11.0
3,metropolitan,calhoun county,AL,17.000104,very high,red,blue,114728.0,83.2,33.2,17.9,23609.64,43686.0,18.6,43333.0,2249.0,4.93,10.4
4,metropolitan,chilton county,AL,19.000031,very high,red,blue,44067.0,81.8,23.3,15.1,22793.82,43501.0,19.4,18344.0,774.0,4.05,14.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1498,nonmetropolitan,wetzel county,WV,19.400998,very high,red,red,15437.0,83.4,22.1,11.8,20224.81,40694.0,23.2,6550.0,507.0,7.18,7.9
1499,nonmetropolitan,wyoming county,WV,15.900696,very high,red,red,21210.0,78.4,20.6,9.3,20608.30,37644.0,22.2,6508.0,490.0,7.00,9.3
1500,nonmetropolitan,big horn county,WY,14.400186,high,orange,red,11906.0,90.5,42.4,19.2,22947.31,51716.0,13.0,5173.0,223.0,4.13,16.3
1501,nonmetropolitan,hot springs county,WY,12.101911,high,orange,red,4696.0,92.5,44.6,21.6,31335.27,48403.0,12.5,2239.0,90.0,3.86,11.4


In [195]:
#save to a new file
all_highprev_county_stats_2017_df.to_csv('../data/all_highprev_county_stats_2017.csv', index=False)

In [191]:
#import all medium level diabetes prevelance county data to merge with 2017 county stats
med_prev_df = pd.read_csv('../data/med_db_prev.csv')
med_prev_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode
0,metropolitan,baldwin county,AL,10.300056,medium,yellow,blue
1,metropolitan,lee county,AL,10.300029,medium,yellow,blue
2,metropolitan,shelby county,AL,10.500051,medium,yellow,blue
3,metropolitan,benton county,AR,9.600004,medium,yellow,blue
4,metropolitan,faulkner county,AR,10.400071,medium,yellow,blue
...,...,...,...,...,...,...,...
1035,nonmetropolitan,lincoln county,WY,9.200293,medium,yellow,red
1036,nonmetropolitan,sublette county,WY,10.401088,medium,yellow,red
1037,nonmetropolitan,uinta county,WY,10.300429,medium,yellow,red
1038,nonmetropolitan,washakie county,WY,10.700487,medium,yellow,red


In [192]:
all_medprev_county_stats_2017_df = pd.merge(med_prev_df, counties_2017_df, on=['county_name','state_abbr'], how='inner')
all_medprev_county_stats_2017_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode,pop2017,hs_grad_2017,some_college_2017,bachelors_2017,per_capita_income_2017,median_household_income_2017,poverty_2017,employed_2017,unemployed_2017,unemployment_rate_2017,uninsured_2017
0,metropolitan,baldwin county,AL,10.300056,medium,yellow,blue,212628.0,90.2,31.6,30.7,27779.85,52562.0,11.8,87915.0,3652.0,3.99,10.8
1,metropolitan,lee county,AL,10.300029,medium,yellow,blue,161604.0,89.9,30.5,34.9,26160.00,47564.0,22.0,70643.0,2877.0,3.91,8.7
2,metropolitan,shelby county,AL,10.500051,medium,yellow,blue,213605.0,92.1,28.9,42.2,34709.33,74063.0,8.3,105804.0,3527.0,3.23,7.4
3,metropolitan,benton county,AR,9.600004,medium,yellow,blue,266300.0,87.5,26.1,31.7,30406.57,61271.0,10.5,128647.0,3849.0,2.90,10.3
4,metropolitan,faulkner county,AR,10.400071,medium,yellow,blue,123654.0,91.2,31.1,29.7,25076.32,50316.0,16.3,59253.0,2035.0,3.32,10.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1031,nonmetropolitan,lincoln county,WY,9.200293,medium,yellow,red,19265.0,92.9,41.1,20.5,29580.38,63756.0,9.3,8402.0,325.0,3.72,11.8
1032,nonmetropolitan,sublette county,WY,10.401088,medium,yellow,red,9799.0,96.2,33.6,25.4,28810.97,84911.0,7.4,4066.0,186.0,4.37,13.5
1033,nonmetropolitan,uinta county,WY,10.300429,medium,yellow,red,20495.0,91.8,37.0,17.4,27048.12,54672.0,14.9,8727.0,411.0,4.50,13.5
1034,nonmetropolitan,washakie county,WY,10.700487,medium,yellow,red,8064.0,88.5,38.6,21.0,27494.83,51362.0,12.8,3947.0,168.0,4.08,16.8


In [196]:
#save as new file
all_medprev_county_stats_2017_df.to_csv('../data/all_medprev_county_stats_2017.csv', index=False)

## For personal interests, looking at my own county statistics

In [194]:
#create bergen county filter
bergen_filter = all_county_stats_2017_df['county_name'] == 'bergen county'

#create bergen county data frame
bergen_df = all_county_stats_2017_df[bergen_filter]
bergen_df

Unnamed: 0,metro_nonmetro,county_name,state_abbr,db_prev,prev_level,prev_colorcode,metro_colorcode,pop2017,hs_grad_2017,some_college_2017,bachelors_2017,per_capita_income_2017,median_household_income_2017,poverty_2017,employed_2017,unemployed_2017,unemployment_rate_2017,uninsured_2017
675,metropolitan,bergen county,NJ,8.7,low,green,blue,948406.0,92.0,20.3,47.9,46158.76,91572.0,7.2,464527.0,18797.0,3.89,9.2


## Next Steps

I will make comparisons between diabetes prevalence and education level, income, employment, and insurance and see if there are any correlations.

I will also see if there are differences within counties with high, low, and medium diabetes prevalence that connect to the same factors.