# Project 1

The purpose of this notebook to prepare the DiscGolf and postcode_level_averages data for analysis.

P1_data_prep:  DiscGolf.csv was sourced from https://www.kaggle.com/datasets/lanekatris/pdga-united-states-disc-golf-courses

P1_data_prep:  postcode_level_averages.csv was sourced from https://www.kaggle.com/datasets/hamishgunasekara/average-income-per-zip-code-usa-2018

P1_data_prep2:  Location information (reference CSV from Jason's map) was sourced from 

P1_data_prep2: 2010_and_2014_county_population_density.csv County population density information was sourced from https://odn.data.socrata.com/dataset/2010-and-2014-county-population-density/mmzq-86sd

P1_data_prep2: median household income was saved to Unemployment.csv was sourced from https://www.ers.usda.gov/data-products/county-level-data-sets/


In [1]:
#Import dependencies
import pandas as pd
import numpy as np
import matplotlib as plt
import requests
import json
from config import gkey
import geopy
from geopy.geocoders import Nominatim

# # Build the endpoint URL
# baseurl = 'https://maps.googleapis.com/maps/api/geocode/json?'

In [2]:
#read CSV and create Pandas DataFrame
popdf = pd.DataFrame(pd.read_csv('Resources/2010_and_2014_county_population_density.csv'))
unemploymentdf = pd.DataFrame(pd.read_csv('Resources/Unemployment.csv'))

In [3]:
# view unemployment df format
unemploymentdf.head(2)

Unnamed: 0,FIPS_Code,State,Area_name,Attribute,Value
0,0,US,United States,Civilian_labor_force_2000,142601576.0
1,0,US,United States,Employed_2000,136904853.0


In [4]:
# create dataframe with only median household income from 2020
countyincomedf = unemploymentdf.loc[(unemploymentdf['Attribute']=='Median_Household_Income_2020')]
countyincomedf

Unnamed: 0,FIPS_Code,State,Area_name,Attribute,Value
88,0,US,United States,Median_Household_Income_2020,67340.0
177,1000,AL,Alabama,Median_Household_Income_2020,53958.0
270,1001,AL,"Autauga County, AL",Median_Household_Income_2020,67565.0
363,1003,AL,"Baldwin County, AL",Median_Household_Income_2020,71135.0
456,1005,AL,"Barbour County, AL",Median_Household_Income_2020,38866.0
...,...,...,...,...,...
296295,56037,WY,"Sweetwater County, WY",Median_Household_Income_2020,70583.0
296388,56039,WY,"Teton County, WY",Median_Household_Income_2020,92488.0
296481,56041,WY,"Uinta County, WY",Median_Household_Income_2020,71246.0
296574,56043,WY,"Washakie County, WY",Median_Household_Income_2020,58532.0


In [6]:
#split county and state info into two columns (if areaname is the state or country, Area_2 will be 'None')
countyincomedf[['Area_1','Area_2']]= countyincomedf['Area_1'].str.split(',', expand = True)
countyincomedf

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,FIPS_Code,State,Area_name,Attribute,Value,Area_1,Area_2
88,0,US,United States,Median_Household_Income_2020,67340.0,United States,
177,1000,AL,Alabama,Median_Household_Income_2020,53958.0,Alabama,
270,1001,AL,"Autauga County, AL",Median_Household_Income_2020,67565.0,Autauga County,AL
363,1003,AL,"Baldwin County, AL",Median_Household_Income_2020,71135.0,Baldwin County,AL
456,1005,AL,"Barbour County, AL",Median_Household_Income_2020,38866.0,Barbour County,AL
...,...,...,...,...,...,...,...
296295,56037,WY,"Sweetwater County, WY",Median_Household_Income_2020,70583.0,Sweetwater County,WY
296388,56039,WY,"Teton County, WY",Median_Household_Income_2020,92488.0,Teton County,WY
296481,56041,WY,"Uinta County, WY",Median_Household_Income_2020,71246.0,Uinta County,WY
296574,56043,WY,"Washakie County, WY",Median_Household_Income_2020,58532.0,Washakie County,WY


In [84]:
# strip leading whitespace from Area_2
countyincomedf['Area_2'] = countyincomedf['Area_2'].str.strip(' ')

In [85]:
#rename value column to be more descriptive (median household income)
countyincomedf = countyincomedf.rename(columns = {'Value':'MedianHHIncome'})
countyincomedf

Unnamed: 0,FIPS_Code,State,Area_name,Attribute,MedianHHIncome,Area_1,Area_2
88,0,US,United States,Median_Household_Income_2020,67340.0,United States,
177,1000,AL,Alabama,Median_Household_Income_2020,53958.0,Alabama,
270,1001,AL,"Autauga County, AL",Median_Household_Income_2020,67565.0,Autauga County,AL
363,1003,AL,"Baldwin County, AL",Median_Household_Income_2020,71135.0,Baldwin County,AL
456,1005,AL,"Barbour County, AL",Median_Household_Income_2020,38866.0,Barbour County,AL
...,...,...,...,...,...,...,...
296295,56037,WY,"Sweetwater County, WY",Median_Household_Income_2020,70583.0,Sweetwater County,WY
296388,56039,WY,"Teton County, WY",Median_Household_Income_2020,92488.0,Teton County,WY
296481,56041,WY,"Uinta County, WY",Median_Household_Income_2020,71246.0,Uinta County,WY
296574,56043,WY,"Washakie County, WY",Median_Household_Income_2020,58532.0,Washakie County,WY


In [96]:
#drop nan will remove country and state info and gaps
countyincomedf = countyincomedf.dropna()  
countyincomedf

Unnamed: 0,FIPS_Code,State,Area_name,Attribute,MedianHHIncome,Area_1,Area_2
270,1001,AL,"Autauga County, AL",Median_Household_Income_2020,67565.0,Autauga County,AL
363,1003,AL,"Baldwin County, AL",Median_Household_Income_2020,71135.0,Baldwin County,AL
456,1005,AL,"Barbour County, AL",Median_Household_Income_2020,38866.0,Barbour County,AL
549,1007,AL,"Bibb County, AL",Median_Household_Income_2020,50907.0,Bibb County,AL
642,1009,AL,"Blount County, AL",Median_Household_Income_2020,55203.0,Blount County,AL
...,...,...,...,...,...,...,...
296295,56037,WY,"Sweetwater County, WY",Median_Household_Income_2020,70583.0,Sweetwater County,WY
296388,56039,WY,"Teton County, WY",Median_Household_Income_2020,92488.0,Teton County,WY
296481,56041,WY,"Uinta County, WY",Median_Household_Income_2020,71246.0,Uinta County,WY
296574,56043,WY,"Washakie County, WY",Median_Household_Income_2020,58532.0,Washakie County,WY


In [97]:
#write data to CSV file
countyincomedf.to_csv('Resources/CountyIncome.csv')

In [9]:
#check df
popdf.head(2)

Unnamed: 0,id,name,type,variable,year,value
0,1600000US2639360,"Houghton city, Michigan",place,land_area,2010,4.445
1,1600000US1220650,"El Portal village, Florida",place,land_area,2009,0.378


In [10]:
#pare down dataset to include only county density data for the latest year available (2018)
countydensdf = popdf.loc[(popdf['type']=='county') & (popdf['variable']=='density')&(popdf['year']==2018)]
countydensdf

Unnamed: 0,id,name,type,variable,year,value
859794,0500000US28151,"Washington County, Mississippi",county,density,2018,64.969
859797,0500000US28111,"Perry County, Mississippi",county,density,2018,18.583
859800,0500000US28019,"Choctaw County, Mississippi",county,density,2018,19.898
859803,0500000US28057,"Itawamba County, Mississippi",county,density,2018,44.070
859806,0500000US28015,"Carroll County, Mississippi",county,density,2018,16.123
...,...,...,...,...,...,...
869199,0500000US19043,"Clayton County, Iowa",county,density,2018,22.699
869202,0500000US19021,"Buena Vista County, Iowa",county,density,2018,35.240
869205,0500000US19077,"Guthrie County, Iowa",county,density,2018,18.073
869208,0500000US19091,"Humboldt County, Iowa",county,density,2018,22.024


In [11]:
#split name into county and state columns
countydensdf['Area_1'] = countydensdf['name']
countydensdf[['Area_1','Area_2']]= countydensdf['Area_1'].str.split(',', expand = True)
countydensdf

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,id,name,type,variable,year,value,Area_1,Area_2
859794,0500000US28151,"Washington County, Mississippi",county,density,2018,64.969,Washington County,Mississippi
859797,0500000US28111,"Perry County, Mississippi",county,density,2018,18.583,Perry County,Mississippi
859800,0500000US28019,"Choctaw County, Mississippi",county,density,2018,19.898,Choctaw County,Mississippi
859803,0500000US28057,"Itawamba County, Mississippi",county,density,2018,44.070,Itawamba County,Mississippi
859806,0500000US28015,"Carroll County, Mississippi",county,density,2018,16.123,Carroll County,Mississippi
...,...,...,...,...,...,...,...,...
869199,0500000US19043,"Clayton County, Iowa",county,density,2018,22.699,Clayton County,Iowa
869202,0500000US19021,"Buena Vista County, Iowa",county,density,2018,35.240,Buena Vista County,Iowa
869205,0500000US19077,"Guthrie County, Iowa",county,density,2018,18.073,Guthrie County,Iowa
869208,0500000US19091,"Humboldt County, Iowa",county,density,2018,22.024,Humboldt County,Iowa


In [89]:
# strip leading whitespace from Area_2
countydensdf['Area_2'] = countydensdf['Area_2'].str.strip(' ')

In [12]:
#rename value column to be more descriptive for population density 
countydensdf = countydensdf.rename(columns = {'value':'PopDensity'})
countydensdf

Unnamed: 0,id,name,type,variable,year,PopDensity,Area_1,Area_2
859794,0500000US28151,"Washington County, Mississippi",county,density,2018,64.969,Washington County,Mississippi
859797,0500000US28111,"Perry County, Mississippi",county,density,2018,18.583,Perry County,Mississippi
859800,0500000US28019,"Choctaw County, Mississippi",county,density,2018,19.898,Choctaw County,Mississippi
859803,0500000US28057,"Itawamba County, Mississippi",county,density,2018,44.070,Itawamba County,Mississippi
859806,0500000US28015,"Carroll County, Mississippi",county,density,2018,16.123,Carroll County,Mississippi
...,...,...,...,...,...,...,...,...
869199,0500000US19043,"Clayton County, Iowa",county,density,2018,22.699,Clayton County,Iowa
869202,0500000US19021,"Buena Vista County, Iowa",county,density,2018,35.240,Buena Vista County,Iowa
869205,0500000US19077,"Guthrie County, Iowa",county,density,2018,18.073,Guthrie County,Iowa
869208,0500000US19091,"Humboldt County, Iowa",county,density,2018,22.024,Humboldt County,Iowa


In [90]:
countydensdf.to_csv('Resources/CountyPopDensity.csv')

In [15]:
# #comment out so code isn't accidentally rerun - TAKES OVER AN HOUR TO COMPLETE CSV CREATION
# #read CSV and create Pandas DataFrame
# dgolf = pd.DataFrame(pd.read_csv('Resources/draft_dgolf_avincome.csv'))
# dgolf.head(2)

Unnamed: 0.1,Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,Comments,state_y,zipcode,total_pop,total_income,country,avg_income
0,0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,none,AL,36804,8240,417346,USA,50648.786408
1,1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,no,none,AL,35769,4170,316625,USA,75929.256595


In [72]:
# #comment out so code isn't accidentally rerun- TAKES OVER AN HOUR TO COMPLETE CSV CREATION
# dgolfall = dgolf.copy()
# dgolfall['city'] = dgolfall['city'].str.strip(' ')
# dgolfall['cityURL'] = dgolfall['city'].str.replace(' ','%20')
# dgolfall['county'] = pd.Series(dtype = 'string')
# dgolfall['latstr'] = dgolfall['latitude'].astype(str)
# dgolfall['lonstr'] = dgolfall['longitude'].astype(str)
# dgolfall.head(2)

Unnamed: 0.1,Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,...,state_y,zipcode,total_pop,total_income,country,avg_income,cityURL,county,latstr,lonstr
0,0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,...,AL,36804,8240,417346,USA,50648.786408,Opelika,,32.6454116,-85.3782795
1,1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,...,AL,35769,4170,316625,USA,75929.256595,Scottsboro,,34.6228192,-86.0806919


In [74]:
# #comment out so code isn't accidentally rerun- TAKES OVER AN HOUR TO COMPLETE CSV CREATION
# https://geopy.readthedocs.io/en/stable/
# https://www.geeksforgeeks.org/get-the-city-state-and-country-names-from-latitude-and-longitude-using-python/

# #  initialize Nominatim API
# geolocator = Nominatim(user_agent="geoapiExercises")

# for index, row in dgolfall.iterrows():
#     # get latitude, longitude from the DataFrame

#     rowlat = row['latstr']
#     rowlon = row['lonstr']
#     rowloc = geolocator.reverse(rowlat+","+rowlon)
#     rowaddress = rowloc.raw['address']
#     rowcounty = rowaddress.get('county','')
    
#     try:
#         dgolfall.loc[index, "county"] = rowcounty
#     except (KeyError, IndexError):
#         # If no county is found, set the county as no county found".
#         dgolfall.loc[index, "county"] = "No county found"

# dgolfall

Unnamed: 0.1,Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,...,state_y,zipcode,total_pop,total_income,country,avg_income,cityURL,county,latstr,lonstr
0,0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.378280,...,AL,36804,8240,417346,USA,50648.786408,Opelika,Lee County,32.6454116,-85.3782795
1,1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,...,AL,35769,4170,316625,USA,75929.256595,Scottsboro,Jackson County,34.6228192,-86.0806919
2,2,aggieland-disc-golf,Aggieland Disc Golf,Hamilton,Alabama,35570,18,,34.142324,-87.988644,...,AL,35570,3930,174684,USA,44448.854962,Hamilton,Marion County,34.1423235,-87.98864379999998
3,3,agricultural-heritage-park,Agricultural Heritage Park,Auburn,Alabama,36830,9,,32.594459,-85.492334,...,AL,36830,16600,1485585,USA,89493.072289,Auburn,Lee County,32.5944595,-85.4923344
4,4,arab-city-park-844-shoal-creek-trail-35016,Arab City Park,Arab,Alabama,35016,18,,34.317835,-86.481828,...,AL,35016,7210,385033,USA,53402.635229,Arab,Marshall County,34.31783529999999,-86.48182779999998
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6765,6765,whippin-post,Whippin' Post,Great Cacapon,West Virginia,25422,18,4.0,39.571376,-78.403217,...,WV,25422,670,34481,USA,51464.179104,Great%20Cacapon,Morgan County,39.5713757,-78.4032172
6766,6766,twin-oaks-disc-golf-course,Twin Oaks Disc Golf Course,Crab Orchard,West Virginia,25827,9,,37.740256,-81.254532,...,WV,25827,1210,61560,USA,50876.033058,Crab%20Orchard,Raleigh County,37.740256,-81.254532
6767,6767,valley-park-dgc-0,Valley Park Disc Golf Course,Hurricane,West Virginia,25526,18,,38.443878,-81.994517,...,WV,25526,9880,712514,USA,72116.801619,Hurricane,Putnam County,38.4438782,-81.9945167
6768,6768,veterans-memorial-park-0,Veterans Memorial Park,Clarksburg,West Virginia,26301,9,3.0,39.272521,-80.360143,...,WV,26301,12350,627014,USA,50770.364372,Clarksburg,Harrison County,39.2725211,-80.3601434


In [76]:
# #comment out so code isn't accidentally rerun- TAKES OVER AN HOUR TO COMPLETE CSV CREATION
# dgolfall = dgolfall.drop(['Unnamed: 0'],axis = 1)
# dgolfall

Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,state_y,zipcode,total_pop,total_income,country,avg_income,cityURL,county,latstr,lonstr
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.378280,no,...,AL,36804,8240,417346,USA,50648.786408,Opelika,Lee County,32.6454116,-85.3782795
1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,no,...,AL,35769,4170,316625,USA,75929.256595,Scottsboro,Jackson County,34.6228192,-86.0806919
2,aggieland-disc-golf,Aggieland Disc Golf,Hamilton,Alabama,35570,18,,34.142324,-87.988644,no,...,AL,35570,3930,174684,USA,44448.854962,Hamilton,Marion County,34.1423235,-87.98864379999998
3,agricultural-heritage-park,Agricultural Heritage Park,Auburn,Alabama,36830,9,,32.594459,-85.492334,no,...,AL,36830,16600,1485585,USA,89493.072289,Auburn,Lee County,32.5944595,-85.4923344
4,arab-city-park-844-shoal-creek-trail-35016,Arab City Park,Arab,Alabama,35016,18,,34.317835,-86.481828,no,...,AL,35016,7210,385033,USA,53402.635229,Arab,Marshall County,34.31783529999999,-86.48182779999998
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6765,whippin-post,Whippin' Post,Great Cacapon,West Virginia,25422,18,4.0,39.571376,-78.403217,no,...,WV,25422,670,34481,USA,51464.179104,Great%20Cacapon,Morgan County,39.5713757,-78.4032172
6766,twin-oaks-disc-golf-course,Twin Oaks Disc Golf Course,Crab Orchard,West Virginia,25827,9,,37.740256,-81.254532,no,...,WV,25827,1210,61560,USA,50876.033058,Crab%20Orchard,Raleigh County,37.740256,-81.254532
6767,valley-park-dgc-0,Valley Park Disc Golf Course,Hurricane,West Virginia,25526,18,,38.443878,-81.994517,no,...,WV,25526,9880,712514,USA,72116.801619,Hurricane,Putnam County,38.4438782,-81.9945167
6768,veterans-memorial-park-0,Veterans Memorial Park,Clarksburg,West Virginia,26301,9,3.0,39.272521,-80.360143,no,...,WV,26301,12350,627014,USA,50770.364372,Clarksburg,Harrison County,39.2725211,-80.3601434


In [77]:
#comment out so code isn't accidentally rerun- TAKES OVER AN HOUR TO COMPLETE CSV CREATION
# #export draft_dgolf_avincome to csv
dgolfall.to_csv('Resources/dgolfall.csv',index = False)

In [82]:
#read dgolfall.csv  need to add the index_col = 0 otherwise the dataframe ends up with an additional column "Unnamed: 0"

dgolfdf = pd.read_csv('Resources/dgolfall.csv',index_col=0)
dgolfdf.head(2)

Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,state_y,zipcode,total_pop,total_income,country,avg_income,cityURL,county,latstr,lonstr
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,...,AL,36804,8240,417346,USA,50648.786408,Opelika,Lee County,32.645412,-85.37828
1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,no,...,AL,35769,4170,316625,USA,75929.256595,Scottsboro,Jackson County,34.622819,-86.080692


In [99]:
dgolfdf['county_clean'] = pd.Series(dtype = 'string')
dgolfdf.head(2)


Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,zipcode,total_pop,total_income,country,avg_income,cityURL,county,latstr,lonstr,county_clean
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,...,36804,8240,417346,USA,50648.786408,Opelika,Lee County,32.645412,-85.37828,
1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,no,...,35769,4170,316625,USA,75929.256595,Scottsboro,Jackson County,34.622819,-86.080692,


In [104]:
#UPDATED CSV file to replace Saint with St. in the county_clean row       dgolfdf.csv
# for index, row in dgolfdf.iterrows():
#     # get latitude, longitude from the DataFrame
#     rowcounty = row['county']
#     if'Saint' not in rowcounty and "St." in rowcounty:
#         rowcounty.replace('St.','Saint')
#         row['county_clean'] = rowcounty
#     else:
#         row['county_clean'] = row['county']

# dgolfdf

TypeError: argument of type 'float' is not iterable

In [105]:
# dgolfdf.to_csv('Resources/dgolfdf.csv',index = False)

In [109]:
dgolfdf2 = pd.read_csv('Resources/dgolfdf.csv')
dgolfdf2.head(2)

Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,zipcode,total_pop,total_income,country,avg_income,cityURL,county,latstr,lonstr,county_clean
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,...,36804,8240,417346,USA,50648.78641,Opelika,Lee County,32.645412,-85.37828,Lee County
1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,no,...,35769,4170,316625,USA,75929.25659,Scottsboro,Jackson County,34.622819,-86.080692,Jackson County


In [121]:

dgolfstr = dgolfdf2.copy()
# dgolfstr['state_x'] = dgolfstr['state_x'].astype(str)
# dgolfstr.dtypes
dgolfstr['county state_x'] = dgolfstr['county'] + ', '+ dgolfstr['state_x']
dgolfstr['county state_y'] = dgolfstr['county'] + ', '+ dgolfstr['state_y']
dgolfstr.head(2)

Unnamed: 0,id,name,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,total_income,country,avg_income,cityURL,county,latstr,lonstr,county_clean,county state_x,county state_y
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,...,417346,USA,50648.78641,Opelika,Lee County,32.645412,-85.37828,Lee County,"Lee County, Alabama","Lee County, AL"
1,agape-disc-golf-course,Agape Disc Golf Course,Scottsboro,Alabama,35769,9,,34.622819,-86.080692,no,...,316625,USA,75929.25659,Scottsboro,Jackson County,34.622819,-86.080692,Jackson County,"Jackson County, Alabama","Jackson County, AL"


In [118]:
#merge this df using county state_x
countydensdf.head(2)

Unnamed: 0,id,name,type,variable,year,PopDensity,Area_1,Area_2
859794,0500000US28151,"Washington County, Mississippi",county,density,2018,64.969,Washington County,Mississippi
859797,0500000US28111,"Perry County, Mississippi",county,density,2018,18.583,Perry County,Mississippi


In [127]:
#merge dgolfstr and 
# countydensdf
dgolfCounty1 = pd.merge(dgolfstr,countydensdf,how = 'inner',left_on='county state_x',right_on='name')
dgolfCounty1.head()

Unnamed: 0,id_x,name_x,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,county state_x,county state_y,id_y,name_y,type,variable,year,PopDensity,Area_1,Area_2
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,...,"Lee County, Alabama","Lee County, AL",0500000US01081,"Lee County, Alabama",county,density,2018,262.185,Lee County,Alabama
1,agricultural-heritage-park,Agricultural Heritage Park,Auburn,Alabama,36830,9,,32.594459,-85.492334,no,...,"Lee County, Alabama","Lee County, AL",0500000US01081,"Lee County, Alabama",county,density,2018,262.185,Lee County,Alabama
2,halawaka-park,Halawaka Park,Salem,Alabama,36874,18,3.0,32.596803,-85.238553,no,...,"Lee County, Alabama","Lee County, AL",0500000US01081,"Lee County, Alabama",county,density,2018,262.185,Lee County,Alabama
3,smiths-station-disc-golf-course,Smiths Station Disc Golf Course,Smiths Station,Alabama,36877,18,,32.521926,-85.117811,no,...,"Lee County, Alabama","Lee County, AL",0500000US01081,"Lee County, Alabama",county,density,2018,262.185,Lee County,Alabama
4,tumble-tree-disc-golf-course,Tumble Tree Disc Golf Course,Opelika,Alabama,36801,18,4.0,32.675108,-85.349505,no,...,"Lee County, Alabama","Lee County, AL",0500000US01081,"Lee County, Alabama",county,density,2018,262.185,Lee County,Alabama


In [119]:
# merge this df using county state_y
countyincomedf.head(2)

Unnamed: 0,FIPS_Code,State,Area_name,Attribute,MedianHHIncome,Area_1,Area_2
270,1001,AL,"Autauga County, AL",Median_Household_Income_2020,67565.0,Autauga County,AL
363,1003,AL,"Baldwin County, AL",Median_Household_Income_2020,71135.0,Baldwin County,AL


In [140]:
dgolfCounty2 = pd.merge(dgolfCounty1,countyincomedf,how = 'inner',left_on='county state_y',right_on='Area_name')
dgolfCounty2.head(2)

Unnamed: 0,id_x,name_x,city,state_x,zip,holeCount,rating,latitude,longitude,Modified,...,PopDensity,Area_1_x,Area_2_x,FIPS_Code,State,Area_name,Attribute,MedianHHIncome,Area_1_y,Area_2_y
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Alabama,36804,3,,32.645412,-85.37828,no,...,262.185,Lee County,Alabama,1081,AL,"Lee County, AL",Median_Household_Income_2020,58963.0,Lee County,AL
1,agricultural-heritage-park,Agricultural Heritage Park,Auburn,Alabama,36830,9,,32.594459,-85.492334,no,...,262.185,Lee County,Alabama,1081,AL,"Lee County, AL",Median_Household_Income_2020,58963.0,Lee County,AL


In [139]:
dgolfCounty2.columns


Index(['id_x', 'name_x', 'city', 'state_x', 'zip', 'holeCount', 'rating',
       'latitude', 'longitude', 'Modified', 'Comments', 'state_y', 'zipcode',
       'total_pop', 'total_income', 'country', 'avg_income', 'cityURL',
       'county', 'latstr', 'lonstr', 'county_clean', 'county state_x',
       'county state_y', 'id_y', 'name_y', 'type', 'variable', 'year',
       'PopDensity', 'Area_1_x', 'Area_2_x', 'FIPS_Code', 'State', 'Area_name',
       'Attribute', 'MedianHHIncome', 'Area_1_y', 'Area_2_y'],
      dtype='object')

In [141]:
dgolfcounty = dgolfCounty2[['id_x', 'name_x', 'city','county','Area_name', 'State','country', 'holeCount', 'rating','latitude', 'longitude', 'zipcode','PopDensity','total_pop', 'total_income', 'avg_income','MedianHHIncome']]
dgolfcounty = dgolfcounty.head(2)

Unnamed: 0,id_x,name_x,city,county,Area_name,State,country,holeCount,rating,latitude,longitude,zipcode,PopDensity,total_pop,total_income,avg_income,MedianHHIncome
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Lee County,"Lee County, AL",AL,USA,3,,32.645412,-85.37828,36804,262.185,8240,417346,50648.78641,58963.0
1,agricultural-heritage-park,Agricultural Heritage Park,Auburn,Lee County,"Lee County, AL",AL,USA,9,,32.594459,-85.492334,36830,262.185,16600,1485585,89493.07229,58963.0


In [142]:
dgolfcounty.to_csv('Resources/dgolfcounty.csv',index = False)

In [143]:

dgc = pd.read_csv('Resources/dgolfcounty.csv')
dgc.head()

Unnamed: 0,id_x,name_x,city,county,Area_name,State,country,holeCount,rating,latitude,longitude,zipcode,PopDensity,total_pop,total_income,avg_income,MedianHHIncome
0,adventist-discovery-park,Adventist DISCovery Park,Opelika,Lee County,"Lee County, AL",AL,USA,3,,32.645412,-85.37828,36804,262.185,8240,417346,50648.78641,58963.0
1,agricultural-heritage-park,Agricultural Heritage Park,Auburn,Lee County,"Lee County, AL",AL,USA,9,,32.594459,-85.492334,36830,262.185,16600,1485585,89493.07229,58963.0
2,halawaka-park,Halawaka Park,Salem,Lee County,"Lee County, AL",AL,USA,18,3.0,32.596803,-85.238553,36874,262.185,3660,219136,59873.22404,58963.0
3,smiths-station-disc-golf-course,Smiths Station Disc Golf Course,Smiths Station,Lee County,"Lee County, AL",AL,USA,18,,32.521926,-85.117811,36877,262.185,5060,280151,55365.81028,58963.0
4,tumble-tree-disc-golf-course,Tumble Tree Disc Golf Course,Opelika,Lee County,"Lee County, AL",AL,USA,18,4.0,32.675108,-85.349505,36801,262.185,9790,559225,57122.06333,58963.0


# create summary chart by county in P1_data_analysis_chris2.ipynb

FOR REFERENCE ONLY - code snippets and data pulls 

In [38]:
#keep this commented out as a reference for what you can pull from geopy by latitude/longitude
#https://geopy.readthedocs.io/en/stable/#usage-with-pandas
# # trythis = location.raw
# trythis

{'place_id': 214458217,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'way',
 'osm_id': 482881497,
 'lat': '32.64581435',
 'lon': '-85.37933547380712',
 'display_name': 'Lee County Courthouse, South 9th Street, Opelika, Lee County, Alabama, 36801, United States',
 'address': {'amenity': 'Lee County Courthouse',
  'road': 'South 9th Street',
  'town': 'Opelika',
  'county': 'Lee County',
  'state': 'Alabama',
  'ISO3166-2-lvl4': 'US-AL',
  'postcode': '36801',
  'country': 'United States',
  'country_code': 'us'},
 'boundingbox': ['32.645538', '32.646093', '-85.3797587', '-85.3790976']}

In [56]:
# keep this commented out - reference for how to write a string variable to a new dataframe column, then change it to a float, then round
# df['latreturn'] = latreturn
# df['lngreturn'] = lngreturn
# df['latreturn'] = df['latreturn'].astype(float)
# df['lngreturn'] = df['lngreturn'].astype(float)
# df['latreturn'] = df['latreturn'].round(6)
# df['lngreturn'] = df['lngreturn'].round(6)
# df

Unnamed: 0,amenity,road,town,county,state,ISO3166-2-lvl4,postcode,country,country_code,latreturn,lngreturn
0,Lee County Courthouse,South 9th Street,Opelika,Lee County,Alabama,US-AL,36801,United States,us,32.645814,-85.379335


In [None]:
# Create DraftMerge - inner join DiscGolfcleandf and zipcodedf
#nice reference https://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/
# draft_dgolf_avincome = pd.merge(discgolfcleandf,zipcodedf,how = 'inner',left_on='zip',right_on='zipcode')
# draft_dgolf_avincome.head()
