# New COVID-19 Testing Center for Charlotte

In this project, data from the North Carolina Department of Health and Human Services and FourSquare API will be used to determine which neighborhood should an COVID-19 testing center that may not have one currently.  This decision will be based on clusters of COVID-19 cases within Charlotte based on positivity rates and neighborhoods and current locations of testing centers.

Most of the data is Http data and will need to be scraped using Beautiful Soup.  

## Description of Imported Data

The data that will be used in this project comes from:

 - North Carolina Department of Health and Human Services -->COVID-19 Data--> About the Data-->Zip Code Cases and Death Table https://covid19.ncdhhs.gov/dashboard/about-data
 - North Carolina Department of Health and Human Services COVID-19 --> Test Site Finder https://covid19.ncdhhs.gov/about-covid-19/testing/find-my-testing-place/test-site-finder
 - FourSquare Locator API Data
 - Public OpenDataSoft.com https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?rows=77

## Libraries Needed for this Project

The libraries that will be used for this project are:

 - Pandas
 - Numpy
 - JSON
 - Geopy
 - Requests
 - Matplotlib
 - Sci-kit
 - Folium

Begin with importing libraries

In [1]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!pip install folium
import folium

print('Libraries imported')

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 2.4 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported


Import Data from North Carolina Department of Health and Human Services
 - COVID Cases and Deaths by Zip Code
 - Existing Testing Centers
 

In [2]:
##Added missing values function due to my initialinspection of data prior to importing, I knew data would be missing##
missing_values = ["n/a", "na", "--"]
df=pd.read_csv(r'https://raw.githubusercontent.com/kimberlycoxfisher0330/Coursera_Capstone/main/TABLE_ZIPCODE_data.csv',na_values = missing_values)
df

Unnamed: 0,ZIP Code,Measure Names,Measure Values
0,28909,Deaths,0.0
1,28906,Deaths,9.0
2,28905,Deaths,1.0
3,28904,Deaths,5.0
4,28902,Deaths,0.0
5,28901,Deaths,5.0
6,28806,Deaths,34.0
7,28805,Deaths,14.0
8,28804,Deaths,9.0
9,28803,Deaths,16.0


Begin data scrubbing by inspection

In [3]:
df.dtypes

ZIP Code            int64
Measure Names      object
Measure Values    float64
dtype: object

In [4]:
df.describe()

Unnamed: 0,ZIP Code,Measure Values
count,3116.0,3094.0
mean,28064.228498,1138.680995
std,517.96007,1760.3416
min,27006.0,0.0
25%,27608.0,13.0
50%,28110.0,325.0
75%,28526.0,1539.75
max,28909.0,15468.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3116 entries, 0 to 3115
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ZIP Code        3116 non-null   int64  
 1   Measure Names   3116 non-null   object 
 2   Measure Values  3094 non-null   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 73.2+ KB


In [6]:
df.isnull().sum()

ZIP Code           0
Measure Names      0
Measure Values    22
dtype: int64

As there is data missing in some rows, will proceed with deleting those rows from analysis

In [7]:
df1 = df.dropna(axis=0, subset=['Measure Values'])
df1

Unnamed: 0,ZIP Code,Measure Names,Measure Values
0,28909,Deaths,0.0
1,28906,Deaths,9.0
2,28905,Deaths,1.0
3,28904,Deaths,5.0
4,28902,Deaths,0.0
5,28901,Deaths,5.0
6,28806,Deaths,34.0
7,28805,Deaths,14.0
8,28804,Deaths,9.0
9,28803,Deaths,16.0


In [8]:
col=np.array(df['Measure Values'], np.int16)
df['Measure Values']=col
df

Unnamed: 0,ZIP Code,Measure Names,Measure Values
0,28909,Deaths,0
1,28906,Deaths,9
2,28905,Deaths,1
3,28904,Deaths,5
4,28902,Deaths,0
5,28901,Deaths,5
6,28806,Deaths,34
7,28805,Deaths,14
8,28804,Deaths,9
9,28803,Deaths,16


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3116 entries, 0 to 3115
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   ZIP Code        3116 non-null   int64 
 1   Measure Names   3116 non-null   object
 2   Measure Values  3116 non-null   int16 
dtypes: int16(1), int64(1), object(1)
memory usage: 54.9+ KB


In [10]:
df['ZIP Code']=df['ZIP Code'].astype('str')
df

Unnamed: 0,ZIP Code,Measure Names,Measure Values
0,28909,Deaths,0
1,28906,Deaths,9
2,28905,Deaths,1
3,28904,Deaths,5
4,28902,Deaths,0
5,28901,Deaths,5
6,28806,Deaths,34
7,28805,Deaths,14
8,28804,Deaths,9
9,28803,Deaths,16


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3116 entries, 0 to 3115
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   ZIP Code        3116 non-null   object
 1   Measure Names   3116 non-null   object
 2   Measure Values  3116 non-null   int16 
dtypes: int16(1), object(2)
memory usage: 54.9+ KB


Transformed dataframe to create independent columns from the "Measure Names" column

In [12]:
df1 = df.set_index(['ZIP Code','Measure Names'])['Measure Values'].unstack()
df1.reset_index(inplace = True)
df1

Measure Names,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths
0,27006,447,301,3006,3
1,27007,79,370,3699,0
2,27009,85,289,2894,1
3,27011,244,451,4509,4
4,27012,1128,390,3896,19
5,27013,214,329,3295,4
6,27014,27,282,2821,0
7,27016,58,345,3448,0
8,27017,497,523,5232,3
9,27018,318,387,3868,3


In [13]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 779 entries, 0 to 778
Data columns (total 5 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   ZIP Code                     779 non-null    object
 1   Cases                        779 non-null    int16 
 2   Cases Per 10,000 Residents   779 non-null    int16 
 3   Cases Per 100,000 Residents  779 non-null    int16 
 4   Deaths                       779 non-null    int16 
dtypes: int16(4), object(1)
memory usage: 12.3+ KB


In [14]:
df1.rename_axis(columns = {'Measure Names':'Row'}, inplace = True)
df1

Row,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths
0,27006,447,301,3006,3
1,27007,79,370,3699,0
2,27009,85,289,2894,1
3,27011,244,451,4509,4
4,27012,1128,390,3896,19
5,27013,214,329,3295,4
6,27014,27,282,2821,0
7,27016,58,345,3448,0
8,27017,497,523,5232,3
9,27018,318,387,3868,3


The Zip Code Cases and Death Dataframe is ready for analysis now

Next step will import testing center table from the North Carolina Department of Health and Human Services into Python using Beautiful Soup

In [15]:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from urllib.request import urlopen
url = 'https://covid19.ncdhhs.gov/about-covid-19/testing/find-my-testing-place/pop-testing-sites'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')
req = requests.get("https://covid19.ncdhhs.gov/about-covid-19/testing/find-my-testing-place/pop-testing-sites")

soup = BeautifulSoup(req.content,'lxml')

table = soup.find_all('table')[0]

df = pd.read_html(str(table))

tc=pd.DataFrame(df[0])

In [16]:
tc

Unnamed: 0,Date,City,Name,County,Time,DHHS Event?,Site Tests Children?,Address (click for Google Maps result),Contact Info
0,12/24/2020,Burlington,OptumServe - Fairchild Community Center,Alamance,8am-2pm,Yes,Yes,"827 S Graham Hopedale Rd, Burlington, NC 27217",(877) 562-4850; Register Online
1,12/24/2020,Pelham,OptumServe - Pelham Community Center,Caswell,8am-1pm,Yes,Yes,"161 Community Center Rd, Pelham, NC 27311",(877) 562-4850; Register Online
2,12/24/2020,Yanceyville,OptumServe - Caswell County Health Department,Caswell,8am-1pm,Yes,Yes,"189 County Park Rd, Yanceyville, NC 27379",(877) 562-4850; Register Online
3,12/24/2020,Edenton,OptumServe - American Legion,Chowan,10am-2pm,Yes,Yes,"1317 W Queen St, Edenton, NC 27932",(877) 562-4850; Register Online
4,12/24/2020,Greensboro,StarMed - Coliseum,Guilford,3-7pm,Yes,Yes,"1921 W Gate City Blvd, Greensboro, NC 27405",(704) 615-7754; Register online
5,12/24/2020,Mooresville,StarMed - Mooresville High School,Iredell,1-5pm,Yes,Yes,"659 E Center Ave, Mooresville, NC 28115",(704) 615-7754; Register online
6,12/26/2020,Burlington,OptumServe - Fairchild Community Center,Alamance,9am-6pm,Yes,Yes,"827 S Graham Hopedale Rd, Burlington, NC 27217",(877) 562-4850; Register Online
7,12/26/2020,Newland,OptumServe - Newland Pool Complex Parking lot,Avery,9am-6pm,Yes,Yes,"244 Shady St, Newland, NC 28657",(877) 562-4850; Register Online
8,12/26/2020,Yanceyville,OptumServe - Caswell County Health Department,Caswell,8am-5pm,Yes,Yes,"189 County Park Rd, Yanceyville, NC 27379",(877) 562-4850; Register Online
9,12/26/2020,Edenton,OptumServe - American Legion,Chowan,10am-2pm,Yes,Yes,"1317 W Queen St, Edenton, NC 27932",(877) 562-4850; Register Online


Filter "Zip Code Cases and Death Table" by Charlotte Zip Codes and "Testing Center Table" using Charlotte 

In [17]:
tc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 9 columns):
 #   Column                                  Non-Null Count  Dtype 
---  ------                                  --------------  ----- 
 0   Date                                    240 non-null    object
 1   City                                    240 non-null    object
 2   Name                                    240 non-null    object
 3   County                                  240 non-null    object
 4   Time                                    240 non-null    object
 5   DHHS Event?                             240 non-null    object
 6   Site Tests Children?                    240 non-null    object
 7   Address (click for Google Maps result)  240 non-null    object
 8   Contact Info                            239 non-null    object
dtypes: object(9)
memory usage: 17.0+ KB


In [18]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 779 entries, 0 to 778
Data columns (total 5 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   ZIP Code                     779 non-null    object
 1   Cases                        779 non-null    int16 
 2   Cases Per 10,000 Residents   779 non-null    int16 
 3   Cases Per 100,000 Residents  779 non-null    int16 
 4   Deaths                       779 non-null    int16 
dtypes: int16(4), object(1)
memory usage: 12.3+ KB


In [19]:
df2=df1
df2

Row,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths
0,27006,447,301,3006,3
1,27007,79,370,3699,0
2,27009,85,289,2894,1
3,27011,244,451,4509,4
4,27012,1128,390,3896,19
5,27013,214,329,3295,4
6,27014,27,282,2821,0
7,27016,58,345,3448,0
8,27017,497,523,5232,3
9,27018,318,387,3868,3


In [22]:
##Filter Zip Code Cases and Death Table First based on Charlotte zip code (begins with 282)##
df3=df2.loc[df2['ZIP Code'].str.startswith('282')]
df3

Row,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths
422,28202,989,590,5903,0
423,28203,1073,563,5627,3
424,28204,533,610,6100,3
425,28205,2544,491,4909,36
426,28206,693,455,4546,5
427,28207,366,375,3750,3
428,28208,2057,515,5150,18
429,28209,1113,428,4279,5
430,28210,2136,478,4778,25
431,28211,1160,371,3712,29


In [23]:
###Filter Charlotte Testing Centers out of Testing Center data using City 'Charlotte'###
tc1=tc.loc[tc['City']=='Charlotte']
tc1

Unnamed: 0,Date,City,Name,County,Time,DHHS Event?,Site Tests Children?,Address (click for Google Maps result),Contact Info
23,12/26/2020,Charlotte,StarMed - Home Depot,Mecklenburg,12pm-5pm,Yes,Yes,"8135 University City Blvd, Charlotte, NC 28213",(704) 615-7754; Register online
24,12/26/2020,Charlotte,StarMed - Home Depot,Mecklenburg,12pm-5pm,Yes,Yes,"14310 Rivergate Pkwy, Charlotte, NC 28273",(704) 615-7754; Register online
38,12/27/2020,Charlotte,StarMed - Home Depot,Mecklenburg,9am-4pm,Yes,Yes,"8135 University City Blvd, Charlotte, NC 28213",(704) 615-7754; Register online
39,12/27/2020,Charlotte,StarMed - Home Depot,Mecklenburg,9am-4pm,Yes,Yes,"14310 Rivergate Pkwy, Charlotte, NC 28273",(704) 615-7754; Register online
67,12/28/2020,Charlotte,StarMed - Archdale Park and Ride Lot,Mecklenburg,10am-2pm,Yes,Yes,"6230 South Blvd, Charlotte, NC 28217",(704) 615-7754; Register online
68,12/28/2020,Charlotte,StarMed - Keith Clinic North,Mecklenburg,10am-2pm,Yes,Yes,"402 E Sugar Creek Rd, Charlotte, NC 28213",(704) 615-7754; Register online
70,12/28/2020,Charlotte,StarMed - Mecklenburg Health Department,Mecklenburg,10am-5pm,Yes,Yes,"2845 Beatties Ford Rd, Charlotte, NC 28216",(704) 615-7754; Register online
113,12/29/2020,Charlotte,StarMed - Archdale Park and Ride Lot,Mecklenburg,10am-2pm,Yes,Yes,"6230 South Blvd, Charlotte, NC 28217",(704) 615-7754; Register online
114,12/29/2020,Charlotte,StarMed - Keith Clinic North,Mecklenburg,10am-2pm,Yes,Yes,"402 E Sugar Creek Rd, Charlotte, NC 28213",(704) 615-7754; Register online
115,12/29/2020,Charlotte,StarMed - Mecklenburg Health Department,Mecklenburg,10am-5pm,Yes,Yes,"2845 Beatties Ford Rd, Charlotte, NC 28216",(704) 615-7754; Register online


In [24]:
from project_lib import Project
project = Project(project_id='423d7789-6221-4165-bd44-701f8baaad3e', project_access_token='p-5fb1cc2a2b84655fb5777f81c0e2f3cec644847a')
pc = project.project_context

tc1.to_csv('testingcenters.csv')
print('DataFrame is written successfully to CSV.')

DataFrame is written successfully to CSV.


After extracting the testing center from the notebook, I removed the duplicates and added the geo-coordinates of the addresses manually.  Now, read the new csv file back in

In [25]:
tc2=pd.read_csv('https://raw.githubusercontent.com/kimberlycoxfisher0330/Coursera_Capstone/main/testingcenters.csv')
tc2

Unnamed: 0,Date,City,Name,County,Time,DHHS Event?,Site Tests Children?,Address,City.1,State,Contact Info,Latitude,Longitude
0,12/23/2020,Charlotte,StarMed - Archdale Park and Ride Lot,Mecklenburg,10am-2pm,Yes,Yes,6230 South Blvd,Charlotte,NC 28217,(704) 615-7754,35.152778,-80.8775
1,12/23/2020,Charlotte,StarMed - Keith Clinic North,Mecklenburg,10am-2pm,Yes,Yes,402 E Sugar Creek Rd,Charlotte,NC 28213,(704) 615-7754,35.217571,-80.793317
2,12/23/2020,Charlotte,StarMed - Mecklenburg Health Department,Mecklenburg,10am-5pm,Yes,Yes,2845 Beatties Ford Rd,Charlotte,NC 28216,(704) 615-7754,35.220368,-80.837936
3,12/26/2020,Charlotte,StarMed - Home Depot,Mecklenburg,12pm-5pm,Yes,Yes,8135 University City Blvd,Charlotte,NC 28213,(704) 615-7754,35.296423,-80.749265
4,12/26/2020,Charlotte,StarMed - Home Depot,Mecklenburg,12pm-5pm,Yes,Yes,14310 Rivergate Pkwy,Charlotte,NC 28273,(704) 615-7754,35.101065,-80.98414


In [26]:
tc2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Date                  5 non-null      object 
 1   City                  5 non-null      object 
 2   Name                  5 non-null      object 
 3   County                5 non-null      object 
 4   Time                  5 non-null      object 
 5   DHHS Event?           5 non-null      object 
 6   Site Tests Children?  5 non-null      object 
 7   Address               5 non-null      object 
 8   City.1                5 non-null      object 
 9   State                 5 non-null      object 
 10  Contact Info          5 non-null      object 
 11  Latitude              5 non-null      float64
 12  Longitude             5 non-null      float64
dtypes: float64(2), object(11)
memory usage: 648.0+ bytes


Import Geospatial data csv that will be used to add geospatial coordinates to the Zip Code Cases and Deaths in Charlotte 

In [27]:
##Import geospatial data via csv###
gl=pd.read_csv('https://raw.githubusercontent.com/kimberlycoxfisher0330/Coursera_Capstone/main/us-zip-code-latitude-and-longitude%20Charlotte.csv')
gl

Unnamed: 0,Zip,City,State,Latitude,Longitude
0,28261,Charlotte,NC,35.26002,-80.804151
1,28219,Charlotte,NC,35.26002,-80.804151
2,28260,Charlotte,NC,35.26002,-80.804151
3,28229,Charlotte,NC,35.26002,-80.804151
4,28213,Charlotte,NC,35.280464,-80.75678
5,28269,Charlotte,NC,35.329235,-80.80486
6,28243,Charlotte,NC,35.26002,-80.804151
7,28234,Charlotte,NC,35.26002,-80.804151
8,28203,Charlotte,NC,35.208992,-80.85539
9,28283,Charlotte,NC,35.26002,-80.804151


Merge the geospatial data with the Zip Codes Cases and Deaths Table

In [28]:
gl.rename(columns = {'Zip':'ZIP Code'}, inplace=True)
gl

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude
0,28261,Charlotte,NC,35.26002,-80.804151
1,28219,Charlotte,NC,35.26002,-80.804151
2,28260,Charlotte,NC,35.26002,-80.804151
3,28229,Charlotte,NC,35.26002,-80.804151
4,28213,Charlotte,NC,35.280464,-80.75678
5,28269,Charlotte,NC,35.329235,-80.80486
6,28243,Charlotte,NC,35.26002,-80.804151
7,28234,Charlotte,NC,35.26002,-80.804151
8,28203,Charlotte,NC,35.208992,-80.85539
9,28283,Charlotte,NC,35.26002,-80.804151


In [29]:
gl.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ZIP Code   77 non-null     int64  
 1   City       77 non-null     object 
 2   State      77 non-null     object 
 3   Latitude   77 non-null     float64
 4   Longitude  77 non-null     float64
dtypes: float64(2), int64(1), object(2)
memory usage: 3.1+ KB


In [30]:
gl['ZIP Code']=gl['ZIP Code'].astype('str')
gl

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude
0,28261,Charlotte,NC,35.26002,-80.804151
1,28219,Charlotte,NC,35.26002,-80.804151
2,28260,Charlotte,NC,35.26002,-80.804151
3,28229,Charlotte,NC,35.26002,-80.804151
4,28213,Charlotte,NC,35.280464,-80.75678
5,28269,Charlotte,NC,35.329235,-80.80486
6,28243,Charlotte,NC,35.26002,-80.804151
7,28234,Charlotte,NC,35.26002,-80.804151
8,28203,Charlotte,NC,35.208992,-80.85539
9,28283,Charlotte,NC,35.26002,-80.804151


In [31]:
gl.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ZIP Code   77 non-null     object 
 1   City       77 non-null     object 
 2   State      77 non-null     object 
 3   Latitude   77 non-null     float64
 4   Longitude  77 non-null     float64
dtypes: float64(2), object(3)
memory usage: 3.1+ KB


In [32]:
zip_loc = pd.merge(df3, gl)
zip_loc

Unnamed: 0,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths,City,State,Latitude,Longitude
0,28202,989,590,5903,0,Charlotte,NC,35.227192,-80.84419
1,28203,1073,563,5627,3,Charlotte,NC,35.208992,-80.85539
2,28204,533,610,6100,3,Charlotte,NC,35.214693,-80.82665
3,28205,2544,491,4909,36,Charlotte,NC,35.2224,-80.79221
4,28206,693,455,4546,5,Charlotte,NC,35.248292,-80.82748
5,28207,366,375,3750,3,Charlotte,NC,35.197643,-80.82752
6,28208,2057,515,5150,18,Charlotte,NC,35.235791,-80.89295
7,28209,1113,428,4279,5,Charlotte,NC,35.178543,-80.85375
8,28210,2136,478,4778,25,Charlotte,NC,35.13451,-80.85632
9,28211,1160,371,3712,29,Charlotte,NC,35.170094,-80.79857


Import remaining libraries to explore and cluster data from Charlotte neighborhoods

In [33]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

from geopy.geocoders import Nominatim

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!pip install folium

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [34]:
address = 'Charlotte, North Carolina'

geolocator = Nominatim(user_agent="charlotte_explorer")
location=geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Charlotte are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Charlotte are 35.2272086, -80.8430827.


Import data from FourSquare API on Charlotte neighborhoods

In [35]:
CLIENT_ID = '1VOZODD4LEIZKKKTOAD5LAHEHHZWC4IV0FPQYEIVO4GG2IJE' # your Foursquare ID
CLIENT_SECRET = 'CIZWXCTGZJMTQJOWAZLB41MWUSLNWIE25PVL5DEVZCZQKCRB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1VOZODD4LEIZKKKTOAD5LAHEHHZWC4IV0FPQYEIVO4GG2IJE
CLIENT_SECRET:CIZWXCTGZJMTQJOWAZLB41MWUSLNWIE25PVL5DEVZCZQKCRB


In [36]:
# create map of Charlotte and positive test cases and deaths using latitude and longitude values
map_charlotte = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, Cases, Deaths in zip(zip_loc['Latitude'], zip_loc['Longitude'], zip_loc['Cases'], zip_loc['Deaths']):
    label = '{}, {}'.format(Cases,Deaths)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_charlotte)  
    
map_charlotte

Use FourSquare API to identify neighborhoods in Charlotte

In [37]:
neighborhood_latitude = zip_loc.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = zip_loc.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = zip_loc.loc[0, 'ZIP Code'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of 28202 are 35.227191999999995, -80.84419.


In [38]:
LIMIT = 100 
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=1VOZODD4LEIZKKKTOAD5LAHEHHZWC4IV0FPQYEIVO4GG2IJE&client_secret=CIZWXCTGZJMTQJOWAZLB41MWUSLNWIE25PVL5DEVZCZQKCRB&v=20180605&ll=35.227191999999995,-80.84419&radius=500&limit=100'

In [39]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fe4d3384bde931d9e690e10'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Third Ward',
  'headerFullLocation': 'Third Ward, Charlotte',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 121,
  'suggestedBounds': {'ne': {'lat': 35.2316920045, 'lng': -80.83869145595618},
   'sw': {'lat': 35.22269199549999, 'lng': -80.84968854404381}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bad5829f964a52071483be3',
       'name': 'Blumenthal Performing Arts Center',
       'location': {'address': '130 N Tryon St',
        'crossStreet': 'at 5th St',
        'lat': 35.22792953956913,
        'lng': 

In [40]:

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [41]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Blumenthal Performing Arts Center,Performing Arts Venue,35.22793,-80.841951
1,The Capital Grille,American Restaurant,35.228216,-80.841974
2,Belk Theater,Concert Hall,35.227711,-80.841663
3,Not Just Coffee,Café,35.226891,-80.846126
4,Charlotte Athletic Club: Trade & Tryon,Gym / Fitness Center,35.226238,-80.842897


Create new map with the positive test cases and deaths and overlaying the testing center data 

In [42]:
# create map of Charlotte and positive test cases and deaths using latitude and longitude values
map_charlotte2 = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, Cases, Deaths in zip(zip_loc['Latitude'], zip_loc['Longitude'], zip_loc['Cases'], zip_loc['Deaths']):
    label = '{}, {}'.format(Cases,Deaths)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_charlotte2)  
for lat, lng in zip(tc2['Latitude'], tc2['Longitude']):
    label = '{}, {}'.format(Cases,Deaths)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_charlotte2)    
map_charlotte2

##Analysis

Now will segment and cluster zip codes with testing centers and those without and make comparisons. Observations will include number of positive cases and deaths within the selected zip codes and whether or not there is a testing center available.  Based on that, will make recommendation to add a testing center for access to those living in the zip code.

First, briefly review the venues using FourSquare data.  Based on that information, may be able to draw conclusion the population of that zip code and, if there is a testing center there, why the city chose that location for the center.  

In [43]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [44]:
nearby_venues = getNearbyVenues(names=zip_loc['ZIP Code'],
                                   latitudes=zip_loc['Latitude'],
                                   longitudes=zip_loc['Longitude']
                                  )

28202
28203
28204
28205
28206
28207
28208
28209
28210
28211
28212
28213
28214
28215
28216
28217
28223
28226
28227
28253
28262
28269
28270
28273
28277
28278


In [45]:
print(nearby_venues.shape)
nearby_venues.head()

(272, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,28202,35.227192,-80.84419,Blumenthal Performing Arts Center,35.22793,-80.841951,Performing Arts Venue
1,28202,35.227192,-80.84419,The Capital Grille,35.228216,-80.841974,American Restaurant
2,28202,35.227192,-80.84419,Belk Theater,35.227711,-80.841663,Concert Hall
3,28202,35.227192,-80.84419,Not Just Coffee,35.226891,-80.846126,Café
4,28202,35.227192,-80.84419,Charlotte Athletic Club: Trade & Tryon,35.226238,-80.842897,Gym / Fitness Center


In [46]:
nearby_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
28202,100,100,100,100,100,100
28203,32,32,32,32,32,32
28204,23,23,23,23,23,23
28205,2,2,2,2,2,2
28206,4,4,4,4,4,4
28207,17,17,17,17,17,17
28209,3,3,3,3,3,3
28210,7,7,7,7,7,7
28211,2,2,2,2,2,2
28212,4,4,4,4,4,4


In [47]:
print('There are {} uniques categories.'.format(len(nearby_venues['Neighborhood'].unique())))

There are 24 uniques categories.


In [48]:
# one hot encoding
nearby_venues_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nearby_venues_onehot['Neighborhood'] = nearby_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nearby_venues_onehot.columns[-1]] + list(nearby_venues_onehot.columns[:-1])
nearby_venues_onehot = nearby_venues_onehot[fixed_columns]

nearby_venues_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Art Gallery,Art Museum,Art Studio,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Bed & Breakfast,Bike Rental / Bike Share,Bistro,Botanical Garden,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Library,College Rec Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Event Space,Fast Food Restaurant,Food,Food Court,French Restaurant,Fried Chicken Joint,Garden Center,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,High School,Historic Site,History Museum,Home Service,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Latin American Restaurant,Liquor Store,Locksmith,Lounge,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Moving Target,Music Venue,Nail Salon,New American Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pharmacy,Picnic Area,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shopping Mall,Smoke Shop,Smoothie Shop,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Trail,Wine Bar,Wings Joint,Women's Store
0,28202,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,28202,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,28202,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,28202,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,28202,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [49]:
nearby_venues_onehot.shape

(272, 136)

In [50]:
dc_grouped = nearby_venues_onehot.groupby('Neighborhood').mean().reset_index()
dc_grouped

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Art Gallery,Art Museum,Art Studio,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Bed & Breakfast,Bike Rental / Bike Share,Bistro,Botanical Garden,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Library,College Rec Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Event Space,Fast Food Restaurant,Food,Food Court,French Restaurant,Fried Chicken Joint,Garden Center,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,High School,Historic Site,History Museum,Home Service,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Latin American Restaurant,Liquor Store,Locksmith,Lounge,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Moving Target,Music Venue,Nail Salon,New American Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pharmacy,Picnic Area,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shopping Mall,Smoke Shop,Smoothie Shop,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Trail,Wine Bar,Wings Joint,Women's Store
0,28202,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.04,0.02,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.03,0.04,0.0,0.02,0.01,0.04,0.0,0.01,0.01,0.03,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.04,0.0,0.01,0.0,0.0
1,28203,0.0625,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.03125,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.03125,0.0,0.03125
2,28204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.043478,0.0,0.043478,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,28205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,28206,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,28207,0.117647,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0
6,28209,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,28210,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,28211,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,28212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [51]:
dc_grouped.shape

(24, 136)

In [52]:
num_top_venues = 5

for hood in dc_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dc_grouped[dc_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(int)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----28202----
                       venue  freq
0        American Restaurant     0
1  Middle Eastern Restaurant     0
2                      Plaza     0
3                 Playground     0
4                Pizza Place     0


----28203----
                       venue  freq
0        American Restaurant     0
1  Middle Eastern Restaurant     0
2                      Plaza     0
3                 Playground     0
4                Pizza Place     0


----28204----
                       venue  freq
0        American Restaurant     0
1  Middle Eastern Restaurant     0
2                      Plaza     0
3                 Playground     0
4                Pizza Place     0


----28205----
                       venue  freq
0        American Restaurant     0
1  Middle Eastern Restaurant     0
2                      Plaza     0
3                 Playground     0
4                Pizza Place     0


----28206----
                       venue  freq
0        American Restaurant     0
1  Middle Ea

In [53]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [54]:

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dc_grouped['Neighborhood']

for ind in np.arange(dc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,28202,Steakhouse,Pizza Place,American Restaurant,Theater,Restaurant
1,28203,American Restaurant,Bar,Ice Cream Shop,Pizza Place,Pharmacy
2,28204,Pizza Place,New American Restaurant,School,Sandwich Place,Rock Club
3,28205,Auto Workshop,Southern / Soul Food Restaurant,Women's Store,Cosmetics Shop,Dive Bar
4,28206,American Restaurant,Coffee Shop,Grocery Store,Discount Store,College Cafeteria


In [55]:
# set number of clusters
kclusters = 10

dc_grouped_clustering = dc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 7, 1, 1, 1, 1, 5, 4], dtype=int32)

In [61]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

charlotte_merged= zip_loc


charlotte_merged = charlotte_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='ZIP Code')

charlotte_merged # check the last columns!

Unnamed: 0,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,28202,989,590,5903,0,Charlotte,NC,35.227192,-80.84419,1.0,Steakhouse,Pizza Place,American Restaurant,Theater,Restaurant
1,28203,1073,563,5627,3,Charlotte,NC,35.208992,-80.85539,1.0,American Restaurant,Bar,Ice Cream Shop,Pizza Place,Pharmacy
2,28204,533,610,6100,3,Charlotte,NC,35.214693,-80.82665,1.0,Pizza Place,New American Restaurant,School,Sandwich Place,Rock Club
3,28205,2544,491,4909,36,Charlotte,NC,35.2224,-80.79221,7.0,Auto Workshop,Southern / Soul Food Restaurant,Women's Store,Cosmetics Shop,Dive Bar
4,28206,693,455,4546,5,Charlotte,NC,35.248292,-80.82748,1.0,American Restaurant,Coffee Shop,Grocery Store,Discount Store,College Cafeteria
5,28207,366,375,3750,3,Charlotte,NC,35.197643,-80.82752,1.0,American Restaurant,Italian Restaurant,Nail Salon,Sculpture Garden,Thai Restaurant
6,28208,2057,515,5150,18,Charlotte,NC,35.235791,-80.89295,,,,,,
7,28209,1113,428,4279,5,Charlotte,NC,35.178543,-80.85375,1.0,Pool,Antique Shop,Grocery Store,Convenience Store,Discount Store
8,28210,2136,478,4778,25,Charlotte,NC,35.13451,-80.85632,1.0,Event Space,Pool,Art Studio,Music Venue,Park
9,28211,1160,371,3712,29,Charlotte,NC,35.170094,-80.79857,5.0,Historic Site,Doctor's Office,College Cafeteria,College Gym,College Library


In [62]:
charlotte_merged1=charlotte_merged.dropna()
charlotte_merged1

Unnamed: 0,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,28202,989,590,5903,0,Charlotte,NC,35.227192,-80.84419,1.0,Steakhouse,Pizza Place,American Restaurant,Theater,Restaurant
1,28203,1073,563,5627,3,Charlotte,NC,35.208992,-80.85539,1.0,American Restaurant,Bar,Ice Cream Shop,Pizza Place,Pharmacy
2,28204,533,610,6100,3,Charlotte,NC,35.214693,-80.82665,1.0,Pizza Place,New American Restaurant,School,Sandwich Place,Rock Club
3,28205,2544,491,4909,36,Charlotte,NC,35.2224,-80.79221,7.0,Auto Workshop,Southern / Soul Food Restaurant,Women's Store,Cosmetics Shop,Dive Bar
4,28206,693,455,4546,5,Charlotte,NC,35.248292,-80.82748,1.0,American Restaurant,Coffee Shop,Grocery Store,Discount Store,College Cafeteria
5,28207,366,375,3750,3,Charlotte,NC,35.197643,-80.82752,1.0,American Restaurant,Italian Restaurant,Nail Salon,Sculpture Garden,Thai Restaurant
7,28209,1113,428,4279,5,Charlotte,NC,35.178543,-80.85375,1.0,Pool,Antique Shop,Grocery Store,Convenience Store,Discount Store
8,28210,2136,478,4778,25,Charlotte,NC,35.13451,-80.85632,1.0,Event Space,Pool,Art Studio,Music Venue,Park
9,28211,1160,371,3712,29,Charlotte,NC,35.170094,-80.79857,5.0,Historic Site,Doctor's Office,College Cafeteria,College Gym,College Library
10,28212,2546,596,5956,24,Charlotte,NC,35.189544,-80.74742,4.0,Playground,Business Service,Home Service,Gift Shop,Deli / Bodega


In [63]:
charlotte_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26 entries, 0 to 25
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ZIP Code                     26 non-null     object 
 1   Cases                        26 non-null     int16  
 2   Cases Per 10,000 Residents   26 non-null     int16  
 3   Cases Per 100,000 Residents  26 non-null     int16  
 4   Deaths                       26 non-null     int16  
 5   City                         26 non-null     object 
 6   State                        26 non-null     object 
 7   Latitude                     26 non-null     float64
 8   Longitude                    26 non-null     float64
 9   Cluster Labels               24 non-null     float64
 10  1st Most Common Venue        24 non-null     object 
 11  2nd Most Common Venue        24 non-null     object 
 12  3rd Most Common Venue        24 non-null     object 
 13  4th Most Common Venue 

In [64]:
charlotte_merged.fillna(0)

Unnamed: 0,ZIP Code,Cases,"Cases Per 10,000 Residents","Cases Per 100,000 Residents",Deaths,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,28202,989,590,5903,0,Charlotte,NC,35.227192,-80.84419,1.0,Steakhouse,Pizza Place,American Restaurant,Theater,Restaurant
1,28203,1073,563,5627,3,Charlotte,NC,35.208992,-80.85539,1.0,American Restaurant,Bar,Ice Cream Shop,Pizza Place,Pharmacy
2,28204,533,610,6100,3,Charlotte,NC,35.214693,-80.82665,1.0,Pizza Place,New American Restaurant,School,Sandwich Place,Rock Club
3,28205,2544,491,4909,36,Charlotte,NC,35.2224,-80.79221,7.0,Auto Workshop,Southern / Soul Food Restaurant,Women's Store,Cosmetics Shop,Dive Bar
4,28206,693,455,4546,5,Charlotte,NC,35.248292,-80.82748,1.0,American Restaurant,Coffee Shop,Grocery Store,Discount Store,College Cafeteria
5,28207,366,375,3750,3,Charlotte,NC,35.197643,-80.82752,1.0,American Restaurant,Italian Restaurant,Nail Salon,Sculpture Garden,Thai Restaurant
6,28208,2057,515,5150,18,Charlotte,NC,35.235791,-80.89295,0.0,0,0,0,0,0
7,28209,1113,428,4279,5,Charlotte,NC,35.178543,-80.85375,1.0,Pool,Antique Shop,Grocery Store,Convenience Store,Discount Store
8,28210,2136,478,4778,25,Charlotte,NC,35.13451,-80.85632,1.0,Event Space,Pool,Art Studio,Music Venue,Park
9,28211,1160,371,3712,29,Charlotte,NC,35.170094,-80.79857,5.0,Historic Site,Doctor's Office,College Cafeteria,College Gym,College Library


In [65]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(charlotte_merged1['Latitude'], charlotte_merged1['Longitude'], charlotte_merged1['ZIP Code'], charlotte_merged1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

By observation of location of venues and number of positive cases and deaths, it appears that most of the testing centers are appropriately located towards the center of the city.  However, when looking at the positive cases located outside the city center, it appears that those particular neighborhoods are somewhat neglected.  

Examine the first Five Clusters to see if a more definite conclusion can be drawn and recommendation made

In [66]:
charlotte_merged1.loc[charlotte_merged1['Cluster Labels'] == 0, charlotte_merged1.columns[[0] + list(range(5, charlotte_merged1.shape[1]))]]

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
25,28278,Charlotte,NC,35.119012,-81.02213,0.0,Bike Rental / Bike Share,Women's Store,Cosmetics Shop,Discount Store,Deli / Bodega


In [67]:
charlotte_merged1.loc[charlotte_merged1['Cluster Labels'] == 1, charlotte_merged1.columns[[0] + list(range(5, charlotte_merged1.shape[1]))]]

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,28202,Charlotte,NC,35.227192,-80.84419,1.0,Steakhouse,Pizza Place,American Restaurant,Theater,Restaurant
1,28203,Charlotte,NC,35.208992,-80.85539,1.0,American Restaurant,Bar,Ice Cream Shop,Pizza Place,Pharmacy
2,28204,Charlotte,NC,35.214693,-80.82665,1.0,Pizza Place,New American Restaurant,School,Sandwich Place,Rock Club
4,28206,Charlotte,NC,35.248292,-80.82748,1.0,American Restaurant,Coffee Shop,Grocery Store,Discount Store,College Cafeteria
5,28207,Charlotte,NC,35.197643,-80.82752,1.0,American Restaurant,Italian Restaurant,Nail Salon,Sculpture Garden,Thai Restaurant
7,28209,Charlotte,NC,35.178543,-80.85375,1.0,Pool,Antique Shop,Grocery Store,Convenience Store,Discount Store
8,28210,Charlotte,NC,35.13451,-80.85632,1.0,Event Space,Pool,Art Studio,Music Venue,Park
12,28214,Charlotte,NC,35.276639,-80.96111,1.0,Supermarket,Pharmacy,Spa,Chinese Restaurant,Women's Store
16,28223,Charlotte,NC,35.305552,-80.73303,1.0,Fast Food Restaurant,Restaurant,Botanical Garden,College Academic Building,College Bookstore
17,28226,Charlotte,NC,35.107804,-80.82139,1.0,Golf Course,Gym Pool,Tennis Court,Convenience Store,Discount Store


In [68]:
charlotte_merged1.loc[charlotte_merged1['Cluster Labels'] == 2, charlotte_merged1.columns[[0] + list(range(5, charlotte_merged1.shape[1]))]]

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
18,28227,Charlotte,NC,35.192919,-80.66822,2.0,Baseball Field,Women's Store,Cosmetics Shop,Discount Store,Deli / Bodega


In [69]:
charlotte_merged1.loc[charlotte_merged1['Cluster Labels'] == 3, charlotte_merged1.columns[[0] + list(range(5, charlotte_merged1.shape[1]))]]

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,28213,Charlotte,NC,35.280464,-80.75678,3.0,Moving Target,Women's Store,Cosmetics Shop,Discount Store,Deli / Bodega


In [70]:
charlotte_merged1.loc[charlotte_merged1['Cluster Labels'] == 4, charlotte_merged1.columns[[0] + list(range(5, charlotte_merged1.shape[1]))]]

Unnamed: 0,ZIP Code,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,28212,Charlotte,NC,35.189544,-80.74742,4.0,Playground,Business Service,Home Service,Gift Shop,Deli / Bodega
19,28253,Charlotte,NC,35.26002,-80.804151,4.0,Locksmith,Home Service,IT Services,Business Service,Cosmetics Shop
