# Trending FourSquare Venues in COVID-19 Hot Spot Zip Codes in Virginia, USA

H. Diana McSpadden

2020-12-29

## Introduction
According to the Virginia Department of Health COVID-19 is spread through respiratory droplets, and is more likely spread when people are in close contact. "Community spread" is the public health term used to describe someone being infected by a disease when they do not know when they were in contact with the sick person. Community spread may be occuring Virginia communities at indoor public venues such as restaurants, bars, coffee shops, and grocery stores. 

Determining what categories of venues are correlated with high per capita COVID-19 infection rates can be helpful to determine community risk factors. This project uses FourSquare venues by Virginia zip codes to determine if such a correlation exists. Machine Learning, specific the Support Vector Machine method, is used to determine if high COVID-19 per capita infections can be predicted based on types of trending venues by zip code.


## The Data

Four data sources are utilized in this project:

1. **World Population Review** provides a csv of Virginia Zip Codes and estimates of the 2020 population for each zip code.
  * These data are required to acquire a dataset of Virginia zip codes, and to determine the per capita COVID-19 infections per zip code.
  * https://worldpopulationreview.com/zips/virginia
2. **Open Data Standard API** will be used to aquire a latitude and longitude for each zip code.
  * **It should be noted** that the FourSquare API can use zip codes with the ***&near*** querystring parameter, but in order for more practice I will retrieve latitudes and longitudes.
  * https://public.opendatasoft.com/api/records/1.0/search/?dataset=us-zip-code-latitude-and-longitude&q=VA&rows=1275&facet=state&facet=timezone&facet=dst&refine.state=VA
3. **Virginia's Open Data Portal of COVID-19 Counts by Zip Code** provides cummulative COVID-19 counts per Virginia zip code.
  * These data are required to calculate the per capita COVID-19 infections per zip code.
  * Data are included for 5/15/2020 - 12-27-2020: I will use **December 1, 2020 - December 27, 2020** in order to focus on the most current infections, and also to best match venue data which is from December 28, 2020.
  * I subtract total cases for each zip code on November 30th from the total cases on December 27th.
  * I divide the total December cases by zip code population to determine the per capita COVID case count.
  * I then determine the mean per capita case count and assign a categorical label: "low" or "high" to the zip code based on whether the zip code's per capita COVID-19 December cases are above or below the mean.
  * https://data.virginia.gov/Government/VDH-COVID-19-PublicUseDataset-ZIPCode/8bkr-zfqv

4. **FourSquare Venues Endpoint** will provide venues within 2000m (2 km) of the zip code.
  * The FourSquare API will return categories for each venue. **The top 10 categories for each zip code will be calculated, and the resulting data will be used in the Support Vector Machine to determine what accuracy we can predict high or low per capita COVID-19 infections by venue categories.**
  * **It should be noted** that many venues will be beyond the 2000m radius of the FourSquare search. This project is for example purposes only.
  * https://developer.foursquare.com/docs/places-api/endpoints/
  

### Collect the Data

Import needed libraries

In [4]:
# import needed libraries
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

#!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


#!pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


#### Collect Population For Each Virginia Zip Code

Load the csv of population by Virginia zip code from World Population Review.

In [5]:
df_ZipPop = pd.read_csv(r'Data/VirginiaPopByZipCode.csv')
print(df_ZipPop.shape)
df_ZipPop.head()

(892, 4)


Unnamed: 0,zip,city,county,pop
0,22193,Woodbridge,Prince William,82573
1,23464,Virginia Beach,Virginia Beach City,76495
2,22191,Woodbridge,Prince William,70641
3,23322,Chesapeake,Chesapeake City,65603
4,20147,Ashburn,Loudoun,64197


Need to change data type of the zip column for later merging.

In [6]:
# need to change data type for later merging
df_ZipPop['zip'] = df_ZipPop['zip'].astype(str)
print(df_ZipPop.dtypes)

zip       object
city      object
county    object
pop        int64
dtype: object


#### Collect Latitude and Longitude for each zip code

From Open Data Standard retrieve all Virginia zip codes and corresponding latitudes and longitudes. Save the results in DataFrame and merge the DataFrame with the population DataFrame.

In [7]:
#https://public.opendatasoft.com/api/records/1.0/search/?dataset=us-zip-code-latitude-and-longitude&q=VA&rows=1275&facet=state&facet=timezone&facet=dst&refine.state=VA

zip_url = 'https://public.opendatasoft.com/api/records/1.0/search/?dataset=us-zip-code-latitude-and-longitude&q=VA&rows=1275&facet=state&facet=timezone&facet=dst&refine.state=VA'
            
# make the GET request
zip_results = requests.get(zip_url).json()['records']
#print(zip_results)
zip_latlon_list=[]
zip_latlon_list.append([(z['fields']['zip'], z['fields']['latitude'], z['fields']['longitude']) for z in zip_results])

df_ZipLatLon = pd.DataFrame([item for zip_latlon_list in zip_latlon_list for item in zip_latlon_list])
df_ZipLatLon.columns = ['zip', 'Latitude', 'Longitude']

In [8]:
print(df_ZipLatLon.shape)
df_ZipLatLon['zip'] = df_ZipLatLon['zip'].astype(str)
print(df_ZipLatLon.dtypes)
df_ZipLatLon.head()

(1275, 3)
zip           object
Latitude     float64
Longitude    float64
dtype: object


Unnamed: 0,zip,Latitude,Longitude
0,23887,36.563755,-77.82652
1,24656,37.198005,-82.12193
2,24149,37.011934,-80.41897
3,23125,37.342721,-76.27989
4,22652,38.823987,-78.4531


* Merge the DataFrames
* Drop NA values from the merged DataFrame

In [9]:
df_ZipUnion = df_ZipLatLon.merge(df_ZipPop, on='zip', how='left')

In [10]:
df_ZipUnion.dropna(inplace=True)
print(df_ZipUnion.shape)
df_ZipUnion.head()

(891, 6)


Unnamed: 0,zip,Latitude,Longitude,city,county,pop
0,23887,36.563755,-77.82652,Valentines,Brunswick,387.0
1,24656,37.198005,-82.12193,Vansant,Buchanan,2812.0
2,24149,37.011934,-80.41897,Riner,Montgomery,3530.0
3,23125,37.342721,-76.27989,New Point,Mathews,255.0
4,22652,38.823987,-78.4531,Fort Valley,Shenandoah,1339.0


Create a new **areaName** column in the DataFrame with CITY, COUNTY (ZIPCODE) for display purposes.

In [11]:
df_ZipUnion['areaName'] = df_ZipUnion['city'] + ', ' + df_ZipUnion['county'] + " (" + df_ZipUnion['zip'] + ')'
df_ZipUnion.head()

Unnamed: 0,zip,Latitude,Longitude,city,county,pop,areaName
0,23887,36.563755,-77.82652,Valentines,Brunswick,387.0,"Valentines, Brunswick (23887)"
1,24656,37.198005,-82.12193,Vansant,Buchanan,2812.0,"Vansant, Buchanan (24656)"
2,24149,37.011934,-80.41897,Riner,Montgomery,3530.0,"Riner, Montgomery (24149)"
3,23125,37.342721,-76.27989,New Point,Mathews,255.0,"New Point, Mathews (23125)"
4,22652,38.823987,-78.4531,Fort Valley,Shenandoah,1339.0,"Fort Valley, Shenandoah (22652)"


#### Collect Virginia COVID-19 Infection Counts By Zip Code

Load the csv of Virginia's Open Data Portal of COVID-19 Counts by Zip Code 

In [53]:
df_ZipCOVID = pd.read_csv(r'Data/VDH-COVID-19-PublicUseDataset-ZIPCode.csv')
print(df_ZipCOVID.shape)
df_ZipCOVID.head()

(199668, 5)


Unnamed: 0,Report Date,ZIP Code,Number of Cases,Number of Testing Encounters,Number of PCR Testing Encounters
0,10/29/2020,20105,327,,6316
1,10/29/2020,20106,78,,792
2,10/29/2020,20109,1868,,12598
3,10/29/2020,20110,2327,,15054
4,10/29/2020,20111,1566,,10305


In [54]:
df_ZipCOVID.dtypes

Report Date                          object
ZIP Code                             object
Number of Cases                      object
Number of Testing Encounters        float64
Number of PCR Testing Encounters     object
dtype: object

Set Report Date to datetime format.

In [55]:
df_ZipCOVID['Report Date'] =  pd.to_datetime(df_ZipCOVID['Report Date'], infer_datetime_format=True)
df_ZipCOVID.dtypes

Report Date                         datetime64[ns]
ZIP Code                                    object
Number of Cases                             object
Number of Testing Encounters               float64
Number of PCR Testing Encounters            object
dtype: object

* I only need the Report Date, Zip Code, and Number of Cases columns.
* Drop NaN records
* rename columns to ReportDate, zip, and NumCases
* I only need the values for dates greater than November 30th.

In [56]:
df_ZipCOVID.drop(['Number of Testing Encounters','Number of PCR Testing Encounters'], axis=1, inplace=True)
df_ZipCOVID.dropna(inplace=True)
df_ZipCOVID = df_ZipCOVID[df_ZipCOVID['Number of Cases'] != 'Suppressed']
df_ZipCOVID = df_ZipCOVID[df_ZipCOVID['Number of Cases'] != 'Suppressed*']
print(df_ZipCOVID.shape)


(169113, 3)


In [57]:
# set Num of Cases to int
df_ZipCOVID['Number of Cases'] =  pd.to_numeric(df_ZipCOVID['Number of Cases'])
df_ZipCOVID.dtypes

Report Date        datetime64[ns]
ZIP Code                   object
Number of Cases             int64
dtype: object

In [58]:
# renaming the columns 
df_ZipCOVID.rename({"Report Date": "ReportDate",  
           "ZIP Code": "zip",  
           "Number of Cases": "NumCases"},  
          axis = "columns", inplace = True) 
print(df_ZipCOVID.shape)
df_ZipCOVID.head()

(169113, 3)


Unnamed: 0,ReportDate,zip,NumCases
0,2020-10-29,20105,327
1,2020-10-29,20106,78
2,2020-10-29,20109,1868
3,2020-10-29,20110,2327
4,2020-10-29,20111,1566


In [59]:
# remove rows with dates less than 2020-11-30
df_ZipCOVIDDecember = df_ZipCOVID[df_ZipCOVID.ReportDate > '2020-11-30']
print(df_ZipCOVIDDecember.shape)
df_ZipCOVIDDecember.head()

(22760, 3)


Unnamed: 0,ReportDate,zip,NumCases
31712,2020-12-01,20105,455
31713,2020-12-01,20106,106
31714,2020-12-01,20109,2266
31715,2020-12-01,20110,2703
31716,2020-12-01,20111,1922


In [60]:
# groupby zip code, subtract the minimum NumCases from the maximum NumCases to get the December Number Cases
df_ZIPDecCaseCount = df_ZipCOVIDDecember.groupby('zip').agg({'NumCases': ['min', 'max']}) 
print(df_ZIPDecCaseCount.shape)
df_ZIPDecCaseCount.head()

(867, 2)


Unnamed: 0_level_0,NumCases,NumCases
Unnamed: 0_level_1,min,max
zip,Unnamed: 1_level_2,Unnamed: 2_level_2
20105,455,643
20106,106,140
20109,2266,2786
20110,2703,3367
20111,1922,2487


In [61]:
#rename columns 
df_ZIPDecCaseCount.columns = ['min', 'max']
# calculate the December case counts
df_ZIPDecCaseCount['DecNumCases'] = df_ZIPDecCaseCount['max'] - df_ZIPDecCaseCount['min']

# drop the min and max columns
df_ZIPDecCaseCount.drop(['min','max'], axis=1, inplace=True)
print(df_ZIPDecCaseCount.shape)
df_ZIPDecCaseCount.head()

(867, 1)


Unnamed: 0_level_0,DecNumCases
zip,Unnamed: 1_level_1
20105,188
20106,34
20109,520
20110,664
20111,565


In [62]:
# reset the index
df_ZIPDecCaseCount.reset_index(inplace=True)
print(df_ZIPDecCaseCount.shape)
df_ZIPDecCaseCount.head()

(867, 2)


Unnamed: 0,zip,DecNumCases
0,20105,188
1,20106,34
2,20109,520
3,20110,664
4,20111,565


**Merge the Num Cases DataFrame with the df_ZipUnion DataFrame**

In [64]:
print("df_ZipUnion Shape: ",df_ZipUnion.shape)
print("df_ZIPDecCaseCount Shape: ",df_ZIPDecCaseCount.shape)

df_ZipUnion Shape:  (891, 7)
df_ZIPDecCaseCount Shape:  (867, 2)


There are some zip codes with 0 cases. Need to make sure the merge works to not eliminate those zips, and I will need to add a 0 value to the NaNs.

In [65]:
df_ZipCOVIDUnion = df_ZipUnion.merge(df_ZIPDecCaseCount, on='zip', how='left')
df_ZipCOVIDUnion.shape

(891, 8)

In [67]:
df_ZipCOVIDUnion.head(10)

Unnamed: 0,zip,Latitude,Longitude,city,county,pop,areaName,DecNumCases
0,23887,36.563755,-77.82652,Valentines,Brunswick,387.0,"Valentines, Brunswick (23887)",0.0
1,24656,37.198005,-82.12193,Vansant,Buchanan,2812.0,"Vansant, Buchanan (24656)",46.0
2,24149,37.011934,-80.41897,Riner,Montgomery,3530.0,"Riner, Montgomery (24149)",52.0
3,23125,37.342721,-76.27989,New Point,Mathews,255.0,"New Point, Mathews (23125)",1.0
4,22652,38.823987,-78.4531,Fort Valley,Shenandoah,1339.0,"Fort Valley, Shenandoah (22652)",7.0
5,22311,38.837312,-77.12064,Alexandria,Alexandria City,18952.0,"Alexandria, Alexandria City (22311)",260.0
6,24174,37.357587,-79.66552,Thaxton,Bedford,2762.0,"Thaxton, Bedford (24174)",18.0
7,23601,37.053346,-76.45948,Newport News,Newport News City,25578.0,"Newport News, Newport News City (23601)",255.0
8,22578,37.64424,-76.36034,White Stone,Lancaster,2315.0,"White Stone, Lancaster (22578)",23.0
9,23389,37.661513,-75.8315,Harborton,Accomack,258.0,"Harborton, Accomack (23389)",


The tenth row has a NaN, need to set this, and all other NaN's to 0.

In [68]:
df_ZipCOVIDUnion.fillna(0, inplace=True)
df_ZipCOVIDUnion.head(10)

Unnamed: 0,zip,Latitude,Longitude,city,county,pop,areaName,DecNumCases
0,23887,36.563755,-77.82652,Valentines,Brunswick,387.0,"Valentines, Brunswick (23887)",0.0
1,24656,37.198005,-82.12193,Vansant,Buchanan,2812.0,"Vansant, Buchanan (24656)",46.0
2,24149,37.011934,-80.41897,Riner,Montgomery,3530.0,"Riner, Montgomery (24149)",52.0
3,23125,37.342721,-76.27989,New Point,Mathews,255.0,"New Point, Mathews (23125)",1.0
4,22652,38.823987,-78.4531,Fort Valley,Shenandoah,1339.0,"Fort Valley, Shenandoah (22652)",7.0
5,22311,38.837312,-77.12064,Alexandria,Alexandria City,18952.0,"Alexandria, Alexandria City (22311)",260.0
6,24174,37.357587,-79.66552,Thaxton,Bedford,2762.0,"Thaxton, Bedford (24174)",18.0
7,23601,37.053346,-76.45948,Newport News,Newport News City,25578.0,"Newport News, Newport News City (23601)",255.0
8,22578,37.64424,-76.36034,White Stone,Lancaster,2315.0,"White Stone, Lancaster (22578)",23.0
9,23389,37.661513,-75.8315,Harborton,Accomack,258.0,"Harborton, Accomack (23389)",0.0


### Calculate the per population number of December COVID Cases for each Virginia zip code

In [70]:
df_ZipCOVIDUnion['DecCOVIDPerCap'] = df_ZipCOVIDUnion['DecNumCases'] / df_ZipCOVIDUnion['pop']

Unnamed: 0,zip,Latitude,Longitude,city,county,pop,areaName,DecNumCases,DecCOVIDPerCap
0,23887,36.563755,-77.82652,Valentines,Brunswick,387.0,"Valentines, Brunswick (23887)",0.0,0.0
1,24656,37.198005,-82.12193,Vansant,Buchanan,2812.0,"Vansant, Buchanan (24656)",46.0,0.016358
2,24149,37.011934,-80.41897,Riner,Montgomery,3530.0,"Riner, Montgomery (24149)",52.0,0.014731
3,23125,37.342721,-76.27989,New Point,Mathews,255.0,"New Point, Mathews (23125)",1.0,0.003922
4,22652,38.823987,-78.4531,Fort Valley,Shenandoah,1339.0,"Fort Valley, Shenandoah (22652)",7.0,0.005228
5,22311,38.837312,-77.12064,Alexandria,Alexandria City,18952.0,"Alexandria, Alexandria City (22311)",260.0,0.013719
6,24174,37.357587,-79.66552,Thaxton,Bedford,2762.0,"Thaxton, Bedford (24174)",18.0,0.006517
7,23601,37.053346,-76.45948,Newport News,Newport News City,25578.0,"Newport News, Newport News City (23601)",255.0,0.00997
8,22578,37.64424,-76.36034,White Stone,Lancaster,2315.0,"White Stone, Lancaster (22578)",23.0,0.009935
9,23389,37.661513,-75.8315,Harborton,Accomack,258.0,"Harborton, Accomack (23389)",0.0,0.0


In [92]:
df_ZipCOVIDUnion.replace([np.inf, -np.inf], np.nan)
df_ZipCOVIDUnion.fillna(0, inplace=True)
df_ZipCOVIDUnion.dtypes

zip                object
Latitude          float64
Longitude         float64
city               object
county             object
pop               float64
areaName           object
DecNumCases       float64
DecCOVIDPerCap    float64
dtype: object

In [95]:
df_ZipCOVIDUnion.describe()


Unnamed: 0,Latitude,Longitude,pop,DecNumCases,DecCOVIDPerCap
count,891.0,891.0,891.0,891.0,891.0
mean,37.655585,-78.352604,9307.725028,101.529742,inf
std,0.741489,1.770358,13765.797134,153.62516,
min,36.554164,-83.48322,0.0,0.0,0.0
25%,37.023567,-79.36043,831.0,7.0,0.006452
50%,37.528701,-77.99875,2777.0,31.0,0.009658
75%,38.23487,-77.10044,11063.5,134.0,0.013966
max,39.345906,-75.3678,82573.0,1232.0,inf


In [101]:
# calculate the thredhold for above average December Cases Per Capita, vs below average December Cases Per Capita
threshold_DecCasesPerCap = df_ZipCOVIDUnion['DecNumCases'].mean() / df_ZipCOVIDUnion['pop'].mean()
threshold_DecCasesPerCap

0.010908115737950074

### Threshold For Above or Below December COVID-19 Cases Per Capita in Virgina Counties: 1.1% of zip code population

Add a column to the df_ZipCOVIDUnion DataFrame with 'above' or 'below' depending on if the DecCOVIDPerCap is > 0.011 or < 0.011

In [102]:
df_ZipCOVIDUnion['aboveorbelow'] = np.where(df_ZipCOVIDUnion['DecCOVIDPerCap'] < 0.011, 'below', 'above')
df_ZipCOVIDUnion.head()

Unnamed: 0,zip,Latitude,Longitude,city,county,pop,areaName,DecNumCases,DecCOVIDPerCap,aboveorbelow
0,23887,36.563755,-77.82652,Valentines,Brunswick,387.0,"Valentines, Brunswick (23887)",0.0,0.0,below
1,24656,37.198005,-82.12193,Vansant,Buchanan,2812.0,"Vansant, Buchanan (24656)",46.0,0.016358,above
2,24149,37.011934,-80.41897,Riner,Montgomery,3530.0,"Riner, Montgomery (24149)",52.0,0.014731,above
3,23125,37.342721,-76.27989,New Point,Mathews,255.0,"New Point, Mathews (23125)",1.0,0.003922,below
4,22652,38.823987,-78.4531,Fort Valley,Shenandoah,1339.0,"Fort Valley, Shenandoah (22652)",7.0,0.005228,below


For ongoing working I only need the following columns:
* zip
* Latitude
* Longitude
* areaName
* aboveorbelow

In [103]:
df_ZipsToUse = df_ZipCOVIDUnion.drop(['city','county','pop','DecNumCases','DecCOVIDPerCap'], axis=1)
print(df_ZipsToUse.shape)
df_ZipsToUse.head()

(891, 5)


Unnamed: 0,zip,Latitude,Longitude,areaName,aboveorbelow
0,23887,36.563755,-77.82652,"Valentines, Brunswick (23887)",below
1,24656,37.198005,-82.12193,"Vansant, Buchanan (24656)",above
2,24149,37.011934,-80.41897,"Riner, Montgomery (24149)",above
3,23125,37.342721,-76.27989,"New Point, Mathews (23125)",below
4,22652,38.823987,-78.4531,"Fort Valley, Shenandoah (22652)",below


#### Collect the venues near the latitude, longitude of the zip codes
For each zip code query the FourSquare API for a max of 50 venues within 2000m (2km) of the zip code lat, lon

In [104]:
CLIENT_ID = 'FHLIENCAGLQPDQRTXZFUBNVHPMW4TRXLFQN0OLHZ3SJXQGQF' 
CLIENT_SECRET = 'VT2WIU3QJHHQUFY04FC4BKHZJIMUA1TNZE4TAKYYUX4QGAEJ' 
VERSION = '20180605'
LIMIT = 100 # A default Foursquare API limit value

In [105]:
names = df_ZipsToUse['areaName']
latitudes = df_ZipsToUse['Latitude']
longitudes = df_ZipsToUse['Longitude']
radius = 2000

In [106]:
explore_venues_list = []

for name, lat, lng in zip(names, latitudes, longitudes):
     
    # create the API request URL
    explore_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

    try:
        # make the GET request
        explore_results = requests.get(explore_url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        explore_venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in explore_results])
    except:
        explore_venues_list.append([('NaN','NaN','NaN','NaN','NaN','NaN','NaN')])

In [109]:
nearby_venues = pd.DataFrame([item for explore_venues_list in explore_venues_list for item in explore_venues_list])
nearby_venues.columns = ['areaName', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

In [144]:
print(nearby_venues.shape)
nearby_venues.head()

(15250, 7)


Unnamed: 0,areaName,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Riner, Montgomery (24149)",37.011934,-80.41897,bread basket,37.018349,-80.428886,Bakery
1,"Riner, Montgomery (24149)",37.011934,-80.41897,B & B Quality Fencing Inc,37.023586,-80.426829,Moving Target
2,"New Point, Mathews (23125)",37.342721,-76.27989,New Point RV Resort,37.34438,-76.276341,RV Park
3,"New Point, Mathews (23125)",37.342721,-76.27989,Warung Kopi Fauzan Lampineung,37.353405,-76.285563,Café
4,"New Point, Mathews (23125)",37.342721,-76.27989,Tower Coffee,37.353405,-76.285563,Coffee Shop


## Methodology 

### Exploratory Data Analysis

Before training and testing the Support Vector Machine I would like to explore the two DataFrames I have created:

1. df_ZipsToUse: contains the Virginia zip codes, areaNames, and the categorical variables: aboveorbelow which designates whether the zip has December 2020 covid infections above or below the Virginia per capita mean.
2. nearby_venues: contains venues near the areas and the venue category 

#### Exploring df_ZipsToUse

What is the count of above per capita infections zip codes and below per capita zip codes?

In [111]:
df_ZipsToUse.groupby('aboveorbelow').count()

Unnamed: 0_level_0,zip,Latitude,Longitude,areaName
aboveorbelow,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
above,361,361,361,361
below,530,530,530,530


**361** zip codes are ***above*** the mean December 2020 COVID infections.

**530** zip codes are ***below*** the mean December 2020 COVID infections.

**Where are the above's and below's in Virginia?**

Red dots are zip codes with above average per capita infections in December.

Blue dots are zip codes with below average per capita infections in December.


In [129]:
map_abovebelow = folium.Map(location=[37.926868, -79.124902], zoom_start=7)

# add markers to map
for lat, lng, label, colorcat in zip(df_ZipsToUse['Latitude'], df_ZipsToUse['Longitude'], df_ZipsToUse['areaName'],df_ZipsToUse['aboveorbelow']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color= 'red' if colorcat == 'above' else 'blue',
        fill=True,
        fill_color = 'lightred' if colorcat == 'above' else 'lightblue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_abovebelow)  

map_abovebelow

#### Exploring nearby_venues

What are the counts of the venues in each category?


In [147]:
nearby_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,areaName,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ATM,14,14,14,14,14,14
Accessories Store,13,13,13,13,13,13
Adult Boutique,4,4,4,4,4,4
Advertising Agency,3,3,3,3,3,3
Afghan Restaurant,7,7,7,7,7,7
...,...,...,...,...,...,...
Wings Joint,46,46,46,46,46,46
Women's Store,27,27,27,27,27,27
Yoga Studio,54,54,54,54,54,54
Zoo,4,4,4,4,4,4


Here are the top 25 categories of venues in the state or Virginia:

In [137]:
nearby_venues.groupby('Venue Category').count().sort_values(by='areaName', ascending=False).head(25)

Unnamed: 0_level_0,areaName,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Pizza Place,595,595,595,595,595,595
Fast Food Restaurant,530,530,530,530,530,530
Convenience Store,494,494,494,494,494,494
American Restaurant,492,492,492,492,492,492
Sandwich Place,484,484,484,484,484,484
Coffee Shop,386,386,386,386,386,386
Discount Store,330,330,330,330,330,330
Mexican Restaurant,328,328,328,328,328,328
Hotel,291,291,291,291,291,291
Park,289,289,289,289,289,289


Here are the top 50 venue categories when the areaName has an **above average** per capita infection rate:

In [154]:
df_topVenuesAbove = nearby_venues[nearby_venues['areaName'].isin(df_ZipsToUse[df_ZipsToUse['aboveorbelow'] == 'above']['areaName'])].groupby('Venue Category').count().sort_values(by='areaName', ascending=False).head(50)
df_topVenuesAbove.reset_index(inplace=True)
df_topVenuesAbove.head(50)

Unnamed: 0,Venue Category,areaName,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude
0,Fast Food Restaurant,233,233,233,233,233,233
1,Pizza Place,224,224,224,224,224,224
2,American Restaurant,187,187,187,187,187,187
3,Sandwich Place,185,185,185,185,185,185
4,Convenience Store,176,176,176,176,176,176
5,Discount Store,145,145,145,145,145,145
6,Hotel,118,118,118,118,118,118
7,Mexican Restaurant,115,115,115,115,115,115
8,Coffee Shop,108,108,108,108,108,108
9,Pharmacy,107,107,107,107,107,107


Here are the top 50 venue categories when the areaName has an **below average** per capita infection rate:

In [155]:
df_bottomVenuesAbove = nearby_venues[nearby_venues['areaName'].isin(df_ZipsToUse[df_ZipsToUse['aboveorbelow'] == 'below']['areaName'])].groupby('Venue Category').count().sort_values(by='areaName', ascending=False).head(50)
df_bottomVenuesAbove.reset_index(inplace=True)
df_bottomVenuesAbove.head(50)

Unnamed: 0,Venue Category,areaName,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude
0,Pizza Place,371,371,371,371,371,371
1,Convenience Store,318,318,318,318,318,318
2,American Restaurant,305,305,305,305,305,305
3,Sandwich Place,299,299,299,299,299,299
4,Fast Food Restaurant,297,297,297,297,297,297
5,Coffee Shop,278,278,278,278,278,278
6,Mexican Restaurant,213,213,213,213,213,213
7,Park,199,199,199,199,199,199
8,Discount Store,185,185,185,185,185,185
9,Hotel,173,173,173,173,173,173


#### Findings after EDA

There is not much difference in the categories that make up the top 50 lists.

I will be focused on only venues that make up the top 50, so I need to filter the nearby_venues DataFrame to include only those venues.

**First**, I will get the union of top and bottom 50 venue catgories.

## Results

** TO DO**
Discuss the results.

## Discussion

** TO DO**
Discuss any observations you noted and any recommendations you can make based on the results.

## Conclusion

** TO DO**
Conclude the report.

## References
 Virgina Departments of Health: https://www.vdh.virgi
 nia.gov/coronavirus/frequently-asked-questions/general-questions/