<h1>Coursera Data Science Capstone Project</h1>

<h2>Introduction</h2>
The rise of COVID-19 cases have been rampant in the country of India, and especially in the state of Andhra Pradesh.
To try and understand why this is the case for my state, we analyze data such as number of active, confirmed and deceased cases along with the most common venues (obtained via Foursquare API) for each district to determine the reasons why the number of cases are in a steady incline.

<h2>Business Problem</h2>
Using this analysis, companies can find out which category of businesses might be a cause for rise in cases and implement necessary protocols and measures to try and mitigate their negative involvement in this pandemic.

<h2>Data</h2>
In this project I will be using Districts of Andhra Pradesh data such as its name, population and area which is scraped from Wikipedia using BeautifulSoup. This data helps us obtain few of the parameters for the analysis.<br>
We obtain the COVID-19 statistics like active, confirmed and deceased cases for each district via the covid19india.org API which which will help us fulfil the main objective of our exploration.<br>
The latitude and longitude for all the districts of obtained using Nominatim method of Geopy package.<br>
We use Foursqure API to obtain venues around the district by using the explore endpoint.<br>


Importing required Packages

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import requests, json
import folium
from geopy import Nominatim
from bs4 import BeautifulSoup
from pandas import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

We will be importing COVID cases data from <strong>covid19india.org</strong>'s API which is a reliable and regularly updated database.

In [2]:
covidInfoURL = 'https://api.covid19india.org/state_district_wise.json'

In [3]:
covid = pd.read_json(covidInfoURL)

Scraping Andhra Pradesh district data from Wikipedia using <strong>BeautifulSoup</strong>

In [4]:
districtURL = requests.get('https://en.wikipedia.org/wiki/List_of_districts_of_Andhra_Pradesh').text
soup = BeautifulSoup(districtURL, 'lxml')
districtsDf = pd.DataFrame(columns = ['Districts','Population', "Area"])
districtsTable = soup.find("table",{"class":"wikitable"})
#print(districtsTable)
for row in districtsTable.findAll('tr')[1:]:
    cells = row.findAll('td')
    districtName = str(cells[1].find(text = True)).strip()
    districtPop = str(cells[5].find(text = True)).strip()
    districtArea = str(cells[6].find(text = True)).strip()
    districtsDf = districtsDf.append({'Districts': districtName, "Population" : districtPop, "Area" : districtArea}, ignore_index = True)
districtsDf.replace(to_replace = {'YSR Kadapa district':"Y.S.R. Kadapa", "Nellore": 'S.P.S. Nellore'} ,inplace = True)
districtsDf.head(13)

Unnamed: 0,Districts,Population,Area
0,Anantapur,4083315,19130
1,Chittoor,4170468,15152
2,East Godavari,5151549,10807
3,Guntur,4889230,11391
4,Y.S.R. Kadapa,2884524,15359
5,Krishna,4529009,8727
6,Kurnool,4046601,17658
7,S.P.S. Nellore,2966082,13076
8,Prakasam,3392764,17626
9,Srikakulam,2699471,5837


Creating a new Dataframe <strong><i>districtCovidDf</i></strong> to hold all the district information along with the COVID 19 stats for each district.

In [5]:
districtCovidDf = pd.DataFrame(columns = ['District', 'Active', 'Confirmed', 'Deceased', 'Population', 'Area'])
for district, pop, area in zip(districtsDf['Districts'], districtsDf['Population'], districtsDf['Area']):
    d = covid['Andhra Pradesh']['districtData'][district]
    active = d['active']
    confirmed = d['confirmed']
    deceased = d['deceased']
    districtCovidDf = districtCovidDf.append({'District': district, 'Active': active, 'Confirmed': confirmed, 'Deceased': deceased, 'Population': pop, "Area" : area}, ignore_index = True)
districtCovidDf.replace(to_replace = {"Y.S.R. Kadapa": 'Kadapa', 'S.P.S. Nellore':'Nellore'} ,inplace = True)
districtCovidDf.head(13)

Unnamed: 0,District,Active,Confirmed,Deceased,Population,Area
0,Anantapur,2206,6266,80,4083315,19130
1,Chittoor,2521,5668,64,4170468,15152
2,East Godavari,5768,8647,82,5151549,10807
3,Guntur,3960,6913,78,4889230,11391
4,Kadapa,1655,3349,28,2884524,15359
5,Krishna,1679,4252,118,4529009,8727
6,Kurnool,3030,7797,135,4046601,17658
7,Nellore,2000,3010,22,2966082,13076
8,Prakasam,1004,2433,42,3392764,17626
9,Srikakulam,1858,3215,39,2699471,5837


Obtaining Latitude and Longitude data from GeoPy and inserting into our districeCovidDf

In [6]:
latlon = dict(Latitude = [], Longitude = [])
for d in districtCovidDf['District']:
    g = Nominatim(user_agent = 'Coursera').geocode("{}, India".format(d))
    latlon['Latitude'].append(g.latitude)
    latlon['Longitude'].append(g.longitude)
pd.DataFrame(latlon)
districtCovidDf = pd.concat([districtCovidDf, pd.DataFrame(latlon)], axis = 1)
districtCovidDf.head(13)

Unnamed: 0,District,Active,Confirmed,Deceased,Population,Area,Latitude,Longitude
0,Anantapur,2206,6266,80,4083315,19130,14.654623,77.55626
1,Chittoor,2521,5668,64,4170468,15152,13.160105,79.155551
2,East Godavari,5768,8647,82,5151549,10807,17.233496,81.722599
3,Guntur,3960,6913,78,4889230,11391,16.291519,80.454159
4,Kadapa,1655,3349,28,2884524,15359,14.475294,78.821686
5,Krishna,1679,4252,118,4529009,8727,16.669152,80.719002
6,Kurnool,3030,7797,135,4046601,17658,15.830925,78.042537
7,Nellore,2000,3010,22,2966082,13076,14.449372,79.987376
8,Prakasam,1004,2433,42,3392764,17626,15.5,79.5
9,Srikakulam,1858,3215,39,2699471,5837,18.320022,83.916077


Getting the coordinates for the state of Andhra Pradesh

In [7]:
ll = Nominatim(user_agent = 'Coursera').geocode("Andhra Pradesh, India")
AP = [ll.latitude, ll.longitude]

Let's plot all the districts on the map of Andhra Pradesh using folium and CircleMarker

In [53]:
APMap = folium.Map(AP, zoom_start = 7, tiles ="OpenStreetMap")
for district, lat, lon, active, confirmed, deceased in zip(districtCovidDf['District'], districtCovidDf['Latitude'], districtCovidDf['Longitude'],districtCovidDf['Active'], districtCovidDf['Confirmed'], districtCovidDf['Deceased']):
    tool= folium.Tooltip(text = '{} Active : {} Confirmed : {} Deceased : {}'.format(district, active,confirmed, deceased))
    folium.CircleMarker(
        location = [lat, lon],
        radius = 10,
        tooltip = tool
        
    ).add_to(APMap)
APMap

Credentials necessary to call the Foursquare API

In [9]:
CLIENT_ID = 'NPXOKEP5IZKKDZJOSZX2UF00CGKABLR2JPKHYSWAAVT5NUM3'
CLIENT_SECRET = '4LNQRWFG5D2Y4RIOSWVMA1KI22RSRDJXVGG3SJZA5KDUYFTN'
VERSION = '20180605'

Using <strong>get_category_type</strong> method we can obtain the category of each venue

In [11]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Using <strong>getNearbyVenues</strong> method we can iterate thorugh the districts and store venue data for each district given to us by Foursquare API.

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT = 250):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Let's call the getNearbyVenues method on the districtCovidDf DataFrame obtaining nearby venues in 65 Km radius with a limit of 250

In [57]:
AP_venues = getNearbyVenues(names=districtCovidDf['District'],
                                   latitudes=districtCovidDf['Latitude'],
                                   longitudes=districtCovidDf['Longitude'],
                                  radius = 65000, LIMIT = 250)


Anantapur
Chittoor
East Godavari
Guntur
Kadapa
Krishna
Kurnool
Nellore
Prakasam
Srikakulam
Visakhapatnam
Vizianagaram
West Godavari


### Categories
Let's see all the categories of businesses Foursquare API has given us.<br>
There are a total of 70 unique categories.

In [60]:
AP_venues['Venue Category'].value_counts()

Indian Restaurant                 75
Hotel                             33
Café                              22
Multiplex                         20
Train Station                     15
Ice Cream Shop                    14
Pizza Place                       13
Coffee Shop                       11
Fast Food Restaurant              10
Bakery                             9
Shopping Mall                      9
Food Court                         8
Restaurant                         8
Movie Theater                      7
Beach                              7
Vegetarian / Vegan Restaurant      7
Department Store                   6
Indie Movie Theater                6
Andhra Restaurant                  5
Breakfast Spot                     5
Mountain                           5
Bus Station                        5
Snack Place                        5
Diner                              4
Rest Area                          3
Clothing Store                     3
South Indian Restaurant            3
C

In [15]:
print(AP_venues.shape)

(384, 7)


In [16]:
AP_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anantapur,7,7,7,7,7,7
Chittoor,69,69,69,69,69,69
East Godavari,29,29,29,29,29,29
Guntur,64,64,64,64,64,64
Kadapa,5,5,5,5,5,5
Krishna,63,63,63,63,63,63
Kurnool,4,4,4,4,4,4
Nellore,22,22,22,22,22,22
Prakasam,4,4,4,4,4,4
Srikakulam,5,5,5,5,5,5


In [17]:
print('There are {} uniques categories in Andhra Pradesh.'.format(len(AP_venues['Venue Category'].unique())))

There are 70 uniques categories in Andhra Pradesh.


#### Analyzing each district

In [18]:
AP_onehot = pd.get_dummies(AP_venues[['Venue Category']], prefix="", prefix_sep="")

AP_onehot['District'] = AP_venues['District'] 

fixed_columns = [AP_onehot.columns[-1]] + list(AP_onehot.columns[:-1])
AP_onehot = AP_onehot[fixed_columns]

AP_onehot

Unnamed: 0,District,Airport,Airport Terminal,American Restaurant,Andhra Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Beach,Bed & Breakfast,Boarding House,Bookstore,Breakfast Spot,Buffet,Burger Joint,Bus Station,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Cricket Ground,Department Store,Diner,Electronics Store,Fabric Shop,Fast Food Restaurant,Food,Food Court,Fried Chicken Joint,Gym,Harbor / Marina,Historic Site,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Kids Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Motel,Motorcycle Shop,Mountain,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Outdoors & Recreation,Park,Pier,Pizza Place,Pub,Resort,Rest Area,Restaurant,Science Museum,Shopping Mall,Snack Place,South Indian Restaurant,Stadium,Steakhouse,Trail,Train Station,Vegetarian / Vegan Restaurant,Watch Shop
0,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Anantapur,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
5,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Anantapur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
7,Chittoor,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Chittoor,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Chittoor,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
AP_onehot.shape

(384, 71)

#### Next, let's group rows by districts and by taking the mean of the frequency of occurrence of each category

In [20]:
AP_grouped = AP_onehot.groupby('District').mean().reset_index()
AP_grouped

Unnamed: 0,District,Airport,Airport Terminal,American Restaurant,Andhra Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Beach,Bed & Breakfast,Boarding House,Bookstore,Breakfast Spot,Buffet,Burger Joint,Bus Station,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Cricket Ground,Department Store,Diner,Electronics Store,Fabric Shop,Fast Food Restaurant,Food,Food Court,Fried Chicken Joint,Gym,Harbor / Marina,Historic Site,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Kids Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Motel,Motorcycle Shop,Mountain,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Outdoors & Recreation,Park,Pier,Pizza Place,Pub,Resort,Rest Area,Restaurant,Science Museum,Shopping Mall,Snack Place,South Indian Restaurant,Stadium,Steakhouse,Trail,Train Station,Vegetarian / Vegan Restaurant,Watch Shop
0,Anantapur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.285714,0.0
1,Chittoor,0.0,0.014493,0.014493,0.0,0.0,0.014493,0.014493,0.0,0.014493,0.0,0.014493,0.014493,0.0,0.014493,0.014493,0.014493,0.014493,0.101449,0.028986,0.0,0.0,0.0,0.0,0.028986,0.0,0.0,0.0,0.0,0.014493,0.014493,0.0,0.0,0.0,0.014493,0.086957,0.0,0.362319,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.028986,0.0,0.0,0.014493,0.0,0.0,0.0,0.014493,0.043478,0.0,0.0,0.014493,0.043478,0.014493,0.0
2,East Godavari,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.034483,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.206897,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.137931,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.034483,0.0
3,Guntur,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.046875,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.0,0.03125,0.03125,0.0,0.015625,0.03125,0.046875,0.0,0.015625,0.015625,0.015625,0.0,0.0625,0.0,0.0,0.0,0.015625,0.0,0.0,0.09375,0.078125,0.15625,0.015625,0.0,0.0,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.046875,0.0,0.0625,0.015625,0.0,0.0,0.03125,0.015625,0.0,0.0,0.015625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0
4,Kadapa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0
5,Krishna,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.047619,0.0,0.015873,0.015873,0.047619,0.0,0.015873,0.015873,0.015873,0.0,0.063492,0.0,0.0,0.0,0.015873,0.0,0.0,0.095238,0.079365,0.15873,0.015873,0.0,0.0,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.047619,0.0,0.063492,0.015873,0.0,0.0,0.031746,0.015873,0.0,0.0,0.015873,0.0,0.031746,0.0,0.0,0.015873,0.0,0.0,0.015873,0.015873,0.0
6,Kurnool,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
7,Nellore,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.045455,0.0,0.136364,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455
8,Prakasam,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
9,Srikakulam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0


In [21]:
AP_grouped.shape

(13, 71)

#### Let's print each district along with the top 5 most common venues

In [22]:
num_top_venues = 5

for dist in AP_grouped['District']:
    print("----"+dist+"----")
    temp = AP_grouped[AP_grouped['District'] == dist].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Anantapur----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.29
1                Bed & Breakfast  0.14
2                  Train Station  0.14
3                  Movie Theater  0.14
4                  Historic Site  0.14


----Chittoor----
                     venue  freq
0        Indian Restaurant  0.36
1                     Café  0.10
2                    Hotel  0.09
3            Train Station  0.04
4  South Indian Restaurant  0.04


----East Godavari----
               venue  freq
0  Indian Restaurant  0.21
1          Multiplex  0.14
2        Pizza Place  0.07
3      Train Station  0.07
4              Hotel  0.07


----Guntur----
                  venue  freq
0     Indian Restaurant  0.16
1                 Hotel  0.09
2        Ice Cream Shop  0.08
3             Multiplex  0.06
4  Fast Food Restaurant  0.06


----Kadapa----
            venue  freq
0   Train Station   0.4
1       Multiplex   0.2
2     Coffee Shop   0.2
3  Mattress Store   0.2
4     Men's

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 5 venues for each district.

In [61]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
districts_venues_sorted = pd.DataFrame(columns=columns)
districts_venues_sorted['District'] = AP_grouped['District']

for ind in np.arange(AP_grouped.shape[0]):
    districts_venues_sorted.iloc[ind, 1:] = return_most_common_venues(AP_grouped.iloc[ind, :], num_top_venues)

districts_venues_sorted.head(13)

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Anantapur,Vegetarian / Vegan Restaurant,Historic Site,Train Station,Shopping Mall,Movie Theater
1,Chittoor,Indian Restaurant,Café,Hotel,Train Station,South Indian Restaurant
2,East Godavari,Indian Restaurant,Multiplex,Hotel,Department Store,Train Station
3,Guntur,Indian Restaurant,Hotel,Ice Cream Shop,Fast Food Restaurant,Multiplex
4,Kadapa,Train Station,Multiplex,Coffee Shop,Mattress Store,Department Store
5,Krishna,Indian Restaurant,Hotel,Ice Cream Shop,Multiplex,Fast Food Restaurant
6,Kurnool,Train Station,Indian Restaurant,Shopping Mall,Watch Shop,Diner
7,Nellore,Indian Restaurant,Shopping Mall,Pizza Place,Food Court,Multiplex
8,Prakasam,Hotel,Train Station,Food Court,Arts & Crafts Store,Diner
9,Srikakulam,Indian Restaurant,Train Station,Burger Joint,Motorcycle Shop,Watch Shop


## Clustering Districts

Run *k*-means to cluster the districts into 6 clusters.

In [39]:
kclusters = 6

AP_grouped_clustering = AP_grouped.drop(['District'], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(AP_grouped_clustering)


kmeans.labels_[0:10] 

array([3, 0, 0, 0, 2, 0, 1, 0, 4, 5])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [42]:
# add clustering labels
districts_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

AP_merged = districtCovidDf

AP_merged = AP_merged.join(districts_venues_sorted.set_index('District'), on='District')

AP_merged.head()

Unnamed: 0,District,Active,Confirmed,Deceased,Population,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Anantapur,2206,6266,80,4083315,19130,14.654623,77.55626,3,Vegetarian / Vegan Restaurant,Historic Site,Train Station,Shopping Mall,Movie Theater
1,Chittoor,2521,5668,64,4170468,15152,13.160105,79.155551,0,Indian Restaurant,Café,Hotel,Train Station,South Indian Restaurant
2,East Godavari,5768,8647,82,5151549,10807,17.233496,81.722599,0,Indian Restaurant,Multiplex,Hotel,Department Store,Train Station
3,Guntur,3960,6913,78,4889230,11391,16.291519,80.454159,0,Indian Restaurant,Hotel,Ice Cream Shop,Fast Food Restaurant,Multiplex
4,Kadapa,1655,3349,28,2884524,15359,14.475294,78.821686,2,Train Station,Multiplex,Coffee Shop,Mattress Store,Department Store


Finally, let's visualize the resulting clusters

In [62]:
map_clusters = folium.Map(location= AP, zoom_start=7)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(AP_merged['Latitude'], AP_merged['Longitude'], AP_merged['District'], AP_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster Breakdown

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories.

#### Cluster 1

In [46]:
AP_merged.loc[AP_merged['Cluster Labels'] == 0, AP_merged.columns[[0] + list(range(5, AP_merged.shape[1]))]]

Unnamed: 0,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Chittoor,15152,13.160105,79.155551,0,Indian Restaurant,Café,Hotel,Train Station,South Indian Restaurant
2,East Godavari,10807,17.233496,81.722599,0,Indian Restaurant,Multiplex,Hotel,Department Store,Train Station
3,Guntur,11391,16.291519,80.454159,0,Indian Restaurant,Hotel,Ice Cream Shop,Fast Food Restaurant,Multiplex
5,Krishna,8727,16.669152,80.719002,0,Indian Restaurant,Hotel,Ice Cream Shop,Multiplex,Fast Food Restaurant
7,Nellore,13076,14.449372,79.987376,0,Indian Restaurant,Shopping Mall,Pizza Place,Food Court,Multiplex
10,Visakhapatnam,11161,17.723128,83.301284,0,Indian Restaurant,Hotel,Café,Restaurant,Beach
11,Vizianagaram,6539,18.112082,83.40522,0,Indian Restaurant,Hotel,Café,Mountain,Restaurant
12,West Godavari,7742,17.0,81.166667,0,Indian Restaurant,Multiplex,Bakery,Café,Hotel


#### Cluster 2

In [48]:
AP_merged.loc[AP_merged['Cluster Labels'] == 1, AP_merged.columns[[0] + list(range(5, AP_merged.shape[1]))]]

Unnamed: 0,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Kurnool,17658,15.830925,78.042537,1,Train Station,Indian Restaurant,Shopping Mall,Watch Shop,Diner


#### Cluster 3

In [49]:
AP_merged.loc[AP_merged['Cluster Labels'] == 2, AP_merged.columns[[0] + list(range(5, AP_merged.shape[1]))]]

Unnamed: 0,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Kadapa,15359,14.475294,78.821686,2,Train Station,Multiplex,Coffee Shop,Mattress Store,Department Store


#### Cluster 4

In [50]:
AP_merged.loc[AP_merged['Cluster Labels'] == 3, AP_merged.columns[[0] + list(range(5, AP_merged.shape[1]))]]

Unnamed: 0,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Anantapur,19130,14.654623,77.55626,3,Vegetarian / Vegan Restaurant,Historic Site,Train Station,Shopping Mall,Movie Theater


#### Cluster 5

In [51]:
AP_merged.loc[AP_merged['Cluster Labels'] == 4, AP_merged.columns[[0] + list(range(5, AP_merged.shape[1]))]]

Unnamed: 0,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
8,Prakasam,17626,15.5,79.5,4,Hotel,Train Station,Food Court,Arts & Crafts Store,Diner


#### Cluster 6

In [52]:
AP_merged.loc[AP_merged['Cluster Labels'] == 5, AP_merged.columns[[0] + list(range(5, AP_merged.shape[1]))]]

Unnamed: 0,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Srikakulam,5837,18.320022,83.916077,5,Indian Restaurant,Train Station,Burger Joint,Motorcycle Shop,Watch Shop
