# Capstone Project - Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

# Predicting Covid Intensive Zones in Delhi

## Table of contents
* [Introduction: Business Problem](#intro)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction/Business Problem <a name="intro"></a>

Covid-19 is an infectious disease that has caused a havoc in the modern world by disrupting the daily life of every civilian on earth. The virus which primarily affects the lungs, has infected 4,543,390 people globally and 303,711 people have lost their lives because of it (as of 15th May, 2020). The pandemic has forced world leaders to come up with stringent measures such as nation-wide lockdowns to curb the spread of the virus. But lockdowns are holding back civilians from working and earning a living, and its effect does not trouble the poor but every section of the society, even the governments themselves. In such a situation, where economies fall and countries will be heading to recessions, people might be forced to head out and work and live along with the virus. 
This causes a huge risk to countries like mine, India, with an enormous population density.   
To forsee the effects of lifting lockdowns, my research and this notebook will help people understand which neighborhoods might see a surge in Covid-19 cases. I will be using population density data and location data of popular venues and their frequency of occurence  (like popular market places) to estimate the interaction occuring at a particular place. I will be focusing on the predictions mainly in my city (Delhi, the capital of India).  
With this, I hope readers can better understand potentially risky areas and authorities can take measures to place restrictions beforehand to such areas to reduce the spread of the virus and thus the suffering to civilian life.  
I hope and pray whoever reads this is safe.

## Data <a name="data"></a>

To solve the above problem, I have made use of the following data:
* Neighborhoods of Delhi (and the basis of their segmentation) - https://en.wikipedia.org/wiki/Neighbourhoods_of_Delhi  
There are 9 districts in delhi and the neighborhoods are aggregated into one of these districts.  
This data would be used to define the neighborhoods when I will be plotting the results on the map.
I used the help of the 'search' option in Google Maps to figure out the approximate coordinates of each neighborhood and created my own dataset.
     
     
* Location data of popular venues - provided by Foursquare API  
The Foursquare API provides data related to the frequency of occurence of different venues at a particular place.  
I will be using the location data to identify clusters where proximity of venues with high footfall (like markets) are present.  
These will be identified as hotspot heighborhoods.


* Population density of various districts - Census 2011 data - https://www.census2011.co.in/census/state/districtlist/delhi.html  
The population density data would be used to mark districts with potentially high risk of community transmission due to the high proximity in living conditions.


* District-wise Covid cases (as of 15th May, 2020) - https://www.covid19india.org/state/DL  
The presently available data does not divide the total 8,895 cases (as of 15th May, 2020) district-wise entirely.  
So, I we will make-do with the present data of 788 cases. This will also contribute to linking the present situation with the future predictions.

### Let's get these datasets

In [3]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import folium

In [4]:
url = 'https://en.wikipedia.org/wiki/Neighbourhoods_of_Delhi'
html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")

In [5]:
delhi_table = soup.find_all("span",attrs={"class":"mw-headline"})
districts = []
i=1;
for v in delhi_table:
    districts.append(v.text)
    i=i+1
    if(i==10):
        break

In [6]:
districts

['North West Delhi',
 'North Delhi',
 'North East Delhi',
 'Central Delhi',
 'New Delhi',
 'East Delhi',
 'South Delhi',
 'South West Delhi',
 'West Delhi']

In [7]:
address = 'Delhi, India'

geolocator = Nominatim(user_agent="del_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Delhi are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Delhi are 28.6517178, 77.2219388.


In [8]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,District,Neighborhood,Latitude,Longitude
0,North West Delhi,Adarsh Nagar,28.71939,77.17327
1,North West Delhi,Ashok Vihar,28.68726,77.177689
2,North West Delhi,Azadpur,28.712997,77.17736
3,North West Delhi,Bawana,28.797247,77.048331
4,North West Delhi,Begum Pur,28.726457,77.064246


In [9]:
df_delhi = df_data_0

In [10]:
df_delhi.shape

(177, 4)

In [15]:
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, dist, neighborhood in zip(df_delhi['Latitude'], df_delhi['Longitude'], df_delhi['District'], df_delhi['Neighborhood']):
    label = '{}, {}'.format(neighborhood, dist)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

#### Let us now get the venues in each neighborhood

In [12]:
CLIENT_ID = 'Y3H35IKOA5URAE5CNY0CN5UACSA4BIVGWAPYFZ1TYQGOB435'
CLIENT_SECRET = 'F0LOPS1EHYFZPIC3I5OSQ2FMOVN0CHA5VIQ3SRDMLYHZQW1E'
VERSION = '20200511' 
LIMIT = 50
radius = 1000

In [13]:
# Function for getting venues by the neighborhood, latitiute, longitude and radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [101]:
delhi_venues = getNearbyVenues(names=df_delhi['Neighborhood'],
                                   latitudes=df_delhi['Latitude'],
                                   longitudes=df_delhi['Longitude']
                                  )


Naraina
Palam
Rama Krishna Puram
Rajokri
Rangpuri
Sagar Pur
Vasant Kunj
Vasant Kunj Mall Road
Vasant Vihar
Ashok Nagar
Bali Nagar
Fateh Nagar
Janakpuri
Kirti Nagar
Meera Bagh
Moti Nagar
Partap Nagar
Paschim Vihar
Patel Nagar
Punjabi Bagh
Rajouri Garden
Shivaji Place
Tihar Village
Tilak Nagar
Uttam Nagar
Vikas Nagar
Vikaspuri


In [22]:
map_delhi_venues = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, ven, cat in zip(delhi_venues['Venue Latitude'], delhi_venues['Venue Longitude'], delhi_venues['Venue'], delhi_venues['Venue Category']):
    label = '{}, {}'.format(ven,cat)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi_venues)  
    
map_delhi_venues

###### It is noted that Foursquare API doesnt have much details reagrding venues in Delhi, therefore this is just an approximate picture of the real world. But we can surely say that it is a good approximation and will serve our need.

### Now lets scrape the district-wise population density data

In [35]:
url = 'https://www.census2011.co.in/census/state/districtlist/delhi.html'
html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")

In [36]:
density_t = soup.find("table")
density_headings = density_t.find_all("th")
den_headings = []
for v in density_headings:
    den_headings.append(v.text)

In [66]:
delhi_table_data = density_t.find_all("tr")
table_data = []
for v in delhi_table_data:
    if(v!=delhi_table_data[0]):
        t_row = {}
        for td,h in zip(v.find_all("td"),den_headings):
            t_row[h] = td.text.replace('\n', '').strip()
        table_data.append(t_row)

In [67]:
den_delhi = pd.DataFrame(table_data)
den_delhi

Unnamed: 0,#,Density,District,Increase,Literacy,Population,Sex Ratio,Sub-Districts
0,1,8254.0,North West Delhi,27.81 %,84.45 %,3656539.0,865.0,List
1,2,11060.0,South Delhi,20.51 %,86.57 %,2731929.0,862.0,List
2,3,19563.0,West Delhi,19.46 %,86.98 %,2543243.0,875.0,List
3,4,5446.0,South West Delhi,30.65 %,88.28 %,2292958.0,840.0,List
4,5,36155.0,North East Delhi,26.78 %,83.09 %,2241624.0,886.0,List
5,6,27132.0,East Delhi,16.79 %,89.31 %,1709346.0,884.0,List
6,7,14557.0,North Delhi,13.62 %,86.85 %,887978.0,869.0,List
7,(adsbygoogle = window.adsbygoogle || []).push(...,,,,,,,
8,8,27730.0,Central Delhi,-9.91 %,85.14 %,582320.0,892.0,List
9,9,4057.0,New Delhi,-20.72 %,88.34 %,142004.0,822.0,List


Dropping the row the add info. Also we just want the district-wise population density.

In [68]:
den_delhi.dropna(inplace=True)

In [69]:
den_delhi = den_delhi[['District','Density']]
den_delhi

Unnamed: 0,District,Density
0,North West Delhi,8254
1,South Delhi,11060
2,West Delhi,19563
3,South West Delhi,5446
4,North East Delhi,36155
5,East Delhi,27132
6,North Delhi,14557
8,Central Delhi,27730
9,New Delhi,4057


### Finally, lets obtain the current district-wise distribution of Covid-19 cases

In [70]:
url = 'https://www.covid19india.org/state/DL'
html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")

In [None]:
cases = soup.find("table")
cases_head = cases.find_all("tr",attrs={"class":"tr-heading"})
cases_headings = []
for v in cases_head:
    cases_headings.append(v.text)

## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>