# Caursera Capstone Project - Battle of Neighborhood
This is the capstone project created to fulfill the requirements of the Data Science Professional Certificate offered by IBM through Coursera. The idea is to use the techniques learned throughout the specialization to solve a real-world problem using data science. 

## Contents
### 1. Introduction - Five Ws
&emsp; 1.1 What is the problem? <br>
&emsp; 1.2 Where is this? <br>
&emsp; 1.3 When is this applicable? <br>
&emsp; 1.4 Why do we do this? <br>
&emsp; 1.5 Who cares? <br>

### 2. Data Selection and clean up


## Contents
### 1. Introduction - Five Ws
#### &emsp; 1.1 What is the problem?
I work in downtown Memphis, TN. In general, I live 30 mins away from work, when there is no traffic. However, since I am commuting when most of the people are commuting to there work, there is almost always traffic on the roads. So it is easily 45-60 mins one-way trip. Being optimistic, considering one-way is 45 mins, it is 90 mins for a round trip. For a week with 5 workdays,  it is 7.5 hours, for a month it is 30 hours, for a year it is 16 days. So for a given year, I am wasting full 16 days counting days and nights riding my car wasting my time. 

**So the problem is, where I should move to save some time from driving but still have good amenities such as restaurants, cafes, parks, shopping, etc within reachable distance.**

#### &emsp; 1.2 Where is this?
It is Memphis, Tennesse, where birds sing and elephants bath, just kidding. But it is definitely is a fantastic place to live. There are tons of things to do around here. The population was roughly 700 k and rising. The job market is good. Most people are friendly. You should have some BBQs around the town, you won't leave Memphis.

#### &emsp; 1.3 When is this applicable?
I know this is a changing world! The time will change everything. The time of this analysis is August 2019. So don't blame me if you decided to move based on this data analysis in 2050. But the good thing is, I developed the program to pull the latest data. So if you re-run the program in 2050, you should be (may be...) fine? 

#### &emsp; 1.4 Why do we do this?
It is to primarily to save time. I am spending so much time on the road, 16 full days per year! just to commute.  People say time is money. So it is to save me some money. I am sure if you are in the same boat, following this, you might able to save some money with this. Who doesn't like saving money for next cruise trip? Waight, is someone paying me when I save my own time? Ney.. I will use this saved time to play with my daughters. Not everything is money. I think I have bipolar disorder.

#### &emsp; 1.5 Who cares?
Do you even here me? It is to save money (really the time). If you are someone who cares about saving money (time), you should read. If you have plenty of those lying around that you don't know what to do, this is not for you. You should spend some money buying a boat and traveling the world instead of reading this. 


## 2. Data Selection and clean up
Obviously, we need data to analyze. This section will gather all the required data and do the clean-up job so that the data are usable. 

### 2.1 Load the required libraries.
I will start by importing some libraries. These libraries are not nessasarry use by this section. But to keep it clean, I will just do that.

In [1]:
# Install beautiul soup 4 (If not already installed!)
# This package will be used for web scraping.
try:
    from bs4 import BeautifulSoup as bs
    print('Beautiful Soup is ready for your service!')
except:
    !conda install -c anaconda beautifulsoup4 -y
    from bs4 import BeautifulSoup as bs
    print('Beautiful Soup is installed and ready for your service!')

Beautiful Soup is ready for your service!


In [2]:
# Install module to convert an address into latitude and longitude values
try:
    from geopy.geocoders import Nominatim 
    from geopy import distance
    print('GeoCorder is ready for your service!')
except:
    !conda install -c conda-forge geopy --yes 
    from geopy.geocoders import Nominatim 
    from geopy import distance
    print('Geocorder is installed and ready for your service!')
    

GeoCorder is ready for your service!


In [3]:
# Install Folium - the map rendering library
try:
    import folium
    print('Folium is ready for your service!')
except:
    !conda install -c conda-forge folium=0.5.0
    import folium
    print('Geocorder is installed and ready for your service!')

Folium is ready for your service!


In [4]:
# import libraries
import pandas as pd
import requests as rs
import numpy as np

import matplotlib.cm as cm
import matplotlib.colors as colors

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


### 2.2 Get the required data
Let's start thinking about data.


In [5]:
# Following wikipedia page has all the neighborhood data. Let's use Beautiful Soup to get the required data.
url = 'https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Memphis,_Tennessee'
source = rs.get(url)
soup = bs(source.content,'lxml')
 # print(soup.prettify())

In [6]:
tocs = soup.body.findAll('span',{'class' :'toctext' })

# Obtain burrows
burrow = []

for item in tocs:
    burrow.append(item.text)

print(burrow)
# Remove last item ('see also')
burrow = burrow[0:-1]
print(burrow)
    
    

['Downtown', 'Midtown', 'University District', 'East Memphis', 'North Memphis', 'Northeast', 'South Memphis', 'Southeast', 'East Parkway District', 'See also']
['Downtown', 'Midtown', 'University District', 'East Memphis', 'North Memphis', 'Northeast', 'South Memphis', 'Southeast', 'East Parkway District']


In [7]:
contents = soup.body.findAll('ul')

# Create array with burrow and neighborhood data
neighborhood = []

for i in range(1,len(burrow)+1):
    for litag in contents[i].findAll('li'):
        #print(litag.text)
        neighborhood.append([burrow[i-1],litag.text])

#print(neighborhood)

df = pd.DataFrame(neighborhood, columns=['Burrow','Neighborhood'])
df

Unnamed: 0,Burrow,Neighborhood
0,Downtown,Central Business District
1,Downtown,Edge District
2,Downtown,Harbor Town
3,Downtown,Linden
4,Downtown,Medical District
5,Downtown,Pinch District
6,Downtown,South Forum
7,Downtown,South Main Arts District
8,Downtown,Speedway Terrace
9,Downtown,Uptown/Greenlaw


In [9]:
# Create a function to reutn the distance from work
geolocator = Nominatim(user_agent="foursquare_agent", format_string="%s, Memphis TN" )

address = '1003 Monroe Ave'
location = geolocator.geocode(address)
lat0= location.latitude
lon0 = location.longitude
address0 = location.address
print(lat0, lon0)
print(address0)


35.1389307755102 -90.0281921632653
1003, Monroe Avenue, Medical District, Memphis, Shelby County, Tennessee, 38104, USA


In [10]:
def find_distance(lat,lon):
    p1 = (lat0,lon0)
    p2 = (lat,lon)
    #print(p1,p2)
    return distance.distance(p1,p2).miles

In [11]:


Lat = []
Lon =[]
address = []
distance_to_work = []

for i in range(df.shape[0]):
    try:
        location = geolocator.geocode(df['Neighborhood'][i])
        print('.', end = ' ')
    except:
        location = None
    
    if location != None:
        Lat.append(location.latitude)
        Lon.append(location.longitude)
        address.append(location.address)
        distance_to_work.append(find_distance(location.latitude,location.longitude))
    else:
        Lat.append('NaN')
        Lon.append('NaN')
        address.append('NaN')
        distance_to_work.append('NaN')
        


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

In [12]:
df['Latitude'] = Lat
df['Longitude'] = Lon
df['Address'] = address
df['miles_to_work'] = distance_to_work
df

Unnamed: 0,Burrow,Neighborhood,Latitude,Longitude,Address,miles_to_work
0,Downtown,Central Business District,,,,
1,Downtown,Edge District,35.1598,-90.0543,"Harbor Edge Drive, Cotton Row Historic Distric...",2.05913
2,Downtown,Harbor Town,,,,
3,Downtown,Linden,35.1344,-90.0127,"Linden Avenue, Medical District, Memphis, Shel...",0.933456
4,Downtown,Medical District,35.142,-90.0303,"Medical District, Memphis, Shelby County, Tenn...",0.242472
5,Downtown,Pinch District,,,,
6,Downtown,South Forum,,,,
7,Downtown,South Main Arts District,,,,
8,Downtown,Speedway Terrace,,,,
9,Downtown,Uptown/Greenlaw,,,,
