# Battle of Neighbourhoods

# Introduction/ Business Problem

## Background: Booming consumerism of Paksitan

Pakistan is categorized as an emerging country with average age population of 24. With incresing income per capita and youth, consumerism is growing double digits and likely to continue so in coming years. Culturally, South Asians love food and like hanging out over lunch/ dinners and hence decent chunk of disposable income is spent on eating out. I intend to open a Thai cuisines restaurant in the capital city of Pakistan, Islamabad. With diversified population of Diplomats, goernment officials, businessmen and students, Thai cusinies are likely to do well. 

## Business Problem: Where to open restaurant in Islamabad

To succeed in food business its important to choose a location where there is more population and less restaurants. We intend to choose a location best suited for a new restaurant.

## Interested Audience

Anyone looking to open a new restaurant is a target audience. It can also help people to choose which retaurants are located where and what option do they have if they want to dine out.

# Data section

## Data Sources

1) Neighbourhoods/ Councils in Islamabad to find number and type of restaurants concertaion: https://en.wikipedia.org/wiki/Islamabad_Capital_Territory
2) Four square API data

## How data will be used

Using Foursquare API, each council will be clustered with concentration of restaurants and cuisines. The data for neighbourhoods used will be scraped through a wikipedia site (https://en.wikipedia.org/wiki/Islamabad_Capital_Territory). The processing of data will help in identifying 1) Which Council has less concentration of restaurants, 2) Have most offices and Universities and 3) Types of Cuisines. 

### First Importing required Liabraries:

In [10]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!pip install geocoder
import geocoder

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


### Scraping data

In [4]:
url= requests.get('https://en.wikipedia.org/wiki/Towns_in_Karachi#Karachi_Towns').text
soup= BeautifulSoup(url, 'html.parser')
table=soup.find('table', class_='multicol')
data= table.find_all('a')
Councils= []

for i in range(0, len(data)):
    Councils.append(data[i].text.strip())

    
Councils

['Baldia Town',
 'Bin Qasim Town',
 'Gadap Town',
 'Gulberg Town',
 'Gulshan Town',
 'Jamshed Town',
 'Kiamari Town',
 'Korangi Town',
 'Landhi Town',
 'Liaquatabad Town',
 'Lyari Town',
 'Malir Town',
 'New Karachi Town',
 'North Nazimabad Town',
 'Orangi Town',
 'Saddar Town',
 'Shah Faisal Town',
 'SITE Town']

In [5]:
df_Khi= pd.DataFrame({"Neighbourhood":Councils})
df_Khi.head()

Unnamed: 0,Neighbourhood
0,Baldia Town
1,Bin Qasim Town
2,Gadap Town
3,Gulberg Town
4,Gulshan Town


In [202]:
df_Khi.shape

(18, 1)

### Getting coordinates

In [6]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Karachi, Pakistan'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [11]:
coordinates= [get_latlng(neighborhood) for neighborhood in df_Khi["Neighbourhood"].tolist()]
coordinates

[[24.928600000000074, 66.99470000000008],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.900600000000054, 67.04750000000007],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.90560000000005, 67.08220000000006],
 [24.927440000000047, 67.03479000000004],
 [24.958150000000046, 66.99139000000008],
 [24.90560000000005, 67.08220000000006],
 [24.893500000000074, 67.17510000000004],
 [24.90560000000005, 67.08220000000006]]

### Adding coordinates 

In [12]:
df_coords=pd.DataFrame(coordinates, columns=['Latitude', 'Longitude'])

df_Khi['Latitude']=df_coords['Latitude']
df_Khi['Longitude']=df_coords['Longitude']
df_Khi.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Baldia Town,24.9286,66.9947
1,Bin Qasim Town,24.9056,67.0822
2,Gadap Town,24.9056,67.0822
3,Gulberg Town,24.9056,67.0822
4,Gulshan Town,24.9056,67.0822


In [24]:
df_Khi.nunique()

Neighbourhood    18
Latitude          6
Longitude         6
dtype: int64

### Converting into csv and double checking the coordinates as Google API sometimes return wrong coordinates. In this case it is returning 6 coordinates for 18 neighbiurhoods

In [25]:
df_Khi.to_csv("df_Khi.csv", index= False)

### Opening the cleaned, ensuring correct coordinates

In [28]:
df_khi_cleaned= pd.read_csv('df_Khi.csv')
df_khi_cleaned.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Baldia Town,24.9525,66.955
1,Bin Qasim Town,24.8596,67.4005
2,Gadap Town,25.0023,67.1321
3,Gulberg Town,24.9368,67.076
4,Gulshan Town,24.918,67.0971


#### Now we have 18 coordinates for 18 neighbourhoods

In [29]:
df_khi_cleaned.nunique()

Neighbourhood    18
Latitude         18
Longitude        18
dtype: int64

### Create a map of Karachi and neighbourhoods super imposed

In [35]:
latitude= 24.8607
longitude= 67.0011

In [36]:
map_khi=folium.Map(location=[latitude, longitude], zoom_start=10)

#superimposing neighbourhoods on the map

for lat,lng,label in zip(df_khi_cleaned['Latitude'], df_khi_cleaned['Longitude'],df_khi_cleaned['Neighbourhood']):
    label= folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3816cc',
    fill_opacity=0.7).add_to(map_khi)

map_khi



### Dialling in foursquare API

In [44]:
CLIENT_ID='YYJIOEE53UDP5MCOJFS2NGYSYP5CVOADZYJOZFQY0A55JGQB'
CLIENT_SECRET='KVRN03PBBM42K2DDYRV33WRJTK4PK1KL2ZBZNCXTCWTN1EXL'
VERSION= '20180605'

LIMIT=50
radius=500

In [39]:
#Creating function to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [46]:
khi_venues=getNearbyVenues(names=df_khi_cleaned['Neighbourhood'],
                              latitudes=df_khi_cleaned['Latitude'],
                              longitudes=df_khi_cleaned['Longitude'])

Baldia Town
Bin Qasim Town
Gadap Town
Gulberg Town
Gulshan Town
Jamshed Town
Kiamari Town
Korangi Town
Landhi Town
Liaquatabad Town
Lyari Town
Malir Town
New Karachi Town
North Nazimabad Town
Orangi Town
Saddar Town
Shah Faisal Town
SITE Town


In [47]:
khi_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Gulberg Town,24.9368,67.076,Mehmood Sweets,24.936394,67.0762,Bakery
1,Gulberg Town,24.9368,67.076,Khan Broast,24.93463,67.07323,Fast Food Restaurant
2,Gulberg Town,24.9368,67.076,Khan Snacks,24.935323,67.073149,Burger Joint
3,Gulberg Town,24.9368,67.076,Mehfooz Sheermal House,24.934723,67.07373,Bakery
4,Gulshan Town,24.918,67.0971,Habitt,24.919327,67.095432,Furniture / Home Store
5,Gulshan Town,24.918,67.0971,Chase Up,24.917045,67.096304,Department Store
6,Gulshan Town,24.918,67.0971,Sindbad Amusement Park,24.915115,67.09823,Theme Park
7,Gulshan Town,24.918,67.0971,Aziz Bhatti Park,24.914074,67.094879,Park
8,Jamshed Town,24.8702,67.0524,Noorani Kabab House,24.867698,67.052259,BBQ Joint
9,Jamshed Town,24.8702,67.0524,Ridan House of Mandi,24.871807,67.05174,Falafel Restaurant


# END OF WEEK 1

# Now Moving to Methodology and Analysis