## The Battle of Neighborhoods - Week 1
------

### Problem:

There is an enthusiastic entrepreneur who wants to create his own coffee shop in the city of Bogotá, Colombia. Colombia is a producer and consumer country of coffee, for this reason, this person wants to sell his products to people who love drink coffee and people who want to learn more about coffee, its preparation and the types of coffee that exist.

To start his business, this person is interested in opening your first store in an area where there is a high volume of people. His goal is to start a study of the city locations and their possible competitors in the market, that allows him to make a correct decision where he can open his first local.

### Data Sources:

#### City Localities
The city that I'm gonna use to this excercise is: ***Bogotá, Colombia***. The information about the locations of this city can be found in Wikipedia following the next link: <https://es.wikipedia.org/wiki/Anexo:Localidades_de_Bogot%C3%A1>

#### Foursquare
To ***Foursquare API***  will be used to gain the information about the competitors (nearby coffe shops, locations, distances, among others). For more information about Foursquare API visit <https://developer.foursquare.com/>

#### Libraries
- Pandas - Library for Data Analysis
- NumPy – Library to handle data in a vectorized manner
- JSON – Library to handle JSON files 
- Folium – Map rendering Library
- Matplotlib – Python Plotting Module 
- Geopy – To retrieve Location Data 
- Requests – Library to handle http requests
- bs4 – Scrapping tool

## Solution
------

#### Libraries

In [29]:
import folium
import requests
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize

#### Methods

In [2]:
def scrape_site(url):
    headers = requests.utils.default_headers()
    headers.update({
        'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0',
    })
    r = requests.get(url, headers)
    raw_html = r.content
    soup = BeautifulSoup(raw_html, 'html.parser')
    return soup

In [3]:
def get_table(soup, class_name):
    information = []
    table = soup.find("table", class_=class_name)
    table_rows = table.find_all('tr')
    for row in table_rows:
        info = row.text.split('\n')[1:-1]
        information.append(info)
    return information

In [4]:
def get_city_coodinates(neighborhood, city_name):
    url = 'http://nominatim.openstreetmap.org/search.php?q='+neighborhood+','+city_name+'&format=json&polygon=0'
    response = requests.get(url).json()[0]
    return response['lat'], response['lon']

In [5]:
def get_request_data(url):
    result = requests.get(url).json()
    return result

In [32]:
def foursquare_format_url(lat, lng, radius=900, limit=100):
    return 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}&categoryId={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, radius, limit, CATEGORY_ID)

def fsqfull_format_url(lat, lng, radius=900, limit=100):
    return 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, radius, limit)

In [7]:
def get_venues_info(neighborhood, lat, lng, fsq_response):   
    try:
        venues = fsq_response['response']['groups'][0]['items']
        nearby_venues = json_normalize(venues) 
        filtered_columns = ['venue.name', 'venue.location.lat', 'venue.location.lng']
        nearby_venues = nearby_venues.loc[:, filtered_columns]
        nearby_venues['Neighborhood'] = neighborhood
        return nearby_venues
    except:
        pass

#### Constraints

In [13]:
# Set Foursquare API config
CLIENT_ID = 'EXERCOQ420QGNLZ3JWZDOYYC0P2TSG3UW5VRSQMG40YJJAGL'
CLIENT_SECRET = 'DQJQABOZFJBWKBMWPTJFEUV5KNNKYFIBMOJBQRP5BSGTMHIQ'
CATEGORY_ID = '4bf58dd8d48988d16d941735' # Coffe ID Category
VERSION = '20190427' # Set version name like yearmonthday string

### Get Neighborhood information from Wikipedia

In [30]:
neighborhoods_url = "https://es.wikipedia.org/wiki/Anexo:Localidades_de_Bogot%C3%A1"
soup = scrape_site(neighborhoods_url)
bogota_df = get_table(soup, 'wikitable')
bogota_df = pd.DataFrame(bogota_df[1:], columns=bogota_df[0])
bogota_df.rename(columns={bogota_df.columns[2]: 'Neighborhood',
                            bogota_df.columns[4]: 'Postal Codes',
                            bogota_df.columns[6]: 'Surface',
                            bogota_df.columns[8]: 'Population',
                            bogota_df.columns[10]: 'Density'
                           }, 
                  inplace=True)
bogota_df = bogota_df.drop("", axis=1)
bogota_df

Unnamed: 0,Nº,Neighborhood,Postal Codes,Surface,Population,Density
0,1,Usaquén,110111-110151,65.31,501 999,7 686.4
1,2,Chapinero,110211-110231,38.15,139 701,3 661.88
2,3,Santa Fe,110311-110321,45.17,110 048,2 436.3
3,4,San Cristóbal,110411-110441,49.09,404 697,8 243.98
4,5,Usme,110511-110571,215.06,457 302,2 126.39
5,6,Tunjuelito,110611-110621,9.91,199 430,20 124.11
6,7,Bosa,110711-110741,23.93,673 077,28 126.91
7,8,Kennedy,110811-110881,38.59,1 088 443,28 205.31
8,9,Fontibón,110911-110931,33.28,394 648,11 858.41
9,10,Engativá,111011-111071,35.88,887 080,24 723.52


### Get Localities detail information

Add Latitude and Longitud for each Neighborhood

In [11]:
bogota_df['Latitude'] = 0
bogota_df['Longitude'] = 0
city_name = 'Bogota, Bogota Capital District'
for index, row in bogota_df.iterrows():    
    lat, lon = get_city_coodinates(row['Neighborhood'], city_name)
    bogota_df.loc[index, 'Latitude'] = lat
    bogota_df.loc[index, 'Longitude'] = lon
bogota_df

Unnamed: 0,Nº,Neighborhood,Postal Codes,Surface,Population,Density,Latitude,Longitude
0,1,Usaquén,110111-110151,65.31,501 999,7 686.4,4.6950465,-74.0314929
1,2,Chapinero,110211-110231,38.15,139 701,3 661.88,4.6471197,-74.0634584
2,3,Santa Fe,110311-110321,45.17,110 048,2 436.3,4.59376555,-74.0343138404862
3,4,San Cristóbal,110411-110441,49.09,404 697,8 243.98,4.5486579,-74.0474729042694
4,5,Usme,110511-110571,215.06,457 302,2 126.39,4.41113565,-74.1291076491203
5,6,Tunjuelito,110611-110621,9.91,199 430,20 124.11,4.5601479,-74.1289223837083
6,7,Bosa,110711-110741,23.93,673 077,28 126.91,4.62549175,-74.2002798089739
7,8,Kennedy,110811-110881,38.59,1 088 443,28 205.31,4.62968195,-74.1499354214614
8,9,Fontibón,110911-110931,33.28,394 648,11 858.41,4.67873705,-74.1469881692
9,10,Engativá,111011-111071,35.88,887 080,24 723.52,4.69662765,-74.1061199041956


### Get Localities Venues information

In [12]:
address = 'Bogotá, Colombia'
geolocator = Nominatim(user_agent="Bogota_Clustering")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinates of Bogotá, Colombia are {}, {}'.format(latitude, longitude))

The geograpical coordinates of Bogotá, Colombia are 4.5980772, -74.0761028


In [14]:
venues_df = pd.DataFrame()
for index, row in bogota_df.iterrows():
    api_url = foursquare_format_url(row['Latitude'], row['Longitude'])
    api_response = get_request_data(api_url)
    venues_data = get_venues_info(row['Neighborhood'], row['Latitude'], row['Longitude'], api_response)
    if venues_data is not None:
        venues_df = venues_df.append(venues_data, ignore_index=True)

In [15]:
venues_df

Unnamed: 0,venue.name,venue.location.lat,venue.location.lng,Neighborhood
0,Amor Perfecto,4.695686,-74.029604,Usaquén
1,Balsámico,4.693048,-74.032413,Usaquén
2,Catación Pública,4.695898,-74.028142,Usaquén
3,Oma,4.693270,-74.032466,Usaquén
4,Juan Valdez,4.696101,-74.032189,Usaquén
5,Shake It Funny Bar,4.694474,-74.030150,Usaquén
6,Myriam Camhi,4.693697,-74.034421,Usaquén
7,Oma Restaurante,4.692047,-74.034598,Usaquén
8,Amarti,4.695503,-74.030637,Usaquén
9,La Folie Boulangerie,4.694615,-74.031656,Usaquén


In [16]:
venues_df.columns = [col.split(".")[-1] for col in venues_df.columns]
venues_df.head()

Unnamed: 0,name,lat,lng,Neighborhood
0,Amor Perfecto,4.695686,-74.029604,Usaquén
1,Balsámico,4.693048,-74.032413,Usaquén
2,Catación Pública,4.695898,-74.028142,Usaquén
3,Oma,4.69327,-74.032466,Usaquén
4,Juan Valdez,4.696101,-74.032189,Usaquén


In [17]:
venues_df.shape

(79, 4)

In [18]:
bogota_map = folium.Map(location=[latitude, longitude], tiles="OpenStreetMap", zoom_start=11)

for lat, lng, borough, neighborhood in zip(venues_df['lat'], venues_df['lng'], venues_df['name'], venues_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='#373F51',
        fill=True,
        fill_color='#373F51',
        fill_opacity=0.3,
        parse_html=False).add_to(bogota_map)  
bogota_map

Total of Coffee Shops by Neighborhood

In [27]:
venues_df.groupby(['Neighborhood']).size().sort_values(ascending=False)

Neighborhood
La Candelaria    37
Chapinero        21
Usaquén          18
Teusaquillo       3
dtype: int64

In [21]:
bogota_df['Density'] = bogota_df['Density'].apply(lambda x: float(x.split()[0].replace(',', '')))

In [23]:
bogota_df.nlargest(5, 'Density')

Unnamed: 0,Nº,Neighborhood,Postal Codes,Surface,Population,Density,Latitude,Longitude
6,7,Bosa,110711-110741,23.93,673 077,28.0,4.62549175,-74.2002798089739
7,8,Kennedy,110811-110881,38.59,1 088 443,28.0,4.62968195,-74.1499354214614
17,18,Rafael Uribe Uribe,111811-111841,13.83,374 246,27.0,4.5734901,-74.1192075
9,10,Engativá,111011-111071,35.88,887 080,24.0,4.69662765,-74.1061199041956
14,15,Antonio Nariño,111511,4.88,109 176,22.0,4.5882529,-74.0974547


### Partial Results
------

Based on the results. We can notice that neighborhoods that have a high population density and compared by the information obtained through the Foursquare API, this neighborhoods are not likely to be areas to meet the objective set forth in this project.

A new study could be generated with neighborhoods where a greater number of coffee shops are presented and one selected. Taking into account other types of businesses that may mean customers for the business