# Where to go in Washington D.C.

By Jaime Lopez - Mar. 2019

**Problem description**

Washington D.C. has many interesting things to be known. Users would like to receive advise where to go on Washington D.C., based in some preferences like coffees, museums, shopping, etc. Using the Foursquare API, a search for neighborhoods to visit will be built. This application will work clustering and ranking neighborhoods based in similarity with user preferences on things to do. Moreover, for the more similar neighborhoods, the application will show recommendation of top venues to go while visiting a neighborhood.

In [1]:
# Library configuration
import pandas as pd
import folium
import requests
import json
import ipywidgets as widgets
import numpy as np
import re
from geopy.geocoders import Nominatim
from ipywidgets import interact

## 1. Getting neighborhood data for Washington D.C.

In the portal D.C.gov the is the [Neighborhood Labels](https://opendata.dc.gov/datasets/neighborhood-labels/data) dataset. It contains the X and Y positions as well as neighborhood labels.

In [2]:
neighborhood_labels_url = 'https://opendata.arcgis.com/datasets/c4b0cd43d50949e98e57de9f22b455fc_35.csv'
neighborhood_labels_df = pd.read_csv(neighborhood_labels_url)

Exploring the Neighborhood Labels dataset

In [3]:
neighborhood_labels_df.columns

Index(['X', 'Y', 'OBJECTID', 'GIS_ID', 'NAME', 'WEB_URL', 'LABEL_NAME',
       'DATELASTMODIFIED'],
      dtype='object')

In [4]:
neighborhood_labels_df.shape

(131, 8)

In [5]:
neighborhood_labels_df.head()

Unnamed: 0,X,Y,OBJECTID,GIS_ID,NAME,WEB_URL,LABEL_NAME,DATELASTMODIFIED
0,-77.007767,38.925776,1,nhood_132,Stronghold,http://op.dc.gov,Stronghold,2003-04-10T00:00:00.000Z
1,-76.972367,38.901336,2,nhood_134,Langston,http://op.dc.gov,Langston,2003-04-10T00:00:00.000Z
2,-77.014234,38.895428,3,nhood_137,Downtown East,http://op.dc.gov,Downtown \r\nEast,2003-04-10T00:00:00.000Z
3,-77.041094,38.98679,4,nhood_029,Colonial Village,http://op.dc.gov,Colonial \r\nVillage,2003-04-10T00:00:00.000Z
4,-77.032126,38.98298,5,nhood_109,Shepherd Park,http://op.dc.gov,Shepherd \r\nPark,2003-04-10T00:00:00.000Z


As it was show, the Neighborhood Labels data set has 131 records. Columns `X`, `Y`, and `name` will be used for analysis.

In [6]:
# A local copy of the Neighborhood Labels dataset is saved in local disk
neighborhood_labels_df.to_csv('washington_dc_neighborhood_labels.csv', index=False)

## 2. Visualizing Washington D.C. neighborhoods

In this section a Washignton D.C. map will be created indicating its neighborhoods.

In [7]:
neighborhood_labels_df = pd.read_csv('washington_dc_neighborhood_labels.csv')

In [8]:
# Getting longitude and latitude for Washington D.C.
address = 'District of Columbia'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Washington D.C. area are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Washington D.C. area are 38.89366125, -76.987883253882.


In [9]:
city_map = folium.Map(location=[latitude, longitude], zoom_start=11)
for idx in range(len(neighborhood_labels_df)):
    row = neighborhood_labels_df.iloc[idx]
    label = row.NAME
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [row.Y, row.X],
        radius=3,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(city_map)  
city_map

## 3. Getting venues for neighborhoods

In this section, a list of venues for each neighborhood is gotten from the FourSquare API.

In [10]:
CLIENT_ID = 'KAJQTVLX13JDQZPLMWMBD5JQ4CJYOXXWGMZ1ZMPTEEKWKVE2' # your Foursquare ID
CLIENT_SECRET = 'LV33X5LRY1VUSMZDIHY5RCXHWRHUMJWROVXVWMOBLBLIV14J' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [11]:
def getNearbyVenues(lat, long, radius=1000, limit=500):
    venues_list=[]
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        long, 
        radius, 
        limit
    )
    results = requests.get(url).content
    return json.loads(results)['response']['groups'][0]['items']

In [13]:
venues = []
for idx in range(len(neighborhood_labels_df)):
    row = neighborhood_labels_df.iloc[idx]
    # print(row.NAME)
    lat = row.Y
    long = row.X
    r = getNearbyVenues(lat, long)
    venues.append({'name': row.NAME, 'data': r})

In [14]:
complete_venues = []
for idx in range(len(venues)):
        neighborhood = venues[idx]['name']
        for j in range(len(venues[idx]['data'])):
            v = venues[idx]['data'][j]['venue']
            categories = []
            for k in range(len(v['categories'])):
                categories.append(v['categories'][k]['name'])
            complete_venues.append({
                'neighborhood': neighborhood,
                'name': v['name'],
                'id': v['id'],
                'lat': v['location']['lat'],
                'long': v['location']['lng'],
                'address': v['location']['formattedAddress'][0],
                'categories': '|'.join(categories)
            })
venues_df = pd.DataFrame(complete_venues)

In [15]:
venues_df.columns

Index(['address', 'categories', 'id', 'lat', 'long', 'name', 'neighborhood'], dtype='object')

In [16]:
venues_df.shape

(6811, 7)

In [17]:
venues_df.head()

Unnamed: 0,address,categories,id,lat,long,name,neighborhood
0,2501 N Capitol St NE (North Capitol & Channing...,Grocery Store,4b565191f964a520000b28e3,38.922529,-77.008847,Fairway Market,Stronghold
1,"Washington, D.C.",Coffee Shop,51c18cf2498e825fe3e0053e,38.92936,-77.009999,VA Medical Center Patriot Coffee,Stronghold
2,Moving Target (Check Twitter for current locat...,Food Truck,4ffeef4de4b03c980fdd520e,38.926482,-77.014386,PhoWheels,Stronghold
3,1st Street and Michigan Avenue NW,Park,4e4195768130d0f89744065e,38.926247,-77.012122,McMillan Park Reservoir,Stronghold
4,200 Michigan Ave NE,Deli / Bodega,4b6b0b67f964a520f2ee2be3,38.929414,-77.004347,Trinity Deli,Stronghold


In [18]:
venues_df.to_csv('washington_dc_venues.csv', index=False)

The venues dataset has 6742 records.

## 4. Building a neighborhood by categories dataset

To prepare for clustering, a neighborhood by categories dataset will be built, i.e. each column will represent a category and its value will be the total venues for that category in each neighborhood.

In [19]:
neighborhood_categories = pd.crosstab(venues_df['neighborhood'], venues_df['categories'].apply(lambda x: x.split('|')[0].lower()))
neighborhood_categories.head()

categories,accessories store,afghan restaurant,african restaurant,airport lounge,american restaurant,antique shop,arcade,arepa restaurant,art gallery,art museum,...,whisky bar,wine bar,wine shop,winery,wings joint,women's store,xinjiang restaurant,yoga studio,zoo,zoo exhibit
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
16th Street Heights,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Adams Morgan,0,1,0,0,0,0,0,0,2,1,...,2,0,0,0,0,0,0,2,0,0
American University Park,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,3,0,0
Arboretum,0,0,1,0,1,1,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
Barnaby Woods,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
neighborhood_categories.to_csv('washington_dc_neighborhood_categories.csv')

## 5. Creating a categories menu for user selection

In this section a categories menu will be created for user selection interface.

In [21]:
neighborhood_categories = pd.read_csv('washington_dc_neighborhood_categories.csv')

In [22]:
categories = neighborhood_categories.columns[1:]
categories

Index(['accessories store', 'afghan restaurant', 'african restaurant',
       'airport lounge', 'american restaurant', 'antique shop', 'arcade',
       'arepa restaurant', 'art gallery', 'art museum',
       ...
       'whisky bar', 'wine bar', 'wine shop', 'winery', 'wings joint',
       'women's store', 'xinjiang restaurant', 'yoga studio', 'zoo',
       'zoo exhibit'],
      dtype='object', length=344)

In [23]:
def search_categories(word, precission=0.8):
    result = []
    for i in range(len(categories)):
        level = int(len(word) * precission)
        if categories[i].find(word[0:level].lower()) != -1:
            result.append(1)
        else:
            result.append(0)
    return np.array(result)

## 6. Finding similar neighborhood

A function to find the neighborhood with more similar venues to the user word query is build.

In [24]:
def find_similar_neighborhood(word):
    res = []
    for idx in range(len(neighborhood_categories)):
        x = search_categories(word)
        y = neighborhood_categories.iloc[idx][1:].values
        dist = np.sqrt(((x - y) ** 2).sum())
        res.append({'neighborhood': neighborhood_categories.iloc[idx]['neighborhood'], 'dist': dist})
    return pd.DataFrame(res).sort_values('dist').iloc[0]

And the next one is the user interface function.

In [25]:
def get_recommendations(word):
    a = find_similar_neighborhood(word)
    a['neighborhood'] 
    recommendations = venues_df[venues_df['neighborhood'] == a['neighborhood']]
    # recommendations = recommendations[recommendations['categories'].apply(lambda x: x.lower().find(word) != -1)]
    city_map = folium.Map(location=[recommendations.lat.mean(), recommendations.long.mean()], zoom_start=15)
    for idx in range(len(recommendations)):
        row = recommendations.iloc[idx]
        label = "<strong>%s</strong><br>%s" % (row['name'], row.address)
        label = folium.Popup(label, parse_html=False)
        folium.CircleMarker(
            [row.lat, row.long],
            radius=5,
            popup=label,
            color='green',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(city_map)  
    display(city_map)

Testing where places to visit like coffee.

In [31]:
get_recommendations('coffee')