<h2>Applied Data Science - Capstone Project: Entertainment in Neighborhoods of New Orleans<h2/>

<h3>Introduction and problem description<h3/>

For this project, I would like to group neighborhoods of New Orleans, LA, based on accessability and popularity of art and entertainment venues in that neighborhoods. I would like to be able to make recommendations of places to hang out and spend leasue time to people that would like to visit the city, based on their preferences for entertainment.

<h3>Data description<h3/>

I will use the data obtained from Wikipedia, containing a list of New Orleans neighborhoods, with coordinates for each. There are 72 neighborhoods listed on the Wiki page. From Foursquare API I will get a list of most common types of venues in each neighborhood that are related to art and entertainment and then will apply the k-means algorithm to cluster neighborhoods. For each cluster I will try to assess some characteristics. Based on those characteristics I will try to recommend which areas would be the most suitable for turists and visitors based on their preferences.

Importing libraries

In [7]:
import pandas as pd
import numpy as np

<h3>Data scarpping - Wikipedia - New Orleans Neighborhoods with coordinates <h3/> 

Importing BeautifulSoup package and opening Wikipedia URL with the table

In [1]:
from bs4 import BeautifulSoup
import urllib
from urllib.request import urlopen

In [2]:
url = "https://en.wikipedia.org/wiki/Neighborhoods_in_New_Orleans"
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')

Scrapping Wikipage: Locating table object on Wiki page and saving it to dataframe

In [9]:

table=soup.find_all('table', class_="wikitable")[0] 


rows = table.findAll('tr')

import re
list_rows = []
for row in rows:
    clean2 = re.sub('\n','',
                    str(re.sub('</td>',':',
                    str(re.sub('<td>','',
                    str(re.sub('</tr>','',
                    str(re.sub('<tr>', '',str(row))))))))))
    list_rows.append(clean2)

list_rows=list_rows[1:]


df_list = pd.DataFrame(list_rows)

df = df_list[0].str.split(':', expand=True)
df=df.iloc[0:,0:3]

df.columns=['Neighborhood', 'Longitude', 'Latitude']
df['Longitude'] = pd.to_numeric(df['Longitude'])
df['Latitude'] = pd.to_numeric(df['Latitude'] )

df.head(10)

Unnamed: 0,Neighborhood,Longitude,Latitude
0,U.S. NAVAL BASE,-90.026093,29.946085
1,ALGIERS POINT,-90.051606,29.952462
2,WHITNEY,-90.042357,29.9472
3,AUDUBON,-90.12145,29.932994
4,OLD AURORA,-90.0,29.92444
5,B. W. COOPER,-90.091753,29.951774
6,BAYOU ST. JOHN,-90.086517,29.976071
7,BEHRMAN,-90.026436,29.934817
8,BLACK PEARL,-90.134883,29.935895
9,BROADMOOR,-90.103812,29.946568


<h3>Analyzing neighborhoods of New Orleans with Foursquare API<h3/>

From Foursquare API I will obtain information about venues related to art and entertainment in each neighborhood.
Importing required libraries.

In [10]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

Plotting map of New Orleans Neighborhoods

In [11]:
#New Orleans Coordinates
latitude = 29.9511
longitude = -90.0715

map_orleans = folium.Map(location=[latitude, longitude], zoom_start=13)


# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_orleans)  
    
map_orleans

Neighborhood exploration with Foursquare API: importing libraries and setting up API credentials

In [15]:
#Foursquare Data
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Foursquare credentials
VERSION = '20180604'
LIMIT = 100
CATEGORYID="4d4b7104d754a06370d81259" #Art&Entertainment Category

Obtaining information about venues in each neighborhood of New Orleans from Foursquare API

In [13]:
#Function to explore neighborhood in New Orleans
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            CATEGORYID,
            LIMIT)
         
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#calling function on df dataframe
new_orleans_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

new_orleans_venues.head()

U.S. NAVAL BASE
ALGIERS POINT
WHITNEY
AUDUBON
OLD AURORA
B. W. COOPER
BAYOU ST. JOHN
BEHRMAN
BLACK PEARL
BROADMOOR
MARLYVILLE - FONTAINEBLEAU
GERT TOWN
MID-CITY
ST. CLAUDE
CENTRAL BUSINESS DISTRICT
FRENCH QUARTER
CENTRAL CITY
LAKE CATHERINE
VILLAGE DE LEST
VIAVANT - VENETIAN ISLES
NEW AURORA - ENGLISH TURN
TALL TIMBERS - BRECHTEL
FISCHER DEV
McDONOGH
LOWER GARDEN DISTRICT
ST. THOMAS DEV
EAST RIVERSIDE
IRISH CHANNEL
TOURO
MILAN
UPTOWN
WEST RIVERSIDE
EAST CARROLLTON
FRERET
GARDEN DISTRICT
LEONIDAS
HOLLYGROVE
TULANE - GRAVIER
TREME - LAFITTE
SEVENTH WARD
MARIGNY
ST. ROCH
DIXON
LAKEWOOD
NAVARRE
CITY PARK
LAKEVIEW
WEST END
LAKESHORE - LAKE VISTA
FILMORE
ST. BERNARD AREA
DILLARD
ST.   ANTHONY
LAKE TERRACE &amp; OAKS
MILNEBURG
PONTCHARTRAIN PARK
GENTILLY WOODS
GENTILLY TERRACE
DESIRE AREA
FLORIDA AREA
FLORIDA DEV
LOWER NINTH WARD
BYWATER
HOLY CROSS
PINES VILLAGE
PLUM ORCHARD
READ BLVD WEST
READ BLVD EAST
WEST LAKE FOREST
LITTLE WOODS
FAIRGROUNDS
IBERVILLE


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,U.S. NAVAL BASE,29.946085,-90.026093,Olive Branch B.C.,29.942959,-90.021953,General Entertainment
1,ALGIERS POINT,29.952462,-90.051606,Algiers Point,29.95141,-90.052207,Historic Site
2,ALGIERS POINT,29.952462,-90.051606,NOLA Potter,29.951946,-90.051021,Art Gallery
3,ALGIERS POINT,29.952462,-90.051606,Algiers Point Tours,29.952759,-90.052636,Tour Provider
4,ALGIERS POINT,29.952462,-90.051606,Jazz Walk Of Fame,29.952244,-90.05508,Historic Site


Dataframe preparations for k-means: grouping dataset by neighborhood and taking the mean of the frequency fo ocuurance of each venue category. 
Only top 10 venues types in each neighborhood are considered.

In [14]:
#Data frame preps for K-means
# one hot encoding
new_orleans_onehot = pd.get_dummies(new_orleans_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
new_orleans_onehot['Neighborhood'] = new_orleans_venues['Neighborhood'] 

# move neighborhood column to the first column
first_col = new_orleans_onehot .pop('Neighborhood')
new_orleans_onehot .insert(0, 'Neighborhood', first_col)

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
new_orleans_grouped = new_orleans_onehot.groupby('Neighborhood').mean().reset_index()

#function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#Top 20 venues per each neighborhood	
num_top_venues = 10

indicators = ['st', 'nd', 'rd']#used to adress 1st, 2nd or 3rd

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = new_orleans_grouped['Neighborhood']

for ind in np.arange(new_orleans_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(new_orleans_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALGIERS POINT,Historic Site,Art Gallery,Arcade,Jazz Club,Music Venue,History Museum,Theater,Tour Provider,College Academic Building,Gas Station
1,AUDUBON,Arcade,General Entertainment,Theater,Concert Hall,History Museum,Rock Club,Indie Theater,College Arts Building,Historic Site,Art Gallery
2,B. W. COOPER,Theme Park,Zoo Exhibit,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Dance Studio
3,BAYOU ST. JOHN,Art Gallery,History Museum,Concert Hall,Tour Provider,General Entertainment,Casino,Public Art,College Academic Building,College Arts Building,College Theater
4,BEHRMAN,Art Museum,Concert Hall,Zoo Exhibit,Dance Studio,College Academic Building,College Arts Building,College Theater,Comedy Club,Country Dance Club,Dive Bar


Running k-means with k=3. Neighborhoods are grouped into clusters based on the similarities in values types availability/presence.
After assessing how neighborhoods are clustered, dataframe with all neighborhoods listed, along with most common venues types and assignment to one of k-means clusters is created.

In [16]:
#K-means
# set number of clusters
kclusters = 3

new_orleans_grouped_clustering = new_orleans_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(new_orleans_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

#new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

new_orleans_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
new_orleans_merged = new_orleans_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
new_orleans_merged= new_orleans_merged[new_orleans_merged['1st Most Common Venue'].notnull()]
new_orleans_merged['Cluster Labels'] = pd.to_numeric(new_orleans_merged['Cluster Labels'],downcast='integer')
new_orleans_merged.head() 

Unnamed: 0,Neighborhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,U.S. NAVAL BASE,-90.026093,29.946085,1,General Entertainment,Zoo Exhibit,Dance Studio,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Country Dance Club,Dive Bar
1,ALGIERS POINT,-90.051606,29.952462,0,Historic Site,Art Gallery,Arcade,Jazz Club,Music Venue,History Museum,Theater,Tour Provider,College Academic Building,Gas Station
2,WHITNEY,-90.042357,29.9472,0,Performing Arts Venue,General Entertainment,Art Gallery,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall
3,AUDUBON,-90.12145,29.932994,2,Arcade,General Entertainment,Theater,Concert Hall,History Museum,Rock Club,Indie Theater,College Arts Building,Historic Site,Art Gallery
4,OLD AURORA,-90.0,29.92444,2,Theme Park,Concert Hall,Zoo Exhibit,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Dance Studio


Creating map with New Orleans neighborhoods clusters.

In [17]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(new_orleans_merged['Latitude'], new_orleans_merged['Longitude'], new_orleans_merged['Neighborhood'], new_orleans_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

<h3>Conclusions<h3/>

As the results of clusters analysis below, we can assess that in cluster #1 there are art galleries, public arts, exhibits, jazz clubs, historic sites, museums. So if someone would be interested in maybe more formal forms of entertainment, in art or historic sites, neighborhoods from cluster #1 would be appropriate places to visit or stay in. 
In cluster #2 zoo exhibits, comedy clubs, concert halls and colleage-related venues are popular, like theaters, or art bulding. Neighborhoods from that cluster may be of interest for students, concerts lovers and animals enthusiasts. 
Neighborhoods from cluster #3 would be interesting for people looking for more casual entertainment, that like music, dancing, movies, as the following types of entertainment venues are popular in cluster #3: dance studios, music venues, concert halls, indie theaters, movie theaters, country dance clubs.

Exploring cluster #1

In [18]:
#Exploring cluster #1
new_orleans_merged.loc[new_orleans_merged['Cluster Labels'] == 0, new_orleans_merged.columns[[0] + list(range(4, new_orleans_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,ALGIERS POINT,Historic Site,Art Gallery,Arcade,Jazz Club,Music Venue,History Museum,Theater,Tour Provider,College Academic Building,Gas Station
2,WHITNEY,Performing Arts Venue,General Entertainment,Art Gallery,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall
6,BAYOU ST. JOHN,Art Gallery,History Museum,Concert Hall,Tour Provider,General Entertainment,Casino,Public Art,College Academic Building,College Arts Building,College Theater
9,BROADMOOR,General Entertainment,Indie Theater,Music Venue,Zoo Exhibit,Country Dance Club,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall
14,CENTRAL BUSINESS DISTRICT,Art Gallery,Music Venue,General Entertainment,Historic Site,Jazz Club,Concert Hall,Outdoor Sculpture,Performing Arts Venue,Museum,Plaza
15,FRENCH QUARTER,Art Gallery,Jazz Club,Historic Site,History Museum,Tour Provider,Museum,Performing Arts Venue,Bar,Cajun / Creole Restaurant,Church
20,NEW AURORA - ENGLISH TURN,Theater,Music Venue,Zoo Exhibit,Country Dance Club,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Dance Studio
23,McDONOGH,Performing Arts Venue,Art Gallery,Concert Hall,Dance Studio,Gas Station,Farmers Market,Exhibit,Event Space,Dive Bar,Cocktail Bar
24,LOWER GARDEN DISTRICT,General Entertainment,Dance Studio,Performing Arts Venue,Music Venue,Historic Site,Concert Hall,Art Gallery,Rock Club,Public Art,Event Space
25,ST. THOMAS DEV,Piano Bar,Shrine,Music Venue,Public Art,Historic Site,Art Gallery,Art Museum,Coffee Shop,Gas Station,Farmers Market


Exploring cluster #2

In [19]:
new_orleans_merged.loc[new_orleans_merged['Cluster Labels'] == 1, new_orleans_merged.columns[[0] + list(range(4, new_orleans_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,U.S. NAVAL BASE,General Entertainment,Zoo Exhibit,Dance Studio,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Country Dance Club,Dive Bar
11,GERT TOWN,General Entertainment,Bowling Alley,Zoo Exhibit,Country Dance Club,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Dance Studio
45,CITY PARK,Performing Arts Venue,General Entertainment,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Zoo Exhibit
53,LAKE TERRACE &amp; OAKS,General Entertainment,College Academic Building,College Theater,Zoo Exhibit,Dance Studio,College Arts Building,Comedy Club,Concert Hall,Country Dance Club,Dive Bar


Exploring cluster #3

In [20]:
new_orleans_merged.loc[new_orleans_merged['Cluster Labels'] == 2,new_orleans_merged.columns[[0] + list(range(4, new_orleans_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,AUDUBON,Arcade,General Entertainment,Theater,Concert Hall,History Museum,Rock Club,Indie Theater,College Arts Building,Historic Site,Art Gallery
4,OLD AURORA,Theme Park,Concert Hall,Zoo Exhibit,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Dance Studio
5,B. W. COOPER,Theme Park,Zoo Exhibit,Country Dance Club,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Dance Studio
7,BEHRMAN,Art Museum,Concert Hall,Zoo Exhibit,Dance Studio,College Academic Building,College Arts Building,College Theater,Comedy Club,Country Dance Club,Dive Bar
8,BLACK PEARL,Movie Theater,General Entertainment,Baseball Stadium,Performing Arts Venue,Country Dance Club,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall
10,MARLYVILLE - FONTAINEBLEAU,Zoo Exhibit,Movie Theater,Arcade,Stadium,Salsa Club,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall
12,MID-CITY,Historic Site,General Entertainment,Movie Theater,Building,Jazz Club,Pub,Basketball Stadium,Music Venue,Bar,Concert Hall
13,ST. CLAUDE,Museum,General Entertainment,Indie Theater,Zoo Exhibit,College Academic Building,College Arts Building,College Theater,Comedy Club,Concert Hall,Country Dance Club
16,CENTRAL CITY,Public Art,Basketball Stadium,Indie Theater,Performing Arts Venue,Concert Hall,Coffee Shop,College Academic Building,College Arts Building,College Theater,Comedy Club
29,MILAN,General Entertainment,Arcade,Theater,Public Art,Zoo Exhibit,Country Dance Club,College Academic Building,College Arts Building,College Theater,Comedy Club
