# Capstone Project - Venues near by NBA arenas (Week 5)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
    * [Highest and lowest capacity](#capacity)
    * [Oldest and newest arenas](#age)
    * [States represented](#state)
    * [Visualization](#visual)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

This project is about to find out if there is any certain type of venues that are located near by the **National Basketball Association** arenas where the teams are playing their **home games**. In this project we will not only find out which are the **most popular venues**. The teams are representing big cities but here we will also find out which teams are maybe playing their home games in a more scattered area.

This might be a good opportunity for certain sort of restaurants or bars to know if they have a demand close to big arenas and sport attractions. Another possible outcome would be for real estate businesses to find opportunities.

To answer this question we need to find the geographic location of these arenas as well as use the Foursquare location data to get the venues.

## Data <a name="data"></a>

In this part we describe the dataset that will be used in this project. The list of the National Basketball Association arenas has been downloaded from Wikipedia (https://en.wikipedia.org/wiki/List_of_National_Basketball_Association_arenas) and the geographic coordinates have been provided by https://www.alltopsights.com/.

We start by reading the csv-file from the computer and look at the dataset.

In [1]:
import numpy as np
import pandas as pd
nba = pd.read_csv(r"NBA_coord.csv", encoding = "ISO-8859-1",sep=";")
nba = pd.DataFrame(nba)
nba

Unnamed: 0,Arena,Location,State,Latitude,Longitude,Team(s),Capacity,Opened
0,American Airlines Arena,Miami,Florida,25.781389,-80.188056,Miami Heat,19600,1999
1,American Airlines Center,Dallas,Texas,32.790556,-96.810278,Dallas Mavericks,19200,2001
2,Amway Center,Orlando,Florida,28.539167,-81.383611,Orlando Magic,18846,2010
3,AT&T Center,San Antonio,Texas,29.426944,-98.435832,San Antonio Spurs,18418,2002
4,Bankers Life Fieldhouse,Indianapolis,Indiana,39.763889,-86.155556,Indiana Pacers,17923,1999
5,Barclays Center,Brooklyn,New York,40.682661,-73.975225,Brooklyn Nets,17732,2012
6,Capital One Arena,Washington,D.C.,38.898125,-77.02088,Washington Wizards,20356,1997
7,Chase Center,San Francisco,California,37.768056,-122.3875,Golden State Warriors,18064,2019
8,Chesapeake Energy Arena,Oklahoma City,Oklahoma,35.457998,-97.508998,Oklahoma City Thunder,18203,2002
9,FedExForum,Memphis,Tennessee,35.138333,-90.050556,Memphis Grizzlies,17794,2004


We see that the dataset contains the **name** of the arena, the **city** where it is located, the **state** of the city as well as the **coordinates** and the teams that are playing there. Additionally we can see the **capacity** and the **year** the arena became the home field. All teams have their own arena except for the Los Angeles-based Lakers and Clippers that are both using Staples Center as their home field. A quick comment is, that many of these arenas are also used by other teams in other leagues.

### Highest and lowest capacity <a name="capacity"></a>

We observe that the **Chicago Bulls'** home arena United Center holds the strongest capacity of 20 917 people while the **New Orleans Pelicans'** Smoothie King Center has a capacity of 16 867 people.

In [2]:
print("The arena with the biggest capacity is:",
      nba[['Arena','Team(s)','Location','State','Capacity','Opened']].loc[nba['Capacity'].idxmax()],sep='\n')
print("----")
print("The arena with the biggest capacity is:",
      nba[['Arena','Team(s)','Location','State','Capacity','Opened']].loc[nba['Capacity'].idxmin()],sep='\n')

The arena with the biggest capacity is:
Arena       United Center
Team(s)     Chicago Bulls
Location          Chicago
State            Illinois
Capacity            20917
Opened               1994
Name: 26, dtype: object
----
The arena with the biggest capacity is:
Arena       Smoothie King Center
Team(s)     New Orleans Pelicans
Location             New Orleans
State                  Louisiana
Capacity                   16867
Opened                      1999
Name: 18, dtype: object


### Oldest and newest arenas <a name="age"></a>

We also observe that the Chase Center of the **Golden State Warriors** in San Fransisco is the most recently opened in **2019** while the **New York Knicks'** legendary Madison Square Garden is the oldest from **1968**.

In [3]:
print("The newest arena is:",
      nba[['Arena','Team(s)','Location','State','Capacity','Opened']].loc[nba['Opened'].idxmax()],sep='\n')
print("----")
print("The oldest arena is:",
      nba[['Arena','Team(s)','Location','State','Capacity','Opened']].loc[nba['Opened'].idxmin()],sep='\n')


The newest arena is:
Arena                Chase Center
Team(s)     Golden State Warriors
Location            San Francisco
State                  California
Capacity                    18064
Opened                       2019
Name: 7, dtype: object
----
The oldest arena is:
Arena       Madison Square Garden
Team(s)           New York Knicks
Location            New York City
State                    New York
Capacity                    19812
Opened                       1968
Name: 13, dtype: object


### States represented <a name="state"></a>

Below we learn that **23 states** are represented in the league and that the state of **Texas** and the state of **California** have three teams each. The state of **Florida** and the state of **New York** have two teams, the remaining states have a single team.

In [24]:
grouped = nba.groupby("State",sort=False)["Team(s)"].count()
print("Number of states represented:",len(grouped),"\n")
print("The states in descending order:\n",grouped.sort_values(ascending=False).head(6))

Number of states represented: 23 

The states in descending order:
 State
 Texas         3
 California    3
 Florida       2
 New York      2
 Oregon        1
 Indiana       1
Name: Team(s), dtype: int64


### Visualization <a name="visual"></a>

Let's visualize a map over **North America** with the red spots representing each one arena. A quick overlook tells us that most of the arenas are located towards the **East Coast**.

In [25]:
import folium

In [26]:
map_nba = folium.Map(location=[40,-100], zoom_start=4)

for lat, lng, state, team in zip(nba['Latitude'], nba['Longitude'], nba['State'], 
                                           nba['Team(s)']):
    label = '{}, {}'.format(team, state)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#FF4040',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nba)  
    
map_nba

## Methodology <a name="methodology"></a>

In this project we will study areas close to arenas where NBA teams are playing their home games. We will look at places in the category "Food" and by using **Foursquare categorization** identify these. We will focus on the restaurants with high density, since we are looking for business opportunities. We will limit our results to within a radius of **500 meters** and pick out only the top 20 venues in the category **"Food"**.

In our first step we use our dataset and the coordinates of the arenas to locate venues within 500 meters from the arena and set the limit to 20. We first perform an example of one arena, the **Unites Center** of the **Chicago Bulls**.

Secondly we create a loop to perform the same method for each arena separate. We will first look for immediate differences if there are any arenas that doesn't reach the limit within the radius. 

After this we will check the frequency around each arena and put them all together to find out which venues are the most popular over all.

## Analysis <a name="analysis"></a>

### Example

To be able to continue we load the packages required for foursquare and input the personal ID's. First we are going to make an example out of the area near by the home arena of the Chicaco Bulls.

In [27]:
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

In [29]:
#hidden CLIENT_ID  
#hidden CLIENT_SECRET
VERSION = '20180605' 


We obtain the latitude and longitude values for the arena.

In [78]:
lat = nba.loc[26,'Latitude']
lon = nba.loc[26,'Longitude']
nba[['Arena','Latitude','Longitude']].loc[nba['Team(s)'] == 'Chicago Bulls']



Unnamed: 0,Arena,Latitude,Longitude
26,United Center,41.880556,-87.674167


In [88]:
LIMIT = 20 # limit of number of venues returned by Foursquare API

radius = 500 # define radius
#Category ID for "Food" taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):
categoryID = '4d4b7105d754a06374d81259'
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    lon,
    categoryID,
    radius, 
    LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?&client_id=YD10VKEJURAHPXLUPGAJRGBHNMMUMJ5W3MBTRHFSU23LEA5Q&client_secret=UXS10GHOFJYIXWZ1W5LIW3SWQ5HBXJ4ZLF2RJ1315XBBQKGS&v=20180605&ll=41.880556,-87.67416700000001&categoryId=4d4b7105d754a06374d81259&radius=1000&limit=20'

In [89]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f0436fb3ff71342711127fc'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Near West Side',
  'headerFullLocation': 'Near West Side, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'query': 'food',
  'totalResults': 48,
  'suggestedBounds': {'ne': {'lat': 41.88955600900001,
    'lng': -87.66210152683232},
   'sw': {'lat': 41.87155599099999, 'lng': -87.6862324731677}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '587049ab4f417a4e33681969',
       'name': 'Eden',
       'location': {'address': '1748 W Lake St',
        'lat': 41.88530693017767,
        'lng': -87.671171341429,
        'labeled

In [86]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [90]:
import warnings
warnings.filterwarnings('ignore')

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

nearby_venues

20 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,Eden,New American Restaurant,41.885307,-87.671171
1,Pita Heaven,Mediterranean Restaurant,41.879511,-87.66878
2,Palace Grill,Diner,41.881625,-87.662672
3,Big Delicious Planet Catering & Canteen,Café,41.889174,-87.674844
4,Park Tavern,Gastropub,41.877401,-87.668705
5,La Lagartija Taqueria,Taco Place,41.879228,-87.666831
6,Kaiser Tiger,Gastropub,41.884057,-87.663051
7,Sinha,Brazilian Restaurant,41.878893,-87.67739
8,Subway,Sandwich Place,41.881274,-87.665121
9,Chipotle Mexican Grill,Mexican Restaurant,41.872368,-87.677238


### Perfoming on the entire dataset

Loop to perform the previous step for all arenas together.

In [97]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lati, long in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lati, 
            long,
            categoryID,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lati, 
            long, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Team', 
                  'Team Latitude', 
                  'Team Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [98]:
venues = getNearbyVenues(names=nba['Team(s)'],
                                   latitudes=nba['Latitude'],
                                   longitudes=nba['Longitude']
                                  )


Miami Heat
Dallas Mavericks
Orlando Magic
San Antonio Spurs
Indiana Pacers
Brooklyn Nets
Washington Wizards
Golden State Warriors
Oklahoma City Thunder
Memphis Grizzlies
Milwaukee Bucks
Sacramento Kings
Detroit Pistons
New York Knicks
Portland Trail Blazers
Denver Nuggets
Cleveland Cavaliers
Toronto Raptors
New Orleans Pelicans
Charlotte Hornets
Los Angeles Clippers/Los Angeles Lakers
Atlanta Hawks
Phoenix Suns
Minnesota Timberwolves
Boston Celtics
Houston Rockets
Chicago Bulls
Utah Jazz
Philadelphia 76ers


In [99]:
print(venues.shape)
venues.head()

(513, 7)


Unnamed: 0,Team,Team Latitude,Team Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Miami Heat,25.781389,-80.188056,Biscayne Bayfront,25.78003,-80.188185,American Restaurant
1,Miami Heat,25.781389,-80.188056,Five Guys,25.778438,-80.186768,Burger Joint
2,Miami Heat,25.781389,-80.188056,Pucci's Pizza,25.782944,-80.189953,Pizza Place
3,Miami Heat,25.781389,-80.188056,Tuyo Restaurant,25.778542,-80.190433,New American Restaurant
4,Miami Heat,25.781389,-80.188056,Bubba Gump Shrimp Co.,25.778315,-80.187068,Seafood Restaurant


We observe, that there are five arenas that does not even have the limit of 20 food categorized venues near by.

In [108]:
venues.groupby("Team",sort=False)["Venue"].count().sort_values(ascending=False)

Team
Philadelphia 76ers                         20
Utah Jazz                                  20
Dallas Mavericks                           20
Orlando Magic                              20
Indiana Pacers                             20
Brooklyn Nets                              20
Washington Wizards                         20
Golden State Warriors                      20
Memphis Grizzlies                          20
Milwaukee Bucks                            20
Sacramento Kings                           20
New York Knicks                            20
Portland Trail Blazers                     20
Cleveland Cavaliers                        20
Toronto Raptors                            20
Charlotte Hornets                          20
Los Angeles Clippers/Los Angeles Lakers    20
Atlanta Hawks                              20
Phoenix Suns                               20
Minnesota Timberwolves                     20
Boston Celtics                             20
Houston Rockets              

These are the five places:

In [109]:
venues.groupby("Team",sort=False)["Venue"].count().sort_values(ascending=False).tail(5)

Team
Detroit Pistons         18
Denver Nuggets          15
Chicago Bulls           12
San Antonio Spurs        5
New Orleans Pelicans     3
Name: Venue, dtype: int64

In [110]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 69 uniques categories.


We check the frequency of each venue by arena and notice there are many different options.

In [114]:

onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")


onehot['Team'] = venues['Team'] 


fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

onehot.head()

grouped = onehot.groupby('Team').mean().reset_index()
grouped



Unnamed: 0,Team,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brasserie,Brazilian Restaurant,Breakfast Spot,...,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Atlanta Hawks,0.15,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Boston Celtics,0.1,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Brooklyn Nets,0.05,0.0,0.0,0.05,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.05,0.0
3,Charlotte Hornets,0.1,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.15,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
4,Chicago Bulls,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.083333,0.0,0.0
5,Cleveland Cavaliers,0.1,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.05
6,Dallas Mavericks,0.1,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0
7,Denver Nuggets,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Detroit Pistons,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0
9,Golden State Warriors,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0


To make it easier to spot the most popular venues over the whole field we look for the maximum frequency of each venue and pick out the top 10.

In [180]:
frequencies = grouped.loc[:, grouped.columns != 'Team'].max().sort_values(ascending=False).to_frame()
frequencies.head(10)

Unnamed: 0,0
Pizza Place,0.4
Restaurant,0.333333
American Restaurant,0.333333
Food Truck,0.333333
Burger Joint,0.2
Steakhouse,0.2
New American Restaurant,0.2
Mexican Restaurant,0.2
Fast Food Restaurant,0.2
Taco Place,0.166667


To get a better picture we also take a look at the least common venues to see if there is anything odd.

In [181]:
frequencies.tail(10)


Unnamed: 0,0
Eastern European Restaurant,0.05
Ethiopian Restaurant,0.05
Falafel Restaurant,0.05
Latin American Restaurant,0.05
Food,0.05
Kebab Restaurant,0.05
Food Stand,0.05
Irish Pub,0.05
Indian Restaurant,0.05
Wings Joint,0.05


## Results and discussion <a name="results"></a>

We see from our dataset that popular food places in the area of sport arenas are not the healthies ones. Pizza places are the most popular one tightly followed by other fast food places. These venues names doesn't tell the whole truth about what they are serving, and the category "Restaurant" could mean a lot of different things. It is however logical that before or after a game the paying spectator might not be out looking for something too fany either. More exotic places like "Latin American Restaurants" and "Indian Restaurants" we find on the bottom of the list which might be a sign that once you go to an American game you probably also want your food to be in the same style.

When it comes to the venues near by the arenas that did not pick up even 20 restaurants the reason behind this would require more exploring. One quick guess, is that the arenas might not even be in a location with too much other activities around and hence people going to the game are not even expecting to find something else there. This doesn't mean that there wouldn't be a demand on expanding your chain to one of these locations. 

There are 69 unique venue categories, which is a whole lot. This means that there might also be categories that are typical in a certain state and would probably not work as well in other states if they expanded there. The possibility is always there, but this is not the platform that would give the whole answer to the story. Especially when looking at the venues around the home arenas of the San Antonio Spurs and New Orleans Pelicans it either tells that there are just many similar restaurants around or that there is not such a big range of options.

## Conclusion <a name="conclusion"></a>

If you are looking for a business opportunity to expand your restaurant in the neighborhood of a sports arena places like **Madison Square Garden** and **Staples Center** might not be the best options since there are already so much to go for. The possibilities around **New Orleans**, **San Antonio** and **Chicago** would be the best places to start.

If you are owning some exotic place the best option would probably to locate elsewhere, but if you have something fast and typically american these locations is where you might find you next hot spot.