# Clustering Toronto Neighbourhoods
#### Part 2: Venue Categories

What is the aim of this notebook?

## Load libraries

In [1]:
import requests # HTTP requests
import pandas as pd # Data structures

## Load data

Here we load the dataframe, tor_boro.csv, which we saved previously in the Toronto_data_prep notebook.

In [2]:
tor_boro = pd.read_csv('tor_boro.csv')

## Define functions and objects

#### Initilise cat_list

cat_list is a list of all categories and their subsequent child directories available on the FourSquare API.

It is initialised here to avoid calling the API unnecessarily.

This cell also includes foursquare API credentials which are required for calls using getNearbyVenues.


In [3]:
CLIENT_ID = 'JHD12LNKPLUI4FCOPN1Q0QAZS2CCIYWXXXYHUBOOYPD3LBRR' # Foursquare ID
CLIENT_SECRET = 'L1TC5ELHY2RAETCJOCMOF3OFA4KHWLXIHOJDFYNU0NNG14KS' # Foursquare Secret
VERSION = '20190425' # Foursquare API version

# Create categories endpoint URL
url = 'https://api.foursquare.com/v2/venues/categories??&client_id={}&client_secret={} \
        &v={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET,
        VERSION)

# Create lists of venue categories from Foursquare endpoint URL 
cat_list = requests.get(url).json()['response']['categories']

### Define _getParentCat

getParentCat is a function which returns the parent category of a venues category when found using the Foursquare app. The Foursquare Venue Category Hierarchy can be found in the Foursquare [docs](https://developer.foursquare.com/docs/build-with-foursquare/categories/).

Note that cat_list must be defined as intended for getNearbyVenueCats to work. As mentioned above, cat_list is not defined within the function to avoid unnecessarily calling the API.

In [4]:
    def _getParentCat(cat_list: list, target_id: str) -> str:

        '''This function returns the parent category given a Foursquare 'venue category'

        Note that cat_list is a list of all Foursquare categories and their subsequent 
        child directories and must be defined prior to using this function.
        '''
    
        for element in cat_list:
            if element['name'] == target_id:
                return element['name']
            else:
                if element['categories']:
                    check_child = _getParentCat(element['categories'], target_id)
                    if check_child:
                        return element['name']


### Define getNearbyVenueCats

getNearbyVenues uses the Foursquare explore endpoint and getParentCat function to return lists of neighbourhoods' venues by category.

Names, latitudes and longitudes arguments must be passed in the form of iterables. Limit and radius arguments are optional and are 100 and 500 respectively by default.

More information on the Foursquare explore endpoint can be found in the [docs](https://developer.foursquare.com/docs/api-reference/venues/explore/).

In [5]:
def getNearbyVenueCats(names, latitudes, longitudes, limit = 100, radius = 1000):
    
    '''
    Uses the Foursquare explore endpoint and getParentCat function to return lists 
    of neighbourhoods' venues by category
    '''
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):        
       # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}\
        &v={}&ll={},{}&limit={}&radius={}&sortByPopularity=1'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            limit,
            radius
            )
         
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues_list.append([(name, _getParentCat(cat_list,
                            v['venue']['categories'][0]['name'])) for v in results])
        
        cat_df = pd.DataFrame([item for venue_list in venues_list for item in venue_list],
                                columns = ['Neighbourhood','Venue_Category'])
        
        # Create frequency table of venue categories for each neighbourhood
        
        # Change category names to dummy vals
        cat_freq_dum = pd.get_dummies(cat_df.Venue_Category,prefix='',prefix_sep='')

        # Sum dummy values for each category by 
        cat_freq = pd.concat([cat_df.Neighbourhood,cat_freq_dum],
                               axis=1).groupby('Neighbourhood').sum().reset_index()
        
    return cat_freq

The above functions are used to get the dataframe we will use for analysis. It is saved as a csv for further exploration. The arguments for the getNearbyVenueCats can also be modified below.

In [6]:
names = tor_boro.Neighbourhood
lats = tor_boro.Latitude
lngs = tor_boro.Longitude
# limit = 100
# radius = 500

toronto_venues = getNearbyVenueCats(names, lats, lngs)

In [8]:
toronto_venues.to_csv('toronto_venues.csv',index=False)