### Introduction

The objective is to look at the different venues between University of Chicago and Georgia Institute of Technology (Georgia Tech) since these are universities people as myself would want to attend for a degree. Both locations have good reputations as public institutions so the problem to solve was to find out what venues are available in the areas that does not require long travel. Many people who decide to stay near a campus ride a bike or use there four legs as a mode of transportation and need close by locations for daily activities and community.


To get started, before even running any code first library modules had to be imported.

In [None]:
import pandas as pd
import numpy as np
import requests

!conda install -c conda-forge geopy --yes
import geopy as geopy
from geopy.geocoders import Nominatim 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print("Libraries Imported")

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: | 

## Data Collection

The data for the venues was retrieved using the Foursquare Places API. The Places API offers real-time access to Foursquare's global database of rich venue data and user content to power your location-based experiences.

### Foursquare Credentials

In [None]:
CLIENT_ID = 'MWBMLLHDCHCNOFVSO405PNGLHJP0EW0MAD1MEEO0SKMOFSJX'
CLIENT_SECRET = '3J3OZ5F4I5WW5OTRZOOWNMREHZEOANOANIOUC3VG2AN5DIM3'
VERSION = '20200825'

print('Credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

### Coordinates

The addresses from both institutions were retrieved from Google Knowledge Graph through search. The Geocoder module was used to get the coordinates for both institutions.

Get nearby venues by Chicago - University of Chicago
Address: 5801 S Ellis Ave, Chicago, IL 60637

In [None]:
address = "5801 S Ellis Ave, Chicago, IL 60637"

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
chi_latitude = location.latitude
chi_longitude = location.longitude
print('The geographical coordinate of University of Chicago are {}, {}.'.format(chi_latitude, chi_longitude))

In [None]:
#Map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[chi_latitude, chi_longitude], zoom_start=10)
map_chicago

Get nearby venues by Atlanta - Georgia Institute of Technology
Address: North Ave NW, Atlanta, GA 30332

In [None]:
address = "North Ave NW, Atlanta, GA 30332"

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
atl_latitude = location.latitude
atl_longitude = location.longitude
print('The geographical coordinates of Georgia Tech are {}, {}.'.format(atl_latitude, atl_longitude))

In [None]:
#Map of Atlanta using latitude and longitude values
map_atlanta = folium.Map(location=[atl_latitude, atl_longitude], zoom_start=10)
map_atlanta

## Explore Venues 

 At first I reduced the radius and then increased it to 500 in order to generate enough venues nearby the universities. The campus size of University of Chicago is 217 acres and as for Georgia Institute of Technology (Georgia Tech), its almost twice as large with over 400 acres according to Google's Knowledge Graph. I completely forgot how large universities can be so this made sense to extend the radius to an appropriate size (500). Even within a 500 radius of the area there was only 13 venue listings nearby the University of Chicago and 17 nearby Georgia Institute of Technology (Georgia Tech). I did set the limit of venues to 100 although the cap was not necessary after seeing that not many locations were not available from either institution.

In [None]:
#Limit 100
#Radius 500 miles

LIMIT = 100 
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    chi_latitude, 
    chi_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
chi_results = requests.get(url).json()
chi_results

### Explore Venues nearby Georgia Institute of Technology

In [None]:
#Limit 100
#Radius 500 miles

LIMIT = 100 
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    atl_latitude, 
    atl_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
atl_results = requests.get(url).json()
atl_results

## Data Preparation

### Venues Nearby 

The venues were grouped by categories in Foursquare to know what types are listed and available around each university. Then the venues were order by which are closer to the universities. After previewing the columns available in the JSON response I seen that we did not need all the columns. The name, categories, and coordinates should be efficient data to continue on for further data analysis.

We stored the venue results for University of Chicago and Georgia Institute of Technology (Georgia Tech) in a pandas dataframe.

Since the venues were not many I did not have to run any code to see unique or common categories and venues. There were not many similar category types. 

In [None]:
chi_venues = chi_results['response']['groups'][0]['items']
#chi_nearby_venues = pd.json_normalize(chi_venues) #flatten JSON

chi_venues_list = []
chi_venues_list.append([
    (
    v['venue']['name'], 
    v['venue']['categories'][0]['name'],
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'] 
    ) for v in chi_venues])

chi_df = pd.DataFrame([item for chi_venue_list in chi_venues_list for item in chi_venue_list])
chi_df.columns = ['Name', 'Category Type', 'Latitude', 'Longitude']
chi_df


In [None]:
print(chi_df.shape)

In [None]:
chi_df.groupby('Category Type').count()

### Categories for Venues nearby Georgia Tech

In [None]:
atl_venues = atl_results['response']['groups'][0]['items']

atl_venues_list = []
atl_venues_list.append([
    (
    v['venue']['name'], 
    v['venue']['categories'][0]['name'],
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'] 
    ) for v in atl_venues])

atl_df = pd.DataFrame([item for atl_venue_list in atl_venues_list for item in atl_venue_list])
atl_df.columns = ['Name', 'Category Type', 'Latitude', 'Longitude']
atl_df

In [None]:
print(atl_df.shape)

In [None]:
atl_df.groupby('Category Type').count()

## Data Exploration (Results)

The University of Chicago had only two repeated category types which were parks and coffee shops. 

There is a nearby bus station at the University of Chicago, so a user does not need a car or uber as much if there is a bus route and a gym, but most universities already have a recreation center that someone could use. If a break is needed a person has the choice of more than one park to walk or go running at. If they do not want to eat coffee shop food and want something more hearty there is a BBQ joint nearby.

At Georgia Institute of Technology (Georgia Tech), they have many different kind of restuarants nearby that are just not inside the college. One of the greatest perks is a music venue so people can probably get to hear local bands and if they do not want to stay at a dorm can have a night at the nearby hotel at the Hampton Inn and not too far from the hotel is a bank. 

There could be shopping center in the area.

## Data Analysis

### Maps with Markers

After looking at the venues for each university, maps were created with markers for each of the venues around each of the universities. We remove the 'Name' and 'Category Type' columns since we only need the coordinates for k-Means cluster labels. The maps will not be ordinary maps with markers, but will have clusters using k-Means with a initial k/clusters of 3. 

After creating the dataframes with the clusters we can see the clusters were color-coded and of course grouped by distances. The elbow method was not neccessary since the data points are very few. The perfect zoom size at 16 let us see some data points that were like outliers, but as you know they don't exist in k-Means. For instance, based on Google maps Leon's Barbecue takes 19 minutes by car and up to two hours walking distance. This venue is about 6.6 miles away from the University of Chicago. The venue with the farthest distance from Georgia Institute of Technology (Georgia Tech) would be Cocoa Cola Mainstreet at AOC although GT Stinger Shop was close and has it's own cluster group. The location for Coca Cola @ AOC was not easy to locate by name in Google Maps and the address was retrieved for the restaurant through Foursquare. Again, Georgia Tech wins with the farthest distance being less than 1.7 miles with only a 35 minute walk.

In [None]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

chi_df_clustering = chi_df.drop(['Name','Category Type'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chi_df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:13]

We then need to create a new dataframe with the cluster.

In [None]:
chi_df_merged = chi_df_clustering
chi_df_merged.insert(0, 'Cluster Labels', kmeans.labels_)
name_column = chi_df['Name']
chi_df_with_clusters = pd.concat([chi_df_merged,name_column], axis = 1)
chi_df_with_clusters

One of the most distance locations are two restuarants in one cluster group (green). Many woud have though the bus station may have been closer to the University of Chicago. Midway Plaisance Park is the closest data point to the college while the Huckleberry Park is one of the farthest.

In [None]:
# create map
map_clusters = folium.Map(location=[chi_latitude, chi_longitude], zoom_start=16)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chi_df_with_clusters['Latitude'], chi_df_with_clusters['Longitude'], chi_df_with_clusters['Name'], chi_df_with_clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
# set number of clusters
kclusters = 3

atl_df_clustering = atl_df.drop(['Name','Category Type'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(atl_df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:17]

We then need to create a new dataframe with the clusters.

In [None]:
atl_df_merged = atl_df_clustering
#atl_df_merged.insert(0, 'Cluster Labels', kmeans.labels_)
name_column = atl_df['Name']
atl_df_with_clusters = pd.concat([atl_df_merged,name_column], axis = 1)
atl_df_with_clusters

As you can see with the clusters, the prediction was correct that near Georgia Institute of Technology there is shopping enter based on the venues. There was a color coordinated data points and one outlier was present based on the map which is location GT Stringer Shop in its own cluster group as one data point.

In [None]:
# create map
map_clusters = folium.Map(location=[atl_latitude, atl_longitude], zoom_start=16)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(atl_df_with_clusters['Latitude'], atl_df_with_clusters['Longitude'], atl_df_with_clusters['Name'], atl_df_with_clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Conclusion

After viewing both universities and the only option was to choose one it would due to mobility it would definitely be Georgia Institute of Technology (Georgia Tech) since there is a shopping center that is walking distance from the institution. As a foodie, I would have preferred Chicago, but as you can see two of the nice restuarants are quite far. They do have the Divvy Station and bus route nearby for easy transportation if you do not want to commute by foot.

Many people have different threshold of what is too far to travel without having a bike, vehicle or public transportation. Overall the venues surrounding both institutions (University of Chicago and Georgia Institute of Technology (Georgia Tech)) are good for pedestrians who need commutable distances to get around.  