# Coursera Capstone Finale

For this project, I will cluster the top 300 universities in the United States, and then use it to find the university towns most similar to my hometown. I will also make a tool so that anyone can look up which university towns are most similar to any town they choose to search.

In [1]:
'''The code in this notebook is not immediately ready to run. 
   If you want to run this, you will need to enter your API keys, 
   but the outputs from when I ran it should still be displayed'''

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Bing # convert an address into latitude and longitude values
from geopy.geocoders import Nominatim
import requests # library to handle requests
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors
from sklearn.cluster import KMeans # import k-means from clustering stage
import folium # map rendering library
from bs4 import BeautifulSoup #web-scraping library

print('Libraries imported.')

Libraries imported.


### Collecting list of Universities

The first step is to get the website html and extract the universities from it:

In [2]:
'''Retreiving HTML'''
url = 'https://www.4icu.org/us/' # Link is complete list of accredited U.S. universities ordered by rank

response = requests.get(url, timeout=(15))

In [3]:
'''Extracting the University Names'''
soup = BeautifulSoup(response.content, parser='html5lib')

items = soup.find_all('tbody') # Finds the table of Colleges/Universities
uni_soup = BeautifulSoup(str(items[0]), parser='html5lib')

unis=[]    # initialize list of universities


for uni in uni_soup.find_all('a'):  # loop to extract name of university from all rows
    unis.append(uni.string)

In [4]:
'''Convert List to Dataframe'''
uni_df = pd.DataFrame(unis, columns=['University']) # convert list to pandas dataframe

uni_df = uni_df.head(n=300) # For efficiency, I will take just the top 300 colleges

print(uni_df.shape)
uni_df.head()

(300, 1)


Unnamed: 0,University
0,Massachusetts Institute of Technology
1,Harvard University
2,Stanford University
3,Cornell University
4,"University of California, Berkeley"


### Finding Coordinates

In [5]:
key = 'enter-your-Bing-key-here'

In [6]:
'''Collecting Coordinates for each University with Bing Maps API'''

lats = []
longs = []
fails = []

for univ in uni_df['University']:
    # print(univ) #non-essential print function to confirm that loop is iterating through postal codes
    lat_lng_coords = None
    
# loop to get the coordinates using Bing
    while(lat_lng_coords is None):
        locator = Bing(key)
        try:                          # using the while loop with try allows me to bypass Bing's timout error
            g = locator.geocode(univ)
            lat_lng_coords = (g.latitude, g.longitude)
            lats.append(lat_lng_coords[0])
            longs.append(lat_lng_coords[1])
        except:
            fails.append(univ)      # Bing was unable to ever get the coords for some universities
            break

'Collecting Coordinates for each University with Bing Maps API'

In [7]:
'''Remove columns for which Bing could not find coordinates'''

fails_index = []
for fail in fails:
    fails_index.append(uni_df.index[uni_df['University']==fail].tolist()[0])

uni_df.drop(fails_index, axis=0, inplace=True)

'Remove columns for which Bing could not find coordinates'

In [8]:
'''Add Coordinates to Dataframe'''

uni_df['Latitude'] = lats
uni_df['Longitude'] = longs

uni_df.head()

'Add Coordinates to Dataframe'

In [9]:
'''Save file for later so I don't have to run lengthy API again later'''
uni_df.to_csv('universities_with_coords.csv')

uni_df = pd.read_csv('universities_with_coords.csv', index_col=0)
uni_df.head()

Unnamed: 0,University,Latitude,Longitude
0,Massachusetts Institute of Technology,42.360001,-71.092003
1,Harvard University,42.374203,-71.116272
2,Stanford University,37.431564,-122.163628
3,Cornell University,42.433624,-76.465393
4,"University of California, Berkeley",37.869999,-122.259003


In [10]:
'''Map of USA with University Locations'''

location = 'United States'

geolocator = Nominatim(user_agent="uni_explorer")
location = geolocator.geocode(location)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of the United States are {}, {}.'.format(latitude, longitude))

uni_map = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, uni in zip(uni_df['Latitude'], uni_df['Longitude'], uni_df['University']):
    label = '{}'.format(uni)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(uni_map)  
    
uni_map

The geograpical coordinate of the United States are 39.7837304, -100.4458825.


### Using Foursquare API to find venues near universities

In [11]:
'''Foursquare Credentials'''

CLIENT_ID = 'enter-ID-here'
CLIENT_SECRET = 'enter-secret-here'
VERSION = '20180605'
LIMIT = 100

In [12]:
'''Venue search function'''

def getNearbyVenues(names, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['University', 'University Latitude', 'University Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

In [13]:
'''Search for venues around universities'''

uni_venues = getNearbyVenues(uni_df['University'], uni_df['Latitude'], uni_df['Longitude'])
uni_venues.head()

'Search for venues around universities'

In [14]:
'''Save search to CSV so I don't have to run the lengthy Foursquare API again'''

uni_venues.to_csv('raw_uni_venue_search.csv')

uni_venues = pd.read_csv('raw_uni_venue_search.csv', index_col=0)
uni_venues.head()

Unnamed: 0,University,University Latitude,University Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Massachusetts Institute of Technology,42.360001,-71.092003,Area Four,42.363096,-71.092185,Pizza Place
1,Massachusetts Institute of Technology,42.360001,-71.092003,Flour Bakery + Cafe,42.361123,-71.096521,Bakery
2,Massachusetts Institute of Technology,42.360001,-71.092003,Darwin's Ltd.,42.362402,-71.098514,Sandwich Place
3,Massachusetts Institute of Technology,42.360001,-71.092003,Izzy's Restaurant,42.366181,-71.095754,Latin American Restaurant
4,Massachusetts Institute of Technology,42.360001,-71.092003,Mamaleh's,42.365933,-71.09118,Jewish Restaurant


In [15]:
'''Get dummies and insert Universities back into dataframe'''

uni_dummies = pd.get_dummies(uni_venues[['Venue Category']], prefix="", prefix_sep="")

uni_dummies.insert(0, 'Universities', uni_venues['University'])

In [16]:
'''Group venues by university'''

uni_grouped = uni_dummies.groupby('Universities').mean().reset_index()
uni_grouped.head()

Unnamed: 0,Universities,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Service,Airport Terminal,Alternative Healer,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Assisted Living,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bavarian Restaurant,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Bulgarian Restaurant,Burger Joint,Burmese Restaurant,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Canal,Candy Store,Cantonese Restaurant,Capitol Building,Caribbean Restaurant,Casino,Castle,Cemetery,Chaat Place,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Baseball Diamond,College Basketball Court,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Football Field,College Gym,College Hockey Rink,College Library,College Quad,College Rec Center,College Residence Hall,College Soccer Field,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cooking School,Cosmetics Shop,Costume Shop,Country Dance Club,Credit Union,Creperie,Cruise Ship,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Disc Golf,Discount Store,Distillery,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Financial or Legal Service,Fish & Chips Shop,Fish Market,Fishing Spot,Fishing Store,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Football Stadium,Forest,Fountain,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Hill,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Rink,Home Service,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Inn,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lighthouse,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Mac & Cheese Joint,Malay Restaurant,Marijuana Dispensary,Marine Terminal,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Medical School,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Mongolian Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Motorsports Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,National Park,Nature Preserve,Neighborhood,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Gym,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paper / Office Supplies Store,Park,Parking,Pastry Shop,Pawn Shop,Pedestrian Plaza,Peking Duck Restaurant,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pilates Studio,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Port,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Rock Club,Romanian Restaurant,Roof Deck,Rugby Pitch,Sake Bar,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Area,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Storage Facility,Street Art,Street Fair,Street Food Gathering,Strip Club,Student Center,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Syrian Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Trade School,Trail,Train Station,Tram Station,Travel & Transport,Travel Agency,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,University,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterans' Organization,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Warehouse Store,Water Park,Waterfall,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,American University,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.02,0.0,0.0
1,Amherst College,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045977,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.022989,0.011494,0.022989,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.045977,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.011494,0.011494,0.034483,0.0,0.011494,0.0,0.0,0.0,0.0,0.022989,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.022989,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.022989,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.045977,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0
2,Appalachian State University,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.051724,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arizona State University,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
4,Auburn University,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [17]:
'''Clustering with K means'''

kmean = KMeans(n_clusters=8, random_state=10)

uni_grouped_cluster = uni_grouped.drop('Universities', axis=1)

k = kmean.fit(uni_grouped_cluster)

k.labels_

array([5, 5, 5, 1, 6, 1, 6, 2, 5, 5, 5, 1, 2, 2, 5, 6, 6, 5, 6, 6, 5, 2,
       6, 5, 1, 1, 2, 6, 5, 5, 6, 1, 6, 6, 6, 1, 5, 6, 6, 2, 5, 5, 5, 1,
       1, 0, 5, 5, 5, 5, 1, 2, 6, 1, 2, 6, 5, 6, 5, 6, 6, 2, 2, 6, 5, 5,
       5, 5, 6, 1, 5, 2, 6, 5, 5, 1, 2, 2, 2, 5, 6, 2, 1, 1, 1, 6, 5, 6,
       6, 6, 5, 5, 5, 6, 5, 5, 5, 1, 2, 6, 5, 6, 5, 6, 2, 1, 6, 6, 2, 6,
       0, 5, 2, 5, 7, 6, 5, 6, 6, 6, 2, 6, 2, 1, 1, 0, 2, 5, 1, 5, 2, 2,
       2, 2, 1, 6, 2, 6, 0, 6, 5, 0, 6, 6, 2, 5, 6, 5, 1, 6, 2, 2, 6, 6,
       1, 6, 2, 0, 2, 2, 2, 6, 2, 2, 6, 6, 5, 2, 5, 6, 2, 5, 1, 5, 6, 5,
       5, 1, 3, 6, 1, 6, 5, 1, 2, 5, 1, 1, 6, 6, 5, 0, 1, 1, 6, 5, 2, 5,
       5, 2, 2, 6, 5, 2, 6, 6, 5, 1, 2, 2, 2, 2, 6, 6, 5, 5, 5, 1, 5, 6,
       1, 5, 5, 2, 2, 5, 2, 2, 2, 2, 2, 6, 2, 2, 6, 2, 2, 1, 5, 5, 6, 5,
       6, 5, 0, 5, 6, 1, 6, 5, 2, 6, 2, 1, 5, 6, 2, 2, 6, 6, 5, 4, 6, 5,
       5, 6, 1, 0, 5, 6, 6, 6, 6, 1, 1, 6, 5, 6, 5, 6, 2, 0])

In [18]:
'''Finding the most common venues near each university'''

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['University']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
uni_venues_sorted = pd.DataFrame(columns=columns)
uni_venues_sorted['University'] = uni_grouped['Universities']

for ind in np.arange(uni_grouped.shape[0]):
    uni_venues_sorted.iloc[ind, 1:] = return_most_common_venues(uni_grouped.iloc[ind, :], num_top_venues)

uni_venues_sorted.insert(0, 'Cluster Labels', k.labels_)

uni_merged = uni_df

print(uni_df.shape)

uni_merged = uni_merged.join(uni_venues_sorted.set_index('University'), on='University')

uni_merged.dropna(inplace=True)
uni_merged.head()

(285, 3)


Unnamed: 0,University,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Massachusetts Institute of Technology,42.360001,-71.092003,5.0,Spa,Bakery,Seafood Restaurant,Hotel,Ice Cream Shop,Coffee Shop,American Restaurant,Restaurant,Plaza,Wine Shop
1,Harvard University,42.374203,-71.116272,5.0,Café,New American Restaurant,Bakery,Pizza Place,Pub,Coffee Shop,Gym,Vegetarian / Vegan Restaurant,Brewery,Ice Cream Shop
2,Stanford University,37.431564,-122.163628,5.0,Coffee Shop,Park,French Restaurant,Cosmetics Shop,Gym / Fitness Center,Ice Cream Shop,Sandwich Place,Greek Restaurant,Gym,Monument / Landmark
3,Cornell University,42.433624,-76.465393,5.0,American Restaurant,Bagel Shop,Sandwich Place,New American Restaurant,Coffee Shop,Bed & Breakfast,Pizza Place,Thai Restaurant,College Quad,Latin American Restaurant
4,"University of California, Berkeley",37.869999,-122.259003,5.0,Japanese Restaurant,Theater,Bookstore,Chinese Restaurant,Yoga Studio,Coffee Shop,Ice Cream Shop,Café,Pizza Place,Scenic Lookout


In [19]:
'''Mapping Clusters'''

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=4.5)

# set color scheme for the clusters
x = np.arange(12)
ys = [i + x + (i*x)**2 for i in range(8)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(uni_merged['Latitude'], uni_merged['Longitude'], uni_merged['University'], uni_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### The Clusters

In [20]:
uni_merged.loc[uni_merged['Cluster Labels'] == 0, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Columbia University in the City of New York,Park,Italian Restaurant,Coffee Shop,Playground,Cocktail Bar,Pizza Place,Dog Run,Southern / Soul Food Restaurant,American Restaurant,Farmers Market
15,New York University,Wine Shop,Bookstore,Bakery,Italian Restaurant,Thai Restaurant,Park,Mediterranean Restaurant,Gourmet Shop,Wine Bar,Sandwich Place
46,Washington University in St. Louis,Zoo,Park,American Restaurant,Thai Restaurant,Taco Place,Wine Shop,Sandwich Place,Rock Club,Vegetarian / Vegan Restaurant,Gastropub
77,"University of California, San Francisco",Park,Coffee Shop,Garden,Bookstore,Scenic Lookout,Bakery,Playground,Gym,Sandwich Place,Breakfast Spot
105,San Francisco State University,Park,Chinese Restaurant,Sandwich Place,Japanese Restaurant,Coffee Shop,Bakery,Mobile Phone Shop,Grocery Store,Ice Cream Shop,Clothing Store
138,The New School,Bakery,Gourmet Shop,Gym / Fitness Center,Bookstore,Wine Shop,Park,Gym,Italian Restaurant,Ice Cream Shop,Spa
153,University of San Francisco,Bakery,Bookstore,Garden,Park,Pizza Place,Sushi Restaurant,New American Restaurant,Scenic Lookout,Spa,Gift Shop
187,Pace University,Park,Bakery,Coffee Shop,Hotel,Wine Shop,Italian Restaurant,Café,Mediterranean Restaurant,Garden,Spa
191,Yeshiva University,Park,Pizza Place,Latin American Restaurant,Spanish Restaurant,Café,Wine Shop,Bakery,Mexican Restaurant,Deli / Bodega,Sandwich Place
282,School of Visual Arts,Park,Bookstore,Gourmet Shop,Bakery,Wine Shop,Gym / Fitness Center,Japanese Restaurant,Gym,Spa,Ice Cream Shop


In [21]:
uni_merged.loc[uni_merged['Cluster Labels'] == 1, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,University of Michigan,Park,Sandwich Place,Pizza Place,Sushi Restaurant,Hotel,Coffee Shop,Korean Restaurant,Café,Pharmacy,Chinese Restaurant
17,Carnegie Mellon University,Coffee Shop,Pizza Place,American Restaurant,Ice Cream Shop,Bakery,Bar,Sandwich Place,New American Restaurant,Toy / Game Store,Fried Chicken Joint
18,University of Southern California,Pizza Place,Coffee Shop,Sandwich Place,Mexican Restaurant,Science Museum,Taco Place,Café,American Restaurant,Food Truck,Food Court
21,Arizona State University,Sandwich Place,Coffee Shop,Pizza Place,Thai Restaurant,Breakfast Spot,Mexican Restaurant,American Restaurant,Bar,Park,Farmers Market
22,University of Illinois at Urbana-Champaign,Coffee Shop,Mexican Restaurant,Pizza Place,Chinese Restaurant,Park,Fast Food Restaurant,American Restaurant,Grocery Store,BBQ Joint,Bar
23,Michigan State University,Coffee Shop,Sandwich Place,Hotel,Fast Food Restaurant,Golf Course,Gym / Fitness Center,Indian Restaurant,Theater,College Cafeteria,Train Station
25,"University of California, Irvine",Coffee Shop,Sandwich Place,Fast Food Restaurant,American Restaurant,Pizza Place,Shopping Mall,Grocery Store,Park,Mexican Restaurant,Ice Cream Shop
27,"University of California, Davis",Coffee Shop,Mexican Restaurant,Grocery Store,Pizza Place,Ice Cream Shop,Stables,Park,Sandwich Place,Burger Joint,Café
36,The University of Utah,Coffee Shop,Pizza Place,New American Restaurant,Grocery Store,Sandwich Place,Bakery,Bank,Video Store,Hotel,Zoo Exhibit
38,University of Virginia,Coffee Shop,Pizza Place,Hotel,Café,BBQ Joint,American Restaurant,Park,Italian Restaurant,Mediterranean Restaurant,Movie Theater


In [22]:
uni_merged.loc[uni_merged['Cluster Labels'] == 2, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Johns Hopkins University,American Restaurant,Pizza Place,Brewery,Bar,Park,Coffee Shop,Italian Restaurant,Pharmacy,Breakfast Spot,Bakery
10,Penn State University,Pizza Place,Ice Cream Shop,Breakfast Spot,Sandwich Place,Salon / Barbershop,Gym / Fitness Center,Bar,Indian Restaurant,Sports Bar,Asian Restaurant
12,Yale University,Pizza Place,Coffee Shop,Indian Restaurant,American Restaurant,Italian Restaurant,Bar,Café,Ice Cream Shop,Food Truck,Sandwich Place
13,Purdue University,Coffee Shop,Pizza Place,Sandwich Place,Video Store,Bar,Mexican Restaurant,Hotel,Ice Cream Shop,Café,Italian Restaurant
28,Duke University,Bar,Hotel,Pizza Place,Brewery,Cocktail Bar,Gym,Coffee Shop,Ice Cream Shop,French Restaurant,BBQ Joint
30,University of North Carolina at Chapel Hill,Bar,Coffee Shop,Trail,Breakfast Spot,Gift Shop,Sandwich Place,Brewery,American Restaurant,Pizza Place,Hotel
33,The University of Arizona,Pizza Place,Sandwich Place,Bar,Ice Cream Shop,Coffee Shop,American Restaurant,Deli / Bodega,Hotel,Brewery,Video Store
39,The Ohio State University,Bar,Pizza Place,Salon / Barbershop,Breakfast Spot,Sandwich Place,Grocery Store,Hotel,Bakery,Sporting Goods Shop,Video Store
43,Indiana University Bloomington,Pizza Place,Ice Cream Shop,Fast Food Restaurant,Bar,Clothing Store,Coffee Shop,Grocery Store,Bagel Shop,Bakery,Restaurant
49,Florida State University,Sandwich Place,Pizza Place,Bar,Café,Discount Store,Coffee Shop,Fast Food Restaurant,Restaurant,Grocery Store,Gym


In [23]:
uni_merged.loc[uni_merged['Cluster Labels'] == 3, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
209,United States Military Academy,Lake,Campground,Fried Chicken Joint,Mountain,Exhibit,Eye Doctor,Event Space,Fabric Shop,Filipino Restaurant,Factory


In [24]:
uni_merged.loc[uni_merged['Cluster Labels'] == 4, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
66,Vanderbilt University,American Restaurant,Art Gallery,Intersection,Pizza Place,Fair,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Zoo Exhibit


In [25]:
uni_merged.loc[uni_merged['Cluster Labels'] == 5, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Massachusetts Institute of Technology,Spa,Bakery,Seafood Restaurant,Hotel,Ice Cream Shop,Coffee Shop,American Restaurant,Restaurant,Plaza,Wine Shop
1,Harvard University,Café,New American Restaurant,Bakery,Pizza Place,Pub,Coffee Shop,Gym,Vegetarian / Vegan Restaurant,Brewery,Ice Cream Shop
2,Stanford University,Coffee Shop,Park,French Restaurant,Cosmetics Shop,Gym / Fitness Center,Ice Cream Shop,Sandwich Place,Greek Restaurant,Gym,Monument / Landmark
3,Cornell University,American Restaurant,Bagel Shop,Sandwich Place,New American Restaurant,Coffee Shop,Bed & Breakfast,Pizza Place,Thai Restaurant,College Quad,Latin American Restaurant
4,"University of California, Berkeley",Japanese Restaurant,Theater,Bookstore,Chinese Restaurant,Yoga Studio,Coffee Shop,Ice Cream Shop,Café,Pizza Place,Scenic Lookout
8,University of Washington,Park,Italian Restaurant,Café,Vietnamese Restaurant,Seafood Restaurant,Burger Joint,Pizza Place,Ice Cream Shop,Grocery Store,Pet Store
9,University of Minnesota-Twin Cities,American Restaurant,Mobile Phone Shop,Snack Place,Sandwich Place,Coffee Shop,Shoe Store,Chinese Restaurant,Kids Store,Supplement Shop,Fast Food Restaurant
11,University of Pennsylvania,Coffee Shop,Hotel,Park,Sushi Restaurant,Seafood Restaurant,Mediterranean Restaurant,Pizza Place,Ice Cream Shop,Science Museum,Israeli Restaurant
14,The University of Texas at Austin,Sandwich Place,Bar,Food Truck,Cocktail Bar,American Restaurant,History Museum,Ice Cream Shop,BBQ Joint,Hotel,Yoga Studio
19,Princeton University,Pizza Place,Hotel,Trail,Coffee Shop,Rental Car Location,Park,Ice Cream Shop,Sushi Restaurant,New American Restaurant,Clothing Store


In [26]:
uni_merged.loc[uni_merged['Cluster Labels'] == 6, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,University of Wisconsin-Madison,Convenience Store,Bowling Alley,Gym,Discount Store,Sandwich Place,General Entertainment,Post Office,Hotel,American Restaurant,Falafel Restaurant
32,"Rutgers, The State University of New Jersey",Pizza Place,Hotel,Diner,American Restaurant,Convenience Store,Sports Bar,Breakfast Spot,Mobile Phone Shop,Deli / Bodega,Chinese Restaurant
34,University of Maryland,Pizza Place,Gas Station,Sandwich Place,Mobile Phone Shop,Video Game Store,Department Store,Gym,Grocery Store,Convenience Store,Cosmetics Shop
37,Texas A&M University,Mexican Restaurant,Fast Food Restaurant,Sandwich Place,Hotel,Pizza Place,Smoothie Shop,Bakery,Bar,Music Store,Park
51,Brigham Young University,Sandwich Place,Pizza Place,Video Store,Fast Food Restaurant,Mexican Restaurant,Ice Cream Shop,Discount Store,Café,Asian Restaurant,ATM
57,University of Rochester,Pizza Place,Sandwich Place,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Hotel,Burger Joint,Park,Discount Store,Sporting Goods Shop
59,University of South Florida,Sandwich Place,Fast Food Restaurant,Clothing Store,Theme Park,Coffee Shop,Theme Park Ride / Attraction,Pizza Place,Lingerie Store,Supplement Shop,Bar
63,University of Houston,Fried Chicken Joint,Fast Food Restaurant,Sandwich Place,Bakery,Food Service,Pizza Place,Pharmacy,Burger Joint,Discount Store,Video Store
67,University of Delaware,Pizza Place,Coffee Shop,Video Store,Ice Cream Shop,American Restaurant,Sandwich Place,Rental Car Location,Bank,Discount Store,Mexican Restaurant
71,"University of California, Riverside",Sandwich Place,Pizza Place,Mexican Restaurant,Coffee Shop,Video Store,Bank,Burger Joint,Rental Car Location,Park,Fast Food Restaurant


In [27]:
uni_merged.loc[uni_merged['Cluster Labels'] == 7, uni_merged.columns[[0] + list(range(4, uni_merged.shape[1]))]]

Unnamed: 0,University,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
145,Northern Arizona University,Theater,Home Service,Garden Center,Fair,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Zoo Exhibit,Field


In [31]:
import plotly.express as px
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output


app = JupyterDash(__name__)
app.layout = html.Div([                        # overall layout of the dashboard
    html.H1("Find Universities Similar to your Favorite City"),
    dcc.Markdown(id='unis'),
    html.Br(),
    html.H4('Please format the city as "City, State"'),
    html.Label([
        "City: ",
        dcc.Input(
            id='city',
            value='',
            placeholder='Enter town here...',
            )
    ]),
])
@app.callback(                        # callback function to update list
    Output('unis', 'children'),
    [Input("city", "value")]
)
def update_list(city):         # function that finds the cluster of the inputed city
    try:
        locator = Bing(key)            # find coordinates of city
        g = locator.geocode(city)
        lat = g.latitude
        lng = g.longitude

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(  # get venues in city
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, 3000, LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']

        venues_list = []
        venues_list.append([(city, v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) # make dataframe for venues
        nearby_venues.columns = ['University', 'Venue Category']

        combined_df = uni_venues[['University','Venue Category']].append(nearby_venues).reset_index(drop=True) # add city to dataframe of universities

        dummies = pd.get_dummies(combined_df[['Venue Category']], prefix="", prefix_sep="")  # make dummies for dataframe
        dummies.insert(0, 'City', combined_df['University'])
        city_grouped = dummies.groupby('City').mean().reset_index()

        for col in city_grouped.columns:      # checks and deletes any columns from the city that were not in the training data
            if col == 'City':
                continue
            elif col not in uni_dummies.columns:
                city_grouped.drop(col, axis=1, inplace=True)

        city_data = city_grouped[city_grouped['City']==city]  # get only the row of the city that was inputed

        city_k = k.predict(city_data.drop('City', axis=1))[0]  # find the cluster for the city
        uni_list = uni_merged.loc[uni_merged['Cluster Labels'] == city_k, uni_merged.columns[0]].head(25) # find the top 20 universities in that cluster

        html = '#### The 25 top universities with venues most similar to {} are in cluster {}:\n'.format(city, city_k) # sets up html that will list the universities
        i = 1
        for uni in uni_list:
            html = html + '\n' + str(i) + '. ' + uni
            i += 1
        
        return html

    except:
        return 'Please enter a valid city'  # if the search query errors for the inputed string, user will see this message
        
    
# Run app and display result inline in the notebook
app.run_server(mode='inline', port=1050)