<h1 align=center><font size = 5> Capstone Project: The Battle of Neighborhoods

<h2 align=center><font size = 4> Question: Where is the best place to visit a museum in the State of Ohio, United States

<h2 align=left><font size = 4> Introduction

In this project / report, museums are defined as institutions, including nonprofit organizations, government entities, and private businesses, collect and care for objects of cultural, artistic, scientific, or historical interest and make their collections or related exhibits available for public viewing.  Museums that exist only in cyberspace or on the Internet (i.e., virtual museums) are not included.  

When I was in college back to 30 years ago, when video games, internet and web surfing are in their infancies, physically visiting museums was a big deal.  That being said, museums in metropolitan area, such as Chicago, New York and Washington D.C., still draw a lot visitors, especially during the weekend.  In contrast, museums in Ohio, except National Museum of the United States Air Force located in Dayton (https://en.wikipedia.org/wiki/National_Museum_of_the_United_States_Air_Force), few museums are known even for most people who even live in Ohio.  In searching for a project for Applied Data Science Capstone, I feel it is useful to explore what museums in Oho and what they may offer.   

Through on-line searching, there are 468 museums displayed through the state of Ohio and museum guide available for more than 250 museums through internet, http://www.museumstuff.com/states/OH.php.  In this report, I used data analyses and visualization, including Foursquare API analyses, for analyzing a small set of the data.

In [149]:
# Include libries for the project
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json #

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# for webscraping import Beautiful Soup 
import bs4   # from http://beautiful-soup-4.readthedocs.io/en/latest/
from bs4 import BeautifulSoup

import xml

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


<h2 align=left><font size = 4> Data

In order to visualize the museums in different counties, two sets of data were chosen for this project.  One is the list of museums in the state of Ohio and the other is the coordinates of each county.  The coordinates of the township of each county were chosen to represent the coordinate of each county. 

In [150]:
url = 'https://en.wikipedia.org/wiki/List_of_museums_in_Ohio'

In [151]:
def scrape_table_bs4(cname,cols):
    page  = urllib.request.urlopen(url).read()
    soup  = bs4.BeautifulSoup(page,'lxml')
    table = soup.find("table",class_=cname)
    header = [head.findAll(text=True)[0].strip() for head in table.find_all("th")]
    data   = [[td.findAll(text=True)[0].strip() for td in tr.find_all("td")]
              for tr in table.find_all("tr")]
    data    = [row for row in data if len(row) == cols]
    # Store data to this temporary dataframe
    raw_df = pd.DataFrame(data,columns=header)
    return raw_df

In [152]:
# Parsing using xpath
import lxml.html as lh
import urllib.request

def scrape_table_lxml(XPATH,cols):
    page = requests.get(url)
    doc = lh.fromstring(page.content)
    table_content = doc.xpath(XPATH)
    for table in table_content:
        headers = [th.text_content().strip() for th in table.xpath('//th')]
        headers = headers[0:3]
        data    = [[td.text_content().strip() for td in tr.xpath('td')] 
                   for tr in table.xpath('//tbody/tr')]
        data    = [row for row in data if len(row) == cols]
        raw_df = pd.DataFrame(data,columns = headers)  
        return raw_df  # Transform the data in the table on Wikipedia into a pandas dataframe.

In [153]:
#Test in beautifulSoup
raw_MuseumsInOhio = scrape_table_bs4("wikitable",6)
print("# Museums in Ohio")
print(raw_MuseumsInOhio.info(verbose=True))

# Museums in Ohio
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 469 entries, 0 to 468
Data columns (total 6 columns):
Name         469 non-null object
Town/city    469 non-null object
County       469 non-null object
Region       469 non-null object
Type         469 non-null object
Summary      469 non-null object
dtypes: object(6)
memory usage: 22.1+ KB
None


Clean up the Data

In [147]:
MuseumsInOhio = raw_MuseumsInOhio
MuseumsInOhio.head()

Unnamed: 0,Museums,Town_city,County,Region,Type,Summary
0,103rd Ohio Volunteer Infantry Museum,Sheffield Lake,Lorain,Northeast,Civil War,Open by appointment
1,1810 House,Portsmouth,Scioto,Southwest,Historic house,Operated by the Scioto County Historical Society
2,A.B. Graham Memorial Center,Conover,Miami,Southwest,Biographical,Honors
3,Adena Mansion,Chillicothe,Ross,Southwest,Historic house,Early 19th-century mansion on 300 acres (1.2 km
4,AHA! A Hands-On Adventure,Lancaster,Fairfield,Central,Children's,


In [7]:
# Rename Name to Name of Museums
MuseumsInOhio.rename(columns={'Name':'Museums'}, inplace=True)
MuseumsInOhio.rename(columns={'Town/city':'Town_city'}, inplace=True)
MuseumsInOhio.head()

Unnamed: 0,Museums,Town_city,County,Region,Type,Summary
0,103rd Ohio Volunteer Infantry Museum,Sheffield Lake,Lorain,Northeast,Civil War,Open by appointment
1,1810 House,Portsmouth,Scioto,Southwest,Historic house,Operated by the Scioto County Historical Society
2,A.B. Graham Memorial Center,Conover,Miami,Southwest,Biographical,Honors
3,Adena Mansion,Chillicothe,Ross,Southwest,Historic house,Early 19th-century mansion on 300 acres (1.2 km
4,AHA! A Hands-On Adventure,Lancaster,Fairfield,Central,Children's,


In [8]:
MuseumsInOhio.shape

(469, 6)

In [9]:
MuseumsInOhio.dtypes.to_frame(name='Data Type')

Unnamed: 0,Data Type
Museums,object
Town_city,object
County,object
Region,object
Type,object
Summary,object


In [10]:
# Catagories
MuseumsInOhio = MuseumsInOhio.groupby(['County', 'Town_city', 'Region'])['Museums'].apply(', '.join).reset_index()
MuseumsInOhio.sort_values('County')
MuseumsInOhio.head()

Unnamed: 0,County,Town_city,Region,Museums
0,Adams,Peebles,Southwest,Serpent Mound
1,Adams,Winchester,Southwest,National Barber Museum and Hall of Fame
2,Allen,Delphos,Northwest,"Delphos Canal Commission Museum, Delphos Museu..."
3,Allen,Lima,Northwest,"Allen County Museum, ArtSpace Lima"
4,Ashland,Ashland,Northeast,Ashland County Historical Society Museum


In [11]:
MuseumsInOhio.shape

(253, 4)

In [12]:
# Museums in each county
MuseumsInOhio1 = MuseumsInOhio.groupby(['County'])['Museums'].apply(', '.join).reset_index()
MuseumsInOhio1.head()

Unnamed: 0,County,Museums
0,Adams,"Serpent Mound, National Barber Museum and Hall..."
1,Allen,"Delphos Canal Commission Museum, Delphos Museu..."
2,Ashland,"Ashland County Historical Society Museum, Cleo..."
3,Ashtabula,"Ashtabula Maritime Museum, Blakeslee Log Cabin..."
4,Athens,"Dairy Barn Arts Center, Kennedy Museum of Art,..."


In [13]:
MuseumsInOhio1.shape

(76, 2)

In [14]:
# Suggestion from other people
OhioCounty = pd.read_html('https://github.com/jotang1/Capstone_Project/blob/master/County_Coords_2.csv')  
CountyCoord = OhioCounty[0]
CountyCoord = CountyCoord.dropna(how='all', axis='columns')
CountyCoord.head()

Unnamed: 0,County,Latitude,Longitude
0,Adams,39.0317,-83.4221
1,Allen,40.5318,-84.235
2,Ashland,41.0358,-82.25
3,Ashtabula,41.584,-81.0
4,Athens,39.3322,-82.1606


In [15]:
# Checking the data type
CountyCoord.dtypes.to_frame(name='Data Type')

Unnamed: 0,Data Type
County,object
Latitude,float64
Longitude,float64


<h2 align=left><font size = 4> Methodologies </font></h2> 

In order to visulaize the approximate locaiton of museums in counties of Ohio, I merged musteum list and county coordinates in one table. 

In [16]:
# Mwrging two Tables
OhioMuseums = MuseumsInOhio.merge(CountyCoord, on='County', how='left')
OhioMuseums.head()

Unnamed: 0,County,Town_city,Region,Museums,Latitude,Longitude
0,Adams,Peebles,Southwest,Serpent Mound,39.0317,-83.4221
1,Adams,Winchester,Southwest,National Barber Museum and Hall of Fame,39.0317,-83.4221
2,Allen,Delphos,Northwest,"Delphos Canal Commission Museum, Delphos Museu...",40.5318,-84.235
3,Allen,Lima,Northwest,"Allen County Museum, ArtSpace Lima",40.5318,-84.235
4,Ashland,Ashland,Northeast,Ashland County Historical Society Museum,41.0358,-82.25


In [17]:
OhioMuseums.shape

(253, 6)

In [18]:
OhioMuseums.to_csv('OhioMuseums.csv',index=False)

In [19]:
# First, to find the coordinate of Ohio
address = 'Ohio, United States'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Ohio are {}, {}.'.format(latitude, longitude))

  from ipykernel import kernelapp as app


The geograpical coordinate of Ohio are 40.2253569, -82.6881395.


In [20]:
Map_Ohio = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, county, town_city, region, museums in zip(OhioMuseums['Latitude'], OhioMuseums['Longitude'], OhioMuseums['County'], 
                                           OhioMuseums['Town_city'], OhioMuseums['Museums'], OhioMuseums['Region']):
    label = '{}, {}'.format(county, museums, town_city, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(Map_Ohio)

Map_Ohio

Since there are more than 450 museums in Ohio, in this report I analyzed a smnll set of the data by selecting counties that have museums with Ohio as part of their name.  

In [134]:
# Find our how many counties have museums with "Ohio" in the name
OhioData = OhioMuseums[OhioMuseums['Museums'].str.contains("Ohio")].reset_index(drop=True)
print(OhioData.shape)
OhioData.head(17)

(18, 6)


Unnamed: 0,County,Town_city,Region,Museums,Latitude,Longitude
0,Athens,Athens,Southeast,"Dairy Barn Arts Center, Kennedy Museum of Art,...",39.3322,-82.1606
1,Brown,Ripley,Southwest,"John P. Parker House, John Rankin House, Ohio ...",39.1515,-84.02
2,Darke,Bradford,Southwest,Bradford Ohio Railroad Museum,40.0208,-84.4843
3,Fairfield,Lancaster,Central,"AHA! A Hands-On Adventure, Decorative Arts Cen...",39.5627,-82.5039
4,Franklin,Bexley,Central,"Ohio Governor's Mansion, Schumacher Gallery",40.0835,-83.1538
5,Franklin,Canal Winchester,Central,Mid-Ohio Historical Museum,40.0835,-83.1538
6,Franklin,Hilliard,Central,"Early Television Museum, Hilliard Ohio Histori...",40.0835,-83.1538
7,Franklin,Worthington,Central,"Doll Museum at the Old Rectory, Ohio Railway M...",40.0835,-83.1538
8,Greene,Xenia,Southwest,"Greene County Ohio Historical Society Museum, ...",39.5102,-84.0648
9,Lake,Mentor,Northeast,"Indian Museum of Lake County, Ohio, James A. G...",41.5113,-81.292


Using FourSquare API to find these venues around the museums:

In [21]:
# Added my Foursquare credentials:
CLIENT_ID = 'NNQNTNBYGXJR3DWV2O02E4S2NCR2KZOYB2G41554SZ1L3C1H' # your Foursquare ID
CLIENT_SECRET = '0GVNS3SIGZZ1VETN3WTBLOCNITKOTMUG3ZGVUMGFTETXBJFO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NNQNTNBYGXJR3DWV2O02E4S2NCR2KZOYB2G41554SZ1L3C1H
CLIENT_SECRET:0GVNS3SIGZZ1VETN3WTBLOCNITKOTMUG3ZGVUMGFTETXBJFO


In [78]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Museum', 
                  'Museum Latitude', 
                  'Museum Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [79]:
OhioMuseums = OhioData
ohio_venues = getNearbyVenues(names = OhioMuseums['Museums'],
                                   latitudes = OhioMuseums['Latitude'],
                                   longitudes = OhioMuseums['Longitude']
                                  )

Dairy Barn Arts Center, Kennedy Museum of Art, Southeast Ohio History Center
John P. Parker House, John Rankin House, Ohio Tobacco Museum, Ripley Museum
Bradford Ohio Railroad Museum
AHA! A Hands-On Adventure, Decorative Arts Center of Ohio, Georgian Museum, Ohio Glass Museum, Sherman House Museum
Ohio Governor's Mansion, Schumacher Gallery
Mid-Ohio Historical Museum
Early Television Museum, Hilliard Ohio Historical Society Museum
Doll Museum at the Old Rectory, Ohio Railway Museum, Orange Johnson House
Greene County Ohio Historical Society Museum, Xenia Station
Indian Museum of Lake County, Ohio, James A. Garfield National Historic Site, Mentor Safety Village
103rd Ohio Volunteer Infantry Museum
Northern Ohio Railway Museum
Ohio Valley River Museum
Ohio's Small Town Museum
Bible Walk, Kingwood Center, Mansfield Art Center, Mansfield Fire Museum, Mansfield Memorial Museum, Oak Hill Cottage Museum, Ohio State Reformatory
1810 House, Southern Ohio Museum
American Civil War Museum of Ohio

In [80]:
print(ohio_venues.shape)
ohio_venues.head()

(25, 7)


Unnamed: 0,Museum,Museum Latitude,Museum Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"AHA! A Hands-On Adventure, Decorative Arts Cen...",39.5627,-82.5039,Starbucks Coffee,39.562269,-82.505975,Coffee Shop
1,"AHA! A Hands-On Adventure, Decorative Arts Cen...",39.5627,-82.5039,Rest Area 10-27 (Eastbound),39.563629,-82.509339,Rest Area
2,"AHA! A Hands-On Adventure, Decorative Arts Cen...",39.5627,-82.5039,Rest Area 10-28 (Westbound),39.564337,-82.508093,Rest Area
3,"AHA! A Hands-On Adventure, Decorative Arts Cen...",39.5627,-82.5039,old mans caves,39.5637,-82.5081,Trail
4,"AHA! A Hands-On Adventure, Decorative Arts Cen...",39.5627,-82.5039,Rest Area,39.563676,-82.509205,Rest Area


In [81]:
ohio_venues.groupby('Museum').count()

Unnamed: 0_level_0,Museum Latitude,Museum Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Museum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"AHA! A Hands-On Adventure, Decorative Arts Center of Ohio, Georgian Museum, Ohio Glass Museum, Sherman House Museum",5,5,5,5,5,5
"Doll Museum at the Old Rectory, Ohio Railway Museum, Orange Johnson House",5,5,5,5,5,5
"Early Television Museum, Hilliard Ohio Historical Society Museum",5,5,5,5,5,5
Mid-Ohio Historical Museum,5,5,5,5,5,5
"Ohio Governor's Mansion, Schumacher Gallery",5,5,5,5,5,5


In [82]:
print('There are {} uniques categories.'.format(len(ohio_venues['Venue Category'].unique())))

There are 8 uniques categories.


In [128]:
# one hot encoding
ohio_onehot = pd.get_dummies(ohio_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ohio_onehot['Museum'] = ohio_venues['Museum'] 

# move neighborhood column to the first column
fixed_columns = [ohio_onehot.columns[-1]] + list(ohio_onehot.columns[:-1])
ohio_onehot = ohio_onehot[fixed_columns]

print(ohio_onehot.shape)
ohio_onehot.head(25)

(25, 9)


Unnamed: 0,Museum,Coffee Shop,Disc Golf,Electronics Store,Gym,IT Services,Park,Rest Area,Trail
0,"AHA! A Hands-On Adventure, Decorative Arts Cen...",1,0,0,0,0,0,0,0
1,"AHA! A Hands-On Adventure, Decorative Arts Cen...",0,0,0,0,0,0,1,0
2,"AHA! A Hands-On Adventure, Decorative Arts Cen...",0,0,0,0,0,0,1,0
3,"AHA! A Hands-On Adventure, Decorative Arts Cen...",0,0,0,0,0,0,0,1
4,"AHA! A Hands-On Adventure, Decorative Arts Cen...",0,0,0,0,0,0,1,0
5,"Ohio Governor's Mansion, Schumacher Gallery",0,1,0,0,0,0,0,0
6,"Ohio Governor's Mansion, Schumacher Gallery",0,0,0,0,0,1,0,0
7,"Ohio Governor's Mansion, Schumacher Gallery",0,0,1,0,0,0,0,0
8,"Ohio Governor's Mansion, Schumacher Gallery",0,0,0,1,0,0,0,0
9,"Ohio Governor's Mansion, Schumacher Gallery",0,0,0,0,1,0,0,0


In [130]:
ohio_grouped = ohio_onehot.groupby('Museum').mean().reset_index()
print(ohio_grouped.shape)
ohio_grouped

(5, 9)


Unnamed: 0,Museum,Coffee Shop,Disc Golf,Electronics Store,Gym,IT Services,Park,Rest Area,Trail
0,"AHA! A Hands-On Adventure, Decorative Arts Cen...",0.2,0.0,0.0,0.0,0.0,0.0,0.6,0.2
1,"Doll Museum at the Old Rectory, Ohio Railway M...",0.0,0.2,0.2,0.2,0.2,0.2,0.0,0.0
2,"Early Television Museum, Hilliard Ohio Histori...",0.0,0.2,0.2,0.2,0.2,0.2,0.0,0.0
3,Mid-Ohio Historical Museum,0.0,0.2,0.2,0.2,0.2,0.2,0.0,0.0
4,"Ohio Governor's Mansion, Schumacher Gallery",0.0,0.2,0.2,0.2,0.2,0.2,0.0,0.0


In [92]:
num_top_venues = 5
for neigh in ohio_grouped['Museum']:
    print("----"+neigh+"----")
    temp = ohio_grouped[ohio_grouped['Museum'] == neigh].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AHA! A Hands-On Adventure, Decorative Arts Center of Ohio, Georgian Museum, Ohio Glass Museum, Sherman House Museum----
               venue  freq
0          Rest Area   0.6
1        Coffee Shop   0.2
2              Trail   0.2
3          Disc Golf   0.0
4  Electronics Store   0.0


----Doll Museum at the Old Rectory, Ohio Railway Museum, Orange Johnson House----
               venue  freq
0          Disc Golf   0.2
1  Electronics Store   0.2
2                Gym   0.2
3        IT Services   0.2
4               Park   0.2


----Early Television Museum, Hilliard Ohio Historical Society Museum----
               venue  freq
0          Disc Golf   0.2
1  Electronics Store   0.2
2                Gym   0.2
3        IT Services   0.2
4               Park   0.2


----Mid-Ohio Historical Museum----
               venue  freq
0          Disc Golf   0.2
1  Electronics Store   0.2
2                Gym   0.2
3        IT Services   0.2
4               Park   0.2


----Ohio Governor's Mansion, S

In [93]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [120]:
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Museum']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
museums_venues_sorted = pd.DataFrame(columns=columns)
museums_venues_sorted['Museum'] = ohio_grouped['Museum']

for ind in np.arange(ohio_grouped.shape[0]):
   museums_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ohio_grouped.iloc[ind, :], num_top_venues)

museums_venues_sorted.shape

(5, 9)

In [121]:
museums_venues_sorted.head()

Unnamed: 0,Museum,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,"AHA! A Hands-On Adventure, Decorative Arts Cen...",Rest Area,Trail,Coffee Shop,Park,IT Services,Gym,Electronics Store,Disc Golf
1,"Doll Museum at the Old Rectory, Ohio Railway M...",Park,IT Services,Gym,Electronics Store,Disc Golf,Trail,Rest Area,Coffee Shop
2,"Early Television Museum, Hilliard Ohio Histori...",Park,IT Services,Gym,Electronics Store,Disc Golf,Trail,Rest Area,Coffee Shop
3,Mid-Ohio Historical Museum,Park,IT Services,Gym,Electronics Store,Disc Golf,Trail,Rest Area,Coffee Shop
4,"Ohio Governor's Mansion, Schumacher Gallery",Park,IT Services,Gym,Electronics Store,Disc Golf,Trail,Rest Area,Coffee Shop


In [122]:
# set number of clusters
kclusters = 5

ohio_grouped_clustering = ohio_grouped.drop('Museum', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(ohio_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:5])
print(len(kmeans.labels_))

[1 0 0 0 0]
5


In [124]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(OhioMuseumProject['Latitude'], OhioMuseumProject['Longitude'], OhioMuseumProject['County'], kmeans.labels_):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h2 align=left><font size = 5> Discussion

<foint size = 4> As an Ohioan who has raised my family here, in the beginning of this project I was mainly intended to find out the location of museums in the state of Ohio, as nowadays my family and I spent most outdoor activities in state/metro parks, theme/water parks and shopping.  Admittedly the small set of the data, which is the counties containing museums with the name of “Ohio”, in this project is not optimized, since most Ohio’s museums do not have the name of “Ohio” included.  More through analyses could be performed and information could be extracted. Nevertheless, my analyses demonstrate that we could use Foursquare API to analyze data more than neighborhoods in metropolitan area and that there are a lot to learn for the museums in Oho through Python and Foursquare API.   

<h2 align=left><font size = 5> Conclusion

It is a bit challenging to put things together for this Capstone project, as I purposely picked a less popular topic with few data available on-line.  Using the knowledge I have learned through the courses, in this report I intended to explore the museums in the State of Ohio as an Ohioan, although it is not a typical “neighborhood” people have defined.  To me, physically visiting museums can never be replaced by virtual museums, and data analyses and visualization with Python and API definitely help us to find out more museums before visiting them in person.    