# Capstone Project: The Battle of Neighborhoods (Week 1)

## - Instructions: 
Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

U.S. has been a land of opportunity for many who seek a better live and a fresh start.  Not all migrants have technical skills or are willing to join the corporate world of a very different culture.  Starting one's own business such as a restaurant may be more viable for such individuals. 
This project demonstrates how the basic fundamental law of Demand and Supply can be used to analyze statistical data
We will use the greater area of Dallas and narrow down to a few feasible neighborhoods.  The same process can be repeated with other areas of interest.

# Start of Battle of Neighborhoods - code section (part of Week 2)

In [1]:
# Import libraries - long wait to sort the environment
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests for postal codes in wikipedia
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported!')


usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported!


usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


## Import data and prepare a dataframe of Dallas neighborhoods

Next, we need to import the Postal Code (zip code) for the Dallas city along with its neighborhoods and their lat/longs

In [2]:
# Import zip codes, lat/longs and cities/neighborhoods of whole TX 
raw_zip = pd.read_csv('texas_zips.csv')
#raw_zip.set_index("city")

print(raw_zip.dtypes)
##print(raw_zip.shape)
##print(raw_zip.columns)
raw_zip.head()
#raw_zip

zip                     int64
lat                   float64
lng                   float64
city                   object
state_id               object
state_name             object
zcta                     bool
parent_zcta           float64
population              int64
density               float64
county_fips             int64
county_name            object
all_county_weights     object
imprecise                bool
military                 bool
timezone               object
dtype: object


Unnamed: 0,zip,lat,lng,city,state_id,state_name,zcta,parent_zcta,population,density,county_fips,county_name,all_county_weights,imprecise,military,timezone
0,75001,32.96,-96.83847,Addison,TX,Texas,True,,12414,1250.2,48113,Dallas,{'48113':100},False,False,America/Chicago
1,75002,33.08966,-96.60751,Allen,TX,Texas,True,,63140,655.6,48085,Collin,{'48085':100},False,False,America/Chicago
2,75006,32.96188,-96.89701,Carrollton,TX,Texas,True,,46364,1065.0,48113,Dallas,{'48113':100},False,False,America/Chicago
3,75007,33.00462,-96.89714,Carrollton,TX,Texas,True,,51624,1709.9,48121,Denton,"{'48113':5.79,'48121':94.21}",False,False,America/Chicago
4,75009,33.34028,-96.75033,Celina,TX,Texas,True,,8785,35.5,48085,Collin,"{'48085':94.8,'48121':5.2}",False,False,America/Chicago


Clean up the raw data of Texas and create a new dataframe with only Dallas neighborhoods

In [3]:
# Drop zcta, county_fips, all_county_weights, imprecise, military, timezone columns
raw_zip.drop(['zcta','parent_zcta','density','county_fips','all_county_weights','imprecise','military','timezone'], axis=1, inplace=True)
raw_zip=raw_zip[['state_name', 'state_id', 'county_name', 'city', 'zip', 'lat', 'lng', 'population']]

# Rename city-> neighborhood, county-> city, lat/long, state_name->state
raw_zip.rename(columns={'lat':'latitude','lng':'longitude','city':'neighborhood'}, inplace=True)
raw_zip.rename(columns={'county_name':'city','state_name':'state'}, inplace=True)
#raw_zip.set_index("city")


In [4]:
# create a new dataframe only with Dallas neighborhoods
dallas_zip=raw_zip[raw_zip.city == 'Dallas']
dallas_zip.set_index("city")

print(dallas_zip.dtypes)
print(dallas_zip.shape)
#print(raw_zip.columns)
dallas_zip.head()       # this is the dataframe to work with

state            object
state_id         object
city             object
neighborhood     object
zip               int64
latitude        float64
longitude       float64
population        int64
dtype: object
(84, 8)


Unnamed: 0,state,state_id,city,neighborhood,zip,latitude,longitude,population
0,Texas,TX,Dallas,Addison,75001,32.96,-96.83847,12414
2,Texas,TX,Dallas,Carrollton,75006,32.96188,-96.89701,46364
7,Texas,TX,Dallas,Coppell,75019,32.96329,-96.98553,38666
18,Texas,TX,Dallas,Irving,75038,32.87458,-96.99758,27802
19,Texas,TX,Dallas,Irving,75039,32.88752,-96.94225,11032


### Now to render a map of Neighborhood (cities) in the County of Dallas 

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>dallas_explorer</em>, as shown below.

In [5]:
address = 'Dallas, TX'
geolocator = Nominatim(user_agent="dallas_explorer")
location = geolocator.geocode(address)
d_latitude = location.latitude
d_longitude = location.longitude
print('The geographical coordinate of Dallas city is {}, {}.'.format(d_latitude, d_longitude))


The geographical coordinate of Dallas city is 32.7762719, -96.7968559.


Create a map of Dallas county (Borough) with cities (neighborhood) superimposed on top.

In [6]:
# create map of Dallas using latitude and longitude values
map_dallas = folium.Map(location=[d_latitude, d_longitude], zoom_start=10)

# add markers to map
for lat, lng, city, neighborhood in zip(dallas_zip['latitude'], dallas_zip['longitude'], dallas_zip['city'], dallas_zip['neighborhood']):
    label = '{}, {}'.format(neighborhood, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dallas)  
    
map_dallas

## Explore Neighborhoods (cities) venues in the county of Dallas using Foursquare

Explore and cluster the cities in Dallas -  
Define Foursquare Credentials and Version and some parameters

In [11]:
#CLIENT_ID = 'REZHBJKA5QZFZ5DDNU4X50IDMD4WK0ECXQXYOUAWAQBPAL1P'     # your Foursquare ID
# CLIENT_SECRET = '2QDU2YE3I0LZMFHQL53GXUEVLD0QOQ0HNE4QR1SC5VNA5Y5L' # your Foursquare Secret

CLIENT_ID = 'UCJZPABOG0WAB1PSUOEKCXX3ZODYYHCOBX5LX45IDZNIKS0C'     # your Foursquare ID
CLIENT_SECRET = 'FTFATJJQW0FGH4DKH2AZFD5A3D1LQ10ZAID44AHIJZ0UHAU5' # your Foursquare Secret
VERSION = '20180605'                # Foursquare API version

radius=3000                         # default radius is 500m from ll
LIMIT = 500                         # default LIMIT 500 venues per neighborhood

Create a function to repeat the same process for exploring venues to all the cities of Dallas county

In [12]:
def getNearbyVenues(names, latitudes, longitudes):   # names -> cities in Dallas county, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId= 4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,                # radius = 3000 as set in previous cell
            LIMIT)                 # LIMIT  =  500 as set in previous cell
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function for each city and store it in a new dataframe called dallas_venues

In [13]:
dallas_venues = getNearbyVenues(names=dallas_zip['neighborhood'],
                                   latitudes=dallas_zip['latitude'],
                                   longitudes=dallas_zip['longitude']
                                  )

Addison
Carrollton
Coppell
Irving
Irving
Garland
Garland
Garland
Garland
Garland
Sachse
Grand Prairie
Grand Prairie
Grand Prairie
Irving
Irving
Irving
Irving
Richardson
Richardson
Rowlett
Rowlett
Cedar Hill
Desoto
Duncanville
Lancaster
Duncanville
Hutchins
Lancaster
Mesquite
Mesquite
Seagoville
Wilmer
Balch Springs
Mesquite
Sunnyvale
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas


Let's do a sanity check on the resulting data

In [14]:
##print(dallas_venues.shape)
##dallas_venues.head()

Let's check how many venues were returned for each city (neighborhood)

In [15]:
dallas_venues.groupby('Neighborhood').count()        

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Addison,100,100,100,100,100,100
Balch Springs,38,38,38,38,38,38
Carrollton,100,100,100,100,100,100
Cedar Hill,45,45,45,45,45,45
Coppell,76,76,76,76,76,76
Dallas,3495,3495,3495,3495,3495,3495
Desoto,43,43,43,43,43,43
Duncanville,121,121,121,121,121,121
Garland,310,310,310,310,310,310
Grand Prairie,140,140,140,140,140,140


In [16]:
# Let's find out how many unique categories can be curated from all the returned venues - debug 
print('There are {} uniques categories.'.format(len(dallas_venues['Venue Category'].unique())))

There are 89 uniques categories.


### Analyze Each Neighborhood

In [17]:
# one hot encoding
dallas_onehot = pd.get_dummies(dallas_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dallas_onehot['Neighborhood'] = dallas_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dallas_onehot.columns[-1]] + list(dallas_onehot.columns[:-1])
dallas_onehot = dallas_onehot[fixed_columns]

dallas_onehot.shape   # examine the new dataframe size

(5481, 90)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [18]:
dallas_grouped = dallas_onehot.groupby('Neighborhood').mean().reset_index()
dallas_grouped                         # 
dallas_grouped.shape                   # Let's confirm the new size

(19, 90)

Let's print each neighborhood along with the top 10 most common venues

In [19]:
num_top_venues = 20     # normally only the top 10 

for hood in dallas_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dallas_grouped[dallas_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 3})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Addison----
                      venue  freq
0        Mexican Restaurant  0.09
1        Italian Restaurant  0.08
2       American Restaurant  0.08
3               Pizza Place  0.04
4                Steakhouse  0.04
5                     Diner  0.04
6                    Bakery  0.04
7        Seafood Restaurant  0.04
8              Burger Joint  0.04
9                Restaurant  0.04
10           Sandwich Place  0.03
11     Fast Food Restaurant  0.03
12              Wings Joint  0.03
13           Breakfast Spot  0.03
14         Asian Restaurant  0.03
15         Sushi Restaurant  0.02
16      Japanese Restaurant  0.02
17            Deli / Bodega  0.02
18  New American Restaurant  0.02
19       Chinese Restaurant  0.02


----Balch Springs----
                   venue   freq
0   Fast Food Restaurant  0.237
1    Fried Chicken Joint  0.132
2     Chinese Restaurant  0.105
3            Pizza Place  0.105
4             Taco Place  0.079
5     Mexican Restaurant  0.053
6                   Fo

First, let's write a function to sort the venues in descending order.

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [21]:
num_top_venues = 20     # normally only the top 10 

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dallas_grouped['Neighborhood']

for ind in np.arange(dallas_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dallas_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted   # now to eyeball our data

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Addison,Mexican Restaurant,Italian Restaurant,American Restaurant,Pizza Place,Steakhouse,Diner,Bakery,Seafood Restaurant,Burger Joint,Restaurant,Sandwich Place,Fast Food Restaurant,Wings Joint,Breakfast Spot,Asian Restaurant,Sushi Restaurant,Japanese Restaurant,Deli / Bodega,New American Restaurant,Chinese Restaurant
1,Balch Springs,Fast Food Restaurant,Fried Chicken Joint,Chinese Restaurant,Pizza Place,Taco Place,Mexican Restaurant,Food,Bakery,Sandwich Place,Bagel Shop,Burger Joint,Donut Shop,Café,Diner,Asian Restaurant,Seafood Restaurant,Restaurant,Ramen Restaurant,Poke Place,African Restaurant
2,Carrollton,Fast Food Restaurant,Mexican Restaurant,Korean Restaurant,Pizza Place,Sandwich Place,Chinese Restaurant,Sushi Restaurant,Burger Joint,Fried Chicken Joint,Café,Indian Restaurant,Vietnamese Restaurant,Asian Restaurant,Cajun / Creole Restaurant,Bakery,Thai Restaurant,Donut Shop,Diner,Japanese Restaurant,Italian Restaurant
3,Cedar Hill,Mexican Restaurant,American Restaurant,Fast Food Restaurant,Pizza Place,Fried Chicken Joint,Burger Joint,Donut Shop,Breakfast Spot,Seafood Restaurant,Chinese Restaurant,Italian Restaurant,Restaurant,Food,Sandwich Place,Southern / Soul Food Restaurant,Sushi Restaurant,Wings Joint,Tex-Mex Restaurant,Asian Restaurant,BBQ Joint
4,Coppell,Pizza Place,American Restaurant,Fast Food Restaurant,Sandwich Place,Burger Joint,Mexican Restaurant,Donut Shop,Tex-Mex Restaurant,Bakery,Indian Restaurant,Café,Salad Place,Restaurant,Seafood Restaurant,Sushi Restaurant,Peking Duck Restaurant,New American Restaurant,Taco Place,Mediterranean Restaurant,Japanese Restaurant
5,Dallas,Mexican Restaurant,Fast Food Restaurant,American Restaurant,Pizza Place,Sandwich Place,Burger Joint,Fried Chicken Joint,Taco Place,Italian Restaurant,Seafood Restaurant,Bakery,BBQ Joint,Chinese Restaurant,Steakhouse,Restaurant,New American Restaurant,Breakfast Spot,Sushi Restaurant,Café,Thai Restaurant
6,Desoto,Donut Shop,Pizza Place,Fast Food Restaurant,American Restaurant,Fried Chicken Joint,Sandwich Place,Mexican Restaurant,Burger Joint,Seafood Restaurant,Chinese Restaurant,Fish & Chips Shop,Restaurant,Snack Place,Wings Joint,Bakery,New American Restaurant,Peruvian Restaurant,Peking Duck Restaurant,Pakistani Restaurant,Turkish Restaurant
7,Duncanville,Fast Food Restaurant,Pizza Place,American Restaurant,Mexican Restaurant,Fried Chicken Joint,Chinese Restaurant,Seafood Restaurant,Italian Restaurant,Donut Shop,Sandwich Place,Wings Joint,BBQ Joint,Burger Joint,Restaurant,Bagel Shop,Taco Place,Sushi Restaurant,Breakfast Spot,Bakery,Asian Restaurant
8,Garland,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Chinese Restaurant,American Restaurant,Burger Joint,Sandwich Place,Donut Shop,Fried Chicken Joint,Taco Place,Seafood Restaurant,BBQ Joint,Wings Joint,Vietnamese Restaurant,Breakfast Spot,Restaurant,Italian Restaurant,Food Truck,Bakery,Snack Place
9,Grand Prairie,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Sandwich Place,Fried Chicken Joint,American Restaurant,Bakery,Donut Shop,Wings Joint,Chinese Restaurant,BBQ Joint,Taco Place,Burger Joint,Seafood Restaurant,Italian Restaurant,Restaurant,Breakfast Spot,Latin American Restaurant,Buffet,Asian Restaurant


In [22]:
neighborhoods_venues_sorted.to_csv(r'Dallas Venues.csv')

## Now to explore Asian food in the using Foursquare

In [23]:
def getNearbyAsianVenues(names, latitudes, longitudes):   # names -> cities in Dallas county, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d142941735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,                # radius = 500 as set in previous cell
            LIMIT)                 # LIMIT = 100 as set in previous cell
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
dallas_asian_venues = getNearbyAsianVenues(names=dallas_zip['neighborhood'],
                                   latitudes=dallas_zip['latitude'],
                                   longitudes=dallas_zip['longitude']
                                  )

Addison
Carrollton
Coppell
Irving
Irving
Garland
Garland
Garland
Garland
Garland
Sachse
Grand Prairie
Grand Prairie
Grand Prairie
Irving
Irving
Irving
Irving
Richardson
Richardson
Rowlett
Rowlett
Cedar Hill
Desoto
Duncanville
Lancaster
Duncanville
Hutchins
Lancaster
Mesquite
Mesquite
Seagoville
Wilmer
Balch Springs
Mesquite
Sunnyvale
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas
Dallas


In [25]:
dallas_asian_venues.groupby('Neighborhood').count()     

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Addison,36,36,36,36,36,36
Balch Springs,5,5,5,5,5,5
Carrollton,71,71,71,71,71,71
Cedar Hill,3,3,3,3,3,3
Coppell,9,9,9,9,9,9
Dallas,1218,1218,1218,1218,1218,1218
Desoto,3,3,3,3,3,3
Duncanville,18,18,18,18,18,18
Garland,65,65,65,65,65,65
Grand Prairie,16,16,16,16,16,16


In [26]:
# one hot encoding
dallas_asian_onehot = pd.get_dummies(dallas_asian_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dallas_asian_onehot['Neighborhood'] = dallas_asian_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dallas_asian_onehot.columns[-1]] + list(dallas_asian_onehot.columns[:-1])
dallas_asian_onehot = dallas_asian_onehot[fixed_columns]

dallas_asian_onehot.shape   # examine the new dataframe size

(1723, 40)

In [27]:
dallas_asian_grouped = dallas_asian_onehot.groupby('Neighborhood').mean().reset_index()
dallas_asian_grouped                         # 
dallas_asian_grouped.shape                   # Let's confirm the new size

(18, 40)

In [28]:
num_top_venues = 20     # normally only the top 10 

for hood in dallas_asian_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dallas_asian_grouped[dallas_asian_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 3})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Addison----
                            venue   freq
0                Asian Restaurant  0.389
1              Chinese Restaurant  0.167
2                 Thai Restaurant  0.139
3             Japanese Restaurant  0.111
4                Sushi Restaurant  0.083
5           Vietnamese Restaurant  0.056
6               Korean Restaurant  0.028
7                    Noodle House  0.028
8                  Sandwich Place  0.000
9                      Poke Place  0.000
10               Ramen Restaurant  0.000
11                     Restaurant  0.000
12                    Salad Place  0.000
13                     Soup Place  0.000
14             Seafood Restaurant  0.000
15                  Shopping Mall  0.000
16            Szechuan Restaurant  0.000
17           Taiwanese Restaurant  0.000
18                Udon Restaurant  0.000
19  Vegetarian / Vegan Restaurant  0.000


----Balch Springs----
                            venue  freq
0              Chinese Restaurant   0.6
1                As

In [29]:
def return_most_common_asian_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [30]:
num_top_venues = 20     # normally only the top 10 

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_asian_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_asian_venues_sorted['Neighborhood'] = dallas_asian_grouped['Neighborhood']

for ind in np.arange(dallas_asian_grouped.shape[0]):
    neighborhoods_asian_venues_sorted.iloc[ind, 1:] = return_most_common_asian_venues(dallas_asian_grouped.iloc[ind, :], num_top_venues)

neighborhoods_asian_venues_sorted   # now to eyeball our data

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Addison,Asian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Korean Restaurant,Noodle House,Sandwich Place,Poke Place,Ramen Restaurant,Restaurant,Salad Place,Soup Place,Seafood Restaurant,Shopping Mall,Szechuan Restaurant,Taiwanese Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant
1,Balch Springs,Chinese Restaurant,Asian Restaurant,Shopping Mall,Peking Duck Restaurant,Poke Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,American Restaurant,Noodle House,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Soup Place
2,Carrollton,Korean Restaurant,Asian Restaurant,Chinese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Ramen Restaurant,Thai Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Karaoke Bar,Noodle House,Soup Place,Taiwanese Restaurant,Poke Place,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Szechuan Restaurant,Udon Restaurant
3,Cedar Hill,Asian Restaurant,Chinese Restaurant,Sushi Restaurant,Shopping Mall,Peking Duck Restaurant,Poke Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,American Restaurant,Noodle House,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Soup Place
4,Coppell,Vietnamese Restaurant,Chinese Restaurant,Peking Duck Restaurant,Asian Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Sandwich Place,Poke Place,Ramen Restaurant,Restaurant,Salad Place,American Restaurant,Seafood Restaurant,Shopping Mall,Noodle House,Szechuan Restaurant,Taiwanese Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant
5,Dallas,Asian Restaurant,Chinese Restaurant,Sushi Restaurant,Thai Restaurant,Japanese Restaurant,Vietnamese Restaurant,Ramen Restaurant,Korean Restaurant,Food Truck,Noodle House,Fast Food Restaurant,Seafood Restaurant,Diner,Taiwanese Restaurant,Sandwich Place,Fried Chicken Joint,Vegetarian / Vegan Restaurant,American Restaurant,Buffet,Poke Place
6,Desoto,Chinese Restaurant,Thai Restaurant,American Restaurant,Shopping Mall,Peking Duck Restaurant,Poke Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Soup Place,Mongolian Restaurant,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Noodle House
7,Duncanville,Chinese Restaurant,Asian Restaurant,Vietnamese Restaurant,Sushi Restaurant,Mongolian Restaurant,Seafood Restaurant,Peking Duck Restaurant,Poke Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,American Restaurant,Shopping Mall,Noodle House,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant
8,Garland,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Korean Restaurant,Udon Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Vegetarian / Vegan Restaurant,Noodle House,Shopping Mall,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Ramen Restaurant,Poke Place,Peking Duck Restaurant,Soup Place
9,Grand Prairie,Chinese Restaurant,Asian Restaurant,Japanese Restaurant,Filipino Restaurant,Café,Sushi Restaurant,Shopping Mall,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,American Restaurant,Soup Place,Peking Duck Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant


In [31]:
neighborhoods_asian_venues_sorted.to_csv(r'Dallas Asian Venues.csv') 

End here for Asian Venues

### Visualization of Cluster venues

Run *k*-means to cluster Asian venues in the neighborhood into 18 clusters.

In [32]:

kclusters = 5           # set number of clusters

# new data frame for grouping by neighborhood
dallas_asian_grouped_clustering = dallas_asian_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dallas_asian_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:18] 

array([1, 0, 1, 4, 1, 1, 3, 0, 1, 0, 4, 1, 0, 0, 1, 3, 1, 2])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [33]:
# add clustering labels
#backup_neighborhoods_asian_venues_sorted=neighborhoods_asian_venues_sorted # debug - create a backup 
neighborhoods_asian_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dallas_merged = dallas_zip

# merge dallas_asian_grouped with dallas_zip to add latitude/longitude for each neighborhood
dallas_merged = dallas_merged.join(neighborhoods_asian_venues_sorted.set_index('Neighborhood'), on='neighborhood')
dallas_merged.dropna(axis = 0, inplace = True)             # incased NaN shows up after merging
dallas_merged['Cluster Labels'] = dallas_merged['Cluster Labels'].astype('int64') # Cluster Labels must be int

print(dallas_merged.dtypes)
dallas_merged                      # check the columns!  


state                      object
state_id                   object
city                       object
neighborhood               object
zip                         int64
latitude                  float64
longitude                 float64
population                  int64
Cluster Labels              int64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
11th Most Common Venue     object
12th Most Common Venue     object
13th Most Common Venue     object
14th Most Common Venue     object
15th Most Common Venue     object
16th Most Common Venue     object
17th Most Common Venue     object
18th Most Common Venue     object
19th Most Common Venue     object
20th Most Common Venue     object
dtype: object


Unnamed: 0,state,state_id,city,neighborhood,zip,latitude,longitude,population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Texas,TX,Dallas,Addison,75001,32.96,-96.83847,12414,1,Asian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Korean Restaurant,Noodle House,Sandwich Place,Poke Place,Ramen Restaurant,Restaurant,Salad Place,Soup Place,Seafood Restaurant,Shopping Mall,Szechuan Restaurant,Taiwanese Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant
2,Texas,TX,Dallas,Carrollton,75006,32.96188,-96.89701,46364,1,Korean Restaurant,Asian Restaurant,Chinese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Ramen Restaurant,Thai Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Karaoke Bar,Noodle House,Soup Place,Taiwanese Restaurant,Poke Place,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Szechuan Restaurant,Udon Restaurant
7,Texas,TX,Dallas,Coppell,75019,32.96329,-96.98553,38666,1,Vietnamese Restaurant,Chinese Restaurant,Peking Duck Restaurant,Asian Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Sandwich Place,Poke Place,Ramen Restaurant,Restaurant,Salad Place,American Restaurant,Seafood Restaurant,Shopping Mall,Noodle House,Szechuan Restaurant,Taiwanese Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant
18,Texas,TX,Dallas,Irving,75038,32.87458,-96.99758,27802,1,Asian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Korean Restaurant,Vietnamese Restaurant,Himalayan Restaurant,Bakery,Sandwich Place,Indian Restaurant,Filipino Restaurant,Wings Joint,Fried Chicken Joint,Food Truck,Buffet,Vegetarian / Vegan Restaurant,Udon Restaurant,Burmese Restaurant,Taiwanese Restaurant
19,Texas,TX,Dallas,Irving,75039,32.88752,-96.94225,11032,1,Asian Restaurant,Chinese Restaurant,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Korean Restaurant,Vietnamese Restaurant,Himalayan Restaurant,Bakery,Sandwich Place,Indian Restaurant,Filipino Restaurant,Wings Joint,Fried Chicken Joint,Food Truck,Buffet,Vegetarian / Vegan Restaurant,Udon Restaurant,Burmese Restaurant,Taiwanese Restaurant
20,Texas,TX,Dallas,Garland,75040,32.92766,-96.62008,59406,1,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Korean Restaurant,Udon Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Vegetarian / Vegan Restaurant,Noodle House,Shopping Mall,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Ramen Restaurant,Poke Place,Peking Duck Restaurant,Soup Place
21,Texas,TX,Dallas,Garland,75041,32.88091,-96.65147,30700,1,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Korean Restaurant,Udon Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Vegetarian / Vegan Restaurant,Noodle House,Shopping Mall,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Ramen Restaurant,Poke Place,Peking Duck Restaurant,Soup Place
22,Texas,TX,Dallas,Garland,75042,32.9139,-96.67493,37881,1,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Korean Restaurant,Udon Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Vegetarian / Vegan Restaurant,Noodle House,Shopping Mall,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Ramen Restaurant,Poke Place,Peking Duck Restaurant,Soup Place
23,Texas,TX,Dallas,Garland,75043,32.85707,-96.57941,58094,1,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Korean Restaurant,Udon Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Vegetarian / Vegan Restaurant,Noodle House,Shopping Mall,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Ramen Restaurant,Poke Place,Peking Duck Restaurant,Soup Place
24,Texas,TX,Dallas,Garland,75044,32.96264,-96.65323,40811,1,Chinese Restaurant,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Sushi Restaurant,Korean Restaurant,Udon Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Vegetarian / Vegan Restaurant,Noodle House,Shopping Mall,Seafood Restaurant,Sandwich Place,Salad Place,Restaurant,Ramen Restaurant,Poke Place,Peking Duck Restaurant,Soup Place


In [34]:
# create map of Dallas using latitude and longitude values
#map_clusters = folium.Map(location=[d_latitude, d_longitude], zoom_start=10)
map_dallas = folium.Map(location=[d_latitude, d_longitude], zoom_start=10)   # try adding cluster to the map_dallas. Was map_clusters

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dallas_merged['latitude'], dallas_merged['longitude'], dallas_merged['neighborhood'], dallas_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_dallas)         #fill_opacity=0.7).add_to(map_clusters)
       
map_dallas
#map_clusters

# End of Battle of Neighborhoods - code section (part of Week 2)