## Identifying the Best Location for a Motel in Calcasieu Parish, Louisiana



Table of Contents
1. Introduction
2. Description of the Data
3. Methodology
4. Results
5. Discussion
6. Conclusion

##  1. Introduction
The main objective of this project is to find an ideal location which is suitable to start a new motel in the neighborhoods of Calcasieu Parish.

#### Stakeholders
Econo Lodge is an economy motel chain based in the United States and Canada. Econo Lodge is the second-largest brand in the Choice Hotels system. Choice Hotels plans to add more new-construction properties to the Econo Lodge brand and would like to start a new Econo Lodge near Calcasieu Parish, Louisiana. Therefore my goal is to determine an ideal location around the neighborhood of Calcasieu Parish that is suitable to start a Motel Business.

#### Location
Calcasieu Parish is part of the Lake Charles, with a population of 194,138. Lake Charles has the biggest casino market in the state of Louisiana. Calcasieu Parish was created March 24, 1840, from the parish of Saint Landry, one of the original nineteen civil parishes established by the Louisiana Legislature in 1807. It is also located near the Beaumont–Port Arthur (Texas), Lafayette, and Alexandria metropolitan areas. As the population in this area grew over the years, the original Calcasieu Parish has since been divided into five smaller parishes. Louisiana state has total of 36 Parish(borough) and 304 neighborhoods.

As the stakeholders wants to find the ideal location around Calcasieu Parish' neighborhoods. There are 6 neighborhood of Calcasieu Parish - DeQuincy, Iowa, Lake Charles, Sulphur, Vinton and Westlake. I will be using Foursquare API to explore neighborhoods in Calcasieu Parish and Folium library to visualize the neighborhoods of Calcasieu Parish.In an addition, we will be building a classification model that can help us to identify suitable location in order to start a Motel business. Finally, the overall process and methodologies that are used to obtain results will be presented to the stakeholders Choice Hotels system.

##  2.Description of the Data
This project required two datasets. The first dataset is available on Wikipedia that contains the list of municipalities of Louisiana. The link address to the dataset is https://en.wikipedia.org/wiki/List_of_municipalities_in_Louisiana. We can extract data in the panda data-frame by scraping the html link stated above. This dataset contains the data about all the neighborhood of the Louisiana State, their populations and population density.This dataset contains the information about 36 borough and it’s 304 neighborhoods.

The second dataset contains the information about the latitude and longitude of the neighborhood of Calcasieu Parish of Louisiana. This dataset is also extracted scraping the web page as stated above and combining the geographical information from a csv file

Using the Foursquare API, we will explore and collect the additional data such as numbers of Motels, Casinos, Restaurants  and many more in the neighborhoods of Calcasieu Parish. This dataset will be merged to the other two datasets and final dataset will be passed to the classification model.

In [3]:
# Downloading all the dependencies that are required for this project.
# library to handle data in a vectorized manner
import numpy as np
# library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# library to handle JSON files
import json
# uncomment this line if you haven't completed the Foursquare API lab
!conda install -c conda-forge geopy --yes
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

   

Loading data from https://en.wikipedia.org/wiki/List_of_municipalities_in_Louisiana

In [108]:
link = "https://en.wikipedia.org/wiki/List_of_municipalities_in_Louisiana"
df = pd.read_html(link,header=0)[0]

Load data from csv file

In [109]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighborhood,Latitude,Longitude,Unnamed: 3
0,DeQuincy,30.4505,-93.4332,
1,Iowa,30.2369,-93.4332,
2,Lake Charles,30.2266,-93.2174,
3,Sulphur,30.2366,-93.3774,
4,Vinton,30.1908,-93.5813,


Cleaning and Pre-Processing the data

In [110]:

# Select the only required columns of data frame 
df = df[['Parish[1]','Name','Population(2000)[1]','Population(2010)[1]', 'Population density']]
# Replace columns names
df = df.rename(columns = {'Parish[1]':'Borough'})
df = df.rename(columns = {'Name':'Neighborhood'})
df = df.rename(columns = {'Population(2010)[1]':'Population_2010'})
df = df.rename(columns = {'Population(2000)[1]':'Population_2000'})
df = df.rename(columns = {'Population density':'Population_density'})
# Remove the empty rows of Data Frame
df = df.iloc[1:305].reset_index(drop=True)

In [111]:
# Create the data frame for Calcasieu parish, la populations
population_data = df[df['Borough'] == 'Calcasieu'].reset_index(drop=True)
population_data.head()

Unnamed: 0,Borough,Neighborhood,Population_2000,Population_2010,Population_density
0,Calcasieu,DeQuincy,3398.0,3235.0,8.2
1,Calcasieu,Iowa,2663.0,2996.0,8.2
2,Calcasieu,Lake Charles,71757.0,71993.0,108.9
3,Calcasieu,Sulphur,20512.0,20410.0,25.9
4,Calcasieu,Vinton,3338.0,3212.0,12.9


In [112]:
# Create the data frame for Calcasieu parish, la
calcasieu_data = df[df['Borough'] == 'Calcasieu'].reset_index(drop=False)
# Select the columns Borough and Neighborhood
calcasieu_data = calcasieu_data[['Borough','Neighborhood']]
# Geo Data 
df_geo = df_geo[['Neighborhood','Latitude','Longitude']]
# Tranform the data into a *pandas* dataframe by merging two dataframe
calcasieu_data = pd.merge(calcasieu_data, df_geo, on =['Neighborhood'])
calcasieu_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Calcasieu,DeQuincy,30.4505,-93.4332
1,Calcasieu,Iowa,30.2369,-93.4332
2,Calcasieu,Lake Charles,30.2266,-93.2174
3,Calcasieu,Sulphur,30.2366,-93.3774
4,Calcasieu,Vinton,30.1908,-93.5813
5,Calcasieu,Westlake,30.2421,-93.2507


## 3. Methodology

Foursquare Credentials and Version

In [113]:
# The code was removed by Watson Studio for sharing.

In [114]:
#geographical coordinates of Calcasieu Parish, La
address = 'Calcasieu, LA'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Calcasieu Parish, La are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Calcasieu Parish, La are 30.2232015, -93.3434067.


In [115]:
# create map of Calcasieu Parish using latitude and longitude values
map_calcasieu = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(calcasieu_data['Latitude'], calcasieu_data['Longitude'], calcasieu_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_calcasieu)  
    
map_calcasieu

In [116]:
# Parameters for Foursquare API
LIMIT = 500
address = 'Calcasieu, LA'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
radius = 50000

In this section we are exploring the important attributes that plays the vital role in solving the business problem and collect the data about them using the Foursquare API. Hence, we are exploring and collecting the data in the Calcasieu, Parish about - 
- a. Numbers of Motels
- b. Numbers of Casinos 
- c. Numbers of Restaurants

#### Motels

In [117]:
# Search Query
search_query = 'Motel'
# url
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
# Search results
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe = dataframe[['name','location.city']]
# Rename
dataframe = dataframe.rename(columns = {'location.city':'Neighborhood'})
# Drop NaN values
dataframe = dataframe.dropna(axis = 0, how='all')
#dataframe.head(2)
dataframe_motel = dataframe.groupby('Neighborhood').count().reset_index()
dataframe_motel = dataframe_motel.rename(columns = {'name':'Num_Motel'})
# Convert type as Integer
dataframe_motel['Num_Motel'].astype(int)
final_data = pd.merge(calcasieu_data, dataframe_motel, on = 'Neighborhood', how = 'outer')
dataframe_motel.head()

Unnamed: 0,Neighborhood,Num_Motel
0,Groves,1
1,Kinder,1
2,Lake Charles,3
3,Orange,1
4,Port Arthur,1


#### Casino

In [118]:
# Search Query
search_query = 'Casino'
# url
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
# Search results
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe = dataframe[['name','location.city']]
# Rename
dataframe = dataframe.rename(columns = {'location.city':'Neighborhood'})
# Drop NaN values
dataframe = dataframe.dropna(axis = 0, how='all')
#dataframe.head(2)
dataframe_casino = dataframe.groupby('Neighborhood').count().reset_index()
dataframe_casino = dataframe_casino.rename(columns = {'name':'Num_Casino'})
# Convert type as Integer
dataframe_casino['Num_Casino'].astype(int)
final_data = pd.merge(final_data, dataframe_casino, on = 'Neighborhood', how = 'outer')
dataframe_casino.head()

Unnamed: 0,Neighborhood,Num_Casino
0,Kinder,3
1,Lake Charles,17
2,Sulphur,2
3,Vinton,4
4,Welsh,1


#### Restaurants

In [119]:
# Search Query
search_query = 'Restaurant'
# url
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
# Search results
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe = dataframe[['name','location.city']]
# Rename
dataframe = dataframe.rename(columns = {'location.city':'Neighborhood'})
# Drop NaN values
dataframe = dataframe.dropna(axis = 0, how='all')
#dataframe.head(2)
dataframe_rest = dataframe.groupby('Neighborhood').count().reset_index()
dataframe_rest = dataframe_rest.rename(columns = {'name':'Num_Restaurant'})
# Convert type as Integer
dataframe_rest['Num_Restaurant'].astype(int)
final_data = pd.merge(final_data, dataframe_rest, on = 'Neighborhood', how = 'outer')
dataframe_rest.head()

Unnamed: 0,Neighborhood,Num_Restaurant
0,Bridge City,1
1,DeQuincy,1
2,Lake Charles,29
3,Moss Bluff,1
4,Orange,7


#### Merge data

In [120]:
final_data = final_data.loc[(final_data['Neighborhood'] == 'DeQuincy') | (final_data['Neighborhood'] == 'Iowa') | (final_data['Neighborhood'] == 'Lake Charles') |(final_data['Neighborhood'] == 'Sulphur') | (final_data['Neighborhood'] == 'Vinton')|(final_data['Neighborhood'] == 'Westlake')]
final_data = final_data.fillna(0)
# Merge the final data with population data
calcasieu_final_data = pd.merge(final_data, population_data, on =['Neighborhood','Borough'])
calcasieu_final_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Num_Motel,Num_Casino,Num_Restaurant,Population_2000,Population_2010,Population_density
0,Calcasieu,DeQuincy,30.4505,-93.4332,0.0,0.0,1.0,3398.0,3235.0,8.2
1,Calcasieu,Iowa,30.2369,-93.4332,0.0,0.0,0.0,2663.0,2996.0,8.2
2,Calcasieu,Lake Charles,30.2266,-93.2174,3.0,17.0,29.0,71757.0,71993.0,108.9
3,Calcasieu,Sulphur,30.2366,-93.3774,3.0,2.0,5.0,20512.0,20410.0,25.9
4,Calcasieu,Vinton,30.1908,-93.5813,1.0,4.0,3.0,3338.0,3212.0,12.9
5,Calcasieu,Westlake,30.2421,-93.2507,0.0,5.0,1.0,4668.0,4568.0,9.5


#### K-means Clustering

In [121]:
# Drop the unnecessary Columns
calcasieu_final_data = calcasieu_final_data.drop('Borough', 1)
calcasieu_final_data = calcasieu_final_data.drop('Latitude', 1)
calcasieu_final_data = calcasieu_final_data.drop('Longitude', 1)
calcasieu_final_data

Unnamed: 0,Neighborhood,Num_Motel,Num_Casino,Num_Restaurant,Population_2000,Population_2010,Population_density
0,DeQuincy,0.0,0.0,1.0,3398.0,3235.0,8.2
1,Iowa,0.0,0.0,0.0,2663.0,2996.0,8.2
2,Lake Charles,3.0,17.0,29.0,71757.0,71993.0,108.9
3,Sulphur,3.0,2.0,5.0,20512.0,20410.0,25.9
4,Vinton,1.0,4.0,3.0,3338.0,3212.0,12.9
5,Westlake,0.0,5.0,1.0,4668.0,4568.0,9.5


In [122]:
from sklearn.preprocessing import StandardScaler
# set number of clusters
kclusters = 3
calcasieu_grouped_clustering = calcasieu_final_data.drop('Neighborhood', 1)
# Normalize the dataset
X = calcasieu_grouped_clustering.values[:,1:]
X = np.nan_to_num(X)
cluster_dataset = StandardScaler().fit_transform(X)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cluster_dataset)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 1, 2, 0, 0], dtype=int32)

In [123]:
# merge the dataframe with cluster label
calcasieu_merged = calcasieu_data

# add clustering labels
calcasieu_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
calcasieu_merged = calcasieu_merged.join(calcasieu_final_data.set_index('Neighborhood'), on='Neighborhood')

calcasieu_merged # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Num_Motel,Num_Casino,Num_Restaurant,Population_2000,Population_2010,Population_density
0,Calcasieu,DeQuincy,30.4505,-93.4332,0,0.0,0.0,1.0,3398.0,3235.0,8.2
1,Calcasieu,Iowa,30.2369,-93.4332,0,0.0,0.0,0.0,2663.0,2996.0,8.2
2,Calcasieu,Lake Charles,30.2266,-93.2174,1,3.0,17.0,29.0,71757.0,71993.0,108.9
3,Calcasieu,Sulphur,30.2366,-93.3774,2,3.0,2.0,5.0,20512.0,20410.0,25.9
4,Calcasieu,Vinton,30.1908,-93.5813,0,1.0,4.0,3.0,3338.0,3212.0,12.9
5,Calcasieu,Westlake,30.2421,-93.2507,0,0.0,5.0,1.0,4668.0,4568.0,9.5


## 4.Results

#### cluster 1

In [124]:
cluster_one = calcasieu_merged.loc[calcasieu_merged['Cluster Labels'] == 0, calcasieu_merged.columns[[1] + list(range(5, calcasieu_merged.shape[1]))]]
cluster_one.head()

Unnamed: 0,Neighborhood,Num_Motel,Num_Casino,Num_Restaurant,Population_2000,Population_2010,Population_density
0,DeQuincy,0.0,0.0,1.0,3398.0,3235.0,8.2
1,Iowa,0.0,0.0,0.0,2663.0,2996.0,8.2
4,Vinton,1.0,4.0,3.0,3338.0,3212.0,12.9
5,Westlake,0.0,5.0,1.0,4668.0,4568.0,9.5


#### cluster 2

In [125]:
cluster_two = calcasieu_merged.loc[calcasieu_merged['Cluster Labels'] == 1, calcasieu_merged.columns[[1] + list(range(5, calcasieu_merged.shape[1]))]]
cluster_two

Unnamed: 0,Neighborhood,Num_Motel,Num_Casino,Num_Restaurant,Population_2000,Population_2010,Population_density
2,Lake Charles,3.0,17.0,29.0,71757.0,71993.0,108.9


#### cluster 3

In [126]:
cluster_three = calcasieu_merged.loc[calcasieu_merged['Cluster Labels'] == 2, calcasieu_merged.columns[[1] + list(range(5, calcasieu_merged.shape[1]))]]
cluster_three

Unnamed: 0,Neighborhood,Num_Motel,Num_Casino,Num_Restaurant,Population_2000,Population_2010,Population_density
3,Sulphur,3.0,2.0,5.0,20512.0,20410.0,25.9


In [127]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(calcasieu_merged['Latitude'], calcasieu_merged['Longitude'], calcasieu_merged['Neighborhood'], calcasieu_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Discussion

Calcasieu Parish of Louisiana has 6 neighborhood. And the goal was to identify the best location to open the motel business in the neighborhood of Calcasieu Parish. The result obtained from the models shows that the data is classified three different clusters. 

- Cluster 1 ------ DeQuincy, Iowa, Vinton, Westlake 
- Cluster 2 ------ Lake Charles 
- Cluster 3 ---- Sulphur 

### Observation and Recommendations

Analyzing the results obtained in the result sections using the k-means algorithms as follows:


#### Cluster 1
The results suggests that the numbers of Casinos, and Restaurants are less as compared to cluster 2. In addition, population density is also very low in all the neighborhood of cluster 1. Comparing the population in the period of 2000 to 2010, the rate of increase in population is significantly low.

Hence, it will not be wise decision to start the Motel in neighborhood occupied by this cluster.

#### Cluster 2
There are higher numbers  of Casinos, and Restaurants in cluster two as compared to other clusters. In addition, population density is also significantly high in comparison to other clusters. But, Comparing the population in the period of 2000 to 2010, the rate of increase in population is significantly low. Even though the number of Motel is same as Cluster 3, but the number of Casino  and Restaurants are high.

Hence, I would recommended cluster 2 neighborhoods an ideal destination to start Motel business.

#### Cluster 3
The results suggests that the numbers of Casinos, and Restaurants are less as compared to cluster 2. In addition, population density is also very low in all the neighborhood of cluster 3

Hence, it will not be wise decision to start the Motel in neighborhood occupied by this cluster.


## 6.Conclusion

The main objective of this project is to demonstrate the concepts of recommendation systems/techniques that are used to solve the real business problems. In our case, we developed a model that can easily help us to determine the best location in Calcasieu Parish of Lousiana to start a Motel business. The result from our model recommended us that cluster 3  only neighborhood *** Lake Charles, La *** is the ideal location to start the Motel business.