## IBM Applied Data Science Capstone Project

### Opening a New Gym in Austin, TX

### Introduction:

Located in Central Texas within the greater Texas Hill Country, it is home to numerous lakes, rivers, and waterways. Austin was recently voted the No. 1 place to live in America. It was also named the fastest growing large city in the U.S.A. Austin and its suburb has an estimated population of 2.20 MM. Austin is a hotbed for technology, startups and innovation. A number of Fortune 500 companies have headquarters or regional offices in Austin, including Dell, 3M, Amazon, Apple, Google, IBM, Intel, Oracle, Texas Instruments, and Whole Foods Market.  

People in the city are affluent, health conscious and willing to spend high dollar to take care of their wellbeing. That's why I have selected Austin for my project to open a new gymnasium.


### Business Problem:

The objective of this project is to analyze and identify a suitable location in Austin, TX that will have a  good potential to open a new Gym. It is extremely important to choose an appropriate location where there is less or no Gym at all to ensure the success of the new Gym. Using Data Science Methodology and Machine Learning techniques like clustering, we should be able to determine that.


### Target Audience:

Anyone who is looking to open a new Gym in the Austin, TX area is a target audience. Whether a single location for an individual entrepreneur or multiple locations for a big business, it’s a good investment to fulfill the health and well being needs of a modern, health concious population. 


## Data Description:

### Data Required

1. List of Neighborhoods in Austin, TX. This defines the scope of the project, which is confined to the city of
Austin, TX.
2. Latitude and the Longitude of the Neighborhoods. This is required to plot the map and get the venues.
3. Venue data, specifically related to Gym. This data will be used to perform Clustering of the
neighborhoods.

### Sources and Methods to Extract Data

From Wikipedia ('https://en.wikipedia.org/wiki/List_of_Austin_neighborhoods') ) we'll extract and scrape Austin
Neighborhood data using various Python commands. Next, we will get the geographical coordinates of the neighborhoods by using Python Geocoder library and that will give us data of the Latitude and the Longitude for the Neighborhoods. With a list of Neighborhoods and their Latitudes and Longitudes, we’ll use Foursquare API to get venue information and we’ll select the Gym category for further analysis. We’ll be using K-mean Clustering (Machine Learning Technique) to determine suitable locations for our new business as well as Folium library to locate them in the Map. The processing of data will help us identify which neighborhoods has less concentration of Gyms, therefore, indicating a suitable location to open a new one.


### 1. Import libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

# uncomment this line if you haven'tcompleted the Foursquare API lab
!conda install -c conda-forge geopy --yes

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

!pip install geocoder
import geocoder

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install folium
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_Austin_neighborhoods'

In [39]:
df1 = pd.read_html(url, header = None)

In [67]:
df = df1[0]
df= df.drop(columns=['COA ID#[nb 1]'])

In [69]:
df_austin=df.rename(columns={'Name':'Neighborhoods'})
df_austin.head()

Unnamed: 0,Neighborhoods
0,Bryker Woods
1,Caswell Heights
2,Downtown Austin
3,Eastwoods
4,Hancock


### 3. Get the geographical coordinates

In [73]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Austin, TX'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [74]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [get_latlng(neighborhood) for neighborhood in df_austin["Neighborhoods"].tolist()]

In [75]:
coords[:5]

[[30.305015660387273, -97.75420440854235],
 [30.307883086657483, -97.71940278965468],
 [30.271220062178976, -97.75418003332545],
 [30.290490000000034, -97.73166999999995],
 [30.297150000000045, -97.72661999999997]]

In [76]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_coords.head()

Unnamed: 0,Latitude,Longitude
0,30.305016,-97.754204
1,30.307883,-97.719403
2,30.27122,-97.75418
3,30.29049,-97.73167
4,30.29715,-97.72662


In [77]:
# merge the coordinates into the original dataframe
df_austin['Latitude'] = df_coords['Latitude']
df_austin['Longitude'] = df_coords['Longitude']

In [78]:
df_austin.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude
0,Bryker Woods,30.305016,-97.754204
1,Caswell Heights,30.307883,-97.719403
2,Downtown Austin,30.27122,-97.75418
3,Eastwoods,30.29049,-97.73167
4,Hancock,30.29715,-97.72662


In [79]:
df_austin.shape

(22, 3)

In [80]:
# save the DataFrame as CSV file
df_austin.to_csv("df_austin.csv", index=False)

### 4. Create a map of Chicago with neighborhoods superimposed on top

In [81]:
address = 'Austin, TX, USA'
geolocator = Nominatim(user_agent="austin_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Austin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Austin are 30.2711286, -97.7436995.


In [83]:
map_austin = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df_austin['Latitude'], df_austin['Longitude'], df_austin['Neighborhoods']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color='green',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7).add_to(map_austin) 
    
map_austin

### 5. Use the Foursquare API to explore the neighborhoods

In [84]:
CLIENT_ID = 'HIDWTSRI3RZT13JAIDYJVWNSVFGYLSKM4PUK5OZZLT2MBTNM' # your Foursquare ID
CLIENT_SECRET = 'YTXYRGQIRIBKKCOT4MTWPPIKFR1DMYU5OTKYYT2PADSRWAYX' # your Four square Secret
VERSION = '20180605' # Foursquare API version
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HIDWTSRI3RZT13JAIDYJVWNSVFGYLSKM4PUK5OZZLT2MBTNM
CLIENT_SECRET:YTXYRGQIRIBKKCOT4MTWPPIKFR1DMYU5OTKYYT2PADSRWAYX


In [85]:
# save the map as HTML file
map_chicago.save('map_austin.html')

#### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [86]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df_austin['Latitude'], df_austin['Longitude'], df_austin['Neighborhoods']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [87]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)
# venues_df.head()
# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1865, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Bryker Woods,30.305016,-97.754204,Kerbey Lane Café,30.30803,-97.75047,Café
1,Bryker Woods,30.305016,-97.754204,Tiny Boxwoods,30.306058,-97.749789,American Restaurant
2,Bryker Woods,30.305016,-97.754204,Anderson's Coffee Co,30.308382,-97.750355,Coffee Shop
3,Bryker Woods,30.305016,-97.754204,Austin Flower Company,30.307787,-97.751224,Flower Shop
4,Bryker Woods,30.305016,-97.754204,Olive & June,30.30745,-97.751046,Italian Restaurant


### End of Week 1

### Methodology and Analyses