## Table of Contents

<div>
        <a href="#item1">1. Business Statement</a>
        <br>
        <a href="##item2">2. Data Source</a>
        <br>
        <a href="#item3">3. Analyze Data</a>
        <br>
        <a href="#item4">4. Visualise Data</a>
        <br>
        <a href="#item5">5. Conclusions</a>    
        <br>
    </ol>
</div>

## 1. Business Statement

A new fitness chain would like to expand in Denmark. Opening the first Gym is going to give the stakeholder an idea on how profitable is their business model and how it will be received might dictate future plans for expansion on the Danish Market.

Data Analysis should be used to find the best location considering a few criteria:

Proximity to high traffic areas
A certain distance to existing fitness centers.
A location with lower density of Gym facilities.




## 2. Data Source

The main Data Source will be Foursquare API for identifying existing gym and mapping them over the Central Copenhagen zip codes.
Data from Denmark Statistics Institute will be used to identify higher density areas and popular venues.
People are more likely to choose a gym close to a Shopping Center then in a remote area so this should be factored into account.

By knowing existing Gym and their area of coverage, by applying unsupervised learning techniques it will be determined the optimum location for opening a new fitness center.

Using Foursquare let's fetch 

In [None]:
### A. Let's import the Fitness Gyms that are registered for Copenhagen from Foursquare.

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
# import k-means from clustering stage
from sklearn.cluster import KMeans

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
from pandas.io.json import json_normalize
from folium.plugins import HeatMap


print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1

In [155]:
## Adding Foursquare Credentials:


CLIENT_ID = 'FZF2SWDMAVWXMFYAXUYYDKCLZYHQX0WESCXZFEU5ZNQ2DLLL' # your Foursquare ID
CLIENT_SECRET = 'WSRM2GTQKNXLREEGDXDR4ZWE4LNVILAKOPPVRPVY5H2OL3AO' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 50

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

#Let's define the user agent:

address = 'Copenhagen'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

# Let's define the search based on location

search_query = 'Fitness'

print(search_query + ' .... OK!')


radius = 10000 # define radius
LIMIT = 100 # limit of number of venues returned by Foursquare API

# Let's print the url to fetch the venues
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
print(url)
# Let's make a request and print the results:

results = requests.get(url).json()

results

Your credentails:
CLIENT_ID: FZF2SWDMAVWXMFYAXUYYDKCLZYHQX0WESCXZFEU5ZNQ2DLLL
CLIENT_SECRET:WSRM2GTQKNXLREEGDXDR4ZWE4LNVILAKOPPVRPVY5H2OL3AO
55.6867243 12.5700724
Fitness .... OK!
https://api.foursquare.com/v2/venues/search?client_id=FZF2SWDMAVWXMFYAXUYYDKCLZYHQX0WESCXZFEU5ZNQ2DLLL&client_secret=WSRM2GTQKNXLREEGDXDR4ZWE4LNVILAKOPPVRPVY5H2OL3AO&ll=55.6867243,12.5700724&v=20180604&query=Fitness&radius=10000&limit=100


{'meta': {'code': 200, 'requestId': '5eb097909388d7001b9a85fb'},
 'response': {'venues': [{'id': '4b9bc30bf964a520402236e3',
    'name': 'Fitness World',
    'location': {'address': 'Købmagergade 48',
     'lat': 55.681079503234926,
     'lng': 12.57663229853562,
     'labeledLatLngs': [{'label': 'display',
       'lat': 55.681079503234926,
       'lng': 12.57663229853562}],
     'distance': 751,
     'postalCode': '1150',
     'cc': 'DK',
     'city': 'København K',
     'state': 'Region Hovedstaden',
     'country': 'Danmark',
     'formattedAddress': ['Købmagergade 48', '1150 København K', 'Danmark']},
    'categories': [{'id': '4bf58dd8d48988d175941735',
      'name': 'Gym / Fitness Center',
      'pluralName': 'Gyms or Fitness Centers',
      'shortName': 'Gym / Fitness',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/gym_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1588631581',
    'hasPerk': False},
   {'id': '4d2ac6eb8292

## 3. Analyze Data

Analyze Data

#### Clear the Data and Transform it in a Dataframe:

In [156]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

dataframe.shape

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

dataframe_filtered
    
# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]


  """


In [157]:
#Let's see the list of gyms:

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,Fitness World,Gym / Fitness Center,Købmagergade 48,55.68108,12.576632,"[{'label': 'display', 'lat': 55.68107950323492...",751,1150.0,DK,København K,Region Hovedstaden,Danmark,"[Købmagergade 48, 1150 København K, Danmark]",,,4b9bc30bf964a520402236e3
1,Fitness World,Gym,Worsaaesvej 17,55.683154,12.552322,"[{'label': 'display', 'lat': 55.68315419305056...",1182,1972.0,DK,Frederiksberg,Region Hovedstaden,Danmark,"[Worsaaesvej 17, 1972 Frederiksberg, Danmark]",,,4d2ac6eb8292236adf772ebb
2,Fitness World,Gym / Fitness Center,Gasværksvej 16,55.67068,12.556421,"[{'label': 'display', 'lat': 55.6706799, 'lng'...",1980,1656.0,DK,København V,Region Hovedstaden,Danmark,"[Gasværksvej 16, 1656 København V, Danmark]",,,4b3c9071f964a520298525e3
3,Fitness World,Gym / Fitness Center,Esromgade 15,55.697399,12.540378,"[{'label': 'display', 'lat': 55.69739900482128...",2209,2200.0,DK,København N,Nørrebro,Danmark,"[Esromgade 15, 2200 København N, Danmark]",,,5549c800498ef76f3c5f958d
4,Fitness World,Gym / Fitness Center,Tagensvej 129,55.705169,12.544592,"[{'label': 'display', 'lat': 55.70516915274733...",2602,2200.0,DK,København N,,Danmark,"[Tagensvej 129, 2200 København N, Danmark]",,,5a5cc156ad17890e6bbd35ef
5,Fitness World,Gym / Fitness Center,Jagtvej 113-115,55.696596,12.551058,"[{'label': 'display', 'lat': 55.6965957, 'lng'...",1621,2200.0,DK,København N,,Danmark,"[Jagtvej 113-115, 2200 København N, Danmark]",,,5020058ae4b061ae1add2500
6,Fitness World,Gym / Fitness Center,Lyongade 23-25,55.661443,12.610136,"[{'label': 'display', 'lat': 55.66144308636816...",3774,2300.0,DK,København S,Region Hovedstaden,Danmark,"[Lyongade 23-25, 2300 København S, Danmark]",,,4b44e3bbf964a52066ff25e3
7,Fitness World,Gym / Fitness Center,Århusgade 102,55.706723,12.587448,"[{'label': 'display', 'lat': 55.7067226, 'lng'...",2478,2100.0,DK,København Ø,Region Hovedstaden,Danmark,"[Århusgade 102, 2100 København Ø, Danmark]",,,4b14c89cf964a52082a623e3
8,Fitness World,Gym / Fitness Center,Lyngbyvej 62,55.71156,12.56036,"[{'label': 'display', 'lat': 55.71156014926032...",2831,2100.0,DK,København Ø,Region Hovedstaden,Danmark,"[Lyngbyvej 62, 2100 København Ø, Danmark]",,,4b292f2af964a520949a24e3
9,Fitness World Lygten,Gym,Rentemestervej 2,55.705184,12.537586,"[{'label': 'display', 'lat': 55.70518371468917...",2894,2400.0,DK,København NV,,Danmark,"[Rentemestervej 2 (Lygten 41), 2400 København ...",Lygten 41,,4b7fbfbbf964a520053c30e3


## 4. Visualize Data


In [None]:
# Plotting the Gym addresses on the Map

In [166]:
# Let's see the gyms on a map:

print('Folium installed')

dataframe_filtered['lat'] = dataframe_filtered['lat'].astype(float)
dataframe_filtered['lng'] = dataframe_filtered['lng'].astype(float)


venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) 


#folium.features.CircleMarker(
#    [latitude, longitude],
#    radius=30,
#    color='red',
#    popup='Copenhagen',
#    fill = True,
#    fill_color = 'red',
#    fill_opacity = 0.6
#).add_to(venues_map)


for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
       popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.1
    ).add_to(venues_map)

heat_data = [[row['lat'],row['lng']] for index, row in dataframe_filtered.iterrows()]

venues_map.add_child(HeatMap(heat_data, radius = 75, gradient={.01: 'blue', .3: 'lime', 1: 'red'}))

    
# display map
venues_map

Folium installed


## 5. Conclusions

Conclusions

Based on the above, the locations and the areas of influence can be adjusted 
considering the population density and a better picture of training facilities 
per person can be obtained.



A gym in the red area is not a guarantee that it will not perform, 
but probably the marketing efforts will have to be higher to make 
the gym known and to attract members from other gyms.



The competition is already high if there are 3-4 gyms clustered 
in an area so free offers, no signup fee, personalised training 
advice and programs could be considered as a marketing strategy.
