# Yoga Studio Corporate Raider

## Introduction / Business Problem

I have been approached by a self-described "Corporate Raider" who dreams of owning a chain of yoga studios.  Rather than build them from the ground up, he'd like to find a group of at least 5 yoga studios that he can purchase and put together into a chain.  The customer lives in Southern California and wants the yoga chain to be based in the Los Angeles area.  

The customer is willing to build his yoga empire inland if that is all he can find, but would prefer to have it closer to the coast.  The 5 yoga studios need to be relatively close to each other, preferably in the same neighborhood.  This customer is not concerned about price because he plans to make the current owners an offer they can't refuse.

## Data

The data that I will use to solve this problem will primarily come from the Foursquare API.  From Foursquare, I will need to filter results for venues in categories containing the word "Yoga" and pull the following information for each venue:

<ul>
<li>Name</li>
<li>Rating</li>
<li>Location (latitude / longitude)</li>
<li>Address</li>
</ul>

I will also pull in a zip code list of the LA area from this website:

http://www.laalmanac.com/communications/cm02_communities.php

This data will help determine which neighborhood each yoga studio is in.

## Methodology

In this section, I will explain and execute the steps to perform the analysis.  A key assumption used in the analysis is that yoga studios with lower ratings may be run by owners that may be willing to sell their business to my client.  This analysis will follow these high-level steps:
<ul>
    <li>Find coordinates for Los Angeles, CA</li>
    <li>Use Foursquare to pull information about every yoga studio in a 20km radius of the coordinates in step 1</li>
    <li>Perform segmentation to classify each yoga studio into High, Medium, Low ratings</li>
    <li>Plot the yoga studios on a map, color-coding their rating category</li>
    <li>Review the map to determine a set of yoga studios that meet the business requirements</li>
</ul>

### Begin by Importing Libraries

In [1]:
import pandas as pd
import json
from pandas.io.json import json_normalize
import requests
!pip install geopy
from geopy.geocoders import Nominatim
!pip install folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 8.9MB/s  eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1


### Find Coordinates for Los Angeles, CA

In [2]:
address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="yoga_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

34.0536909 -118.2427666


### Download Foursquare Data

In [3]:
CLIENT_ID = 'FEPR0QJEJKEQ5YCJC5V20CKVOTTJIDJZZTPM1OU5SQZP2I4X' # your Foursquare ID
CLIENT_SECRET = 'JYHQDN2N0OLF2GFKTEPJW2VVFA5OIVTD4DRKUB2TWERBTJIG' # your Foursquare Secret
VERSION = '20180604'

search_query = 'Yoga'
radius = 20000

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e9c5773b4b684001cbe50df'},
 'response': {'venues': [{'id': '4d9b5a549298b1f7d0e85538',
    'name': 'Yoga Circle Downtown',
    'location': {'address': '400 S Main St',
     'crossStreet': 'W 4th St.',
     'lat': 34.047788059871195,
     'lng': -118.24724435806274,
     'labeledLatLngs': [{'label': 'display',
       'lat': 34.047788059871195,
       'lng': -118.24724435806274},
      {'label': 'entrance', 'lat': 34.048003, 'lng': -118.247196}],
     'distance': 776,
     'postalCode': '90013',
     'cc': 'US',
     'city': 'Los Angeles',
     'state': 'CA',
     'country': 'United States',
     'formattedAddress': ['400 S Main St (W 4th St.)',
      'Los Angeles, CA 90013',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d102941735',
      'name': 'Yoga Studio',
      'pluralName': 'Yoga Studios',
      'shortName': 'Yoga Studio',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/gym_yogastudio_',
       'suff

In [4]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,hasPerk,id,location.address,...,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d102941735', 'name': 'Y...",,,,,,,False,4d9b5a549298b1f7d0e85538,400 S Main St,...,776,"[400 S Main St (W 4th St.), Los Angeles, CA 90...","[{'label': 'display', 'lat': 34.04778805987119...",34.047788,-118.247244,90013,CA,Yoga Circle Downtown,v-1587304540,65047312.0
1,"[{'id': '4bf58dd8d48988d102941735', 'name': 'Y...",,,,,,,False,4b48a660f964a5209a5126e3,700 W 1st St,...,738,"[700 W 1st St (at Hope St), Los Angeles, CA 90...","[{'label': 'display', 'lat': 34.05597374706711...",34.055974,-118.250287,90012,CA,Bikram Yoga Downtown LA,v-1587304540,83943050.0
2,"[{'id': '4bf58dd8d48988d102941735', 'name': 'Y...",,,,,,,False,5c2047601953f3002b8ed3ba,"14622 Ventura Blvd., Ste. 2038",...,210,"[14622 Ventura Blvd., Ste. 2038, Los Angeles, ...","[{'label': 'display', 'lat': 34.05297942802767...",34.052979,-118.244884,91403,CA,Calm With Yoga,v-1587304540,
3,"[{'id': '4bf58dd8d48988d102941735', 'name': 'Y...",,,,,,,False,5ae837af6bdee6002cb3c9eb,360 E 2nd St Ste 150,...,707,"[360 E 2nd St Ste 150 (Central), Los Angeles, ...","[{'label': 'display', 'lat': 34.04778917, 'lng...",34.047789,-118.239925,90012,CA,Sweat Yoga,v-1587304540,
4,"[{'id': '4bf58dd8d48988d102941735', 'name': 'Y...",,,,,,,False,4d91f35a80d337043107a806,,...,830,"[Los Angeles, CA 90012, United States]","[{'label': 'display', 'lat': 34.05835069715267...",34.058351,-118.249799,90012,CA,DWP Yoga Classes,v-1587304540,


In [5]:
dataframe.shape

(30, 24)

In [6]:
filtered_columns = ['name','location.postalCode','location.distance','location.lat','location.lng','id']
dataframe_2 =dataframe.loc[:, filtered_columns]

dataframe_2

Unnamed: 0,name,location.postalCode,location.distance,location.lat,location.lng,id
0,Yoga Circle Downtown,90013.0,776,34.047788,-118.247244,4d9b5a549298b1f7d0e85538
1,Bikram Yoga Downtown LA,90012.0,738,34.055974,-118.250287,4b48a660f964a5209a5126e3
2,Calm With Yoga,91403.0,210,34.052979,-118.244884,5c2047601953f3002b8ed3ba
3,Sweat Yoga,90012.0,707,34.047789,-118.239925,5ae837af6bdee6002cb3c9eb
4,DWP Yoga Classes,90012.0,830,34.058351,-118.249799,4d91f35a80d337043107a806
5,Modo Yoga LA,90036.0,9462,34.067658,-118.343974,4e66f7c9922e45a1f7e88451
6,Jivamukti Yoga,90013.0,1356,34.042692,-118.23644,5e3d8b70d30bfd0008fd73b6
7,Yoga West,90035.0,13056,34.051378,-118.384304,4b71a7ddf964a52008542de3
8,Evoke Yoga,,1390,34.043827,-118.25202,52488fb711d2b17bfa85e57b
9,CorePower Yoga,90017.0,1800,34.045945,-118.259908,5b7e71ffe57ca60039449f4a


### Map Studios

This first map is just to get a look at where the studios are located in the LA area

In [7]:
import folium

# create map of LA using latitude and longitude values
map_la = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(dataframe_2['location.lat'], dataframe_2['location.lng'], dataframe_2['name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la)  
    
map_la

### Pull Ratings Data from Foursquare and add to dataframe

In [12]:
for v_id in dataframe_2['id']:
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(v_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    venue_result = requests.get(url).json()
    try:
        rating = venue_result['response']['venue']['rating']
    except:
        rating = 0
    
    dataframe_2['rating'] = rating

dataframe_2

Unnamed: 0,name,location.postalCode,location.distance,location.lat,location.lng,id,rating
0,Yoga Circle Downtown,90013.0,776,34.047788,-118.247244,4d9b5a549298b1f7d0e85538,0
1,Bikram Yoga Downtown LA,90012.0,738,34.055974,-118.250287,4b48a660f964a5209a5126e3,0
2,Calm With Yoga,91403.0,210,34.052979,-118.244884,5c2047601953f3002b8ed3ba,0
3,Sweat Yoga,90012.0,707,34.047789,-118.239925,5ae837af6bdee6002cb3c9eb,0
4,DWP Yoga Classes,90012.0,830,34.058351,-118.249799,4d91f35a80d337043107a806,0
5,Modo Yoga LA,90036.0,9462,34.067658,-118.343974,4e66f7c9922e45a1f7e88451,0
6,Jivamukti Yoga,90013.0,1356,34.042692,-118.23644,5e3d8b70d30bfd0008fd73b6,0
7,Yoga West,90035.0,13056,34.051378,-118.384304,4b71a7ddf964a52008542de3,0
8,Evoke Yoga,,1390,34.043827,-118.25202,52488fb711d2b17bfa85e57b,0
9,CorePower Yoga,90017.0,1800,34.045945,-118.259908,5b7e71ffe57ca60039449f4a,0


### Spot-check a few individual ratings

In [11]:
venue_id = '5c2047601953f3002b8ed3ba'
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
venue_result = requests.get(url).json()
    
#venue_result['response']['venue']['rating']
venue_result

{'meta': {'code': 429,
  'errorType': 'quota_exceeded',
  'errorDetail': 'Quota exceeded',
  'requestId': '5e9c576b40a7ea001b1592bc'},
 'response': {}}

It turns out that obtaining ratings is a "premium" call on Foursquare, so I cannot obtain ratings.  We will have to complete the job without rating data.

### Categorize Yoga Studios by Rating

If we could obtain ratings data, we would color-code each yoga studio using the following logic:
<ul>
    <li>Green - low ratings = prime target for takeover</li>
    <li>Yellow - medium ratings = possible target for takeover</li>
    <li>Red - high ratings = probably not a takeover target</li>
</ul>

In [13]:
for r in dataframe_2['rating']:
    if r >= 7.0:
        color = 'red'
    elif r < 5.0:
        color = 'green'
    else:
        color = 'orange'
    
    dataframe_2['color'] = color
    
dataframe_2.head()

Unnamed: 0,name,location.postalCode,location.distance,location.lat,location.lng,id,rating,color
0,Yoga Circle Downtown,90013,776,34.047788,-118.247244,4d9b5a549298b1f7d0e85538,0,green
1,Bikram Yoga Downtown LA,90012,738,34.055974,-118.250287,4b48a660f964a5209a5126e3,0,green
2,Calm With Yoga,91403,210,34.052979,-118.244884,5c2047601953f3002b8ed3ba,0,green
3,Sweat Yoga,90012,707,34.047789,-118.239925,5ae837af6bdee6002cb3c9eb,0,green
4,DWP Yoga Classes,90012,830,34.058351,-118.249799,4d91f35a80d337043107a806,0,green


In our case, without ratings data, everything shows as a good target (since we default to 0 if a venue had no ratings)

### Map Again to look for Clusters of Green Circles

In [14]:
import folium

# create map of LA using latitude and longitude values
map_la = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label,color in zip(dataframe_2['location.lat'], dataframe_2['location.lng'], dataframe_2['name'], dataframe_2['color']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la)  
    
map_la

### Results

The results from this method were inconclusive, due to data limitations.  While it is not possible to target yoga studios with low ratings, visualizing them on the map can give us a good idea of which yoga studios to target based solely on geography.

### Discussion

This project failed due to a lack of available data.  This was a failure in the planning stages, as I should have known that ratings data required premium Foursquare calls and those are limited.  It may have been possible to use other data sources to try to complete the analysis, including other rating systems, real estate pricing data, or other.

### Conclusion

The customer's requirements were to find 5 yoga studios in the Los Angeles area that are relatively close to each other, preferably closer to the coast, and prime targets for being purchased.  I attempted to use ratings data from Foursquare to find yoga studios matching those requirements but with low ratings, indicating they may be willing to sell.

Unfortunately, rating inforation requires premium Foursquare calls, and I am limited to 2/day with a free account, so I was not able to use ratings data.  This leaves us with only the customer requirements.  I can use the map to see that there is a grouping of studios in West Hollywood and Bevery Hills that are closer to the coast than the remaining studios and recommend that my client start there.  If that doesn't work out, there seem to be a lot of yoga studios in the downtown LA area and he can probably build his empire there.