### Fitness4All Business Location Prospecting Project

##### Introduction/Business Problem 

Fitness4All operates gyms in the state of New Jersey. The company is looking to expand into Pennsylvania and would like to identify a shortlist of locations (townships) with the best prospects for profitability.  

##### Data 

Fitness4All believes that the most profitable locations for expansion are townships currently under-served by commercial gym facilities with a population of between 20,000 and 50,000. This project will use readily available population data from Wikipedia to identify a baseline list of possible locations (https://en.wikipedia.org/wiki/List_of_populated_places_in_Pennsylvania).

It will then use Foursquare location data to explore existing locations for gym facilities in these townships, to discover the competitive landscape. We will start by retrieving information on venues for all townships on the baseline list so that we can calculate the density of gyms relative to population size. It seems reasonable to also determine an appropriate search radius for each town, based on square mileage. If we are evaluating the ratio of gym venues vs. population, large towns having higher populations will also cover a larger area geographically and so the search for gym venues should reflect this. Square miles information for PA towns can be found at https://www.indexmundi.com/facts/united-states/quick-facts/pennsylvania/land-area/cities#table. This portion of the analysis should enable us to winnow down the initial list of about 20 towns to a smaller focus list of potential locations. 

For the focus list, we can then evaluate potential cross-street sites that are i. somewhat remote from competitive facilities; ii. close to complementary amenities e.g. shops, restaurants that support the integration of gym sessions with other frequent daily activities. Below is a plot of gym facilities in Altoona PA to illustrate the Foursquare data we will be using. Altoona may or may not be a good candidate for the focus list of prospective locations - the full analysis will determine this.

Update note: In the course of completing the analysis I realized that the quest to determine township square miles and derive an appropriate search radius was not really necessary because i. Foursquare automatically calibrates this if a fixed radius value is not specified, and ii. the scope of venue searches needed to be artifically constrained to limit the volume of data called. Code combining township information with square mileage and to derive the radius was left in, but commented out in some places.

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
print('Folium installed')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Folium installed


In [2]:
# Placeholder for Foursquare credentials


In [3]:
address = 'Altoona,PA'
geolocator = Nominatim(user_agent="banana")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.518681 -78.394736


In [4]:
search_query = 'Gym'
radius = 10000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

In [5]:
# code to create a cleaned up df of gym venues
results = requests.get(url).json()
venues = results['response']['venues']
dataframe = json_normalize(venues)
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,St Marys Gym,Basketball Court,,US,Altoona,United States,,1086,"[Altoona, PA, United States]","[{'label': 'display', 'lat': 40.50918114323374...",40.509181,-78.397668,,PA,524850bb498e9a683978460e
1,AAHS Fieldhouse,Sports Club,1415 6th Ave,US,Altoona,United States,btwn 13th & 14th St.,900,"[1415 6th Ave (btwn 13th & 14th St.), Altoona,...","[{'label': 'display', 'lat': 40.51077335430656...",40.510773,-78.396994,16602.0,PA,4b8d9d28f964a5204a0433e3
2,Gym At Homewood Suites,Gym,Poydras,US,Hollidaysburg,United States,Barrone,9626,"[Poydras (Barrone), Hollidaysburg, PA 16648, U...","[{'label': 'display', 'lat': 40.435085, 'lng':...",40.435085,-78.365639,16648.0,PA,4f20b8cae4b0467cd70d9403
3,Not-So-Average Joe's Gym,Gym,Woodview Drive,US,Hollidaysburg,United States,,7899,"[Woodview Drive, Hollidaysburg, PA 16648, Unit...","[{'label': 'display', 'lat': 40.484025, 'lng':...",40.484025,-78.313299,16648.0,PA,5017d10ee4b015cce931e8ae
4,Gymboree,Kids Store,Logan Valley Mall,US,Altoona,United States,2nd Floor,5682,"[Logan Valley Mall (2nd Floor), Altoona, PA 16...","[{'label': 'display', 'lat': 40.46924120905203...",40.469241,-78.411452,16602.0,PA,4c2b834ef7acef3bde4fed0c
5,Gemini Gymnastics,Gymnastics Gym,1885 E Pleasant Valley Blvd,US,Altoona,United States,,5636,"[1885 E Pleasant Valley Blvd, Altoona, PA 1660...","[{'label': 'display', 'lat': 40.55230352254357...",40.552304,-78.344922,16602.0,PA,4bd6014c6798ef3b6265648d
6,Uzelac Gymnastics,Gymnastics Gym,3519 Rte. 764,US,Duncansville,United States,,6617,"[3519 Rte. 764, Duncansville, PA 16635, United...","[{'label': 'display', 'lat': 40.46469481538818...",40.464695,-78.42747,16635.0,PA,4e49953cd164a7c8b69c2de0


In [6]:
# the query finds some non-gym venues with the str gym in the name or address, 
# so we'll maintain a list of valid 'categories' to include in the results
myList=['Gym','Gymnastics Gym','Sports Club','College Gym','Gym / Fitness Center']

gym_list=pd.DataFrame({'categories':myList}).merge(dataframe_filtered)
gym_list

Unnamed: 0,categories,name,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Gym,Gym At Homewood Suites,Poydras,US,Hollidaysburg,United States,Barrone,9626,"[Poydras (Barrone), Hollidaysburg, PA 16648, U...","[{'label': 'display', 'lat': 40.435085, 'lng':...",40.435085,-78.365639,16648,PA,4f20b8cae4b0467cd70d9403
1,Gym,Not-So-Average Joe's Gym,Woodview Drive,US,Hollidaysburg,United States,,7899,"[Woodview Drive, Hollidaysburg, PA 16648, Unit...","[{'label': 'display', 'lat': 40.484025, 'lng':...",40.484025,-78.313299,16648,PA,5017d10ee4b015cce931e8ae
2,Gymnastics Gym,Gemini Gymnastics,1885 E Pleasant Valley Blvd,US,Altoona,United States,,5636,"[1885 E Pleasant Valley Blvd, Altoona, PA 1660...","[{'label': 'display', 'lat': 40.55230352254357...",40.552304,-78.344922,16602,PA,4bd6014c6798ef3b6265648d
3,Gymnastics Gym,Uzelac Gymnastics,3519 Rte. 764,US,Duncansville,United States,,6617,"[3519 Rte. 764, Duncansville, PA 16635, United...","[{'label': 'display', 'lat': 40.46469481538818...",40.464695,-78.42747,16635,PA,4e49953cd164a7c8b69c2de0
4,Sports Club,AAHS Fieldhouse,1415 6th Ave,US,Altoona,United States,btwn 13th & 14th St.,900,"[1415 6th Ave (btwn 13th & 14th St.), Altoona,...","[{'label': 'display', 'lat': 40.51077335430656...",40.510773,-78.396994,16602,PA,4b8d9d28f964a5204a0433e3


In [7]:
# display gym venues on a map of Altoona
gym_map = folium.Map(location=[latitude, longitude], zoom_start=12) # generate map centred around the co-ordinates for Altoona, PA

# add a red circle marker to represent the centre of Altoona
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Altoona',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(gym_map)

# add the bars as blue circle markers
for lat, lng, label in zip(gym_list.lat, gym_list.lng, gym_list.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(gym_map)

# display map
gym_map

## Let's start the full analysis

In [8]:
# Create a df for towns in PA that meet our filter criteria
df=pd.read_html('https://en.wikipedia.org/wiki/List_of_populated_places_in_Pennsylvania')[0]
population=df[['Place Name','Population (2010 census)']]
population.columns=['Town','Population']
towns=population.query('Population>20000 & Population<50000')
towns.reset_index(inplace=True,drop=True)
towns.head()

Unnamed: 0,Town,Population
0,Harrisburg,49528
1,Altoona,46320
2,York,43718
3,State College,42034
4,Wilkes-Barre,41498


In [9]:
towns.tail()

Unnamed: 0,Town,Population
17,Allison Park,21552
18,Johnstown,20978
19,West Mifflin,20313
20,Chambersburg,20268
21,Murrysville,20079


In [10]:
# Retrieve square mileage information
square_miles=pd.read_html('https://www.indexmundi.com/facts/united-states/quick-facts/pennsylvania/land-area/cities#table')[1]
square_miles.columns=['Town','Sq_Miles']
#square_miles['Radius']=square_miles.apply(lambda row: ((row.Sq_Miles)/3.14)**(0.5),axis=1)
square_miles.head()

Unnamed: 0,Town,Sq_Miles
0,Aliquippa,4.19
1,Allentown,17.55
2,Allison Park,13.84
3,Altoona,9.91
4,Ambler,0.85


In [11]:
# Add square miles information, drop rows where this is NA ('Wilkes-Barre')
towns = pd.merge(towns,square_miles,on='Town',how='left')
towns.dropna(subset=['Sq_Miles'],axis=0,inplace=True)
towns.reset_index(inplace=True,drop=True)
towns

Unnamed: 0,Town,Population,Sq_Miles
0,Harrisburg,49528,8.13
1,Altoona,46320,9.91
2,York,43718,5.29
3,State College,42034,4.56
4,Norristown,34324,3.52
5,Chester,33972,4.84
6,Bethel Park,32313,11.67
7,Williamsport,29381,8.73
8,Monroeville,28386,19.74
9,Drexel Hill,28043,3.19


In [12]:
# create column with extended address for co-ordinates search 
state="PA"
towns['State']=state
towns["Full_address"]=towns['Town'].str.cat(towns['State'],sep = ", ")
towns.head()

Unnamed: 0,Town,Population,Sq_Miles,State,Full_address
0,Harrisburg,49528,8.13,PA,"Harrisburg, PA"
1,Altoona,46320,9.91,PA,"Altoona, PA"
2,York,43718,5.29,PA,"York, PA"
3,State College,42034,4.56,PA,"State College, PA"
4,Norristown,34324,3.52,PA,"Norristown, PA"


In [13]:
towns.shape

(21, 5)

In [14]:
    # find co-ordinates for towns
    coordinates=pd.DataFrame(columns=['Full_address','Latitude','Longitude'])

    address=towns['Full_address']
    for i in address:
        geolocator = Nominatim(user_agent="banana")
        location = geolocator.geocode(i)
        latitude = location.latitude
        longitude = location.longitude 
        coordinates = coordinates.append({'Full_address': i,
                                       'Latitude': latitude,
                                        'Longitude':longitude},ignore_index=True)
    coordinates

Unnamed: 0,Full_address,Latitude,Longitude
0,"Harrisburg, PA",40.266311,-76.886112
1,"Altoona, PA",40.518681,-78.394736
2,"York, PA",39.90675,-76.700895
3,"State College, PA",40.79445,-77.861639
4,"Norristown, PA",40.121497,-75.339905
5,"Chester, PA",39.982931,-75.765242
6,"Bethel Park, PA",40.32757,-80.039498
7,"Williamsport, PA",41.249329,-77.002767
8,"Monroeville, PA",40.42118,-79.788102
9,"Drexel Hill, PA",39.947057,-75.29213


In [15]:
# merge co-ordinates information into main towns df
towns = pd.merge(towns,coordinates,on='Full_address',how='left')
towns.head()

Unnamed: 0,Town,Population,Sq_Miles,State,Full_address,Latitude,Longitude
0,Harrisburg,49528,8.13,PA,"Harrisburg, PA",40.266311,-76.886112
1,Altoona,46320,9.91,PA,"Altoona, PA",40.518681,-78.394736
2,York,43718,5.29,PA,"York, PA",39.90675,-76.700895
3,State College,42034,4.56,PA,"State College, PA",40.79445,-77.861639
4,Norristown,34324,3.52,PA,"Norristown, PA",40.121497,-75.339905


In [16]:
# check rows as expected
towns.shape

(21, 7)

In [17]:
#display possible locations for a new gym on a map of PA using the 'towns' df
possible_locations = folium.Map(location=[latitude, longitude], zoom_start=7) # generate map centred around the co-ordinates for Altoona, PA

# add the towns as blue circle markers
for lat, lng, label in zip(towns.Latitude, towns.Longitude, towns.Town):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(possible_locations)

# display map
possible_locations

#### Reviewing the above map (and wanting to reduce the volume of data for analysis to something more manageable), I decided to stay out of major cities e.g. Pittsburgh suburbs, where rents are likely to be higher...

In [18]:
#use .isin to drop rows for towns in designated list
exclude_list=['Allison Park','Bethel Park','West Mifflin','Monroeville','Plum','Murrysville','Drexel Hill','Harrisburg']
towns=towns[~towns['Town'].isin(exclude_list)]
towns.reset_index(inplace=True,drop=True)
towns

Unnamed: 0,Town,Population,Sq_Miles,State,Full_address,Latitude,Longitude
0,Altoona,46320,9.91,PA,"Altoona, PA",40.518681,-78.394736
1,York,43718,5.29,PA,"York, PA",39.90675,-76.700895
2,State College,42034,4.56,PA,"State College, PA",40.79445,-77.861639
3,Norristown,34324,3.52,PA,"Norristown, PA",40.121497,-75.339905
4,Chester,33972,4.84,PA,"Chester, PA",39.982931,-75.765242
5,Williamsport,29381,8.73,PA,"Williamsport, PA",41.249329,-77.002767
6,Easton,26800,4.07,PA,"Easton, PA",40.691608,-75.209987
7,Lebanon,25477,4.17,PA,"Lebanon, PA",40.375713,-76.462612
8,Hazleton,25340,6.01,PA,"Hazleton, PA",40.964687,-75.985279
9,New Castle,23273,8.31,PA,"New Castle, PA",40.99992,-80.347186


In [19]:
#redraw the revised map of prospective locations
possible_locations = folium.Map(location=[latitude, longitude], zoom_start=7)

# add the towns as blue circle markers
for lat, lng, label in zip(towns.Latitude, towns.Longitude, towns.Town):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(possible_locations)

# display map
possible_locations


In [20]:
#get Foursquare data on existing gym venue locations in focus list of prospective towns

def getNearbyGymVenues(names, latitudes, longitudes,LIMIT=3,search_query='Gym'):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name,lat,lng)          
        
        # create the API request URL
        url='https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            VERSION, 
            lat, 
            lng,
            search_query,
            radius,
            LIMIT)
        #print(url)
            
        # make the GET request
        results = requests.get(url).json()['response']['venues']
        #print(results)
        
        #build list with all data needed
        venues_list.append([(
            name, 
            lat, 
            lng,
            v['id'],
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])
        #print(venues_list)
 
    PA_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    PA_venues.columns = ['Town', 
                  'TownLatitude', 
                  'TownLongitude', 
                  'VenueID',
                  'Venue', 
                  'VenueLat', 
                  'VenueLong', 
                  'VenueCat']
    return(PA_venues)

In [21]:
PA_possible_gyms = getNearbyGymVenues(names=towns['Full_address'],latitudes=towns['Latitude'],longitudes=towns['Longitude'])

#Note: the above code - incorporating the function above it - initially ran through the loop for some of the towns
#in the focus list, then produced an 'index out of range' error message. I found that by limiting the size of each call
#to Foursquare, the code ran successfully to completion. In reading the FourSquare documentation, there
#is a limit of 50 venues that can be called at one time in a search query, so setting a low LIMIT for each url for each 
#town was necessary to ensure data for each of the towns in the loop. I am still surprised at the error message though,
#as I would have thought the code should just have run to completion but just not found data for towns lower down
#the list. Any comments on this appreciated

In [22]:
PA_possible_gyms.head()

Unnamed: 0,Town,TownLatitude,TownLongitude,VenueID,Venue,VenueLat,VenueLong,VenueCat
0,"Altoona, PA",40.518681,-78.394736,524850bb498e9a683978460e,St Marys Gym,40.509181,-78.397668,Basketball Court
1,"Altoona, PA",40.518681,-78.394736,4b8d9d28f964a5204a0433e3,AAHS Fieldhouse,40.510773,-78.396994,Sports Club
2,"Altoona, PA",40.518681,-78.394736,4f20b8cae4b0467cd70d9403,Gym At Homewood Suites,40.435085,-78.365639,Gym
3,"York, PA",39.90675,-76.700895,4bbfc7072a89ef3be694ef88,Gold's Gym,39.936469,-76.681822,Gym
4,"York, PA",39.90675,-76.700895,5b2aeeccefa82a002c3e009b,Gold’s Gym,39.939679,-76.691803,Gym / Fitness Center


In [23]:
#eliminate non-gym venues with the str gym in the name or address
myList=['Gym','Gymnastics Gym','Sports Club','College Gym','Gym / Fitness Center']

PA_gyms=pd.DataFrame({'VenueCat':myList}).merge(PA_possible_gyms)
PA_gyms.head()

Unnamed: 0,VenueCat,Town,TownLatitude,TownLongitude,VenueID,Venue,VenueLat,VenueLong
0,Gym,"Altoona, PA",40.518681,-78.394736,4f20b8cae4b0467cd70d9403,Gym At Homewood Suites,40.435085,-78.365639
1,Gym,"York, PA",39.90675,-76.700895,4bbfc7072a89ef3be694ef88,Gold's Gym,39.936469,-76.681822
2,Gym,"York, PA",39.90675,-76.700895,4f5f3d14e4b040463d8565c6,Gym,39.984843,-76.645341
3,Gym,"Norristown, PA",40.121497,-75.339905,4e43140762e1a67a6e595c28,The Gym,40.114706,-75.335801
4,Gym,"Norristown, PA",40.121497,-75.339905,4e2d67f262e144b5d3c01fce,24hour Fitness Gym,40.114715,-75.335862


In [24]:
#plot focus towns and associated gym locations
address = 'Pennsylvania'
geolocator = Nominatim(user_agent="banana")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.9699889 -77.7278831


In [25]:
#display gym venues on a map of PA
gym_map = folium.Map(location=[latitude, longitude], zoom_start=7) # generate map centred around the co-ordinates for Altoona, PA

# add a red circle marker to represent the centre of each town
for lat, lng, label in zip(PA_gyms.TownLatitude, PA_gyms.TownLongitude, PA_gyms.Town):
    folium.features.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        popup=label,
        fill = True,
        fill_color = 'red',
        fill_opacity = 0.6
    ).add_to(gym_map)
    
# add the bars as blue circle markers
for lat, lng, label in zip(PA_gyms.VenueLat, PA_gyms.VenueLong, PA_gyms.Venue):
    folium.features.CircleMarker(
        [lat, lng],
        radius=2,
        color='blue',
        #popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(gym_map)

# display map
gym_map

#### Unfortunately none of my focus towns have NO gyms nearby (too much to hope for!). So let's look at the number of gyms in relation to Town population

In [26]:
#count gym venues retrieved by town
gym_summary=PA_gyms.groupby('Town').count()
gym_summary

Unnamed: 0_level_0,VenueCat,TownLatitude,TownLongitude,VenueID,Venue,VenueLat,VenueLong
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Altoona, PA",2,2,2,2,2,2,2
"Chester, PA",2,2,2,2,2,2,2
"Easton, PA",3,3,3,3,3,3,3
"Hazleton, PA",3,3,3,3,3,3,3
"Johnstown, PA",3,3,3,3,3,3,3
"Lebanon, PA",3,3,3,3,3,3,3
"New Castle, PA",1,1,1,1,1,1,1
"Norristown, PA",2,2,2,2,2,2,2
"Pottstown, PA",2,2,2,2,2,2,2
"State College, PA",1,1,1,1,1,1,1


In [27]:
town_pop=towns[['Full_address','Population']]
town_pop.head()

Unnamed: 0,Full_address,Population
0,"Altoona, PA",46320
1,"York, PA",43718
2,"State College, PA",42034
3,"Norristown, PA",34324
4,"Chester, PA",33972


In [28]:
#create df combining number of gyms with population for each focus town
gym_perhead=gym_summary[['VenueID']].merge(town_pop,left_on='Town',right_on='Full_address',how='left')
gym_perhead

Unnamed: 0,VenueID,Full_address,Population
0,2,"Altoona, PA",46320
1,2,"Chester, PA",33972
2,3,"Easton, PA",26800
3,3,"Hazleton, PA",25340
4,3,"Johnstown, PA",20978
5,3,"Lebanon, PA",25477
6,1,"New Castle, PA",23273
7,2,"Norristown, PA",34324
8,2,"Pottstown, PA",22377
9,1,"State College, PA",42034


In [29]:
#calculate population per existing gym
gym_perhead['Heads_per_gym']=gym_perhead.apply(lambda row: (row.Population/row.VenueID),axis=1).round()
gym_perhead_sorted=gym_perhead.sort_values(by='Heads_per_gym',ascending=False)
gym_perhead_sorted

Unnamed: 0,VenueID,Full_address,Population,Heads_per_gym
9,1,"State College, PA",42034,42034.0
6,1,"New Castle, PA",23273,23273.0
0,2,"Altoona, PA",46320,23160.0
7,2,"Norristown, PA",34324,17162.0
1,2,"Chester, PA",33972,16986.0
10,2,"Williamsport, PA",29381,14690.0
11,3,"York, PA",43718,14573.0
8,2,"Pottstown, PA",22377,11188.0
2,3,"Easton, PA",26800,8933.0
5,3,"Lebanon, PA",25477,8492.0


#### New Castle looks relatively under-served (State College may be less favorable due to assumed size of population affiliated with Penn State and it's subsidized athletic facilities). Let's look at New Castle and decide where to site our new business by looking at the competing gym and location of other amenities. In particular, I read that there's a strong correlation between people who belong to gyms and people who frequent coffee shops, so I'm going to look at venues for both categories

In [30]:
address = 'New Castle,PA'
geolocator = Nominatim(user_agent="banana")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.9999202 -80.3471856


In [31]:
search_query='gym,coffee'
LIMIT=50
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

In [32]:
#code to create a cleaned up df of New Castle gym venues
results = requests.get(url).json()
venues = results['response']['venues']
dataframe = json_normalize(venues)
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
NewCastle_filtered = dataframe.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

NewCastle_filtered['categories'] = NewCastle_filtered.apply(get_category_type, axis=1)
NewCastle_filtered.columns = [column.split('.')[-1] for column in NewCastle_filtered.columns]
NewCastle_filtered

Unnamed: 0,name,categories,address,cc,city,country,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Two Rivers Artisan Coffee,Coffee Shop,11 S Mill St,US,New Castle,United States,255,"[11 S Mill St, New Castle, PA 16101, United St...","[{'label': 'display', 'lat': 40.99935292977494...",40.999353,-80.344234,16101.0,PA,523c700f11d2ce2b487df78f
1,Gym,Casino,,US,,United States,809,"[Pennsylvania, United States]","[{'label': 'display', 'lat': 41.004532, 'lng':...",41.004532,-80.339738,,Pennsylvania,4f577668e4b084fcb1f2f4d6
2,Iron Works Gym,College Gym,,US,New Castle,United States,2049,"[New Castle, PA, United States]","[{'label': 'display', 'lat': 41.007294, 'lng':...",41.007294,-80.324825,,PA,514bab89e4b0cda38e870121
3,Yoki's Italian Cafe,Italian Restaurant,1402 E Washington St,US,New Castle,United States,2440,"[1402 E Washington St, New Castle, PA 16101, U...","[{'label': 'display', 'lat': 40.984926, 'lng':...",40.984926,-80.325989,16101.0,PA,4e977a979adf6a4ff09cf861
4,Steamers,Café,,US,New Castle,United States,3270,"[New Castle, PA 16105, United States]","[{'label': 'display', 'lat': 41.0284529671706,...",41.028453,-80.337904,16105.0,PA,4cab636814c33704f03eea3b
5,Off Limits Gymnastics And Cheer,Gymnastics Gym,,US,Neshannock,United States,8938,"[Neshannock, PA, United States]","[{'label': 'display', 'lat': 41.07908134529420...",41.079081,-80.365009,,PA,4e7206381f6ecfe82c111ea0
6,Mohawk Coffee House,Coffee Shop,E Poland Avenue,US,Bessemer,United States,12441,"[E Poland Avenue, Bessemer, PA 16112, United S...","[{'label': 'display', 'lat': 40.97497399999999...",40.974974,-80.491516,16112.0,PA,51f18ecb2fc6db494fa19917


In [33]:
#filter out non-gym venues with the str gym in the name or address, and some venues that aren't really coffee shops
myList=['Gym','Gymnastics Gym','Sports Club','College Gym','Gym / Fitness Center','Coffee Shop','Café']

NewCastle=pd.DataFrame({'categories':myList}).merge(NewCastle_filtered)
NewCastle

Unnamed: 0,categories,name,address,cc,city,country,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Gymnastics Gym,Off Limits Gymnastics And Cheer,,US,Neshannock,United States,8938,"[Neshannock, PA, United States]","[{'label': 'display', 'lat': 41.07908134529420...",41.079081,-80.365009,,PA,4e7206381f6ecfe82c111ea0
1,College Gym,Iron Works Gym,,US,New Castle,United States,2049,"[New Castle, PA, United States]","[{'label': 'display', 'lat': 41.007294, 'lng':...",41.007294,-80.324825,,PA,514bab89e4b0cda38e870121
2,Coffee Shop,Two Rivers Artisan Coffee,11 S Mill St,US,New Castle,United States,255,"[11 S Mill St, New Castle, PA 16101, United St...","[{'label': 'display', 'lat': 40.99935292977494...",40.999353,-80.344234,16101.0,PA,523c700f11d2ce2b487df78f
3,Coffee Shop,Mohawk Coffee House,E Poland Avenue,US,Bessemer,United States,12441,"[E Poland Avenue, Bessemer, PA 16112, United S...","[{'label': 'display', 'lat': 40.97497399999999...",40.974974,-80.491516,16112.0,PA,51f18ecb2fc6db494fa19917
4,Café,Steamers,,US,New Castle,United States,3270,"[New Castle, PA 16105, United States]","[{'label': 'display', 'lat': 41.0284529671706,...",41.028453,-80.337904,16105.0,PA,4cab636814c33704f03eea3b


In [34]:
# generate map centred around the co-ordinates for New Castle, PA
NC_map = folium.Map(location=[latitude, longitude], zoom_start=12) 

# add a red circle marker to represent the centre of New Castle
folium.features.CircleMarker(
    [latitude, longitude],
    radius=2,
    color='red',
    popup='New Castle',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(NC_map)

# add the bars as blue circle markers
for lat, lng, label in zip(NewCastle.lat, NewCastle.lng, NewCastle.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(NC_map)

# display map
NC_map

### Ok, so the only existing gym is a ways out of town (and a College Gym, not a commercial Gym). There's an artisanal coffee shop in the center of town. I'll look to open a gym location close by!