# Market Opportunities

In this project we will be using FourSquare to find market opportunities for a vegetarian/vegan restaurant in Madrid, Spain. But we will create a code that is general and can be used in any city in the world and can find other market opportunities as well.

## Collecting Data

In this first part we will be gathering the data to be used. FourSquare will provide us with all sorts of venues, but it has a limitation on how much data we can fetch per request daily for basic accounts. So we will overcome this problem by creating multiple requests around the given city, then we will put all the information in one big dataset. So let's start by importing the necessary libraries.

In [1]:
import pandas as pd
import requests
from pandas.io.json import json_normalize
import math
import folium 
import numpy as np
import branca.colormap as cm

### Creating a grid over the city

Let's create a rectangular grid over the initial given point, this grid will contain coordinates that will be used to fetch the data

This is the coordinates of central Madrid, but can be changed for any place.

In [2]:
lat_searched=40.41683
lon_searched=-3.7038

Now we can determine how much distance we want to cover between points (in kilometeres):

In [3]:
distance_btw_points=0.5

And finally number in each side of the square:

In [4]:
total_points=15

The size of each side of the square will be (in kms):

In [5]:
total_points*distance_btw_points

7.5

This function below will calculate this grid and put it into a dataframe. We will also be numbering each area searched.

In [6]:
r_lat=40000# radius of Earth for any given latitude span
r_lon=r_lat*math.cos(math.radians(lat_searched))  # radius of Earth for given latitude span
deg_lat=r_lat/360 # distance for each degree latitude
deg_lon=r_lon/360 # distance for each degree longitude
size=distance_btw_points*total_points # size of square in kms
coordinates=[] # a list to store the results from the iteration
area_num=0  # area number starting with 0
lat_init=lat_searched-(size/2)/deg_lat# because the given value is in the center 
lon_init=lon_searched-(size/2)/deg_lon# an initial value needs to be calculated for the iteration
for i in (np.linspace(lat_init,lat_init+(size/deg_lat),total_points)):
    for j in (np.linspace(lon_init,lon_init+(size/deg_lon),total_points)):
        coordinates.append([area_num,i,j])
        area_num=area_num+1
        
coordinates=pd.DataFrame(coordinates)  
coordinates.columns=['area_num','lat','lon']  
coordinates.head()

Unnamed: 0,area_num,lat,lon
0,0,40.38308,-3.748129
1,1,40.38308,-3.741797
2,2,40.38308,-3.735464
3,3,40.38308,-3.729131
4,4,40.38308,-3.722798


The dataframe contains the following number of areas to be searched:

In [7]:
coordinates.shape[0]

225

Let's take a look at this grid over the city using Folium maps:

In [8]:
_map=folium.Map(location=[lat_searched, lon_searched]) 
sw = coordinates[['lat', 'lon']].min().values.tolist()
ne = coordinates[['lat', 'lon']].max().values.tolist()
_map.fit_bounds([sw, ne])

for area,lat,lon in zip(coordinates['area_num'],coordinates['lat'],coordinates['lon']):
    
    folium.map.Marker(
        [lat, lon],
        icon=folium.DivIcon(
            icon_size=(50,50),
            icon_anchor=(8,8),
            html='<div style="color:black"><b>'
            +str(int(area))+'</b></div>',
            )
        ).add_to(_map),
_map

## Area data

Now we can proceeed an collect data for each around each area using the FourSquare API

In [9]:
ACCESS_TOKEN='NLUYF143PWP4CDMRREWYED2XX00SN4KWPA0UQ5LKAIZHVECD'
CLIENT_ID='0HIRPSJC2WXNYWGK1EDTLMTINENVEL2NEQBR5GQWTCM3WGD4'
CLIENT_SECRET='IMYBDXPYRBJ4ZVGWFVKSDL5QDSCRZJS4S2RBHTN2DKM1ALMB'
LIMIT=200
VERSION='20191220' # chosen date

The radius of search around each coordinate point will depend on how distant 
the waypoints are from each other. Radius will be equal to half distance between 
waypoints and because we need to cover the spaces between the circles created, 
we divide this value by 0.85 (sin 45 degrees).

In [10]:
radius =1000*(distance_btw_points/2)/0.85 # radius in meters
radius

294.11764705882354

In [11]:
dataframe=pd.DataFrame() # creating an empty dataframe to store all the results

In [12]:
for area, lat,lon in zip(coordinates['area_num'],coordinates['lat'],coordinates['lon']):
    print('searching area: ', area )    
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={}'',{}&v={}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, lat, lon, VERSION, radius, LIMIT)
    results = requests.get(url).json()
    try:
        items = results['response']['groups'][0]['items']# getting the relevant part 
        items_df = pd.json_normalize(items)
        items_df['area_num']=area # assigning the area number to the search
        dataframe=pd.concat([dataframe,items_df])
    except:
        print('area with problem: ', area)

searching area:  0
searching area:  1
searching area:  2
searching area:  3
searching area:  4
searching area:  5
searching area:  6
searching area:  7
searching area:  8
searching area:  9
searching area:  10
searching area:  11
searching area:  12
searching area:  13
searching area:  14
searching area:  15
searching area:  16
searching area:  17
searching area:  18
searching area:  19
searching area:  20
searching area:  21
searching area:  22
searching area:  23
searching area:  24
searching area:  25
searching area:  26
searching area:  27
searching area:  28
searching area:  29
searching area:  30
searching area:  31
searching area:  32
searching area:  33
searching area:  34
searching area:  35
searching area:  36
searching area:  37
searching area:  38
searching area:  39
searching area:  40
searching area:  41
searching area:  42
searching area:  43
searching area:  44
searching area:  45
searching area:  46
searching area:  47
searching area:  48
searching area:  49
searching 

Taking an initial look at the dataframe

In [13]:
print ('Our dataframe has : ', dataframe.shape[0],' results.')

Our dataframe has :  3579  results.


In [14]:
dataframe.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.cc,...,venue.location.address,venue.location.crossStreet,venue.location.postalCode,venue.location.city,venue.location.state,venue.venuePage.id,area_num,venue.location.neighborhood,venue.events.count,venue.events.summary
0,e-0-4d8ba29cace98cfae1285a9a-0,0.0,"[{'summary': 'This spot is popular', 'type': '...",4d8ba29cace98cfae1285a9a,Polideportivo La Mina,40.381581,-3.746746,"[{'label': 'display', 'lat': 40.38158122477921...",203.0,ES,...,,,,,,,0,,,
1,e-0-4d82625c4bbaa0939a0bc1ac-1,0.0,"[{'summary': 'This spot is popular', 'type': '...",4d82625c4bbaa0939a0bc1ac,Pastelería-Panadería La Gallega,40.384049,-3.746432,"[{'label': 'display', 'lat': 40.38404903690427...",179.0,ES,...,"C. Espinar, 1",C. Camino de los Ingenieros,28047.0,Madrid,Madrid,,0,,,
2,e-0-4c82a9c72f1c236ad1e13b43-2,0.0,"[{'summary': 'This spot is popular', 'type': '...",4c82a9c72f1c236ad1e13b43,Devinums,40.382068,-3.748143,"[{'label': 'display', 'lat': 40.38206778157163...",112.0,ES,...,"C. Monseñor Óscar Romero, 59-67",,28047.0,Madrid,Madrid,,0,,,
3,e-0-4cf1865b7bf3b60c9a4e607f-3,0.0,"[{'summary': 'This spot is popular', 'type': '...",4cf1865b7bf3b60c9a4e607f,Pulperia Caracoleria,40.383472,-3.747245,"[{'label': 'display', 'lat': 40.38347226438317...",86.0,ES,...,"Av de Nuestra Señora de Fátima, 55",,28047.0,Madrid,Madrid,64532020.0,0,,,
4,e-0-4b7c5023f964a520a78b2fe3-4,0.0,"[{'summary': 'This spot is popular', 'type': '...",4b7c5023f964a520a78b2fe3,La Sala Live!,40.383423,-3.747477,"[{'label': 'display', 'lat': 40.38342322328142...",67.0,ES,...,"Avda. de Nuestra Señora de Fátima, 42",,28047.0,Madrid,Madrid,,0,,,


We can see the dataframe has a lot of information that is not readily visible, so let's clean up the results:

In [16]:
def get_category_type(row):# function that extracts the category of the venue
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#filter columns
filtered_columns = ['venue.name', 'venue.categories','area_num'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]



During the search, some venues might have been picked up by more than one area number, 
so let's remove these values from our dataframe using the column 'id', 
keeping the venues closer to the area centre:

In [19]:
dataframe_filtered=dataframe_filtered.sort_values(by=['id','distance'])

In [20]:
dataframe_unique=dataframe_filtered.drop_duplicates(subset='id', keep='first')

In [21]:
print('The dataframe with unique values contains: ',dataframe_unique.shape[0],' rows')

The dataframe with unique values contains:  3372  rows


Before we continue is to get only the data which is useful for us in this study 
by filtering out the relevant columns, such as: name, categories, lat, lon, etc.

In [22]:
dataframe_df=dataframe_unique[['name','categories','lat','lng','id','distance','area_num']]

In [23]:
dataframe_df.reset_index(inplace=True, drop=True)

Let's take a look at our dataframe now:

In [24]:
dataframe_df.head()

Unnamed: 0,name,categories,lat,lng,id,distance,area_num
0,The Westin Palace,Hotel,40.415423,-3.695585,4adcda33f964a5206c3a21e3,223.0,113
1,Hotel Ritz,Hotel,40.415758,-3.692607,4adcda33f964a5206e3a21e3,172.0,114
2,"Hotel Santo Mauro, Autograph Collection",Hotel,40.430934,-3.693156,4adcda33f964a5206f3a21e3,175.0,159
3,Hotel Wellington,Hotel,40.422255,-3.684231,4adcda33f964a520713a21e3,82.0,130
4,Hotel Agumar,Hotel,40.406855,-3.683119,4adcda33f964a520733a21e3,147.0,85


## Exploring the Dataset

### Extracting eateries information

As we can see in the cell below, FourSquare has many categories which we can use to filter out the market that interests us. Before we procede any further, let's explore the 10 most common places in the selected location.

In [25]:
dataframe_df['categories'].value_counts().head(10)

Spanish Restaurant    324
Restaurant            226
Tapas Restaurant      147
Bar                   144
Hotel                 112
Café                  105
Coffee Shop            87
Plaza                  80
Bakery                 79
Italian Restaurant     67
Name: categories, dtype: int64

Let's see which categories are contained in our dataset:

In [26]:
dataframe_df['categories'].unique()

array(['Hotel', 'Wine Bar', 'Spanish Restaurant', 'Tapas Restaurant',
       'Café', 'Restaurant', 'Cocktail Bar', 'Nightclub', 'Bar',
       'Ice Cream Shop', 'Performing Arts Venue', 'Jazz Club',
       'Music Venue', 'Moroccan Restaurant', 'Pub', 'Chocolate Shop',
       'Concert Hall', 'Art Museum', 'Palace', 'Museum',
       'Monument / Landmark', 'Park', 'Plaza', 'Historic Site', 'Church',
       'Planetarium', 'General Entertainment', 'Cable Car', 'Exhibit',
       'Scenic Lookout', 'Garden', 'Science Museum', 'History Museum',
       'Art Gallery', 'Event Space', 'Steakhouse', 'German Restaurant',
       'French Restaurant', 'Paella Restaurant',
       'Mediterranean Restaurant', 'Seafood Restaurant', 'Pizza Place',
       'Opera House', 'Theater', 'Indie Theater', 'Multiplex',
       'Movie Theater', 'Basketball Stadium', 'Boutique',
       'Electronics Store', 'Bookstore', 'Soccer Stadium',
       "Women's Store", 'Fish Market', "Men's Store", 'Market',
       'Farmers Market

There are many categories, but as mentioned beforehand the scope of this project is to work with restaurants, more broadly speaking, eateries. FourSquare has also many categories for eateries besides 'Restaurant', so we gather them into a list below:

In [27]:
categories_list=[ 'Coffee Shop','Pub','Eastern European Restaurant','Fast Food Restaurant', 'Creperie', 'Brewery',  'Diner', 'Breakfast Spot', 
       'Gastropub', 'Café', 'Middle Eastern Restaurant','Bakery','Churrascaria' ,'Falafel Restaurant', 'Argentinian Restaurant', 'Deli / Bodega', 'French Restaurant',
       'Moroccan Restaurant', 'Mediterranean Restaurant','Thai Restaurant', 'Pizza Place', 'Sushi Restaurant',
      'Japanese Restaurant','Persian Restaurant', 'Fish & Chips Shop','Bar', 'Brasserie', 'Indian Restaurant',
       'Italian Restaurant', 'Portuguese Restaurant',  'Lebanese Restaurant',
        'Pastry Shop', 'Halal Restaurant',  'Korean Restaurant','Modern European Restaurant','Chinese Restaurant', 
        'Burger Joint', 'Greek Restaurant','Turkish Restaurant', 'Caribbean Restaurant', 'Spanish Restaurant', 
        'Polish Restaurant','Tapas Restaurant', 'American Restaurant', 'Seafood Restaurant',
       'Restaurant',  'Vegetarian / Vegan Restaurant','Latin American Restaurant', 'Vietnamese Restaurant', 
       'Himalayan Restaurant', 'African Restaurant', 'Bistro','Sandwich Place',  'Asian Restaurant','Pakistani Restaurant', 'Steakhouse', 
      'English Restaurant', 'Caucasian Restaurant', 'Mexican Restaurant', 'Kebab Restaurant', 'Tea Room','Pie Shop', 
      'Indonesian Restaurant', 'Fried Chicken Joint', 'Irish Pub', 'Souvlaki Shop', 'Bagel Shop', 'BBQ Joint', 
       'Food Truck', 'Empanada Restaurant', 'Food Stand','Sri Lankan Restaurant', 'Australian Restaurant',
       'Okonomiyaki Restaurant','Ramen Restaurant', 'South American Restaurant','Dumpling Restaurant', 'New American Restaurant',
       'Malay Restaurant', 'Hungarian Restaurant','Doner Restaurant','Brazilian Restaurant',
       'Burrito Place', 'Buffet','Cafeteria','Ethiopian Restaurant','Hunan Restaurant',
       'Afghan Restaurant','Shaanxi Restaurant','Taiwanese Restaurant', 'Veneto Restaurant',
       'Snack Place', 'North Indian Restaurant', 'Scandinavian Restaurant','South Indian Restaurant', 'Shabu-Shabu Restaurant',
       'Kosher Restaurant', 'Udon Restaurant','Arepa Restaurant', 'Xinjiang Restaurant', 'Salad Place', 'Noodle House',
       'Southern / Soul Food Restaurant', 'Taco Place', 'Peruvian Restaurant', 'Cantonese Restaurant', 
        'Comfort Food Restaurant', 'German Restaurant', 'Cajun / Creole Restaurant', 'Grilled Meat Restaurant', 
       'Theme Restaurant', 'Dim Sum Restaurant', 'Indian Chinese Restaurant','Cigkofte Place','Dosa Place',
       'Russian Restaurant', 'Chaat Place','Pub','Coffee Shop','Café','Pizza Place','Sandwich Place','Burger Joint','Gastropub','Fish & Chips Shop',
        'Steakhouse','Food Truck','Street Food','Diner','Tea Room','Fried Chicken Joint','Bistro','Bagel Shop','Burrito Place',
        'Salad Place','Creperie','BBQ Joint','Pie Shop','Food Stand','Poke Place','Soup Place','Brasserie','Souvlaki Shop',
        'Pastry Shop','Cigkofte Place','Dosa Place' ]
 

We can now created a dataframe that contains all eateries (restaurants and all categories listed above):

In [28]:
eateries_df=dataframe_df[dataframe_df['categories'].isin(categories_list)].reset_index()
eateries_df.drop('index',axis=1,inplace=True)

Let's take a look at this dataframe. Now we are ready to extract some useful information from it.

In [29]:
eateries_df.head()

Unnamed: 0,name,categories,lat,lng,id,distance,area_num
0,La Giralda IV,Spanish Restaurant,40.423349,-3.686757,4adcda34f964a520043b21e3,251.0,130
1,Casa Lucas,Tapas Restaurant,40.412315,-3.709437,4adcda34f964a520073b21e3,68.0,96
2,Restaurante Café El Botánico,Café,40.412798,-3.691181,4adcda34f964a5200c3b21e3,87.0,99
3,Stop Madrid,Tapas Restaurant,40.42099,-3.700632,4adcda34f964a520103b21e3,278.0,127
4,Taberna Albur,Spanish Restaurant,40.428613,-3.703724,4adcda34f964a520263b21e3,238.0,142


In [30]:
num_eateries=eateries_df.shape[0]

In [31]:
print('The eateries dataframe has a total of: ', num_eateries, ' unique eateries.')

The eateries dataframe has a total of:  1975  unique eateries.


Using a map we can locate where these eateries are located:

In [32]:
eateries_map=folium.Map(location=[lat,lon]) 
sw = coordinates[['lat', 'lon']].min().values.tolist()
ne = coordinates[['lat', 'lon']].max().values.tolist()
eateries_map.fit_bounds([sw, ne])
for lat,lon in zip(eateries_df['lat'],eateries_df['lng']):
    folium.CircleMarker(
        [lat,lon],
        radius=2,
        color='cornflowerblue',
        fill = True,
        fill_opacity = 0.01
        ).add_to(eateries_map)
eateries_map

Exploring these eateries further, we can see the most common categories of venues:

In [33]:
eateries_df['categories'].value_counts().head(10)

Spanish Restaurant          324
Restaurant                  226
Tapas Restaurant            147
Bar                         144
Café                        105
Coffee Shop                  87
Bakery                       79
Italian Restaurant           67
Pizza Place                  56
Mediterranean Restaurant     52
Name: categories, dtype: int64

Let's now see the what is the percentage these venues represent compared to the total of eateries:

In [34]:
100*eateries_df['categories'].value_counts(normalize=True).head(10)

Spanish Restaurant          16.405063
Restaurant                  11.443038
Tapas Restaurant             7.443038
Bar                          7.291139
Café                         5.316456
Coffee Shop                  4.405063
Bakery                       4.000000
Italian Restaurant           3.392405
Pizza Place                  2.835443
Mediterranean Restaurant     2.632911
Name: categories, dtype: float64

Lets explore further the types of restaurants, first in absolute values:

In [35]:
eateries_df[eateries_df['categories'].str.contains('Restaurant')]['categories'].value_counts()

Spanish Restaurant               324
Restaurant                       226
Tapas Restaurant                 147
Italian Restaurant                67
Mediterranean Restaurant          52
Japanese Restaurant               39
Seafood Restaurant                35
Asian Restaurant                  31
Mexican Restaurant                31
Chinese Restaurant                26
Fast Food Restaurant              25
Argentinian Restaurant            17
Sushi Restaurant                  17
Vegetarian / Vegan Restaurant     16
American Restaurant               13
Indian Restaurant                 12
Peruvian Restaurant               11
Thai Restaurant                   10
Korean Restaurant                  8
French Restaurant                  7
Greek Restaurant                   5
Middle Eastern Restaurant          5
South American Restaurant          4
Falafel Restaurant                 4
Latin American Restaurant          4
German Restaurant                  3
Brazilian Restaurant               3
K

Now in percentage:

In [36]:
100*eateries_df[eateries_df['categories'].str.contains('Restaurant')]['categories'].value_counts(normalize=True)

Spanish Restaurant               27.692308
Restaurant                       19.316239
Tapas Restaurant                 12.564103
Italian Restaurant                5.726496
Mediterranean Restaurant          4.444444
Japanese Restaurant               3.333333
Seafood Restaurant                2.991453
Asian Restaurant                  2.649573
Mexican Restaurant                2.649573
Chinese Restaurant                2.222222
Fast Food Restaurant              2.136752
Argentinian Restaurant            1.452991
Sushi Restaurant                  1.452991
Vegetarian / Vegan Restaurant     1.367521
American Restaurant               1.111111
Indian Restaurant                 1.025641
Peruvian Restaurant               0.940171
Thai Restaurant                   0.854701
Korean Restaurant                 0.683761
French Restaurant                 0.598291
Greek Restaurant                  0.427350
Middle Eastern Restaurant         0.427350
South American Restaurant         0.341880
Falafel Res

## Vegetarian eateries

What about vegetarian food? Let's see:

In [37]:
vegetarian_df=eateries_df[eateries_df['categories'].str.contains('Vegetarian')]

In [38]:
vegetarian_df.head()

Unnamed: 0,name,categories,lat,lng,id,distance,area_num
62,Sopa,Vegetarian / Vegan Restaurant,40.448126,-3.672487,4b2913d4f964a520429824e3,265.0,207
133,EcoCentro,Vegetarian / Vegan Restaurant,40.442858,-3.704664,4b6c2831f964a520b6262ce3,225.0,187
244,Viva Burger,Vegetarian / Vegan Restaurant,40.41263,-3.711683,4b9d4647f964a520319f36e3,148.0,96
949,El Triángulo de las Verduras,Vegetarian / Vegan Restaurant,40.413453,-3.728349,4e0397de45ddb464557678c5,173.0,93
1224,B13 Bar,Vegetarian / Vegan Restaurant,40.421921,-3.703402,50dc6f4ae4b08a4c254573ce,45.0,127


In [39]:
num_veg=vegetarian_df.shape[0]
print('The number of vegetarian/vegan places is: ',num_veg)

The number of vegetarian/vegan places is:  16


In [40]:
print('The percentage of vegetarian places is : ', round(100*num_veg/num_eateries,2), '%' )

The percentage of vegetarian places is :  0.81 %


As we can see, overall the percentage of purely vegetarian places is very low. Let's see where these places are located:

In [41]:
veg_map=folium.Map(location=[lat_searched, lon_searched], zoom_start=13) 
sw = vegetarian_df[['lat', 'lng']].min().values.tolist()
ne = vegetarian_df[['lat', 'lng']].max().values.tolist()
veg_map.fit_bounds([sw, ne])
for lat,lon,name in zip(vegetarian_df['lat'],vegetarian_df['lng'],vegetarian_df['name']):
    folium.CircleMarker(
        [lat,lon],
        radius=10,
        color='black',
        fill = True,
        fill_color = 'green',
        fill_opacity = 0.5
        ).add_to(veg_map) 
    
veg_map

As we can see, the few vegetarian/vegan restaurants are spread out around the city centre.

## Eateries in each area

Let's explore a bit more and see how many eateries are located in each area:

In [42]:
eateries_area_df=pd.DataFrame(eateries_df.groupby(by='area_num')['name'].count())
eateries_area_df.reset_index(inplace=True)

In [43]:
eateries_area_df.columns=['area_num','num_eateries']

In [44]:
eateries_area_df=pd.merge(eateries_area_df,coordinates,on='area_num', how='inner')

In [45]:
eateries_area_df.sort_values(by='num_eateries', ascending=False).head(20)

Unnamed: 0,area_num,num_eateries,lat,lon
81,98,57,40.412009,-3.697467
143,175,47,40.436116,-3.684802
79,96,45,40.412009,-3.710133
107,131,44,40.421651,-3.678469
106,130,42,40.421651,-3.684802
176,219,41,40.45058,-3.691134
165,203,40,40.445759,-3.697467
126,155,40,40.431294,-3.716466
128,157,38,40.431294,-3.7038
142,174,38,40.436116,-3.691134


By plotting on a map we can see the busiest areas (the size and colour of the dots will relate to the number of eateries in the area)

In [46]:
eateries_area_map=folium.Map(location=[lat, lon]) 
sw = eateries_area_df[['lat', 'lon']].min().values.tolist()
ne = eateries_area_df[['lat', 'lon']].max().values.tolist()
eateries_area_map.fit_bounds([sw, ne])
colormap = cm.LinearColormap(colors=['yellow','red'],vmin=0,vmax=eateries_area_df['num_eateries'].max())
for lat,lon,num in zip(eateries_area_df['lat'],eateries_area_df['lon'],eateries_area_df['num_eateries']):
    
    if num >10:    
        folium.CircleMarker(
            [lat,lon],
            radius=10,
            color=colormap(num),
            fill = True,

            fill_opacity = 0.3
            ).add_to(eateries_area_map)
        folium.map.Marker(
            [lat, lon],
            icon=folium.DivIcon(
                icon_size=(50,50),
                icon_anchor=(8,8),
                html='<div style="color:grey"><b>'
                +str(int(num))+'</b></div>',
                )
            ).add_to(eateries_area_map),      
eateries_area_map

## Vegetarian Eateries in each area

Let's now see in which areas the vegetarian/vegan eateries are located:

In [47]:
veg_area_df=pd.DataFrame(vegetarian_df['area_num'].value_counts())
veg_area_df.reset_index(inplace=True)

In [48]:
veg_area_df.columns=['area_num','num_veg']

In [49]:
veg_area_df=pd.merge(veg_area_df,coordinates,on='area_num', how='inner')

Taking a glimpse at the stations with most number of vegetarian/vegan places we have:

In [50]:
veg_area_df.head(5)

Unnamed: 0,area_num,num_veg,lat,lon
0,127,2,40.421651,-3.7038
1,128,2,40.421651,-3.697467
2,96,2,40.412009,-3.710133
3,95,1,40.412009,-3.716466
4,157,1,40.431294,-3.7038


As we did with the eateries, we can see which areas have vegetarian venues:

In [51]:
veg_area_map=folium.Map(location=[lat, lon]) 
sw = coordinates[['lat', 'lon']].min().values.tolist()
ne = coordinates[['lat', 'lon']].max().values.tolist()
veg_area_map.fit_bounds([sw, ne])
colormap = cm.LinearColormap(colors=['yellow','red'],vmin=0,vmax=veg_area_df['num_veg'].max())
for lat,lon,num in zip(veg_area_df['lat'],veg_area_df['lon'],veg_area_df['num_veg']):
    if num>0:
            
        folium.CircleMarker(
            [lat,lon],
            radius=10,
            color=colormap(num),
            fill = True,
    #         fill_color = 'blue',
            fill_opacity = 0.2
            ).add_to(veg_area_map)
        folium.map.Marker(
        [lat, lon],
        icon=folium.DivIcon(
            icon_size=(10,10),
            icon_anchor=(5,5),
            html='<div style="color:black"><b>'
            +str(int(num))+'</b></div>',
            )
        ).add_to(veg_area_map)
veg_area_map

# Market Opportunities

Now we have the values we need to determine which areas represent a bigger market for vegetarian and vegan places. We will be comparing the number of eateries and number of vegetarian places for each station:

Initially we combine both 'eateries_area' with 'veg_area' dataframes:

In [52]:
comparison_df=pd.merge(eateries_area_df,veg_area_df,on=['area_num','lat','lon'], how='outer')
comparison_df.fillna(0, inplace=True)
comparison_df=comparison_df[['area_num','lat','lon','num_eateries','num_veg']]
comparison_df.head()

Unnamed: 0,area_num,lat,lon,num_eateries,num_veg
0,0,40.38308,-3.748129,4,0.0
1,1,40.38308,-3.741797,3,0.0
2,2,40.38308,-3.735464,1,0.0
3,3,40.38308,-3.729131,3,0.0
4,4,40.38308,-3.722798,1,0.0


## Veggie Hot Spots
Let's see where the vegetarian/vegan places are located in proportion to the number of eateries:

In [53]:
comparison_df['veg_share']=round(100*comparison_df['num_veg']/comparison_df['num_eateries'],2)

In [54]:
comparison_df.sort_values(by='veg_share',ascending=False, inplace=True)

We can look at where these vegetarian/vegan 'hotspots' are located:

In [55]:
comparison_df.head()

Unnamed: 0,area_num,lat,lon,num_eateries,num_veg,veg_share
152,187,40.440937,-3.7038,6,1.0,16.67
96,116,40.41683,-3.678469,8,1.0,12.5
76,93,40.412009,-3.729131,10,1.0,10.0
80,97,40.412009,-3.7038,11,1.0,9.09
114,140,40.426473,-3.716466,13,1.0,7.69


In [57]:
veg_hotspot_map=folium.Map(location=[lat,lon]) 
sw = comparison_df[['lat', 'lon']].min().values.tolist()
ne = comparison_df[['lat', 'lon']].max().values.tolist()
veg_hotspot_map.fit_bounds([sw, ne])
colormap = cm.LinearColormap(colors=['white','green'],vmin=5,vmax=comparison_df['veg_share'].max())
for lat,lon,share in zip(comparison_df['lat'],comparison_df['lon'],comparison_df['veg_share']):
    if share > 5:
    
        folium.CircleMarker(
            [lat,lon],
            radius=20,
            color=colormap(share),
            fill = True,
            fill_color = 'green',     
            fill_opacity = 0.2
            ).add_to(veg_hotspot_map)


    
        folium.map.Marker(
            [lat, lon],
            icon=folium.DivIcon(
                icon_size=(10,10),
                icon_anchor=(5,5),
                html='<div style="text-align: center; font-size: 10pt"><b>'+str(int(share))+'%</b></div>',
                )
            ).add_to(veg_hotspot_map),

veg_hotspot_map

## Business Opportunities 

Let's now explore where there is market for vegetarian places. 
We will do this by comparing the total number eateries in each area 
to number of vegetarian venues getting the amount of market available.

Because many areas had a number equal to ZERO we will be 
adding 1 to all the number of vegetarian places per station 
so we the division can be carried out. We then remove this 
after the division has been done.

In [58]:
comparison_df['opportunity']=(comparison_df['num_eateries']/(comparison_df['num_veg']+1))-1
comparison_df.sort_values(by='opportunity',ascending=False,inplace=True)
comparison_df.reset_index(inplace=True)

In [59]:
comparison_df.head()

Unnamed: 0,index,area_num,lat,lon,num_eateries,num_veg,veg_share,opportunity
0,81,98,40.412009,-3.697467,57,0.0,0.0,56.0
1,143,175,40.436116,-3.684802,47,0.0,0.0,46.0
2,107,131,40.421651,-3.678469,44,0.0,0.0,43.0
3,106,130,40.421651,-3.684802,42,0.0,0.0,41.0
4,176,219,40.45058,-3.691134,41,0.0,0.0,40.0


In [64]:

opportunities_map=folium.Map(location=[lat,lon])   

sw = comparison_df[['lat', 'lon']].min().values.tolist()
ne = comparison_df[['lat', 'lon']].max().values.tolist()
opportunities_map.fit_bounds([sw, ne])
colormap = cm.LinearColormap(colors=['white','green'],vmin=0,vmax=comparison_df['opportunity'].max())
for position,lat,lon,opportunity,area in zip(comparison_df.index,comparison_df['lat'],comparison_df['lon'],comparison_df['opportunity'],comparison_df['area_num']):
    if position< 10:
        folium.CircleMarker(
            [lat,lon],
            radius=10,
            color=colormap(opportunity),
            fill = True,
            popup=str(position+1)+' position',

            fill_opacity = 0.3
            ).add_to(opportunities_map)              
        
        folium.map.Marker(
            [lat, lon],
            icon=folium.DivIcon(
                icon_size=(10,10),
                icon_anchor=(5,5),
                html='<div style="text-align: center; font-size: 10pt"><b>'+str(int(position+1))+'</b></div>',
                )
            ).add_to(opportunities_map),
opportunities_map

The areas in green represent the highest ratio between number of eateries over the number of vegetarian places, representing a likely place to set up a vegetarian/vegan eatery.

## Further analysis

In this notebook we have analysed the possible market for vegetarian places in Madrid, but this code can be applied to many other business opportunities anywhere in the world. We could, for example, evaluate which areas of Moscow have a market niche for Mexican restaurants, bars or even other types of venues, such as hotels. This is a purely quantitative analysis and further research can be done exploring the area before commiting to any investment. 