# Centers of coffee - data

The data will be obtained by the foursquare API, I’ll be looking for geolocation data of all coffee shops and café’s in Rome, and I will be applying k-means algorithms to this data in order to find the locations that minimize the distance to all coffee shops and café’s. 

In this part I'll gather the data.

In [1]:
#importing pandas, numpy and relevant libraries
import pandas as pd
import numpy as np

In [2]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [3]:
# import geocoder - a library for geolocation data
!conda install -c conda-forge geocoder --yes 
import geocoder 
import time
import requests 


Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [4]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [5]:
#hiden cell
#Foursquare credentials

In [6]:
#using API to try get all coffe shops and cafe's
radius = 3500 # define radius
LIMIT = 1500 # limit of number of venues returned by Foursquare API
categoryId = "4bf58dd8d48988d16d941735,4bf58dd8d48988d1e0931735" #Cafe,coffe shop category ID in Foursquare API
location = "Rome"

# create URL

url = 'https://api.foursquare.com/v2/venues/explore?&categoryId={}&client_id={}&client_secret={}&v={}&near={}&radius={}&limit={}'.format(
    categoryId,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    location,
    radius, 
    LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e1c7de2542890001b22a9fd'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'geocode': {'what': '',
   'where': 'rome',
   'center': {'lat': 41.89193, 'lng': 12.51133},
   'displayString': 'Rome, Latium, Italy',
   'cc': 'IT',
   'geometry': {'bounds': {'ne': {'lat': 41.98942097197372,
      'lng': 12.615249866611695},
     'sw': {'lat': 41.79461305370221, 'lng': 12.379309999999998}}},
   'slug': 'roma',
   'longId': '72057594041097006'},
  'headerLocation': 'Rome',
  'headerFullLocation': 'Rome',
  'headerLocationGranularity': 'city',
  'query': 'cafe',
  'totalResults': 212,
  'suggestedBounds': {'ne': {'lat': 41.92708938319744,
    'lng': 12.560106323406826},
   'sw': {'lat': 41.85587073838316, 'lng': 12.456626864462619}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is po

In [7]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [8]:
#making a data frame 
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # normalize JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Bar Trani,Café,41.893034,12.505343
1,Molino,Café,41.89659,12.499325
2,The British Corner,Tea Room,41.888204,12.531285
3,Nespresso Boutique,Coffee Shop,41.885203,12.509556
4,Caffè Ciamei,Café,41.892193,12.50564


In [9]:
#looking at the veneus
Location = [41.9, 12.5]
map_cafe = folium.Map(location=Location, zoom_start=11)

# add markers to map
for lat, lng, name in zip(nearby_venues['lat'],
                          nearby_venues['lng'],
                          nearby_venues['name']):
    label = name
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cafe) 
map_cafe

Looks like we are not getting all of Rome

In [10]:
nearby_venues.shape

(100, 4)

Only 100 venues, not 1500 like the limit I stated. Foursquare API only returns up to 100 in a querie. I think I need multiple queries in order to get all of them.

I'll build a grid to run multiple searches and then merge the results

In [11]:
grid = [(i/50 - 0.1 ,j/50 - 0.1) for i in range(11) for j in range(11)]

In [12]:
#looking at the veneus
Location = [41.9, 12.5]
map_cafe = folium.Map(location=Location, zoom_start=11)

# add markers to map
for ll in grid:
    label = "name"
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Location[0] + ll[0], Location[1] + ll[1]],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cafe) 
map_cafe

The grid covers rome like i wanted. Now I'll run multyple surches in Foursquare to get all the coffe shops of rome

In [13]:
#using API to get all coffe shops and cafe's
radius = 3500 # define radius
LIMIT = 500 # limit of number of venues returned by Foursquare API
categoryId = "4bf58dd8d48988d16d941735,4bf58dd8d48988d1e0931735" #Cafe,coffe shop category ID in Foursquare API
location = "Rome"
df = pd.DataFrame(columns =  ['name', 'categories', 'lat', 'lng'])

# create URL
for ll in grid:
    url = 'https://api.foursquare.com/v2/venues/explore?&categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        categoryId,
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION,
        ll[0] + 41.9,
        ll[1] + 12.5,
        radius, 
        LIMIT)
    results = requests.get(url).json()
    #time.sleep(1)

    #making a data frame 
    venues = results['response']['groups'][0]['items']

    nearby_venues = json_normalize(venues) # normalize JSON
    
    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    if 'venue.name' in nearby_venues.columns:
        nearby_venues =nearby_venues.loc[:, filtered_columns]

        # filter the category for each row
        nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

        # clean columns
        nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
        df = pd.merge(df, nearby_venues, on= ['name', 'categories', 'lat', 'lng'],how='outer')


df.head()

Unnamed: 0,name,categories,lat,lng
0,Bar Dal Pannocchia,Café,41.79627,12.407022
1,Mondo Gelo,Café,41.783271,12.435756
2,BarCode,Café,41.796178,12.432368
3,Bar Bucchi,Café,41.786176,12.38365
4,Red Cafe At Un World Food Programme,Café,41.821181,12.408381


In [14]:
#looking at the veneus
Location = [41.9, 12.5]
map_cafe = folium.Map(location=Location, zoom_start=11)

# add markers to map
for lat, lng, name in zip(df['lat'],
                          df['lng'],
                          df['name']):
    label = name
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cafe) 
map_cafe

Good! we have all the data, next part will be to apply k-means to this data, this belongs in next weeks assinment