<h1> Opening an Italian restaurant </h>

<h2>Introduction: Business problem</h2>

<p>The project describes the process of searching the best district to open an Italian restaurant in Ljubljana. 
It is very important for new restaurant owners to decide which type of restaurant they will open. This research will also help them to decide which district is most promising to open a new restaurant.
    
As a city with many sights, Ljubljana is a target of many tourists and because of that we will define the country of the most numerous tourists. We assume that the most numerous tourists will visit the restaurant which offer dishes from the country they are coming from. Beside that we will show where the most important sights and monuments are located and which district is most attracted by tourists. It is generally konown that there is a strong link between tourists and sights, so district with the largest number of sights will be the most interesting for tourists. We assume that that Italians are the most numerous tourists in Ljubljana and because of that the logical choice is to open an Italian restaurant.
At the end of the project we will confirm the assumption.<p>

<h2>Data description</h2>
<p>Based on definition of our problem, factors that will influence our decission are:
<ul>
    <li> list of districts in Ljubljana,</li>
<li> number of the most numerous tourists in Ljubljana,</li>
<li> location of the most important sights in Ljubljana,</li>
<li> number of restaurants which offer dishes from the country as the most numerous tourists are coming from (example: if the most numerous tourists are Italians, we will query Italian restaurants),</li>
<li> list of chosen type of restaurants situated in the city districts.</li>
</ul>

<p>Following data sources will be needed to extract/generate the required information:<p>
<ul>
    <li> centers of candidate districts will be generated using OpenStreetMap (Nominatim) API.</li>
    <li> list of restaurants and their type and location in every districts will be obtained using Foursquare API,</li>
    <li> coordinate of Ljubljana center will be obtained using Nominatim geocoding,</li>
    <li> list of Ljubljana districts will be web scraped using Beautiful Soup,</li>
    <li> list of Ljubljana sights and list of Ljubljana visitors/tourists will be obtained by pre prepared statistical files.</li>
</ul>

<h3>Data Prep and Pull</h3>
<p>We will import our necessary packages and start pulling our data for data prep and usage.</p>

In [2]:
#import necessary packages and libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np # library to handle data in a vectorized manner

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
#=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [3]:
#web site of the districts in Ljubljana
website_url = requests.get("https://sl.wikipedia.org/wiki/Četrtna_skupnost_Ljubljane").text

In [4]:
#web scraping the Ljubljana districts
soup = BeautifulSoup(website_url,'lxml')
My_table = soup.find('table',{'class':'nowraplinks collapsible autocollapse navbox-inner'})
links = My_table.findAll('a')
links

[<a href="/wiki/Predloga:Ljubljana" title="Predloga:Ljubljana"><span style=";;background:none transparent;border:none;-moz-box-shadow:none;-webkit-box-shadow:none;box-shadow:none;" title="Prikaži to predlogo">p</span></a>,
 <a href="/wiki/Pogovor_o_predlogi:Ljubljana" title="Pogovor o predlogi:Ljubljana"><span style=";;background:none transparent;border:none;-moz-box-shadow:none;-webkit-box-shadow:none;box-shadow:none;" title="Pogovor o tej predlogi">p</span></a>,
 <a class="external text" href="//sl.wikipedia.org/w/index.php?title=Predloga:Ljubljana&amp;action=edit"><span style=";;background:none transparent;border:none;-moz-box-shadow:none;-webkit-box-shadow:none;box-shadow:none;" title="Uredi to predlogo">u</span></a>,
 <a class="image" href="/wiki/Slika:Flag_of_Ljubljana.svg"><img alt="Flag of Ljubljana.svg" data-file-height="270" data-file-width="675" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/5/52/Flag_of_Ljubljana.svg/38px-Flag_of_Ljubljana.

In [5]:
Districts = []
for link in links:
    Districts.append(link.get('title'))   
del Districts[0:2]
Districts=Districts[5:22]
df = pd.DataFrame()
df['Districts'] = Districts
df

Unnamed: 0,Districts
0,Četrtna skupnost Bežigrad
1,Četrtna skupnost Center
2,Četrtna skupnost Črnuče
3,Četrtna skupnost Dravlje
4,Četrtna skupnost Golovec
5,Četrtna skupnost Jarše
6,Četrtna skupnost Moste
7,Četrtna skupnost Polje
8,Četrtna skupnost Posavje
9,Četrtna skupnost Rožnik


In [6]:
def get_coords_local(district, output_as='center'):
    """
    get the bounding box of a locality in WGS84 given its name

    Parameters
    ----------
    district : str
        name of the country in english and lowercase
    output_as : 'str
        chose from 'boundingbox' or 'center'. 
         - 'boundingbox' for [latmin, latmax, lonmin, lonmax]
         - 'center' for [latcenter, loncenter]

    Returns
    -------
    output : list
        list with coordinates as str
    """
    # create url
    url = '{0}{1}{2}'.format('http://nominatim.openstreetmap.org/search.php?q=',
                             district+', Ljubljana',
                             '&format=json&polygon=0')
    response = requests.get(url).json()[0]

    # parse response to list
    if output_as == 'boundingbox':
        lst = response[output_as]
        output = [float(i) for i in lst]
    if output_as == 'center':
        lst = [response.get(key) for key in ['lat','lon']]
        output = [float(i) for i in lst]
    return output

In [7]:
#list of districts in Ljubljana with geographical coordinates (latitude & longitude)
df2 = df.copy()
latitudeCln = []
longitudeCln = []
districtCln=[]

for index, row in df2.iterrows():
#    print(row[0])
    lok=row[0].replace('Četrtna skupnost ','')
    lat, long = get_coords_local(district=lok, output_as='center')
    if lok == 'Rožnik':
        lat = 46.05999
        long = 14.46779
    if lok == 'Šmarna gora':
        lat = 46.119496
        long = 14.4611       
         
    districtCln.append(lok)
    latitudeCln.append(lat)
    longitudeCln.append(long)
    
df2['Latitude'] = latitudeCln
df2['Longitude'] = longitudeCln
df2['Districts'] = districtCln
# make a copy of the districts dataframe to get it simply back if needed
df3 = df2.copy()
df3

Unnamed: 0,Districts,Latitude,Longitude
0,Bežigrad,46.071523,14.509137
1,Center,46.049815,14.506782
2,Črnuče,46.105006,14.532862
3,Dravlje,46.08114,14.475201
4,Golovec,46.034705,14.535686
5,Jarše,46.080596,14.5456
6,Moste,46.057282,14.536756
7,Polje,46.056122,14.580008
8,Posavje,46.089738,14.508915
9,Rožnik,46.05999,14.46779


In [8]:
#geographical coordinates (latitude & longitude) of Ljubljana city
address = 'Ljubljana, Slovenia'

geolocator = Nominatim(user_agent="capstoneProject")
location = geolocator.geocode(address, timeout=60, exactly_one=True)
latitude = location.latitude
longitude = location.longitude
print('The decimal coordinates of Ljubljana are {}, {}.'.format(latitude, longitude))

The decimal coordinates of Ljubljana are 46.0498146, 14.5067824.


In [9]:
# create map of Ljubljana using latitude and longitude values with the displayed centers of the relevant districts
map_ljubljana = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, local in zip(df3['Latitude'], df3['Longitude'], df3['Districts']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_ljubljana)  
    
map_ljubljana

In [74]:
#table shows the number of Ljubljana visitors in years 2008-2017 by Country
ljubljana_tourists = pd.read_csv("http://www.onbria.com/wp-content/uploads/2019/04/Tourists.csv",  sep=';', error_bad_lines=False)
#ljubljana_tourists.drop('ID')
ljubljana_tourists = ljubljana_tourists.drop('ID', axis=1)
ljubljana_tourists #.head()

Unnamed: 0,Country,Arrivals,Overnights
0,Italy,615752,1016681
1,Germany,393852,715091
2,United Kingdom,293308,623496
3,United States,248841,538523
4,Slovenia,272515,445841
5,France,230181,433961
6,Austria,233303,369222
7,Spain,180749,354484
8,other Asian countries,189312,346650
9,Croatia,190177,334662


In [51]:
#Country with the most numerous visitors in Ljubljana
ljubljana_tourists.loc[ljubljana_tourists['Arrivals'].idxmax()]


Country         Italy
Arrivals       615752
Overnights    1016681
Name: 0, dtype: object

In [10]:
# function to repeat the exploring process to all the venues in Ljubljana
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, limit)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)



            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'],
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns =  ['District', 
                                  'District Latitude', 
                                  'District Longitude', 
                                  'Venue', 
                                  'Venue Latitude', 
                                  'Venue Longitude', 
                                  'Venue Category'
                                  ]
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)



In [11]:
limit = 500 # limit of number of venues returned by Foursquare API
radius = 800 # define radius
CLIENT_ID = 'J4Q3Q0XZV1VS4DHWIOV11SIL3CN5ZRU5WQQGJFHZSACBTZE1'
CLIENT_SECRET = 'OZHSLJXO0X3JIZFFZ1F4DNYT0SBOMTDEJQOBTC22EQS441X4'
#CLIENT_ID = 'ZMHWBS0SR12Z3YDYVHJVTZPRK3U1ZP3I2TYQAJ5CU3JUHMB5'
#CLIENT_SECRET = 'H3TT0XT3P5TIAFCV1Y2UUVLF42N44DICNKLUELK34H2TKLFR'
VERSION = '20190406'

In [54]:
#touristic sights
#we will query Ljubljana sights
ljubljana_data = pd.read_csv("http://www.onbria.com/wp-content/uploads/2019/03/Ljubljana.csv",  sep=',', error_bad_lines=False)
ljubljana_data

Unnamed: 0,Place,Lat,Long
0,Ljubljana Castle,46.049021,14.508629
1,Triple Bridge,46.051198,14.5062
2,Tivoli City park,46.052333,14.491331
3,Dragon Bridge,46.052134,14.510351
4,Presern monument,46.051416,14.506177
5,Robba Fountain,46.050114,14.506978
6,Krizanke,46.046465,14.503256
7,Metelkova,46.0564,14.5167
8,Ljubljana Central Market,46.05144,14.509869
9,National and University Library,46.047549,14.503796


In [79]:
#lets find out the centroid of the Ljubljana sights
lj_sights= ljubljana_data.reset_index(drop=True)

lj_sights_lat_lng = lj_sights[['Lat','Long']]
k_means = KMeans(init = "k-means++", n_clusters = 1, n_init = 10)
k_means.fit(lj_sights_lat_lng )

k_means_labels = k_means.labels_
k_means_labels
k_means_cluster_centers = k_means.cluster_centers_
k_means_cluster_centers
df_k_sight = pd.DataFrame(k_means_cluster_centers) 
centroid= ['Centroid sight'] #, 'Centroid 2', 'Centroid 3', 'Centroid 4'] 
df_k_sight['Name']=centroid
df_4_sight = df_k_sight
df_4_sight.columns = ['Latitude', 'Longitude', 'Centroid']
df_4_sight

Unnamed: 0,Latitude,Longitude,Centroid
0,46.051653,14.503397,Centroid sight


In [75]:
# list with the Italian restaurants in districts
italian_restaurant_categories = ['4bf58dd8d48988d1c0941735','4bf58dd8d48988d110941735', '4bf58dd8d48988d1ca941735']
ljubljana_venues_ita_restaurant= pd.DataFrame(getNearbyVenues(names=df3['Districts'], latitudes=df3['Latitude'], longitudes=df3['Longitude'], radius= 800, categoryIds='4bf58dd8d48988d110941735')) 
ljubljana_venues_italian = ljubljana_venues_ita_restaurant
for it_restoran in italian_restaurant_categories:
    ljubljana_venues_ita_restaurant= pd.DataFrame(getNearbyVenues(names=df3['Districts'], latitudes=df3['Latitude'], longitudes=df3['Longitude'], radius= 800, categoryIds= it_restoran) )
    ljubljana_venues_italian = ljubljana_venues_italian.append(ljubljana_venues_ita_restaurant)

ljubljana_venues_italian['venue']=ljubljana_venues_italian['Venue']
ljubljana_venues_italian['district']=ljubljana_venues_italian['District']
ljubljana_venues_italian['venue category']=ljubljana_venues_italian['Venue Category']
ljubljana_venues_italian=ljubljana_venues_italian.groupby('Venue').first() 
print('There are {} Italian restaurants in Ljubljana.'.format(ljubljana_venues_italian.shape[0]))
ljubljana_venues_italian #.head()


There are 67 Italian restaurants in Ljubljana.


Unnamed: 0_level_0,District,District Latitude,District Longitude,Venue Latitude,Venue Longitude,Venue Category,venue,district,venue category
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alegria,Bežigrad,46.071523,14.509137,46.06603,14.514992,Italian Restaurant,Alegria,Bežigrad,Italian Restaurant
Apertivo,Center,46.049815,14.506782,46.051403,14.504608,Italian Restaurant,Apertivo,Center,Italian Restaurant
As Aperitivo,Center,46.049815,14.506782,46.0514,14.50473,Mediterranean Restaurant,As Aperitivo,Center,Mediterranean Restaurant
Bežigrajski dvor,Bežigrad,46.071523,14.509137,46.065758,14.510303,Pizza Place,Bežigrajski dvor,Bežigrad,Pizza Place
Brazzera,Dravlje,46.08114,14.475201,46.074207,14.475384,Pizza Place,Brazzera,Dravlje,Pizza Place
Capriccio,Center,46.049815,14.506782,46.05235,14.511394,Pizza Place,Capriccio,Center,Pizza Place
Emonska klet,Center,46.049815,14.506782,46.050612,14.502283,Pizza Place,Emonska klet,Center,Pizza Place
Enjoy Italy,Center,46.049815,14.506782,46.05276,14.502481,Italian Restaurant,Enjoy Italy,Center,Italian Restaurant
Family'sPizzaExpress,Moste,46.057282,14.536756,46.056124,14.534386,Pizza Place,Family'sPizzaExpress,Moste,Pizza Place
Fany & Mary,Center,46.049815,14.506782,46.051823,14.508512,Bar,Fany & Mary,Center,Bar


In [15]:
print('There are {}  venues.'.format(len(ljubljana_venues_italian['venue'].unique())))

# Analyze each district
# one hot encoding
ljubljana_onehot = pd.get_dummies(ljubljana_venues_italian[['venue']], prefix="", prefix_sep="")
ljubljana_onehot
# add district column back to dataframe
ljubljana_onehot['district'] = ljubljana_venues_italian['district'] 

# move district column to the first column
fixed_columns = [ljubljana_onehot.columns[-1]] + list(ljubljana_onehot.columns[:-1])
ljubljana_onehot = ljubljana_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(ljubljana_onehot.shape[0]))

#group rows by district and by taking the mean of the distribution of each restaurant
ljubljana_grouped = ljubljana_onehot.groupby('district').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(ljubljana_grouped.shape[0]))
ljubljana_grouped

There are 67  venues.
67 rows were returned after one hot encoding.
14 rows were returned after grouping.


Unnamed: 0,district,Alegria,Apertivo,As Aperitivo,Bežigrajski dvor,Brazzera,Capriccio,Emonska klet,Enjoy Italy,Family'sPizzaExpress,Fany & Mary,Favola,Foculus,Foodie,Garaža,Gostilna Dubočica,Gostilna in pizzerija Rogovilc,Halo Janez,Halo Pinki - Moste,Julija,Kot Barbe Dimaria,La Storia Trattoria,Maxim,Mediterraneo,Medo bar,Meta In Bazilika,Mexico Mediterra Place,Mirjams Pub,Nolito,Paninoteka +,Piazza Del Papa,Picerija Osmica,Picestavracija Boccaccio,Pinki,Pinsa Rustika,Pivnica Kratochwill,Pizza Cutty,Pizza Hutt,Pizza dostava Novak,Pizzeria Barjan,Pizzeria Ljubljanski dvor,Pizzeria Luigi,Pizzeria Parma,Pizzeria Tunnel,Pizzeria Šestinka,Pizzerija Gregorino,Pizzerija Laterna,Pizzerija Papirus,Pizzerija Savlje,Pizzerija Soncek,Pizzerija Trnovski zvon,Prince of Orange,Promenada Pizza,Restavracija Allegria,Restavracija Angel,Restavracija Klub 300,Restavracija Tartuf,Restavracija in kavarna Element,Robin Food,Trappa,Trappica,Trta,Verace,Volta cafe,Za Pumpo,Zlata Ribica,pizza delivery,pr gapetu
0,Bežigrad,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667
1,Center,0.0,0.03125,0.03125,0.0,0.0,0.03125,0.03125,0.03125,0.0,0.03125,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.03125,0.03125,0.0,0.03125,0.0,0.03125,0.0,0.03125,0.0,0.03125,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.03125,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.03125,0.03125,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.03125,0.03125,0.0,0.0,0.03125,0.0,0.0
2,Dravlje,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Golovec,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Jarše,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Moste,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
6,Polje,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Posavje,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Rudnik,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Trnovo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
#print each district along with the top 5 most common venues
num_top_venues = 5

for district in ljubljana_grouped['district']:
    print("----"+district+"----")
    temp = ljubljana_grouped[ljubljana_grouped ['district'] == district].T.reset_index()
    temp.columns = ['venue','distribution']
    temp = temp.iloc[1:]
    temp['distribution'] = temp['distribution'].astype(float)
    temp = temp.round({'distribution': 2})
    print(temp.sort_values('distribution', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bežigrad----
                 venue  distribution
0              Alegria          0.17
1               Favola          0.17
2  Pizzerija Gregorino          0.17
3           Halo Janez          0.17
4            pr gapetu          0.17


----Center----
                 venue  distribution
0        Pinsa Rustika          0.03
1  La Storia Trattoria          0.03
2  Restavracija Tartuf          0.03
3     Meta In Bazilika          0.03
4          Mirjams Pub          0.03


----Dravlje----
                      venue  distribution
0  Picestavracija Boccaccio          0.25
1                  Brazzera          0.25
2        Restavracija Angel          0.25
3     Restavracija Klub 300          0.25
4                   Alegria          0.00


----Golovec----
               venue  distribution
0             Garaža           1.0
1            Alegria           0.0
2  Pizzeria Šestinka           0.0
3   Pizzerija Soncek           0.0
4   Pizzerija Savlje           0.0


----Jarše----
        

In [76]:
#district with the smallest distribution is Center district
#it means that in this district there is highest density of Italian restaurants
#in the table there is a list with Italian restaurant in district Center
ljubljana_venues_italian.shape
df4= ljubljana_venues_italian
df4_c= df4[df4['district'] == 'Center'].reset_index(drop=True)
df5=df4_c.drop(['District', 'District Latitude', 'District Longitude', 'Venue Category', 'district','venue category'], axis=1)
df5 = df5.reset_index()
df5=df5.drop(['index'], axis=1)
df5

Unnamed: 0,Venue Latitude,Venue Longitude,venue
0,46.051403,14.504608,Apertivo
1,46.0514,14.50473,As Aperitivo
2,46.05235,14.511394,Capriccio
3,46.050612,14.502283,Emonska klet
4,46.05276,14.502481,Enjoy Italy
5,46.051823,14.508512,Fany & Mary
6,46.048011,14.502157,Foculus
7,46.055904,14.504173,Foodie
8,46.047964,14.506177,Julija
9,46.055786,14.501945,La Storia Trattoria


In [81]:
# create map of Ljubljana using latitude and longitude values
# on the map there will be shown:
#  - all Italian restaurants in district Center
map_ljubljana_italian_center = folium.Map(location=[latitude, longitude], zoom_start=16)
 
# add markers to map: all Italian restaurants in district Center   
for lat, lng, local in zip(df5['Venue Latitude'], df5['Venue Longitude'], df5['venue']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='green',
        fill=True,
        fill_color='red',
        fill_opacity=0.9).add_to(map_ljubljana_italian_center)  
    
map_ljubljana_italian_center    

In [78]:
#lets find out the centroid of the Italian restaurants in district Center
lj_it_center= ljubljana_venues_italian[ljubljana_venues_italian['district'] == 'Center'].reset_index(drop=True)

lj_lat_lng_center = lj_it_center[['Venue Latitude','Venue Longitude']]
k_means = KMeans(init = "k-means++", n_clusters = 1, n_init = 10)
k_means.fit(lj_lat_lng_center )

k_means_labels = k_means.labels_
k_means_labels
k_means_cluster_centers = k_means.cluster_centers_
k_means_cluster_centers
df_k = pd.DataFrame(k_means_cluster_centers) 

centroid= ['Centroid Italian restaurant'] #, 'Centroid 2', 'Centroid 3', 'Centroid 4'] 
df_k['Name']=centroid
df_4 = df_k
df_4.columns = ['Latitude', 'Longitude', 'Centroid']
df_4


Unnamed: 0,Latitude,Longitude,Centroid
0,46.050697,14.506184,Centroid Italian restaurant


In [71]:
# create map of Ljubljana using latitude and longitude values

# on the map there will be shown:
#  - centroid of Italian restaurants in district Center,
#  - centroid of the most important touristic sights in Ljubljana,
#  - all Italian restaurants in district Center
#  - the most important touristic sights in Ljubljana


map_ljubljana_italian_centroid = folium.Map(location=[latitude, longitude], zoom_start=16)

# add markers to map: centroid of Italian restaurants in district Center    
for lat, lng, local in zip(df_4['Latitude'], df_4['Longitude'], df_4['Centroid']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=30,
        popup=label,
        color='red',
        fill=True,
        fill_color='black',
        fill_opacity=0.3).add_to(map_ljubljana_italian_centroid)      

# add markers to map: centroid of Italian restaurants in district Center    
for lat, lng, local in zip(df_4['Latitude'], df_4['Longitude'], df_4['Centroid']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=30,
        popup=label,
        color='red',
        fill=True,
        fill_color='black',
        fill_opacity=0.3).add_to(map_ljubljana_italian_centroid)      


# add markers to map: centroid of the most important touristic sights in Ljubljana
for lat, lng, local in zip(df_4_sight['Latitude'], df_4_sight['Longitude'], df_4_sight['Centroid']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=30,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='black',
        fill_opacity=0.3).add_to(map_ljubljana_italian_centroid)  
    
# add markers to map: all Italian restaurants in district Center   
for lat, lng, local in zip(df5['Venue Latitude'], df5['Venue Longitude'], df5['venue']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='green',
        fill=True,
        fill_color='red',
        fill_opacity=0.9).add_to(map_ljubljana_italian_centroid)  
    

# add markers to map: the most important touristic sights in Ljubljana
for lat, lng, label in zip(ljubljana_data['Lat'], ljubljana_data['Long'], ljubljana_data['Place']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='green',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.9,
        parse_html=False).add_to(map_ljubljana_italian_centroid)      
    
map_ljubljana_italian_centroid