# Budapest restaurants and cuisines

This short project serves as practice with geospatial data, as well as a cool way to visualize the most dominant cuisines in each district of Budapest. Data is obtained via [`OSMNx`](https://osmnx.readthedocs.io/en/stable/index.html), visualizations are created with `matplotlib.pyplot` and `geopandas`.

The question I'd like to answer isn't based on the number of restaurants as some restaurants might serve multiple different cuisines.

#### Import libraries

In [1]:
import pandas as pd
import osmnx as ox
import inflect # to create the district numbers

### Data gathering
#### Create a list of the districts of Budapest

This will be used to create the map, as well as to get all restaurants through `osmnx`.

In [2]:
budapest_districts = []
for i in range(1, 23):
    budapest_districts.append(inflect.engine().ordinal(i) + ' District, Budapest, Hungary')

#### Get all restaurants from each district of Budapest

* Get all "restaurant" and "fast food" amenities with `osmnx` for each district
* Keep the `unique_id` and `cuisine` columns and add a flag for the districts

In [None]:
bp_cuisines = []
for district in budapest_districts:
    amenities = ox.geometries_from_place(district, tags={'amenity': ['restaurant', 'fast_food']})
    cuisines = amenities[['unique_id', 'cuisine']]
    cuisines['district'] = district
    bp_cuisines.append(cuisines)

bp_cuisines = pd.concat(bp_cuisines)

In [3]:
bp_cuisines = pd.read_csv('bp_cuisines.csv')

In [4]:
bp_cuisines.head()

Unnamed: 0,unique_id,cuisine,district
0,node/339287577,burger,"1st District, Budapest, Hungary"
1,node/412953469,,"1st District, Budapest, Hungary"
2,node/412953471,chinese,"1st District, Budapest, Hungary"
3,node/427003905,,"1st District, Budapest, Hungary"
4,node/432490789,,"1st District, Budapest, Hungary"


* There are some `NaN` values, I'll drop those.

In [5]:
bp_cuisines.dropna(inplace=True)

In [6]:
bp_cuisines['cuisine'].unique()

array(['burger', 'chinese', 'crepe', 'regional',
       'hungarian;breakfast;regional', 'hungarian;italian',
       'international', 'hungarian', 'vietnamese', 'korean;asian',
       'turkish', 'mexican', 'french', 'italian', 'hungarian;regional',
       'cake', 'italian_pizza', 'pizza;sandwich', 'regional;fine_dining',
       'burger;pizza;sandwich',
       'hungarian;burger;steak_house;coffee_shop;local', 'pizza;regional',
       'pizza', 'sandwich', 'greek', 'asian;chinese;deli',
       'hungarian;langos;burger', 'italian;pizza', 'japanese',
       'pizza;italian', 'italian;olasz;pizza',
       'pizza;ice_cream;italian;coffee;soup;pasta;dessert',
       'regional;coffee_shop;international', 'seafood',
       'burger;sandwich;grilled chicken;chicken slice;chips',
       'coffee_shop;pasta;sandwich', 'tapas;spanish', 'bbq', 'hot_dog',
       'indian', 'fish;fish_and_chips', 'international;burger',
       'hungarian;italian;mediterranean', 'chinese;japanese', 'kebab',
       'american'

* There seem to be a lot of unique records where multiple cuisines are concatenated, I'll have to break those out into separate rows.

In [14]:
print(bp_cuisines[bp_cuisines['cuisine'] == 'pizza;italian'])

            unique_id        cuisine                          district
136   node/2401121415  pizza;italian   2nd District, Budapest, Hungary
708   node/4340423047  pizza;italian   5th District, Budapest, Hungary
1886  node/6749969867  pizza;italian  12th District, Budapest, Hungary
2125  node/5650234396  pizza;italian  14th District, Budapest, Hungary
2277  node/5298281628  pizza;italian  18th District, Budapest, Hungary


In [16]:
cuisine_split = pd.DataFrame(bp_cuisines['cuisine'].str.split(';').tolist(), index=bp_cuisines['unique_id']).stack()
cuisine_split = cuisine_split.reset_index([0, 'unique_id'])
cuisine_split.columns = ['unique_id', 'cuisine_split']

* Now I'll join back this dataframe to the original `bp_cuisines` dataframe.

In [18]:
bp_cuisines = bp_cuisines.merge(cuisine_split, how='left', on='unique_id')
bp_cuisines.head()

Unnamed: 0,unique_id,cuisine,district,cuisine_split
0,node/339287577,burger,"1st District, Budapest, Hungary",burger
1,node/412953471,chinese,"1st District, Budapest, Hungary",chinese
2,node/648549895,crepe,"1st District, Budapest, Hungary",crepe
3,node/735178492,regional,"1st District, Budapest, Hungary",regional
4,node/735179122,hungarian;breakfast;regional,"1st District, Budapest, Hungary",hungarian


* I can get rid of the concatenated `cuisine` column and rename the `cuisine_split` column.

In [21]:
bp_cuisines = bp_cuisines.drop(columns=['cuisine']).rename(columns={'cuisine_split': 'cuisine'})
bp_cuisines.head()

Unnamed: 0,unique_id,district,cuisine
0,node/339287577,"1st District, Budapest, Hungary",burger
1,node/412953471,"1st District, Budapest, Hungary",chinese
2,node/648549895,"1st District, Budapest, Hungary",crepe
3,node/735178492,"1st District, Budapest, Hungary",regional
4,node/735179122,"1st District, Budapest, Hungary",hungarian


In [30]:
print(
    'There are {0} different restaurants in the dataset that serve {1} cuisines altogether, out of which {2} is unique.'.format(
        len(bp_cuisines['unique_id'].unique()), 
        bp_cuisines.shape[0], 
        len(bp_cuisines['cuisine'].unique())
    )
)

There are 1436 different restaurants in the dataset that serve 1854 cuisines altogether, out of which 133 is unique.


In [None]:
bp_cuisine_counts = bp_cuisines.groupby(['district', 'cuisine']).size().to_frame(name='count').reset_index()

In [None]:
bp_cuisine_counts[bp_cuisine_counts['district'] == '5th District, Budapest, Hungary'].sort_values('count', ascending=False)

In [None]:
max_indices = bp_cuisine_counts.groupby('district')['count'].transform(max) == bp_cuisine_counts['count']
bp_cuisine_counts[max_indices].sort_values('district')

### Visualization
#### Plot
Plot the districts boundaries of Budapest.

In [None]:
places = ox.geocode_to_gdf(budapest_districts)
places = ox.project_gdf(places)
ax = places.plot()
_ = ax.axis('off')