# Dashboard

- Map with top 10 restaurants in the city by rating
- Map with top 10 restaurants in the city by review counts
- Rating changing by postal code? Prove it 
- Graph the distribution of > 4.5* rated and > 10 reviews restaurant by categories



In [0]:
data = spark.read.csv('/FileStore/tables/df_silver.csv', header = True)
display(data)

name,review_count,categories,rating,latitude,price,distance,longitude,postal_code
El Sur,769,Spanish,4.5,40.41105,2.0,1323.9351440693945,-3.6995454,28012
Alhambra,448,Spanish,4.0,40.41587,2.0,847.3790987480967,-3.7016,28012
La Bodega de Los Secretos,31,Spanish,4.5,40.410683,3.0,1375.936331500051,-3.694688,28014
Casa Julio,42,Spanish,4.5,40.42427,2.0,546.9655482728417,-3.70368,28004
Celso y Manolo,182,Spanish,4.5,40.420185,2.0,297.6040361411656,-3.6974769,28004
Taberna El Sur de Huertas,41,Spanish,5.0,40.41366,,1036.183034652163,-3.69943,28014
Malacatín,24,Spanish,4.5,40.41031,,1638.679922735727,-3.70765,28005
Más Al Sur,126,Spanish,4.5,40.41002,2.0,1409.3953170255295,-3.69693,28012
La Casa del Abuelo,161,Spanish,4.0,40.415768,2.0,859.0034856078098,-3.701606,28012
Oink,16,Spanish,4.5,40.42004,1.0,398.07076724778057,-3.700481,28013


In [0]:
data.dtypes

Out[31]: [('name', 'string'),
 ('review_count', 'string'),
 ('categories', 'string'),
 ('rating', 'string'),
 ('latitude', 'string'),
 ('price', 'string'),
 ('distance', 'string'),
 ('longitude', 'string'),
 ('postal_code', 'string')]

In [0]:
data = data.withColumn("rating", data["rating"].cast('float'))
data = data.withColumn("review_count", data["review_count"].cast('float'))
data = data.withColumn("price", data["price"].cast('int'))
data = data.withColumn("latitude", data["latitude"].cast('float'))
data = data.withColumn("longitude", data["longitude"].cast('float'))
data = data.withColumn("postal_code", data["postal_code"].cast('int'))

In [0]:
! pip install folium
import folium

You should consider upgrading via the '/databricks/python3/bin/python -m pip install --upgrade pip' command.[0m


### Query 1
Map with top 10 restaurants in the city by rating.

In [0]:
q1 = data.sort("rating", ascending=False)
df1 = q1.toPandas()
limit = 10
df1 = df1.iloc[0:limit, :]

In [0]:
df1

Unnamed: 0,name,review_count,categories,rating,latitude,price,distance,longitude,postal_code
0,Taberna El Sur de Huertas,41.0,Spanish,5.0,40.413658,,1036.183034652163,-3.69943,28014
1,Vinoteca Moratín,33.0,Spanish,5.0,40.412521,3.0,1168.8084348720097,-3.69541,28014
2,Algarabía,16.0,Spanish,5.0,40.41732,2.0,1245.5718114622448,-3.710374,28013
3,La Garriga,2.0,Spanish,5.0,40.426544,,560.7903673495854,-3.69306,28004
4,El Barril,21.0,Spanish,5.0,40.421989,3.0,630.0871640719441,-3.69023,28001
5,Los Arcos,6.0,Spanish,5.0,40.43787,3.0,1675.278430632665,-3.699283,28010
6,La Tasqueria,21.0,Spanish,5.0,40.422279,3.0,2019.6198202824544,-3.67374,28009
7,Angelita,29.0,Spanish,5.0,40.420277,3.0,372.5374556274898,-3.700387,28004
8,Lakasa,48.0,Spanish,5.0,40.440876,2.0,2020.5902312080937,-3.700715,28003
9,Moline's Grill,1.0,Burgers,5.0,40.408627,,1636.697580764222,-3.692657,28012


In [0]:
# Madrid coordinates
latitude = 40.416775
longitude = -3.703790

# create map
madrid_map = folium.Map(location=[latitude, longitude], zoom_start=12)

In [0]:
# instantiate a feature group for the incidents in the dataframe
restaurants = folium.map.FeatureGroup()

#loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df1.latitude, df1.longitude):
    restaurants.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='yellow',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df1.latitude)
longitudes = list(df1.longitude)
labels = list(df1.name)
for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(madrid_map) 
 
# add restaurants to map
madrid_map.add_child(restaurants)

### Query 2
Map with top 10 restaurants in the city by review counts

In [0]:
q2 = data.sort("review_count", ascending=False)
df2 = q2.toPandas()
limit = 10
df2 = df2.iloc[0:limit, :]

In [0]:
df2

Unnamed: 0,name,review_count,categories,rating,latitude,price,distance,longitude,postal_code
0,El Sur,769.0,Spanish,4.5,40.411049,2.0,1323.9351440693945,-3.699545,28012
1,Botín,615.0,Spanish,4.0,40.414059,3.0,1304.2170679660362,-3.70803,28005
2,Alhambra,448.0,Spanish,4.0,40.415871,2.0,847.3790987480967,-3.7016,28012
3,Takos al Pastor,418.0,Mexican,4.5,40.418964,1.0,671.6237992836653,-3.703647,28013
4,El Tigre,366.0,Spanish,3.5,40.420254,1.0,290.9915674469935,-3.697913,28004
5,Juana la Loca,216.0,Mexican,4.5,40.411358,3.0,1715.626437811679,-3.711094,28005
6,Malaspina,210.0,Spanish,4.0,40.415821,1.0,880.0271748434934,-3.70253,28012
7,Celso y Manolo,182.0,Spanish,4.5,40.420185,2.0,297.6040361411656,-3.697477,28004
8,Rosi La Loca,180.0,Spanish,4.5,40.415813,3.0,906.7764467348136,-3.702979,28012
9,Federal,180.0,Burgers,4.0,40.427052,2.0,1090.3118768624156,-3.70923,28015


In [0]:
# Madrid coordinates
latitude = 40.416775
longitude = -3.703790

# create map
madrid_map = folium.Map(location=[latitude, longitude], zoom_start=12)

In [0]:
# instantiate a feature group for the incidents in the dataframe
restaurants = folium.map.FeatureGroup()

#loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df2.latitude, df2.longitude):
    restaurants.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='yellow',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df2.latitude)
longitudes = list(df2.longitude)
labels = list(df2.name)
for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(madrid_map) 
 
# add restaurants to map
madrid_map.add_child(restaurants)

### Query 3
Rating changing by postal code? Prove it

As we can see from the graph, the ratings don't change much depending on the postal code. They are pretty uniform.

In [0]:
q3 = data.withColumn("postal_code", data["postal_code"].cast('string'))
q3 = q3.groupBy('postal_code').mean('rating')
q3 = q3.sort("avg(rating)", ascending=False)
display(q3)

postal_code,avg(rating)
28045,4.75
28009,4.571428571428571
28002,4.5
28036,4.5
28007,4.5
28042,4.5
28006,4.5
28028,4.5
28003,4.5
28016,4.5


Databricks visualization. Run in Databricks to view.

### Query 4
Graph the distribution of > 4.5* rated and > 10 reviews restaurant by categories

In [0]:
q2 = data.filter( (data.rating >= 4.5) & (data.review_count >=10))
q2 = q2.groupBy('categories').count()
display(q2)

categories,count
Mexican,12
Spanish,33
Burgers,14
Italian,15


Databricks visualization. Run in Databricks to view.

Most of the restaurant with a rating higher than 4.5 and more than 10 reviews fall into the Spanish category.