I am a foodie looking for a place stay in Bahrain. I want to study certain areas in Bahrain and the kind of restaurants that surround them.

I think that a lot of people, not just the youth could benefit from this, since the issue isn't just finding a decent place to stay in Bahrain, but finding one that best serves their culinary interests perhaps.

I mean there are obviously much better factors to look at besides food. However for this problem I want to stick to what I can gain from **Foursquare** with a free license. Thus, by neatly categorizing areas based on their attributes such as frequency of coffee shops, closeness to malls etc; I can can make a better guesstimate of where they might stay.

Foursquare allows us to grab information on venues surrounding a given location, and therefore we will look into the most frequent kind of venues surrounding a given area, and cluster areas them on that.

So let's get started!


<!-- ## Methodology

Since our first and ultimate goal revolves around working with data which are vastly numeric in type such as latitude and longitude values for coordinates, I will leverage the [Pandas](https://pandas.pydata.org/) library to store, load and process this data. We gather the areas in a pandas dataframe for easy processing, and attempt to collect coordinates that can be processed efficiently using such a library.
In order to get the most trusted list of places in Bahrain, we will refer to this [Wikipedia page](https://en.wikipedia.org/wiki/Category:Populated_places_in_Bahrain) as a source to begin with. 

We can extract pieces of text from websites using a popular webscraper called [Beautiful Soup](https://beautiful-soup-4.readthedocs.io/en/latest/). We can use the filter the text from the site for areas in Bahrain. After doing this, we form our first dataframe with that data called `bh_data`.

For futher consideration, we will need the ask questions such as, "does it cover all areas mentioning in the Wikipedia Page?" or "Are there ambigious and confusing elements to those locations like repeated points or missing locations?". Upon ansering these, we can then proceed further.

After we are confident that we have the names of all major areas in Bahrain, we have to geocode them; a process that converts them from a string to latitude & longitude values marking their location on a map. We can use popular geocoding APIs such as [Open Street Map](https://www.openstreetmap.org/) and [Map Quest](https://www.mapquest.com/). We use both of them, since they're the least difficult to get a free account with, and to ensure we don't have any discrepancies or if the name of a particular location doesn't returns an valid result. After geocoding, we store the cooridinates into our existing dataframe `bh_data` as seen below. Our process of data collection has now ended.

| Index |    Area   |  Latitude |  Longitude  |
|:-----:|:---------:|:---------:|:-----------:|
|   0   | A'ali     | 26.154454 | 50.527364   |
|   1   | Abu Baham | 26.205737 | 50.541668   |
|   2   | Abu Saiba | 30.325299 | 48.266157   |
|   3   | Al Garrya | 20.639623 | -100.477387 |
|   4   | Al Hajar  | 26.225405 | 50.59013    |

For the sake of visual analysis, we must utilize mapping libraries like [Folium](https://python-visualization.github.io/folium/) to plot our areas as points on a map, so we can better visualize our findings. This will aid us in our analysis as we can better synthesize our results.

For the second part, we will look at [Foursquare](https://foursquare.com/) to get us the surrounding food places within 500m of the area's center (an arbitrary radius to avoid collision with venues present in other areas).

Foursquare is a service that aids us in obtaining location-based data that details venues, reviews and users' who have reviewed the said venues.
For our use-case, we will look only at the list of food places within each area in Bahrain and their food category.

For the third part of our project, our business problem relies on segmenting areas based on the most common type of food places within the area. This gives us an idea about the type of area it is from a culinary point-of-view, and allowing us to make judgments on whether the food is ideal to our taste or not. We also want to factor in the total number of food places within an area since some places in Bahrain may not be ideal to live in if they don't even have enough places to eat.
 -->

In [35]:
#| include: false
# INSTALLING LIBRARIES
import os
import yaml # Load config file

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from bs4 import BeautifulSoup # To scrap data from Wikipedia

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

from IPython.display import Markdown as md # Convert Python Strings to Markdown text


print('Libraries imported.')

Libraries imported.


In [2]:
#| include: false
def MapRender(save_folder):
    count = 1
    def inner(m):
        nonlocal count
        filename = str(count) + '.html'
        count +=1 
        load_path = os.path.join('assets', save_folder)
        
        try:
            os.mkdir(load_path)
        except OSError as error:
            pass
        
        
        if os.path.exists('../_config.yml'):
            with open('../_config.yml', 'r') as file:
                configvars = yaml.load(file, Loader=yaml.FullLoader)
                baseurl = configvars['baseurl']
        
        save_path = '../' + load_path
        m.save(os.path.join(save_path, filename))

        iframe = f'<iframe src="{baseurl}/{os.path.join(load_path, filename)}" width="100%" height="500px"></iframe>'
        
        return md(iframe)
    return inner
render = MapRender(save_folder='bahrain-segmentation')
    

## Scrap Bahrain Cities/Town Data from Wikipedia

I need to scrap data from **Wikipedia** to lookup towns and cities in Bahrain. We're going to use the popular webscraper **Beautiful Soup** to do that.

In [3]:
url = 'https://en.wikipedia.org/wiki/Category:Populated_places_in_Bahrain'
html_doc = requests.get(url).text # Get HTML Doc
soup = BeautifulSoup(html_doc, 'html.parser') # Parse using bs4
blocks = soup.find_all("div", {"class": "mw-category-group"})[1:]

bh_data=[]
for block in blocks:
    places = block.find('ul').find_all('li')
    for place in places:
        bh_data.append(place.a.text.split(',')[0])

bh_data = pd.DataFrame(bh_data, columns=['Area'])
remove_places = ['Rifa and Southern Region', 'Northern City'] # Exclude these places
bh_data = bh_data[bh_data['Area'].apply(lambda item : item not in remove_places)].reset_index(drop=True)
bh_data.head(5)

Unnamed: 0,Area
0,A'ali
1,Abu Baham
2,Abu Saiba
3,Al Garrya
4,Al Hajar


In [4]:
#| echo: false
md(f"So there are about {bh_data.shape[0]} areas in Bahrain to study.")

So there are about 82 areas in Bahrain to study.

## Retrieving Coordinates via a Geocoder

After that, we need to geocode them; convert them from a simple address to *latitude & longitude* values.

Popular geocoders like OpenStreetMap & Map Quest will be used.

In [5]:
#| output: false
import os
apikey = "API-KEY-XXXXXXXXXXX"
import geocoder

lats = []
lngs = []
for city in bh_data['Area']:
    geocoder_type = 'osm'
    try:
        g = geocoder.osm(f"{city}, Bahrain", key=apikey)
        geodata = g.json
        lats.append(geodata['lat'])
    except:
        geocoder_type = 'MAPQUEST'
        g = geocoder.mapquest(f"{city}, Bahrain", key=apikey)
        geodata = g.json
        lats.append(geodata['lat'])
    lngs.append(geodata['lng'])
    print(city, "|", geocoder_type)
bh_data['Latitude'] = lats
bh_data['Longitude'] = lngs

A'ali | osm
Abu Baham | osm
Abu Saiba | osm
Al Garrya | MAPQUEST
Al Hajar | osm
Al Kharijiya | MAPQUEST
Al Markh | osm
Al Musalla | osm
Al Qadam | MAPQUEST
Al Qala | osm
Al Qurayyah | MAPQUEST
Amwaj Islands | osm
Arad | osm
Askar | osm
Awali | osm
Budaiya | osm
Jid Ali | osm
Bahrain Bay | osm
Bani Jamra | MAPQUEST
Barbar | osm
Bilad Al Qadeem | osm
Bu Quwah | osm
Buri | osm
Busaiteen | osm
Al Daih | osm
Al Dair | osm
Dar Kulaib | osm
Diplomatic Area | osm
Diraz | osm
Diyar Al Muharraq | osm
Dumistan | MAPQUEST
Durrat Al Bahrain | osm
East Hidd City | osm
Eker | osm
Galali | osm
Al Hidd | osm
Halat Bu Maher | osm
Halat Nuaim | MAPQUEST
Hamad Town | osm
Hamala | osm
Hawar Islands | osm
Hillat Abdul Saleh | MAPQUEST
Isa Town | osm
Janabiyah | osm
Jannusan | osm
Jasra | osm
Jaww | osm
Jid Al-Haj | osm
Jidhafs | osm
Jurdab | MAPQUEST
Karbabad | osm
Karrana | MAPQUEST
Karzakan | osm
Khamis | osm
Ma'ameer | osm
Mahazza | MAPQUEST
Malkiya | osm
Manama | osm
Marquban | MAPQUEST
Muharraq | osm
M

These are the first few of them that were geocoded!

In [6]:
#| echo: false
bh_data.head()

Unnamed: 0,Area,Latitude,Longitude
0,A'ali,26.154454,50.527364
1,Abu Baham,26.205737,50.541668
2,Abu Saiba,30.325299,48.266157
3,Al Garrya,26.23269,50.57811
4,Al Hajar,26.225405,50.590138


In [7]:
#| include: false
# Let's store them as part of the project so we don't have to re-do this step

# Store coordinates as CSV
# bh_data.to_csv("assets/bahrain-locations.csv", index=False)

In [36]:
#| include: false
# Load Coordinates Data
bh_data = pd.read_csv("bahrain-locations.csv")

## Visualization on a Map

We will now use **Folium** to visualize the map of Bahrain along with each area as points on the map

In [37]:
# create map of Bahrain using latitude and longitude values
latitude, longitude = 26.0766404, 50.334118

map_bahrain = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city in zip(bh_data['Latitude'], bh_data['Longitude'],
                                           bh_data['Area']):
    
    label = folium.Popup(city, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_bahrain)  
map_bahrain

## Foursquare: Exploring Areas for Food Places

Futhermore, we'll leverage the Foursquare API to gather the most common types of restaurants associated with an area within 500m of its center. We'll then look at various food places and restaurants and extract their types for further analysis.

Note: To filter only restaurants & food places, we will use the specific "Food" category ID : `4d4b7105d754a06374d81259`

In [11]:
food_categoryId = "4d4b7105d754a06374d81259"

In [12]:
#| include: false
CLIENT_ID = 'CLIENT_ID_XXXXXXXXXXXXXXXXXXXXXXX' # My Foursquare ID
CLIENT_SECRET = 'CLIENT_SECRET_XXXXXXXXXXXXXX' # My Foursquare Secret
ACCESS_TOKEN = 'ACCESS_TOKEN_XXXXXXXXXXXX' # My FourSquare Access Token
VERSION = '20180604'
LIMIT = 1000

Alright, let's look at **all food places** surrouding the first area within a **500m** radius

In [38]:
#| echo: false
md(f"... which happens to be {bh_data.loc[0,'Area']}")

... which happens to be A'ali

In [14]:
radius = 500
lat, lng = bh_data[['Latitude', 'Longitude']].iloc[0].values

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, food_categoryId, radius, LIMIT)
results = requests.get(url).json()

Looking at the first food place in the `results` json, we get this output:

In [15]:
results['response']['venues'][0]

{'id': '4e99da8f8231878c15393aa2',
 'name': 'Costa Coffee',
 'location': {'lat': 26.157464331750106,
  'lng': 50.52587327276449,
  'labeledLatLngs': [{'label': 'display',
    'lat': 26.157464331750106,
    'lng': 50.52587327276449}],
  'distance': 366,
  'cc': 'BH',
  'city': 'Madīnat ‘Īsá',
  'state': 'al Muḩāfaz̧ah Al Janūbīyah',
  'country': 'البحرين',
  'formattedAddress': ['Madīnat ‘Īsá', 'البحرين']},
 'categories': [{'id': '4bf58dd8d48988d1e0931735',
   'name': 'Coffee Shop',
   'pluralName': 'Coffee Shops',
   'shortName': 'Coffee Shop',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
    'suffix': '.png'},
   'primary': True}],
 'referralId': 'v-1631999729',
 'hasPerk': False}

In [16]:
#| echo: false
venue = results['response']['venues'][0]
name = venue['name']
category = venue['categories'][0]['name']
md(f"Thie first venue is {name}, and has a category: `{category}`")

Thie first venue is Costa Coffee, and has a category: `Coffee Shop`

So now let's build a helpful function to **extract the category** of each food place. We'll use the same area as an example.

In [17]:
# function that extracts the category of the restaurant
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['venues']
    
nearby_food = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_food = nearby_food.loc[:, filtered_columns]

# filter the category for each row
nearby_food['categories'] = nearby_food.apply(get_category_type, axis=1)

# clean columns
nearby_food.columns = [col.split(".")[-1] for col in nearby_food.columns]

nearby_food.head()

Unnamed: 0,name,categories,lat,lng
0,Costa Coffee,Coffee Shop,26.157464,50.525873
1,Chilis Aali,Diner,26.152996,50.526268
2,Loop Cafe,Café,26.156017,50.531527
3,Hospital Resturant (كافيتيريا المستشفى),Restaurant,26.153012,50.526232
4,كفتيريا المستشفى,Restaurant,26.153455,50.528375


In [18]:
#| echo: false
md(f"These are some of them, in total it returns **{nearby_food.shape[0]}** "
"food places around "
f"**{bh_data.loc[0,'Area']}**.")

These are some of them, in total it returns **19** food places around **A'ali**.

## Exploring All Areas

In [39]:
#| echo: false
md(f"We've got still got {bh_data.shape[0]} places to explore, so let's create a function to do this task much faster.")

We've got still got 82 places to explore, so let's create a function to do this task much faster.

In [20]:
def getNearbyFoods(names, latitudes, longitudes, radius=500):
    food_categoryId = "4d4b7105d754a06374d81259"
    foods_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            food_categoryId,
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['venues']
        except:
            print(results)
            raise KeyError
        
        venue_list = []
        # return only relevant information for each nearby food place
        for v in results:
            vname, vlat, vlng = v['name'], v['location']['lat'], v['location']['lng']
            try:
                vcategory = v['categories'][0]['name']
                venue_list.append((name, 
                                    lat, 
                                    lng,
                                    vname, 
                                    vlat,
                                    vlng,
                                    vcategory))
            except:
                continue
        foods_list.append(venue_list)
    nearby_foods = pd.DataFrame([item for venue_list in foods_list for item in venue_list])
    nearby_foods.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_foods)

We run the above function on each area and create a new dataframe called `bh_foods`.

In [21]:
#| output: false
bh_food = getNearbyFoods(bh_data['Area'], bh_data['Latitude'], 
                                   bh_data['Longitude'], 500)

A'ali
Abu Baham
Abu Saiba
Al Garrya
Al Hajar
Al Kharijiya
Al Markh
Al Musalla
Al Qadam
Al Qala
Al Qurayyah
Amwaj Islands
Arad
Askar
Awali
Budaiya
Jid Ali
Bahrain Bay
Bani Jamra
Barbar
Bilad Al Qadeem
Bu Quwah
Buri
Busaiteen
Al Daih
Al Dair
Dar Kulaib
Diplomatic Area
Diraz
Diyar Al Muharraq
Dumistan
Durrat Al Bahrain
East Hidd City
Eker
Galali
Al Hidd
Halat Bu Maher
Halat Nuaim
Hamad Town
Hamala
Hawar Islands
Hillat Abdul Saleh
Isa Town
Janabiyah
Jannusan
Jasra
Jaww
Jid Al-Haj
Jidhafs
Jurdab
Karbabad
Karrana
Karzakan
Khamis
Ma'ameer
Mahazza
Malkiya
Manama
Marquban
Muharraq
Muqaba
Muqsha
Nabih Saleh
Nurana Islands
Nuwaidrat
Riffa
Reef Island
Sadad
Sakhir
Salmabad
Samaheej
Sanad
Sar
Sehla
Shahrakan
Shakhura
Sitra
Sufala
Tashan
Tubli
Umm an Nasan
Zallaq


In [22]:
bh_food.head()

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,A'ali,26.154454,50.527364,Costa Coffee,26.157464,50.525873,Coffee Shop
1,A'ali,26.154454,50.527364,Chilis Aali,26.152996,50.526268,Diner
2,A'ali,26.154454,50.527364,Loop Cafe,26.156017,50.531527,Café
3,A'ali,26.154454,50.527364,كفتيريا المستشفى,26.153455,50.528375,Restaurant
4,A'ali,26.154454,50.527364,Hospital Resturant (كافيتيريا المستشفى),26.153012,50.526232,Restaurant


In [40]:
#| include: false
# Load Coordinates Data
# bh_food = pd.read_csv("bahrain-foods.csv")

In [42]:
#| echo: false
md(f"This gives us a whopping {bh_food.shape[0]} food places")

This gives us a whopping 1962 food places

We can study the count for each area...


In [24]:
bh_food.groupby('Area').count().head()

Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A'ali,19,19,19,19,19,19
Abu Baham,9,9,9,9,9,9
Al Daih,50,50,50,50,50,50
Al Dair,9,9,9,9,9,9
Al Garrya,50,50,50,50,50,50


We've trimmed out the remaining areas for brevity's sake.

In [25]:
#| echo: false
md(f"What's interesting to notice from this data, is that there are "
    f"{len(bh_food['Venue Category'].unique())} unique categories for food.")

What's interesting to notice from this data, is that there are 88 unique categories for food.

In [26]:
#| echo: false
md(f"Some of them include: {', '.join(bh_food['Venue Category'].unique()[:5])} "
  "and so on.")

Some of them include: Coffee Shop, Diner, Café, Restaurant, Breakfast Spot and so on.

## Most Common Food Places
Our solution relies on segmenting areas based on the most common type of food places within that area. This gives us an idea about the kind of area it is from a culinary point-of-view, and allowing us to make judgments on whether the food is ideal to our taste or not. We also want to factor in the total number of food places within an area since some places in Bahrain may not be ideal to live in if they don't even have enough places to eat.

Using the dataframe `bh_food`, we form a one-hot encoding of the `Venue Category` field that produces new columns for each category. Each record in this table corresponds to a certain venue and a `1` is placed in the category field for that area. The only other field that is retained is the area name. We will call this `bh_onehot`.

In [27]:
# one hot encoding
bh_onehot = pd.get_dummies(bh_food[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bh_onehot = pd.concat([bh_food[['Area']], bh_onehot], axis=1) 

bh_onehot.head()

Unnamed: 0,Area,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Chaat Place,Chinese Restaurant,Coffee Shop,College Lab,Comfort Food Restaurant,Creperie,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Dessert Shop,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Gas Station,Gastropub,Greek Restaurant,Halal Restaurant,Hookah Bar,Hot Dog Joint,Ice Cream Shop,Indian Restaurant,Iraqi Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,Movie Theater,New American Restaurant,Noodle House,Pastry Shop,Persian Restaurant,Pie Shop,Pizza Place,Portuguese Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shawarma Place,Snack Place,South Indian Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,A'ali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,A'ali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,A'ali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,A'ali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,A'ali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
#| include: false
# And let's examine the new dataframe size.
bh_onehot.shape

(1962, 89)

Now, let's group rows by area and by taking the mean of the frequency of occurrence of each category, along with the number of food places surrouding it (`NumberOfFoodPlaces`).


Looking at the number of food places is significant considering that some areas have fewer restaurants, and could be a valid factor to segment, if a "foodie" is looking for a place to stay.

In [29]:
bh_grouped = bh_onehot.groupby(['Area']).mean().reset_index()
bh_grouped['NumberOfFoodPlaces'] = bh_onehot[['Area']].value_counts(sort=False).values
bh_grouped.head()

Unnamed: 0,Area,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Chaat Place,Chinese Restaurant,Coffee Shop,College Lab,Comfort Food Restaurant,Creperie,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Dessert Shop,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Gas Station,Gastropub,Greek Restaurant,Halal Restaurant,Hookah Bar,Hot Dog Joint,Ice Cream Shop,Indian Restaurant,Iraqi Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,Movie Theater,New American Restaurant,Noodle House,Pastry Shop,Persian Restaurant,Pie Shop,Pizza Place,Portuguese Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shawarma Place,Snack Place,South Indian Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,NumberOfFoodPlaces
0,A'ali,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.210526,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,19
1,Abu Baham,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9
2,Al Daih,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.14,0.0,0.1,0.0,0.0,0.04,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,50
3,Al Dair,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.555556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9
4,Al Garrya,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.02,0.0,0.12,0.0,0.0,0.02,0.0,0.02,0.06,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.06,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.18,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,50


Let's call this this `bh_grouped`. Now that we have this processed information, we can analyze this data more clearly by reordering it so that only the 10 most common type of food places for an area are retained.

In [30]:
#| include: false
# Let's confirm the new size
bh_grouped.shape

(72, 90)

In [31]:
# Function to sort venues by most common ones
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:-1]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area', 'NumberOfFoodPlaces']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Food Place'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Food Place'.format(ind+1))

# create a new dataframe
foods_sorted = pd.DataFrame(columns=columns)
foods_sorted[['Area','NumberOfFoodPlaces']] = bh_grouped[['Area','NumberOfFoodPlaces']]

for ind in np.arange(bh_grouped.shape[0]):
    foods_sorted.iloc[ind, 2:] = return_most_common_venues(bh_grouped.iloc[ind, :], num_top_venues)

# Get the count    
foods_sorted.head()

Unnamed: 0,Area,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
0,A'ali,19,Café,Restaurant,Coffee Shop,Cupcake Shop,Breakfast Spot,Food,Sandwich Place,Falafel Restaurant,Middle Eastern Restaurant,Bakery
1,Abu Baham,9,Middle Eastern Restaurant,Cafeteria,Ice Cream Shop,Donut Shop,BBQ Joint,Restaurant,Fish & Chips Shop,Mediterranean Restaurant,Afghan Restaurant,New American Restaurant
2,Al Daih,50,Middle Eastern Restaurant,Bakery,Breakfast Spot,Dessert Shop,Café,Restaurant,Turkish Restaurant,Burger Joint,Diner,Steakhouse
3,Al Dair,9,Bakery,Restaurant,BBQ Joint,Italian Restaurant,Fast Food Restaurant,Afghan Restaurant,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater
4,Al Garrya,50,Restaurant,Breakfast Spot,Indian Restaurant,Coffee Shop,Filipino Restaurant,Café,Fried Chicken Joint,Fast Food Restaurant,Diner,Middle Eastern Restaurant


Let's call this table `foods_sorted`.

## Cluster Areas


Now we are ready for further analysis and clustering. We will use the `bh_grouped` dataframe since it contains the necessary numerical values for machine learning.

Our feature set is comprised of all the food categories (10 features).

We are excluding the `NumberOfFoodPlaces` feature as input to the ML model, since our problem requires segmenting areas by the type of food available. This quantity is only relevant to us to finally decide whether to live in an area or not.

A more concrete reason to exclude it, is the fact that there are all sorts of factors involved that we're **neglecting** due to lack of data, such as living costs, access to public transport etc. 

This is a foodie's guide to finding a place, and this venture shouldn't be bogged-down by the fact that there are sometimes fewer number of restaurants than one would expect.

Our target value will be **cluster labels**.

For our machine learning analysis, we will use the simplest clustering algorithm to separate the areas which is **K-Means Clustering**; an unsupervised machine learning approach to serve our purpose. We'll use the popular machine learning library Sci-Kit Learn to do that in python.

We'll run _k_-means to group the areas into 5 clusters. We pick this number for the sake of examination. We'll fit the model on the entire data to learn these clusters.

In [32]:
# set number of clusters
kclusters = 5

bh_grouped_clustering = bh_grouped.drop(['Area','NumberOfFoodPlaces'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 2, 0, 0, 0, 0, 1, 1], dtype=int32)

Let's create a new dataframe `bh_merged` that includes the cluster as well as the top 10 food places for each area.


In [33]:
# add clustering labels
try:
    foods_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
except:
    # Allows me to retry if the Cluster Labels column exists
    foods_sorted['Cluster Labels'] = kmeans.labels_

bh_merged = bh_data

# merge bh_grouped with bh_data to add latitude/longitude for each neighborhood
bh_merged = bh_merged.join(foods_sorted.set_index('Area'), on='Area')
bh_merged.dropna(how='any', axis=0, inplace=True)
bh_merged['Cluster Labels'] = bh_merged['Cluster Labels'].astype(np.int32)
bh_merged.head() # check the last columns!

Unnamed: 0,Area,Latitude,Longitude,Cluster Labels,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
0,A'ali,26.154454,50.527364,1,19.0,Café,Restaurant,Coffee Shop,Cupcake Shop,Breakfast Spot,Food,Sandwich Place,Falafel Restaurant,Middle Eastern Restaurant,Bakery
1,Abu Baham,26.205737,50.541668,0,9.0,Middle Eastern Restaurant,Cafeteria,Ice Cream Shop,Donut Shop,BBQ Joint,Restaurant,Fish & Chips Shop,Mediterranean Restaurant,Afghan Restaurant,New American Restaurant
3,Al Garrya,26.23269,50.57811,0,50.0,Restaurant,Breakfast Spot,Indian Restaurant,Coffee Shop,Filipino Restaurant,Café,Fried Chicken Joint,Fast Food Restaurant,Diner,Middle Eastern Restaurant
4,Al Hajar,26.225405,50.590138,0,49.0,Café,Filipino Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Coffee Shop,Asian Restaurant,Indian Restaurant,Pizza Place,BBQ Joint,Restaurant
5,Al Kharijiya,26.16023,50.60914,0,16.0,Cafeteria,Asian Restaurant,Fast Food Restaurant,Bakery,Wings Joint,Pizza Place,Falafel Restaurant,Middle Eastern Restaurant,Café,Food Court


Finally, let's visualize the resulting clusters


In [None]:
#| echo: false
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for th12e clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bh_merged['Latitude'], bh_merged['Longitude'], bh_merged['Area'], bh_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [35]:
#| echo: false
map_clusters

## Examine Clusters & Final Conclusion


Now, we can examine & determine the discriminating characteristics of each cluster.

#### Cluster 1


In [36]:
cluster1 = bh_merged.loc[bh_merged['Cluster Labels'] == 0, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster1

Unnamed: 0,Area,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
1,Abu Baham,9.0,Middle Eastern Restaurant,Cafeteria,Ice Cream Shop,Donut Shop,BBQ Joint,Restaurant,Fish & Chips Shop,Mediterranean Restaurant,Afghan Restaurant,New American Restaurant
3,Al Garrya,50.0,Restaurant,Breakfast Spot,Indian Restaurant,Coffee Shop,Filipino Restaurant,Café,Fried Chicken Joint,Fast Food Restaurant,Diner,Middle Eastern Restaurant
4,Al Hajar,49.0,Café,Filipino Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Coffee Shop,Asian Restaurant,Indian Restaurant,Pizza Place,BBQ Joint,Restaurant
5,Al Kharijiya,16.0,Cafeteria,Asian Restaurant,Fast Food Restaurant,Bakery,Wings Joint,Pizza Place,Falafel Restaurant,Middle Eastern Restaurant,Café,Food Court
12,Arad,50.0,Middle Eastern Restaurant,Restaurant,Dessert Shop,Burger Joint,Café,Fast Food Restaurant,Diner,Ice Cream Shop,Bakery,Sandwich Place
15,Budaiya,36.0,Middle Eastern Restaurant,Bakery,Cafeteria,Burger Joint,Seafood Restaurant,Tea Room,Café,Sandwich Place,Ice Cream Shop,Restaurant
16,Jid Ali,48.0,Restaurant,Middle Eastern Restaurant,Café,Italian Restaurant,Coffee Shop,Breakfast Spot,Dessert Shop,Pizza Place,Diner,Seafood Restaurant
18,Bani Jamra,19.0,Breakfast Spot,Restaurant,Café,Cafeteria,Middle Eastern Restaurant,Vegetarian / Vegan Restaurant,Bakery,Snack Place,Indian Restaurant,Fast Food Restaurant
19,Barbar,6.0,Pizza Place,BBQ Joint,Bakery,Sandwich Place,Middle Eastern Restaurant,Juice Bar,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater
21,Bu Quwah,12.0,Turkish Restaurant,Cafeteria,Food,Pizza Place,Asian Restaurant,Bakery,Restaurant,Falafel Restaurant,Seafood Restaurant,Coffee Shop


In [37]:
#| echo: false
md(f"This cluster has {cluster1.shape[0]} areas")

This cluster has 34 areas

#### Cluster 2


In [38]:
cluster2 = bh_merged.loc[bh_merged['Cluster Labels'] == 1, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster2

Unnamed: 0,Area,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
0,A'ali,19.0,Café,Restaurant,Coffee Shop,Cupcake Shop,Breakfast Spot,Food,Sandwich Place,Falafel Restaurant,Middle Eastern Restaurant,Bakery
6,Al Markh,17.0,Café,Fast Food Restaurant,Burger Joint,Ice Cream Shop,Juice Bar,Coffee Shop,Middle Eastern Restaurant,Dessert Shop,Bakery,BBQ Joint
7,Al Musalla,17.0,Fast Food Restaurant,Café,Coffee Shop,Middle Eastern Restaurant,Seafood Restaurant,Steakhouse,Food Court,Restaurant,Pizza Place,Japanese Restaurant
8,Al Qadam,4.0,Pizza Place,Café,Cafeteria,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant,Mexican Restaurant
9,Al Qala,50.0,Café,Burger Joint,Restaurant,Dessert Shop,Coffee Shop,Sandwich Place,Bakery,Juice Bar,Ice Cream Shop,Pizza Place
10,Al Qurayyah,11.0,Restaurant,Café,Mediterranean Restaurant,Middle Eastern Restaurant,Comfort Food Restaurant,Coffee Shop,Diner,Japanese Restaurant,Juice Bar,Pastry Shop
11,Amwaj Islands,37.0,Café,Middle Eastern Restaurant,American Restaurant,Indian Restaurant,Restaurant,Asian Restaurant,Pizza Place,Deli / Bodega,Diner,Portuguese Restaurant
13,Askar,4.0,Cafeteria,Burger Joint,Café,Pie Shop,Pastry Shop,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant
17,Bahrain Bay,50.0,Coffee Shop,Café,Indian Restaurant,Pizza Place,Steakhouse,Burger Joint,Middle Eastern Restaurant,Restaurant,Fried Chicken Joint,American Restaurant
20,Bilad Al Qadeem,46.0,Café,Middle Eastern Restaurant,Ice Cream Shop,Breakfast Spot,Sandwich Place,Fast Food Restaurant,Pizza Place,Burger Joint,Restaurant,Bakery


In [39]:
#| echo: false
md(f"This cluster has {cluster2.shape[0]} areas")

This cluster has 29 areas

#### Cluster 3


In [40]:
cluster3 = bh_merged.loc[bh_merged['Cluster Labels'] == 2, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster3

Unnamed: 0,Area,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
25,Al Dair,9.0,Bakery,Restaurant,BBQ Joint,Italian Restaurant,Fast Food Restaurant,Afghan Restaurant,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater
44,Jannusan,11.0,Bakery,Gastropub,Cafeteria,Indian Restaurant,Sandwich Place,Fish & Chips Shop,Seafood Restaurant,Snack Place,New American Restaurant,Movie Theater
46,Jaww,1.0,Bakery,Afghan Restaurant,Korean Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant,Mexican Restaurant,Mediterranean Restaurant
54,Ma'ameer,3.0,Creperie,Diner,Bakery,Afghan Restaurant,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant
76,Sitra,2.0,Turkish Restaurant,Bakery,Afghan Restaurant,Korean Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant,Mexican Restaurant


In [41]:
#| echo: false
md(f"This cluster has {cluster3.shape[0]} areas")

This cluster has 5 areas

#### Cluster 4


In [42]:
cluster4 = bh_merged.loc[bh_merged['Cluster Labels'] == 3, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster4

Unnamed: 0,Area,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
14,Awali,1.0,Café,Afghan Restaurant,Persian Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant,Mexican Restaurant,Mediterranean Restaurant


In [43]:
#| echo: false
md(f"This cluster has {cluster4.shape[0]} area")

This cluster has 1 area

#### Cluster 5

In [44]:
cluster5 = bh_merged.loc[bh_merged['Cluster Labels'] == 4, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster5

Unnamed: 0,Area,NumberOfFoodPlaces,1st Most Common Food Place,2nd Most Common Food Place,3rd Most Common Food Place,4th Most Common Food Place,5th Most Common Food Place,6th Most Common Food Place,7th Most Common Food Place,8th Most Common Food Place,9th Most Common Food Place,10th Most Common Food Place
26,Dar Kulaib,4.0,Restaurant,Coffee Shop,Sandwich Place,Breakfast Spot,Afghan Restaurant,Lebanese Restaurant,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant
56,Malkiya,2.0,Ice Cream Shop,Coffee Shop,Afghan Restaurant,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant,Mexican Restaurant
77,Sufala,3.0,Coffee Shop,Restaurant,Afghan Restaurant,Lebanese Restaurant,Noodle House,New American Restaurant,Movie Theater,Moroccan Restaurant,Middle Eastern Restaurant,Mexican Restaurant


In [45]:
#| echo: false
md(f"This cluster has {cluster5.shape[0]} areas")

This cluster has 3 areas

### Conclusion

Phew! We're done with finding our clusters, and finding out which areas fall into it.
To understand the constraints and my discussion to conclude this solution, please refer to my report available on my [github repo](https://github.com/isados/neighborhood-clusters/blob/main/report.md).

I hope you've enjoyed reading & learning something new from this post. Doing this was part of my data-science course, and I hope you can do the same with your hobby projects.

Until next time, cheers!