<a href="https://colab.research.google.com/github/sahu-vishal/Clustering-Algorithm/blob/main/Clustering_Algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Problem Statement**

We’re working on geolocation data in this project. As a logistic company, at TheLorry, we need to send the parcel to our customers. We will use our information, which is the address of our client. We need to do pre-processing, translating an address to the geolocation with latitude and longitude. We have a service called Geo Parser which can be used to convert it more efficiently because we added the Artificial Intelligence (AI) in that services and not necessarily call Google Geocoding API if that address already exists in our database. So, for that, we can be saved money.
For this tutorial, we will use the Google Geocoding API service to do that for simplicity.

We are using our customer geolocation data to perform a clustering algorithm to get several clusters in which the member data of each cluster are closest to each other. We assume each cluster contains the parcel to which the driver should be delivered. So the driver should be travel in a certain closet area only.

In [5]:
import pandas as pd

file_url = "https://raw.githubusercontent.com/tribasuki74/clustering01/main/dataset/geolocation.csv"

data = pd.read_csv(file_url)
features = data[['lat', 'lng']]
print(features)

            lat       lng
0    101.671521  3.108459
1    101.642770  3.166217
2    101.674980  3.058430
3    101.706784  3.146544
4    101.700153  3.072845
..          ...       ...
448  101.666171  3.110972
449  101.714256  3.124467
450  101.642516  3.168135
451  101.708522  3.156022
452  101.676220  3.121262

[453 rows x 2 columns]


In [3]:
from sklearn.cluster import KMeans

# create kmeans model/object
kmeans = KMeans(
    init="random",
    n_clusters=16,
    n_init=10,
    max_iter=300,
    random_state=42
)

In [6]:
# do clustering
kmeans.fit(features)

# save results
labels = kmeans.labels_

In [7]:
# send back into dataframe and display it
data['cluster'] = labels

# display the number of mamber each clustering
_clusters = data.groupby('cluster')['no'].count()
print(_clusters)

cluster
0     26
1     20
2     24
3     28
4     30
5     29
6     34
7     38
8     30
9     21
10    28
11    14
12    56
13    17
14    32
15    26
Name: no, dtype: int64


In [8]:
import folium

colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'lightred', \
     'beige', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple', \
     'pink', 'lightblue', 'lightgreen', 'gray', 'black', 'lightgray', 'red', 'blue', 'green', 'purple', 'orange', 'darkred', 'lightred', \
     'beige', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple', \
     'pink', 'lightblue', 'lightgreen', 'gray', 'black', 'lightgray' ]

lat = data.iloc[0]['lat']
lng = data.iloc[0]['lng']
map = folium.Map(location=[lng, lat], zoom_start=12)

for _, row in data.iterrows():
    folium.CircleMarker(location=[row["lng"], row["lat"]], 
                        radius=12, weight=2, fill=True, fill_color=colors[int(row["cluster"])], \
                        color=colors[int(row["cluster"])]).add_to(map)

map