# **ML- Lith-ion power - Unsupervised learning  - KMeans Clustering**

## PROBELM STATEMENT
Lith-ion power is the largest producer of electric vechicle(e-vehicle) batteries.

They provide batteries on rent to e-vechicle drivers. Drivers rent a battery typically for a day and thereafter replacing it with a charged battery from the company.

Lith-ion power has a variable pricing model based on the driver's driving history. Battery life depends on factors such as over speeding, distance driven per day, etc.

**`Created a cluster model where drivers were grouped together based on the driving data and group the datapoints so that drivers will be incentivized based on the cluster.`**

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go

from sklearn.cluster import KMeans
from sklearn import metrics

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('driver-data.csv')
df.head()

Unnamed: 0,id,mean_dist_day,mean_over_speed_perc
0,3423311935,71.24,28
1,3423313212,52.53,25
2,3423313724,64.54,27
3,3423311373,55.69,22
4,3423310999,54.58,25


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 3 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    4000 non-null   int64  
 1   mean_dist_day         4000 non-null   float64
 2   mean_over_speed_perc  4000 non-null   int64  
dtypes: float64(1), int64(2)
memory usage: 93.9 KB


In [None]:
df_fig = px.scatter(df, 
                 x='mean_dist_day', 
                 y='mean_over_speed_perc', 
                 title='Scatter Plot fr mean_dist_day Vs mean_over_speed_percentage',
                 height=800,            
                 template="plotly_dark"
                )
df_fig.show()

In [None]:
df = df.drop(columns='id',axis=1)

In [None]:
df1 = df.copy()

In [None]:
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(df1)

In [None]:
unique, counts= np.unique(kmeans.labels_, return_counts=True)

In [None]:
dict_data = dict(zip(unique, counts))
dict_data

## Metrics

In [None]:
cluster_centers = kmeans.cluster_centers_
cluster_centers

In [None]:
kmeans.labels_

In [None]:
# Get the unique cluster labels assigned by KMeans
unique_labels = np.unique(kmeans.labels_)

print("Unique Cluster Labels:", unique_labels)

In [None]:
# Inertia
inertia = kmeans.inertia_
print("Inertia:", inertia)

# Silhouette Score
silhouette_score = metrics.silhouette_score(df, kmeans.labels_)
print("\nSilhouette Score:", silhouette_score)

# Davies-Bouldin Index
davies_bouldin_index = metrics.davies_bouldin_score(df, kmeans.labels_)
print("\nDavies-Bouldin Index:", davies_bouldin_index)

# Calinski-Harabasz Index
calinski_harabasz_index = metrics.calinski_harabasz_score(df, kmeans.labels_)
print("\nCalinski-Harabasz Index:", calinski_harabasz_index)


In [None]:
df1['cluster'] = kmeans.labels_

In [None]:
df1.sample(10)

In [None]:
fig = px.scatter(df1, 
                 x='mean_dist_day', 
                 y='mean_over_speed_perc', 
                 color='cluster', 
                 title='KMeans Clustering',
                 height=800,            
                 template="plotly_dark"
                )

# Create a scatter trace for the cluster centers
cluster_trace = go.Scatter(x=cluster_centers[:, 0],
                           y=cluster_centers[:, 1],
                           mode='markers',
                           marker=dict(color='red', size=10, symbol='cross'),
                           name='Cluster Centers'
                          )

# Add the cluster centers trace to the Plotly figure
fig.add_trace(cluster_trace)

fig.show()

In [None]:
fig_3d = px.scatter_3d(df1, 
                    x='mean_dist_day', 
                    y='mean_over_speed_perc', 
                    z='cluster', 
                    color='cluster',
                    symbol='cluster',
                    title='3D Scatter Plot with Clusters',
                    height=900,
                    width = 1200
                   )

fig_3d.show()

## Prediction

In [None]:
test = df1.sample(10).reset_index(drop = True)
test_df = test.copy()
test1 = test.drop(columns='cluster',axis=1)
test1

In [None]:
pred = kmeans.predict(test1)
pred_df = pd.DataFrame({'prediction':pred})
pred_df 

In [None]:
model_result = pd.merge(test_df,pred_df,right_index=True,left_index=True)
model_result 