<a href="https://colab.research.google.com/github/harikanemala/Machine-Learning/blob/main/EM11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:

import numpy as np
import pandas as pd
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# Load the Heart Disease Dataset (using a sample dataset)
# For demonstration purposes, we'll create a synthetic dataset
np.random.seed(42)
data = np.random.rand(200, 5)  # 200 samples, 5 features
columns = ['Age', 'Cholesterol', 'Blood Pressure', 'Max Heart Rate', 'Oldpeak']
df = pd.DataFrame(data, columns=columns)

# Feature scaling
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)

# Applying EM Algorithm (Gaussian Mixture Model)
EM_model = GaussianMixture(n_components=3, random_state=42)
EM_labels = EM_model.fit_predict(df_scaled)

# Applying K-Means Algorithm
KMeans_model = KMeans(n_clusters=3, random_state=42)
KMeans_labels = KMeans_model.fit_predict(df_scaled)

# Comparing the results using Silhouette Score
silhouette_em = silhouette_score(df_scaled, EM_labels)
silhouette_kmeans = silhouette_score(df_scaled, KMeans_labels)

# Displaying the results
print(f'Silhouette Score for EM Algorithm (GMM): {silhouette_em:.2f}')
print(f'Silhouette Score for K-Means Algorithm: {silhouette_kmeans:.2f}')

# Display cluster distribution
print("\nCluster Distribution (EM Algorithm):")
print(pd.Series(EM_labels).value_counts())

print("\nCluster Distribution (K-Means Algorithm):")
print(pd.Series(KMeans_labels).value_counts())

# Comments on Clustering Quality
if silhouette_em > silhouette_kmeans:
    print("\nThe EM Algorithm (GMM) provides better clustering quality.")
else:
    print("\nThe K-Means Algorithm provides better clustering quality.")



Silhouette Score for EM Algorithm (GMM): 0.14
Silhouette Score for K-Means Algorithm: 0.15

Cluster Distribution (EM Algorithm):
1    71
2    66
0    63
Name: count, dtype: int64

Cluster Distribution (K-Means Algorithm):
1    79
2    61
0    60
Name: count, dtype: int64

The K-Means Algorithm provides better clustering quality.
