
# CryptoClustering Analysis

In this notebook, we will analyze and cluster cryptocurrencies based on their price changes across various timeframes using **K-means clustering** and **Principal Component Analysis (PCA)**. 

### Objectives
- Normalize cryptocurrency data.
- Use the elbow method to find the optimal number of clusters.
- Perform K-means clustering on the original data and PCA-reduced data.
- Compare clustering results and determine feature weights.

### Dataset Description
The dataset includes the following price change percentages:
- 24 hours (`price_change_percentage_24h`)
- 7 days (`price_change_percentage_7d`)
- 30 days (`price_change_percentage_30d`)
- 60 days (`price_change_percentage_60d`)
- 200 days (`price_change_percentage_200d`)
- 1 year (`price_change_percentage_1y`)


In [None]:

# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load the dataset
file_path = "crypto_market_data.csv"
crypto_data = pd.read_csv(file_path, index_col="coin_id")

# Display summary statistics
print("Summary Statistics:")
print(crypto_data.describe())

# Normalize the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(crypto_data)

# Create a DataFrame with the scaled data
scaled_df = pd.DataFrame(scaled_data, columns=crypto_data.columns, index=crypto_data.index)

# Display the first five rows of the scaled data
print("First five rows of the scaled data:")
print(scaled_df.head())


In [None]:

from sklearn.cluster import KMeans

# Find the best k using the elbow method
k_values = range(1, 12)
inertia = []

for k in k_values:
    kmeans = KMeans(n_clusters=k, random_state=1)
    kmeans.fit(scaled_df)
    inertia.append(kmeans.inertia_)

# Plot the elbow curve
plt.figure(figsize=(10, 6))
plt.plot(k_values, inertia, marker='o')
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.title("Elbow Method to Determine Optimal k")
plt.show()

# Based on the elbow curve, choose the best value for k
best_k = int(input("Enter the best value for k based on the elbow curve: "))


In [None]:

# Perform K-means clustering with the best k
kmeans = KMeans(n_clusters=best_k, random_state=1)
crypto_data["Cluster"] = kmeans.fit_predict(scaled_df)

# Visualize the clusters
crypto_data.plot.scatter(x="price_change_percentage_24h", 
                         y="price_change_percentage_7d", 
                         c="Cluster", 
                         colormap="viridis", 
                         title="Cryptocurrency Clusters (Original Data)")
plt.show()


In [None]:

from sklearn.decomposition import PCA

# Perform PCA to reduce dimensions to 3 principal components
pca = PCA(n_components=3)
pca_data = pca.fit_transform(scaled_df)

# Create a DataFrame with PCA results
pca_df = pd.DataFrame(pca_data, columns=["PC1", "PC2", "PC3"], index=scaled_df.index)

# Explained variance
explained_variance = pca.explained_variance_ratio_.sum()
print(f"Total explained variance by 3 components: {explained_variance:.2f}")

# Perform K-means clustering on PCA-reduced data
kmeans_pca = KMeans(n_clusters=best_k, random_state=1)
pca_df["Cluster"] = kmeans_pca.fit_predict(pca_df)

# Visualize clusters using PCA-reduced data
pca_df.plot.scatter(x="PC1", y="PC2", c="Cluster", colormap="viridis", title="Clusters (PCA Data)")
plt.show()


In [None]:

# Determine the feature weights for each principal component
weights = pd.DataFrame(pca.components_, columns=scaled_df.columns, index=["PC1", "PC2", "PC3"])
print("Feature weights for each principal component:")
print(weights)

# Identify the strongest positive/negative influences
print("Strongest influences on each component:")
for pc in weights.index:
    print(f"{pc}:")
    print(weights.loc[pc].sort_values(ascending=False))
