Skip to content

Use the knowledge of Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Notifications You must be signed in to change notification settings

hatkiet/Module-19-CryptoClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CryptoClustering

In this challenge, I have used our knowledge of Python and unsupervised learning (activities in class) to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Data Source: "crypto_market_data.csv"

Original Data: Original data

  • Use the StandardScaler() module from scikit-learn to normalize the data from the CSV file.

  • Create a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

Original scaled values

Used the Elbow Method to find the best value of k by using original scaled and PCA Datasets

Elbow plots
  • Question: What is the best value for k?

  • Answer: According to the Elbow Curve, the "k" value of 4 is the best option for both, since it represents a strong inflection point.

Optimize Clusters with PCA

PCAs values

Clustered Cryptocurrencies with K-Means using Original Scaled and PCA Dataset

Clusters plots
  • Question: After visually analyzing the cluster analysis results, what is the impact of using fewer features to cluster the data using K-Means?

  • Answer: Based on the elbow curves, it was discovered that employing fewer features yielded comparable outcomes to the initial model. Both models exhibited a peak k-value of 4. However, upon visualizing the data through scatter plots utilizing PCA, disparities surfaced when compared to the original depiction. While utilizing numerous features may seem advantageous, it can sometimes lead to overfitting. On the other hand, using fewer features in the right-hand plot provides a more distinguishable portrayal of clusters. For example, clusters 0 and 2, which seem indistinguishable in the original data plot owing to their proximity, are distinctly separated in the PCA-based plot. Similarly, the distinction between clusters 1 and 2 is more visible in the PCA-based visualization on the right side of the plot.

About

Use the knowledge of Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published