In this challenge, I have used our knowledge of Python and unsupervised learning (activities in class) to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.
Data Source: "crypto_market_data.csv"
-
Use the StandardScaler() module from scikit-learn to normalize the data from the CSV file.
-
Create a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.
Used the Elbow Method to find the best value of k by using original scaled and PCA Datasets
-
Question: What is the best value for k?
-
Answer: According to the Elbow Curve, the "k" value of 4 is the best option for both, since it represents a strong inflection point.
Optimize Clusters with PCA
Clustered Cryptocurrencies with K-Means using Original Scaled and PCA Dataset
-
Question: After visually analyzing the cluster analysis results, what is the impact of using fewer features to cluster the data using K-Means?
-
Answer: Based on the elbow curves, it was discovered that employing fewer features yielded comparable outcomes to the initial model. Both models exhibited a peak k-value of 4. However, upon visualizing the data through scatter plots utilizing PCA, disparities surfaced when compared to the original depiction. While utilizing numerous features may seem advantageous, it can sometimes lead to overfitting. On the other hand, using fewer features in the right-hand plot provides a more distinguishable portrayal of clusters. For example, clusters 0 and 2, which seem indistinguishable in the original data plot owing to their proximity, are distinctly separated in the PCA-based plot. Similarly, the distinction between clusters 1 and 2 is more visible in the PCA-based visualization on the right side of the plot.
