Cryptocurrency Clusters

Here is a hypothetical problem set that allows an analyst to dig into unsupervised learning while showing some skills in pre-processing data in pandas.

Background

You are on the Advisory Services Team of a financial consultancy. One of your clients, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. They’ve asked you to create a report that includes what cryptocurrencies are on the trading market and determine whether they can be grouped to create a classification system for this new investment.
You have been handed raw data, so you will first need to process it to fit the machine learning models. Since there is no known classification system, you will need to use unsupervised learning.

Data Preparation

The dataset for this project was obtained from CryptoCompare.
Once I loaded in the data, I took the following pre-processing steps to have the dataset make sense for an unsupervised learning model.
- Removed all rows that have at least one null value.
- Filtered for cryptocurrencies that have been mined. That is, the total coins mined should be greater than zero.
- Deleted the CoinName from the original dataframe to make all data numeric.
- Converted the remaining features with text values, Algorithm and ProofType, into numerical data.
- Standardized the dataset using scaling so that columns that contained larger values did not unduly influence the outcome.

Dimensionality Reduction

Creating dummy variables dramatically increased the number of features in the dataset. Performing dimensionality reduction with PCA can bring the number of features back down, reducing the complexity of the model. Rather than specify the number of principal components, I defined the level of variance I wanted described by the model, .90, when setting the parameters for the PCA. Performing PCA reduced my feature set from 109 to 84 features.
Next, I ran my PCA dataset through a t-SNE model and visually inspected the results, looking for clusters. As can be observed in my Jupyter Notebook file, I did in fact find clusters of crypto using the t-SNE model, suggesting that crypto assets could be classifed.

Cluster Analysis with k-Means

Using further clustering analysis with the k-Means technique, I was able to determine that the optimal amount of clusters to classify the crypto data into is 4. After that, there is diminishing returns on further clustering based on the elbow curve I generated.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Images		Images
README.md		README.md
crypto_cluster.ipynb		crypto_cluster.ipynb
crypto_data.csv		crypto_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cryptocurrency Clusters

Background

Data Preparation

Dimensionality Reduction

Cluster Analysis with k-Means

About

Releases

Packages

Languages

gpawlows/Unsupervised_Learning_Challenge

Folders and files

Latest commit

History

Repository files navigation

Cryptocurrency Clusters

Background

Data Preparation

Dimensionality Reduction

Cluster Analysis with k-Means

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages