# Cryptocurrency Word Mover's Distance Semantic Analysis
### Authors


|    Student Name                 |    Student Number  |
|---------------------------------|--------------------|
| Raj Sandhu                      | 101111960          |
| Akaash Kapoor                   | 101112895          |
| Ali Alvi                        | 101114940          |
| Hassan Jallad                   | 101109334          |
| Areeb Ul Haq                    | 101115337          |
| Ahmad Abuoudeh                  | 101072636          |

## Libraries to Import

In [12]:
import pandas as pd
import gensim.downloader as api
import matplotlib.pyplot as plt
import seaborn as sns

## Read In Processed Coin Dataset

In [3]:
coin_df = pd.read_csv("coin-info.csv") #Read in the processed dataframe generated in phase 2.
coin_df.head() #Print first 5 rows of dataframe to assess validity.

Unnamed: 0,Name,Volatility,Description
0,iota,0.388529,IOTA (IOTA or MIOTA) is a cryptocurrency token...
1,anchor-protocol,1.155277,Anchor Protocol is a yield stable and attracti...
2,compound,155.017778,COMP is an ERC-20 token built on the Ethereum ...
3,bitcoin-sv,64.927187,Bitcoin SV is a cryptocurrency that was create...
4,drep,0.48517,DREPis committed to building a performance-ori...


## Load In Pretrained Word Embedding Model

In [4]:
model = api.load("word2vec-google-news-300") #Load in the pretrained word embedding model which is used to perform word mover's distance between pairs of documents.



## Generate Similarity Matrix for the Word Mover's Distance Metric

In [5]:
coin_similarity_matrix =  pd.DataFrame([[model.wmdistance(p1, p2) for p2 in coin_df.iloc[:, -1]] for p1 in coin_df.iloc[:, -1]], columns = coin_df.iloc[:, 0], index= coin_df.iloc[:, 0])
#Performs pairwise computations over all possible pairwise combinations of provided descriptions, and stores these computations in a similarity matrix.

## Display Similarity Matrix and Check Validity

In [6]:
coin_similarity_matrix #Display computed similarity matrix.

Name,iota,anchor-protocol,compound,bitcoin-sv,drep,moonbeam,usd-coin,chainlink,basic-attention-token,bittorrent,...,gala,bitcoin-gold,render-token,unfoldu-group-coin-(new),maker,nexus-mutual,juno,okb,avalanche,compound-usd-coin
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
iota,0.000000,0.910464,0.531949,0.535122,0.707168,0.616389,0.454769,0.433123,0.399342,0.498698,...,0.611848,0.592953,0.724121,0.514229,0.538413,0.501112,0.612692,0.591816,0.521628,0.654713
anchor-protocol,0.910464,0.000000,0.870232,0.783680,0.904093,0.815172,0.890481,0.867455,0.876655,0.925402,...,1.053807,0.753797,1.005521,0.958371,0.864873,1.023219,0.854293,0.912554,0.793762,0.924556
compound,0.531949,0.870232,0.000000,0.464339,0.509323,0.500769,0.444233,0.346638,0.322736,0.488655,...,0.651096,0.558753,0.643553,0.453844,0.372586,0.551021,0.439970,0.494823,0.553991,0.416949
bitcoin-sv,0.535122,0.783680,0.464339,0.000000,0.472227,0.618005,0.461729,0.421689,0.439157,0.561344,...,0.651891,0.412771,0.736894,0.502224,0.469675,0.550660,0.484696,0.572272,0.534966,0.525691
drep,0.707168,0.904093,0.509323,0.472227,0.000000,0.645859,0.618103,0.525892,0.547268,0.631994,...,0.747581,0.634858,0.732245,0.609771,0.568612,0.595998,0.466583,0.602804,0.558665,0.566487
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
nexus-mutual,0.501112,1.023219,0.551021,0.550660,0.595998,0.462664,0.451734,0.402379,0.429480,0.599143,...,0.555100,0.676090,0.611816,0.416645,0.475757,0.000000,0.461955,0.609187,0.471587,0.671330
juno,0.612692,0.854293,0.439970,0.484696,0.466583,0.522588,0.560303,0.368216,0.448125,0.643994,...,0.693118,0.646568,0.575115,0.467304,0.414114,0.461955,0.000000,0.640282,0.487108,0.535113
okb,0.591816,0.912554,0.494823,0.572272,0.602804,0.596414,0.565223,0.522803,0.495656,0.494578,...,0.786704,0.592837,0.893617,0.585406,0.617663,0.609187,0.640282,0.000000,0.537707,0.620286
avalanche,0.521628,0.793762,0.553991,0.534966,0.558665,0.613194,0.577205,0.542240,0.535158,0.520215,...,0.693845,0.654128,0.831090,0.526377,0.608833,0.471587,0.487108,0.537707,0.000000,0.675497


In [31]:
#Obtain information of first two coins to perform a sanity check of calculations performed.
coin_desc_1 = coin_df["Description"][0]
coin_desc_2 = coin_df["Description"][1]
coin_name_1 = coin_df["Name"][0]
coin_name_2 = coin_df["Name"][1]

coin_similarity_1 = coin_similarity_matrix.iloc[0].iloc[1]
assert model.wmdistance(coin_desc_1, coin_desc_2) == coin_similarity_1, "Coins " + coin_name_1 + " and " + coin_name_2 + " fail unit test. Computed word mover's distances do not match."
print("Coins " + coin_name_1 + " and " + coin_name_2 + " pass the unit test. They have a word mover's distance of: " + str(coin_similarity_1))
#Verifies that the word mover's distance computed for the first two coins is correct.

Coins iota and anchor-protocol pass the unit test. They have a word mover's distance of: 0.9104638096135769


In the above cell, a sanity check is performed to ensure that word mover's distance calculations were performed correctly, ensuring the obtained similarity matrix is of the highest quality. This is done through the assert statement. A manual computation of word mover's distance of the first two coins is performed, and this computation is also retrieved from the similarity matrix. These computations are then compared for equality with the assert statement. If this unit test is passed, a success message is printed, otherwise an error is thrown with the provided error message. 

## Download Similarity Matrix as a CSV File

In [8]:
coin_similarity_matrix.to_csv("coin-similarity-matrix-description.csv") #exports similarity matrix to a csv file.

In [9]:
! cat coin-similarity-matrix-description.csv
#This linux shell command shows the content of the generated csv file.

Name,iota,anchor-protocol,compound,bitcoin-sv,drep,moonbeam,usd-coin,chainlink,basic-attention-token,bittorrent,enjin-coin,bitlux-otc,sumcoin,binance-coin,shiba-inu,huobi-btc,bancor-network-token,kava,unus-sed-leo,the-graph,elrond,helium,compound-dai,fei-usd,compound-ether,gnosis,theta-fuel,lido-staked-ether,hedera-hashgraph,keep-network,monero,wojak-finance,golem,bitcoin-defi,gatetoken,ecash,thorecoin,ravencoin,voyager-token,celo,zilliqa,cryncoin,solana,holo,yaapoo,open-governance-token,the-transfer-token,convex-finance,bitgert,magic-internet-money,bitkub-coin,waves,uniswap,siacoin,pocket-network,youcash,bitdao,bitcoin-nft,frax,the-sandbox,osmosis,loopring,ontology,aave,zcash,amp,woo-network,ethereum-classic,internet-computer,dogecoin,klaytn,mina-protocol,kusama,neo,apenft,huobi-token,cosmos,nem,ftx-token,bora,decentraland,bittorrent-[old],swissborg,mixin,cronos,stellar,axie-infinity,algorand,renbtc,marinade-staked-sol,arweave,eos,pancakeswap,ecomi,curve-dao-token,defichain,egw-academ

From here, the csv file should be downloaded and you should be able to see it on the left side of the screen in the Files section. From here, simply right click it and download it and then save it in the models folder of the repo.