 # <h1 style='background:#346888; color:white; line-height:1.25;'><center>Cryptocurrency price correlation</center></h1> 
 
  
**<span style="color:#346888;">In this notebook I try to analyze the trend of Bitcoin, Ethereum and Litecoin average closing value and the volume transacted, between Jan-01-2020 and Apr-14-2021. I have also tried to understand the correlation between all the three currencies. </span>**

**<span style="color:#346888;">If you have any suggestions on how to improve this notebook, please let me know. </span>**

**<span style="color:#346888;">Happy Learning!</span>**

In [None]:
#Import required libraries
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

 # <span style="color:#346888;">TABLE OF CONTENTS</span>


* **[Load Data](#load-data)**


* **[Understanding Data](#un-data)**


* **[Data cleaning and preparation ](#clean-data)**
    * **[Drop columns](#drop-cols)**
    * **[Missing data](#missing-rows)**
    * **[Data aggregation](#data-agg)**
    
    
* **[Understanding the trends](#trend)**


* **[Analysing the combined trends](#trend-com)**
    * **[Average closing value](#close-val)**
    * **[Closing volume](#vol)**
    
    
* **[Correlation](#corr)**

## <span style="color:#346888;">Load Data</span> <a id="load-data"></a>

In [None]:
#load bitcoin data
btc_master = pd.read_csv('../input/cryptocurrency-timeseries-2020/gemini_BTCUSD_2020_1min.csv')

#load ethereum data
eth_master = pd.read_csv('../input/cryptocurrency-timeseries-2020/gemini_ETHUSD_2020_1min.csv')

#load litecoin data
ltc_master = pd.read_csv('../input/cryptocurrency-timeseries-2020/gemini_LTCUSD_2020_1min.csv')

## <span style="color:#346888;">Understanding Data</span> <a id="un-data"></a>

In [None]:
#Check the data dimensions for bitcoin
btc_master.head()

In [None]:
btc_master.info()

In [None]:
#Check the data dimensions for ethereum
eth_master.shape

In [None]:
eth_master.info()

In [None]:
#Check the data dimensions for litecoin
ltc_master.shape

In [None]:
ltc_master.info()

Litecoin data is 54 records short compared to bitcoin and 48 records short compared to ethereum data.

##### Looking at the dataframe info there are no null values in any dataset

## <span style="color:#346888;">Data cleaning and preparation</span> <a class="anchor" id="clean-data"></a>

In [None]:
#Convert the object type Date column to datetime type
btc_master['Date'] = pd.to_datetime(btc_master['Date'], format='%m/%d/%Y %H:%M')
eth_master['Date'] = pd.to_datetime(eth_master['Date'], format='%m/%d/%Y %H:%M')
ltc_master['Date'] = pd.to_datetime(ltc_master['Date'], format='%m/%d/%Y %H:%M')

### <span style="color:#346888;">Drop unnecessary columns</span> <a id="drop-cols"></a>

As we are trying to understand the correlation between the currencies over a year, the columns of intrest will be `Date`, `close` and `volume`. Apart from these three columns we can drop rest of the data from the dataset.

In [None]:
#drop from bitcoin
btc_master.drop(['Unix Timestamp','Symbol','Open', 'High','Low'], axis = 1, inplace = True)
#drop from ethereum 
eth_master.drop(['Unix Timestamp','Symbol','Open', 'High','Low'], axis = 1, inplace = True)
#drop from litecoin
ltc_master.drop(['Unix Timestamp','Symbol','Open', 'High','Low'], axis = 1, inplace = True)

ltc_master.columns

### <span style="color:#346888;">Find missing data</span> <a id="missing-rows"></a>

As we noticed that not all three datasets have the same number of records, let us analyse and understand what data is missing.

In [None]:
#get all the records in bitcoin dataset but not in ethereum dataset
btc_master[~(btc_master['Date'].isin(eth_master["Date"]))]

In [None]:
#get all the records in ethereum dataset but not in bitcoin dataset
eth_master[~(eth_master['Date'].isin(btc_master["Date"]))]

In [None]:
#get all the records in litecoin dataset but not in bitcoin dataset
ltc_master[~(ltc_master['Date'].isin(btc_master["Date"]))].count()

In [None]:
#get all the records in litecoin dataset but not in ethereum dataset
ltc_master[~(ltc_master['Date'].isin(eth_master["Date"]))].count()

Looking at the records missing, there does not seem to be any obvious pattern as to why the data is missing. We have the data by the minute of the hour, for understanding the correlation we plan to aggregate the currency data by day. As we do not have more than a couple of minutes data missing for some days the missing data proportion is negligible in this usecase. Hence, the missing values need not be treated.

### <span style="color:#346888;">Data aggregation</span> <a id="data-agg"></a>

##### <span style="color:#346888;">Bitcoin</span>

In [None]:
#Loose the time information
btc_master['Date'] = btc_master['Date'].dt.date

#Aggregate by date
btc_grouped = pd.pivot_table(btc_master, values=['Close', 'Volume'], index=['Date'],
                    aggfunc={'Close': np.mean,
                             'Volume': np.sum})

btc_grouped.head()

In [None]:
btc_grouped.shape

##### <span style="color:#346888;">Ethereum</span>

In [None]:
#Loose the time information
eth_master['Date'] = eth_master['Date'].dt.date

#Aggregate by date
eth_grouped = pd.pivot_table(eth_master, values=['Close', 'Volume'], index=['Date'],
                    aggfunc={'Close': np.mean,
                             'Volume': np.sum})
eth_grouped.head()

In [None]:
eth_grouped.shape

##### <span style="color:#346888;">Litecoin</span>

In [None]:
#Loose the time information
ltc_master['Date'] = ltc_master['Date'].dt.date

#Aggregate by date
ltc_grouped = pd.pivot_table(ltc_master, values=['Close', 'Volume'], index=['Date'],
                    aggfunc={'Close': np.mean,
                             'Volume': np.sum})
ltc_grouped.head()

In [None]:
ltc_grouped.shape

<u>Once the data is aggregated we can see we have data for all the three currencies for 476 days.</u>

## <span style="color:#346888;">Understanding the trends</span> <a id="trend"></a>

Let's ask some questions to understand the trends

#### What is the highest average closing value between Jan-2020 and 2021-04-14?

In [None]:
btc = btc_grouped['Close'].max()
eth = eth_grouped['Close'].max()
ltc = ltc_grouped['Close'].max()

#What is the highest closing value of Bitcoin and when
print("Highest average value of BTC ", btc,"was recorded on ",btc_grouped[btc_grouped['Close'] == btc].index.values[0])

#What is the highest closing value of Ethereum and when
print("Highest average value of ETH ", eth,"was recorded on ",eth_grouped[eth_grouped['Close'] == eth].index.values[0])

#What is the highest closing value of Litecoin and when
print("Highest average value of LTC ", ltc,"was recorded on ",ltc_grouped[ltc_grouped['Close'] == ltc].index.values[0])


sns.barplot(x = ["Bitcoin","Ethereum","Litecoin"], y = [btc,eth,ltc])
plt.title("Comparision of average highest value in a day between Jan-2020 and 2021-04-14")
plt.show()

The value of bitcoin is far higher compared to its counterparts. 

#### What is the highest volume transacted in one day between Jan-2020 and 2021-04-14?

In [None]:
btc = btc_grouped['Volume'].max()
eth = eth_grouped['Volume'].max()
ltc = ltc_grouped['Volume'].max()


#What is the highest closing volume of Bitcoin and when
print("Highest volume of BTC ", btc,"was recorded on ",
      btc_grouped[btc_grouped['Volume'] == btc].index.values[0])

#What is the highest closing volume of Ethereum and when
print("Highest volume of ETH ", eth,"was recorded on ",
     eth_grouped[eth_grouped['Volume'] == eth].index.values[0])

#What is the highest closing volume of Litecoin and when
print("Highest volume of LTC ", ltc,"was recorded on ",
     ltc_grouped[ltc_grouped['Volume'] == ltc].index.values[0])

sns.barplot(x = ["Bitcoin","Ethereum","Litecoin"], y = [btc,eth,ltc])
plt.title("Comparision of highest volume in a day between Jan-2020 and 2021-04-14")
plt.show()

Far more number of ethereum and litecoins have been transacted compared to bitcoins

#### What was the average closing value when the highest volume was transacted between Jan-2020 and 2021-04-14

In [None]:
btc = btc_grouped[btc_grouped['Volume'] == btc_grouped['Volume'].max()]['Close'].values[0]
eth = eth_grouped[eth_grouped['Volume'] == eth_grouped['Volume'].max()]['Close'].values[0]
ltc = ltc_grouped[ltc_grouped['Volume'] == ltc_grouped['Volume'].max()]['Close'].values[0]

#What is the closing value of Bitcoin and when the highest volume was transacted 
print("Highest volume of BTC was", btc_grouped['Volume'].max(),"and mean closing value on that day was",btc)

#What is the highest closing volume of Ethereum and when
print("Highest volume of ETH ", eth_grouped['Volume'].max(),"and mean closing value on that day was ",eth)

#What is the highest closing volume of Litecoin and when
print("Highest volume of LTC ", ltc_grouped['Volume'].max(),"and mean closing value on that day was ",ltc)


#### Was the average closing value at its lowest when highest volume was transacted

In [None]:
#What is the closing value of Bitcoin and when the highest volume was transacted 
print("Lowest average closing value of BTC ", btc_grouped['Close'].min())

#What is the closing value of Bitcoin and when the highest volume was transacted 
print("Lowest average closing value of ETH ", eth_grouped['Close'].min())

#What is the closing value of Bitcoin and when the highest volume was transacted 
print("Lowest average closing value of LTC ", ltc_grouped['Close'].min())

Interestingly, the highest volumn was not transacted when the average closing value was at its least.

#### How did the average closing value vary by time for bitcoin

In [None]:
plt.figure(figsize=[15,5])
sns.lineplot(x = btc_grouped.index , y = 'Close', data = btc_grouped)
plt.show()

The average closing price seem to be under 20K through out 2020. However, towards the end of 2020 it picked up the pace and continued to grow.

In [None]:
plt.figure(figsize=[15,5])
sns.lineplot(x = eth_grouped.index , y = 'Close', data = eth_grouped)
plt.show()

The average closing price seem to be under 1K through out 2020. However, from 2021 it picked up the pace and continued to grow.

In [None]:
plt.figure(figsize=[15,5])
sns.lineplot(x = ltc_grouped.asfreq('M').index , y = 'Close', data = ltc_grouped.asfreq('M'))
plt.xticks(ticks = ltc_grouped.asfreq('M').index, labels = ['Jan-20', 'Feb-20', 'Mar-20', 'Apr-20',
                                                            'May-20', 'Jun-20', 'Jul-20', 'Aug-20', 'Sep-20', 'Oct-20',
                                                           'Nov-20', 'Dec-20', 'Jan-21', 'Feb-21', 'Mar-21'])
plt.show()

The average closing price seem to be well under $150 through out 2020. However, from October 2020 litecoin value continued to grow.

In [None]:
plt.figure(figsize=(15,6))
btc_grouped['Close'].plot(c='blue')
eth_grouped['Close'].plot(c='cyan')
ltc_grouped['Close'].plot(c='orange')
plt.title('Comparision of cryptocurrency values through 2020')
plt.legend(('Bitcoin','Ethereum', 'Litecoin'))
plt.show()

Because of the huge difference in value of the bitcoin and other currencies we cannot really observe any trend in this Y-Scale. Let us change to a log scale and see how the graph looks

## <span style="color:#346888;">Analysing the combined trends</span> <a id="trend-com"></a>

### <span style="color:#346888;">Average closing value</span><a id="close-val"></a>

In [None]:
#Plot all the three currencies together to understand the trend of average closing value by day
plt.figure(figsize=(15,6))
btc_grouped['Close'].plot(c='blue')
eth_grouped['Close'].plot(c='cyan')
ltc_grouped['Close'].plot(c='orange')
plt.title('Comparision of cryptocurrency values through 2020')
plt.legend(('Bitcoin','Ethereum', 'Litecoin'))
plt.yscale('log')
plt.show()

The log Y-Scale gives us a much better graph. All the three coins seem to follow an almost similar trend over the year 2020. Inspite of large differences in the average closing value the trend of value increase or decrease looks similar. This graph indicates a strong correlation of average closing value for all the three currencies. We can confirm this by further analysing the datasets.

### <span style="color:#346888;">Closing volume</span><a id="vol"></a>

In [None]:
#Plot all the three currencies together to understand the trend of volume transcted by day
plt.figure(figsize=(15,6))
btc_grouped['Volume'].plot(c='blue')
eth_grouped['Volume'].plot(c='cyan')
ltc_grouped['Volume'].plot(c='orange')
plt.title('Comparision of cryptocurrency volume by day through 2020')
plt.legend(('Bitcoin','Ethereum', 'Litecoin'))
#plt.yscale('log')
plt.show()

We can clearly see that ethereum and litecoin have been transcated in much more volume compared to bitcoin. One hypothesis is the value of the coin could be a reason behind this difference.

## <span style="color:#346888;">Correlation</span> <a id="corr"></a>

Let us merge all the three data sets to analyse the correlation of average closing value and the volume of all the three currencies

In [None]:
#Merge bitcoin and ethereum data
btc_eth = pd.merge(btc_grouped, eth_grouped, suffixes=('_btc', '_eth'), left_index=True, right_index=True)
btc_eth.head()

In [None]:
#Merge litecoin with other two
btc_eth_ltc = pd.merge(btc_eth, ltc_grouped, left_index=True, right_index=True)
btc_eth_ltc.rename(columns={"Close": "Close_ltc", "Volume": "Volume_ltc"}, inplace=True)
btc_eth_ltc.head()

In [None]:
#Corelation between the currencies
btc_eth_ltc.corr()

In [None]:
plt.figure(figsize = (15,8))
sns.heatmap(btc_eth_ltc.corr(), annot = True)
plt.show()

<span style="color:#346888;">We can see that the average closing value of all the three currencies are strongly correlated. This validates the trends we observed in the line graph. The trend of increase or decrease of cryptocurrencies value is similar across all the three currencies we analyzed here. Even though volume transacted is not as strongly correlated as the value, we can still observe a good correlation value between the volume transacted of the currencies.</span>

<span style="color:#346888;">It will be interesting to understand the factors influencing these trends!!</span>
