## W207 Exploratory Visualization - 16th June 2021 Summer

### Team Members - Jeffrey Adams, Pow Chang, Sweta Bhattacharya, Matt White

### Introduction

Our research is focused on the cryptocurrency market which is still in its infancy. Although Bitcoin, the leading cryptocurrency, has been in circulation since 2009, in the past year there has been a paradigm shift where companies, governments, investors and hobbyists have started to take a real serious look at cryptocurrencies as a vehicle for their investments. With companies like Microstrategy and Tesla purchasing billions of dollars of bitcoins. However the crypto market is unregulated and is very sensitive to events that highly influence the movement of the value of coins.  Our research and exploratory analysis are centered around how the crypto market moves, whether it's in conjunction with traditional markets and factors that affect traditional markets, or whether it's influenced by other factors that have little to no effect on traditional trading markets, stocks, ETFs and commodities. Our 3 hypotheses will help guide our research through exploration of the data in our datasets, to identify patterns, correlations and outliers in our data.

### Observations in the data (without visualizations):

We analyzed several datasets including Yahoo Finance, Google searches, Twitter trends, Elon Musk tweets, and a collation of relevant world events and news articles. The Yahoo Finance dataset is referenced by API with configurable parameters so we could restrict our retrieval of data to specific stocks and time periods.  In looking at the Yahoo Finance dataset we observe there are NaN entries where there is a holiday, or weekend when the markets are closed, this is true of all of the publicly traded stocks. For Bitcoin, Ether and cryptocurrencies we do not see this as these markets do not close.
We also had to transform the columns in the dataframe returned by Yahoo! Finance since we were unable to get Altair to work with nested columns.  
Our dataset for Google searches and Twitter trends was built from their respective APIs and was seeded for keywords, “bitcoin”, “btc”, “ether”, “cryptocurrency” as a measure of how popular crypto currencies were on a day-by-day basis.
Our events database was constructed with data that we retrieved independently, including Elon Musk's tweets, government announcements that impacted the crypto market as well as other world events that may have an influence on crypto market performance. 
In viewing this data it is a little tough to see patterns because it’s time series data over a long period and there is what we believe high correlation with the data we have from the Yahoo! Finance dataset. 

### Hypothesis Description:
From our initial analysis of the datasets we were able to come up with the following hypotheses:\
	- Hypothesis 1: Cryptocurrency market trends mirror traditional markets.\
	- Hypothesis 2: Alt coins are highly correlated with and follow Bitcoin price.\
	- Hypothesis 3: Cryptocurrency market is highly influenced by non-market factors that don't affect traditional markets.



### Hypothesis 1: Cryptocurrency market trends mirror traditional markets

In [1]:
import pandas as pd
import altair as alt
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
print("pd", pd.__version__)
print("alt", alt.__version__)
# alt.data_transformers.disable_max_rows()

alt.data_transformers.enable('data_server')
# alt.data_transformers.enable('json')
# alt.renderers.enable('default')
from altair_saver import save
# alt.renderers.enable('altair_saver', fmts=['vega-lite', 'png']);
import warnings
warnings.filterwarnings("ignore")


pd 1.1.5
alt 4.1.0


## This cell is for Yahoo Finance API, Just need to run once for updating dataset

In [2]:
# import yfinance as yf  
# import matplotlib.pyplot as plt
# # Get the data for the SPY ETF by specifying the stock ticker, start date, and end date
tickers = ['BTC-USD','ETH-USD', 'AAPL', 'TSLA', 'MSFT', 'NVDA', 'SQ', 'PYPL', 'MSTR', 'JPM', '^IXIC', '^DJI', '^GSPC', 'GC=F', 'CL=F']
# data = yf.download(tickers,'2015-01-01','2021-06-15')
# # Plot the close prices
# data["Adj Close"]['GC=F'].plot()
# plt.show()

In [3]:
# data

In [4]:
# df = data.stack().reset_index().rename(index=str, columns={"level_1": "Symbol"}).sort_values(['Symbol','Date'])


In [5]:
# df.columns

In [6]:
# df.set_index('Date', inplace = True)

In [7]:
# df = df.rename(columns = {'Adj Close':'Adj_Close'})

In [8]:
# df

In [9]:
# df[df.Symbol == 'AAPL']['Close'].plot()

In [10]:
# df.reset_index(inplace = True)

In [11]:
# df

## Create some variables here...

In [12]:
# # loop through the ticker to calculate the Moving Average for each ticker
# def calculate_moving_average(days):
#     data = []
#     for ticker in tickers:
#         ma50 = df.groupby('Symbol').get_group(ticker)["Close"].rolling(days).mean()
#         data.append(ma50)
#     return pd.concat(data)


In [13]:
# df["MA50"] = calculate_moving_average(50)

In [14]:
# df["MA100"] = calculate_moving_average(100)

In [15]:
# df["MA200"] = calculate_moving_average(200)

In [16]:
#df["LossGain"] = (df["Returns"] >= 0).astype(int)
#df['LossGain'] = np.where(df['Returns'] >= 0, 'Gain', 'Loss')

## Create additional feature variables 

In [17]:
# def wwma(values, n):
#     """Source: Investopedia - exponential weighted (EW) functions"""
#     return values.ewm(alpha=1/n, adjust=False).mean()

# def atr(df, n=14):
#     """This function calculate ATR
#     Average True Range (ATR) is the average of true ranges over the specified period.
#     ATR measures volatility, taking into account any gaps in the price movement. 
#     Typically, the ATR calculation is based on 14 periods, which can be intraday, 
#     daily, weekly, or monthly.
#     """
#     data = df.copy()
#     high = data["High"]
#     low = data["Low"]
#     close = data["Close"]
#     data['tr0'] = abs(high - low)
#     data['tr1'] = abs(high - close.shift())
#     data['tr2'] = abs(low - close.shift())
#     tr = data[['tr0', 'tr1', 'tr2']].max(axis=1)
#     # return the dfmini, not the whole dataset, it will take years to load
#     atr = wwma(tr, n)
#     return atr

In [18]:
# # loop through the ticker to calculate ATR value
# data = []
# for ticker in tickers:
#     ATR = atr(df.groupby('Symbol').get_group(ticker), n=14)
#     data.append(ATR)
# df["ATR"] = pd.concat(data)

In [19]:
# # loop through the ticker to calculate the 10 days returns
# data = []
# for ticker in tickers:
#     dayX10_returns = df.groupby('Symbol').get_group(ticker)["Close"].pct_change(10)*100
#     data.append(dayX10_returns)
# df["Returns_Percent"] = pd.concat(data)

In [20]:
# # loop through the ticker to calculate the 10 days returns
# data = []
# for ticker in tickers:
#     dayX10_returns = df.groupby('Symbol').get_group(ticker)["Close"].pct_change(10)
#     data.append(dayX10_returns)
# df["Returns"] = pd.concat(data)

In [21]:
# # loop through the ticker to calculate the daily range
# data = []
# for ticker in tickers:
#     daily_range = df.groupby('Symbol').get_group(ticker)["High"]-df.groupby('Symbol').get_group(ticker)["Low"]
#     data.append(daily_range)
# df["Daily_Range"] = pd.concat(data)


In [22]:
# df.groupby('Symbol').get_group('CL=F')['ATR'].plot()

In [23]:
# run once to store the dataset and comment out yfinance to avoid repeat pulling data from API
# df.to_csv('PrimeDataSet2021-06-18.csv', index = False)

## **Important Note: All Analysis should start from here...

## Hypothesis 1: Cryptocurrency Market Trends Mirror Traditional Markets. 

Our hypothesis 1, which is our primary hypothesis for our exploratory analysis is that the crypto market mirrors the behavior of traditional markets, specifically NASDAQ and the NYSE. In the following section we set out to find evidence in support of and against our hypothesis to ascertain what is most probable.


In [24]:
df = pd.read_csv('PrimeDataSet2021-06-18.csv')

In [25]:
# small_list = ['BTC-USD', 'ETH-USD', 'MSTR', 'NVDA', 'PYPL', 'SQ', 'TSLA']
# plot_range = 1000
# y_range_max = round(df[df.Symbol == 'BTC-USD']['Close'].tail(plot_range).max(),0) + 1000
# y_range_min = round(df[df.Symbol == 'BTC-USD']['Close'].tail(plot_range).min(),0) - 1000

pchart = alt.Chart(df[df.Symbol.isin(small_list)]).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q'),
    color = 'Symbol:N',
    tooltip = ['Symbol', 'Close']
).properties(width = 700, title='First View: BTC Price Trend Comparison with Other Assets')

 
pchart       
   
   

NameError: name 'small_list' is not defined

In [None]:
pchart = alt.Chart(df).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q'),
    color = 'Symbol:N',
    tooltip = ['Symbol', 'Close']
).properties(width = 700, title='BTC Price Trend Comparison with Other Assets')

 
pchart    

To escertain whether Bitcoin follows the traditional market trend, we first examine its overall trend line. This is the first chart to show that Bitcoin line chart is very similar to the traditional stock and other commodity asset, it has a clear long term and short term fluctuation pattern. In the medium term, the technical line is strongly supported by 50-day moving average. There is a 50% correction in April 2021, where the trend line drops below 200-day moving average for a brief period and it bounces back above 200-day line. 

In [None]:
plot_range = 1000

In [None]:
y_range_max = round(df[df.Symbol == 'BTC-USD']['Close'].tail(plot_range).max(),0) + 1000
y_range_min = round(df[df.Symbol == 'BTC-USD']['Close'].tail(plot_range).min(),0) - 1000


In [None]:
Mchart = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).mark_line().encode(
    x = alt.X('Date:T'),
)

In [None]:
layer = alt.layer(
    Mchart.mark_line(color='lightblue').encode(alt.Y('Close:Q', scale = alt.Scale(domain=[y_range_min,y_range_max],clamp=True), title = 'Price (USD)')),
    Mchart.mark_line(color='green').encode(alt.Y('MA50:Q', scale = alt.Scale(domain=[y_range_min,y_range_max], clamp=True))),
    Mchart.mark_line(color='#fdbb84').encode(alt.Y('MA100:Q', scale = alt.Scale(domain=[y_range_min,y_range_max], clamp=True))),
    Mchart.mark_line(color='#e34a33').encode(alt.Y('MA200:Q', scale = alt.Scale(domain=[y_range_min,y_range_max], clamp=True))),
).properties(title='BTC Close Price Vs Moving Average (50-Day, 100-Day, 200-Day)', width = 700)

layer

In this chart, we plot the Bitcoin against its own ATR value to gauge how volatile the bitcoin overall market is. It is very clear that Bitcoin is not only extremely volatile but also getting more volatile in the recent months. The orange region represents the loss and blue represents gain. So Bitcoin overall exihibits the characteristics of the money market, it is volatile and has a clear trading pattern for risk purchase and hedging.

In [None]:
interval = alt.selection_interval()

pchart = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q', scale = alt.Scale(domain=[y_range_min,y_range_max])),
    #tooltip = alt.Tooltip('Close', title = 'Close')
).properties(width = 700, title='Close Price Trend').add_selection(interval)



In [None]:
#     color = alt.condition(
#         alt.datum.Returns > 0,
#         alt.value('steelblue'),  
#         alt.value('orange')   # low return seems correlates more with high volatility, we should study this.

In [None]:
cchart = alt.Chart(df[df.Symbol == 'BTC-USD']).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('ATR:Q'),
    tooltip = ['Returns'],
    color = alt.Color('LossGain:N', legend=alt.Legend(title = 'Returns'))
).properties(title = 'ATR Trend by Returns (By Returns)', width = 700).transform_filter(interval)


In [None]:
pchart & cchart

In [None]:
selection = alt.selection_multi(fields=['series'], bind='legend')

alt.Chart(source).mark_area().encode(
    alt.X('Date:T', axis=alt.Axis(domain=False, format='%Y', tickSize=0)),
    alt.Y('Close:Q', stack='center', axis=None),
    alt.Color('series:N', scale=alt.Scale(scheme='category20b')),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(
    selection
)

On this chart, we compare Bitcoin with other assets such as stock, commodity and ETH to investigate whether it does mirror the overall market movement. 


In [None]:
df.Symbol.unique()

In [None]:
exposure_list = ['BTC-USD', 'CL\=F', 'ETH-USD', 'GC\=F', 'JPM',
       'MSTR', 'NVDA', 'PYPL', 'SQ', 'TSLA']

In [None]:
non_exposure_list = ['AAPL', 'BTC-USD', 'ETH-USD', 'MSFT', '^DJI', '^GSPC', '^IXIC']

In [None]:
exchart = alt.Chart(df[df.Symbol.isin(exposure_list)]).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q', scale = alt.Scale(type='log'), title = 'Close Price (USD)'),
    color = 'Symbol:N',
    tooltip = ['Symbol', 'Close']
).properties(width = 700, title='BTC-USD Price VS Exposure Assets')

  

In [None]:
zone = pd.DataFrame([
    {
        "start": "2020-02-15",
        "end": "2020-03-16",
        "event": "Market-Dip"
    },
    
])

In [None]:
ruler = alt.Chart(zone).mark_rule(
    color="lightred",
    strokeWidth=40,
    opacity = 0.8
).encode(
    x= alt.X('start:T', title = 'Date'),
    #x2 = alt.X2('end:T'),
).transform_filter(alt.datum.event == "Market-Dip")

### What's informative about this view: 

In this view you can see a few major index funds, stocks and alt coins compared to the price of BTC. It can be observed that there might be some correlation between BTC and traditional markets. You can see that when DJI takes a dip in Feb 2020 BTC does as well. These rough correlations can be seen in a few spots, but are not abundantly clear. It seems like BTC correlates with the market in the last few months, Bitcoin should be an uncorrelated asset. We also compared the Bitcoin with two groups of assets. One is having high exposure to Bitcoin market and second group has known exposure. The charts can not make any conclusive observation unless we zoom into specific period.


In [None]:
ruler + exchart

In [None]:
alt.Chart(df[df.Symbol.isin(non_exposure_list)]).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q', scale = alt.Scale(type='log'), title = 'Close Price (USD)'),
    color = 'Symbol:N',
    tooltip = ['Symbol', 'Close']
).properties(width = 700, title='BTC-USD Price VS Non-Exposure Assets')

  

### What could be improved about this view:  

It is difficult to see any exact correlations between BTC and the others and most of the tickers are so low on the Y axis that any fluctuations in price can’t be seen. A daily returns plot could be an improvement because it will move all of the tickers on to a similar scale.

In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','AAPL'])]).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Returns:Q'),
    color = 'Symbol:N',
).properties(width = 700, title='BTC and AAPL Returns by Date Overlaid')


From this chart, we could clearly see that both line aligned with each other when they are overlaid on the graph. From this overlaid chart, the bitcoin may have some degree of correlation with some of the stocks or assets. But we are not sure what is the significant level of this correlation.

In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','MSTR'])]).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Returns:Q'),
    color = 'Symbol:N',
).properties(width = 700, title='BTC and MSTR Returns by Date Overlaid')


In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','^DJI'])]).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Returns:Q'),
    color = 'Symbol:N',
).properties(width = 700, title='BTC and Dow Jone Index Returns')


In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','^GSPC'])]).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Returns:Q'),
    color = 'Symbol:N',
).properties(width = 700, title='BTC and SP500 Index Returns')



### What's informative about this view:
This view shows the degree of daily range fluctuation of Bitcoin as compared to other assets. We notice that Bitcoin daily fluctuation is extremely high and no other asset has such a high variation in USD value within a day.

In [None]:
alt.Chart(df).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Daily_Range:Q', title = 'Price (USD)'),
    color = 'Symbol:N',
    tooltip = ['Close', 'Volume']
).properties(width = 700, title='BTC Daily Range VS Other Assets Trend')


### What could be improved about this view:
To improve further, we use bar chart to show the breakdown of each asset as compared to Bitcoin. And now, we could clearly see that Bitcon could fluctuate as much as USD 12,000 just within a day. Bitcoin has high volality, but also higher return as compared to others. 

In [None]:
print(list(df.Symbol.unique()))


In [None]:
df.groupby('Symbol').get_group('CL=F').isna().sum()

In [None]:
alt.Chart(df).mark_bar().encode(
    x = alt.X('Symbol:N'),
    y = alt.Y('Daily_Range:Q'),
    color = 'Symbol:N',
    tooltip = ['Symbol']
).properties(width = 700, title='BTC Daily Range Fluctuation Comparison with Other Assets')
  

In [None]:
alt.Chart(df).mark_bar().encode(
    x = alt.X('Symbol:N',sort=alt.EncodingSortField(field='Returns', op='sum', order='descending')),
    y = alt.Y('sum(Returns):Q', ),
    color = 'Symbol:N',
    tooltip = ['Symbol']
).properties(width = 700, title='BTC Returns Comparison with Other Assets')
  

In [None]:
alt.Chart(df).mark_bar().encode(
    x = alt.X('Symbol:N', sort=alt.EncodingSortField(field='ATR', op='mean', order='descending')),
    y = alt.Y('mean(ATR):Q'),
    color = 'Symbol:N',
    tooltip = ['Symbol']
).properties(width = 700, title='BTC Mean ATR Comparison with Other Assets')
  

### What could be improved about this view:

We use scatter plot to see any particular outliers, it shows clearly that Bitcoin has quite number of outliers. The spikes could be due to any critical news ot event which warrants further investigation in our further study of the hypothesis.

In [None]:
alt.Chart(df).mark_circle().encode(
    x = alt.X('Volume:Q'),
    y = alt.Y('sum(Returns):Q'),
    color = 'Symbol:N',
    #color = 'day(Date:T)'
).properties(width = 700, title='BTC Returns Comparison with Other Assets')
  

In [None]:
alt.Chart(df).mark_circle().encode(
    x = alt.X('Returns:Q'),
    y = alt.Y('ATR:Q'),
    color = 'Symbol:N'
).properties(width = 700, title='Returns VS Volume')

### What could be improved about this view:

We use boxplot to ascertain and present another perspective of the outliers.

In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','AAPL'])]).mark_boxplot(size=50, extent = 0.5).encode(
    x = alt.X('Symbol:N'),
#     y = alt.Y('month(Date):O'),
    y = alt.Y('Returns_Percent:Q', scale=alt.Scale(zero=False)),
    color = alt.Color('Symbol:N')
).properties(title = 'BTC Returns VS Other Assets Borplot', width = 700).configure_axisX(labelFontSize = 12, labelAngle = 45).configure_axisY(labelFontSize = 12)



In [None]:
alt.Chart(df[df.Symbol.isin(exposure_list)]).mark_boxplot(size=50, extent = 0.5).encode(
    x = alt.X('Symbol:N'),
#     y = alt.Y('month(Date):O'),
    y = alt.Y('Returns:Q', scale=alt.Scale(zero=False)),
    color = alt.Color('Symbol:N')
).properties(title = 'BTC Returns VS Other Assets Borplot', width = 700).configure_axisX(labelFontSize = 12, labelAngle = 45).configure_axisY(labelFontSize = 12)



In [None]:
selector = df.Symbol == 'BTC-USD'
plot_range = 500
y_range_max = round(df[selector]['Close'].tail(plot_range).max(),0) + 1000
y_range_min = round(df[selector]['Close'].tail(plot_range).min(),0) - 1000

In [None]:
# df[df.Symbol == 'BTC-USD'].tail(plot_range)
pvchart = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).mark_circle().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q', scale = alt.Scale(domain=[y_range_min,y_range_max], type='log')),
    color = alt.condition(
        alt.datum.Returns > 0,
        alt.value('steelblue'),  
        alt.value('orange')),
        tooltip = ['Date:T','Close', 'Volume']
).properties(width = 700, title='BTC Close Price Trend')


In [None]:
#df[df.Symbol.isin(exposure_list)]
#df[df.Symbol == 'BTC-USD'].tail(plot_range)
vchart = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Volume:Q'),
    color = alt.Color('LossGain:N', legend = alt.Legend(title = 'Returns'))
).properties(title="BTC Volume Trend", width =700)



### What's informative about this view:

This chart is a further improvement of the trend line chart, we marked the color of loss and gain and crossed reference with the trading volume.

In [None]:
pvchart & vchart

## Hypothesis 2: Alt coins are highly correlated with and follow Bitcoin price

#### The view shows that Alt coins are on par with Bitcoin in terms of price percentage performance, there is a great opportunity to invest in alt coins. 

In [None]:

alt.Chart(df[df.Symbol.isin(['BTC-USD','ETH-USD'])]).mark_line().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Close:Q', scale = alt.Scale(type='log'), title = 'Close Price (USD)'),
    color = 'Symbol:N',
    tooltip = ['Symbol', 'Close']
).properties(width = 700, title='BTC-USD Price VS ETH-USD')

  

The view shows that BTC and Alt Coin are very much aligned in the daily returns, this is an insight that alt coin follows the momentum of BTC as they flucture with the highly similar pattern when we overlay their trend lines.

In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','ETH-USD'])]).mark_bar().encode(
    x = alt.X('Date:T'),
    y = alt.Y('Returns:Q'),
    color = 'Symbol:N',
).properties(width = 700, title='BTC and ETH Returns by Date Overlaid')


## Hypothesis 3: Cryptocurrency market is highly influenced by non-market factors that don't affect traditional markets (ie. tweets, corporate investment, covid?)

Cryptocurrency market is highly influenced by non-market factors that don't affect traditional markets (ie. tweets, corporate investment, individual opinions, google searches).
What we are trying to discover here is whether or not the crypto market, being without regulation, is susceptible to factors that don’t affect conventional markets like NASDAQ and NYSE. Tweets and comments made by Elon Musk have affected the overall crypto market, as well as the price of Bitcoin and the price and demand for Doge.  Here we want to see just what factors have the biggest impact on movements in the crypto market, whether positive or negative.


In [None]:
plot_range = 500

In [None]:
schart = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).mark_bar().encode(
    x = alt.X('yearmonthdate(Date)'),
    y = alt.Y('Returns:Q'),
    opacity=alt.value(1.0),
    tooltip = ['Date:T'],
    color = alt.Color('LossGain:N', legend=alt.Legend(title='Returns'))
).properties(width = 700, title='BTC Daily Returns Trend')


In [None]:
bitcoin_news = pd.DataFrame([
    {
        "start": "2021-05-15",
        "end": "2021-06-01",
        "news": "FBI Ransonware Crypto"
    },
    {
        "start": "2020-12-31",
        "end": "2021-01-31",
        "news": "Elon Musk's Tweet"
    },
    {
        "start": "2020-03-02",
        "end": "2020-03-30",
        "news": "Covid-19"
    }
    
])

In [None]:
event = alt.Chart(bitcoin_news).mark_rule(
    color="lightgray",
    strokeWidth=20,
    opacity = 0.4,
).encode(
    x='start:T'
).transform_filter(alt.datum.news == "Elon Musk's Tweet")

In [None]:
event2 = alt.Chart(bitcoin_news).mark_rule(
    color="lightred",
    strokeWidth=40,
    opacity = 0.4,
).encode(
    x= alt.X('start:T', title = 'Date'),
).transform_filter(alt.datum.news == "Covid-19")

In [None]:
event3 = alt.Chart(bitcoin_news).mark_rule(
    color="lightred",
    strokeWidth=30,
    opacity = 0.4
).encode(
    x= alt.X('start:T', title = 'Date'),
).transform_filter(alt.datum.news == "FBI Ransonware Crypto")

In [None]:

text = alt.Chart(bitcoin_news).mark_text(
    align='right',
    baseline='middle',
    dx=7,
    dy=-135,
    size=11
).encode(
    x='start:T',
    x2='end:T',
    text='news',
    color=alt.value('#000000'),
    opacity=alt.value(1.0)
)

In [None]:
schart = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).mark_bar().encode(
    x = alt.X('Date:T', axis = alt.Axis(format = ("%b %Y"))),
    y = alt.Y('Returns:Q'),
    opacity=alt.value(1.0),
    tooltip = ['Date:T'],
    color = alt.Color('LossGain:N', legend=alt.Legend(title='Returns'))
).properties(width = 700, title='BTC Daily Returns Trend')


In [None]:
event + event2 + event3 + text + schart

### What could be improved about this view:

We use heatmap to pinpoint any specific event that might cause the spike in Bitcoin price movement.


In [None]:
alt.Chart(df[df.Symbol == 'BTC-USD']).mark_rect().encode(
    x = alt.X('date(Date):O'),
    y = alt.Y('month(Date):O'),
    color = alt.Color('ATR:Q')
).properties(title = 'Bitcoin ATR Heatmap by Day and Month')


In [None]:
alt.Chart(df[df.Symbol == 'BTC-USD']).mark_rect().encode(
    x = alt.X('date(Date):O'),
    y = alt.Y('month(Date):O'),
    color = alt.Color('Returns:Q')
).properties(title = 'BTC Returns Heatmap by Day and Month')


In [None]:
alt.Chart(df[df.Symbol == 'BTC-USD']).mark_rect().encode(
    y = alt.Y('day(Date):O'),
    x = alt.X('month(Date):O'),
    color = alt.Color('Returns:Q')
).properties(title = 'Returns Heatmap by Hours and Day of the Week')


In [None]:
alt.Chart(df[df.Symbol.isin(['BTC-USD','^DJI'])]).mark_circle().encode(
    x = alt.X('ATR:Q'),
    y = alt.Y('Returns:Q'),
    color = 'Symbol:N'  
).properties(width = 700, title='BTC VS Down Jone (In Terms of Return VS Volatility)')


### What could be improved about this view:

The detailed candlestick chart is plotted to compare Bitcoin Versus Gold market as both of the commodities are highly driven by events such as government policy. As shown in the charts below, both of them are moving in the opposite direction. 

In [None]:
# Candlestick plot for short term price movement analysis

plot_range = 100

open_close_color = alt.condition("datum.Open <= datum.Close",
                                 alt.value("#06982d"),
                                 alt.value("#ae1325"))

base = alt.Chart(df[df.Symbol == 'BTC-USD'].tail(plot_range)).encode(
    x = alt.X('Date:T',
          axis=alt.Axis(
              format='%m/%d',
              labelAngle=-45,
              title='Date'
          )
    ),
    color=open_close_color
).properties(title = 'BTC 3-month Candlestick Chart', width = 700)

rule = base.mark_rule().encode(
    y = alt.Y(
        'Low:Q',
        title='Price',
        scale=alt.Scale(zero=False),
    ),
    y2 = alt.Y2('High:Q')
)

line = base.mark_line(color='lightgreen').encode(y = alt.Y('MA50:Q'))
line2 = base.mark_line(color='red').encode(y = alt.Y('MA100:Q'))

bar = base.mark_bar().encode(
    y = alt.Y('Open:Q'),
    y2 = alt.Y2('Close:Q')
)

BTC = (rule + bar + line + line2)

In [None]:
# Candlestick plot for short term price movement analysis

plot_range = 100

open_close_color = alt.condition("datum.Open <= datum.Close",
                                 alt.value("#06982d"),
                                 alt.value("#ae1325"))

base = alt.Chart(df[df.Symbol == 'ETH-USD'].tail(plot_range)).encode(
    x = alt.X('Date:T',
          axis=alt.Axis(
              format='%m/%d',
              labelAngle=-45,
              title='Date'
          )
    ),
    color=open_close_color
).properties(title = 'ETH 3-month Candlestick Chart', width = 700)

rule = base.mark_rule().encode(
    y = alt.Y(
        'Low:Q',
        title='Price',
        scale=alt.Scale(zero=False),
    ),
    y2 = alt.Y2('High:Q')
)

line = base.mark_line(color='lightgreen').encode(y = alt.Y('MA50:Q'))
line2 = base.mark_line(color='red').encode(y = alt.Y('MA100:Q'))

bar = base.mark_bar().encode(
    y = alt.Y('Open:Q'),
    y2 = alt.Y2('Close:Q')
)

ETH = (rule + bar + line + line2)

In [None]:
# Candlestick plot for short term price movement analysis

plot_range = 100

open_close_color = alt.condition("datum.Open <= datum.Close",
                                 alt.value("#06982d"),
                                 alt.value("#ae1325"))

base = alt.Chart(df[df.Symbol == 'GC=F'].tail(plot_range)).encode(
    x = alt.X('Date:T',
          axis=alt.Axis(
              format='%m/%d',
              labelAngle=-45,
              title='Date'
          )
    ),
    color=open_close_color
).properties(title = 'GOLD 3-month Candlestick Chart', width = 700)

rule = base.mark_rule().encode(
    y = alt.Y(
        'Low:Q',
        title='Price',
        scale=alt.Scale(zero=False),
    ),
    y2 = alt.Y2('High:Q')
)

line = base.mark_line(color='lightgreen').encode(y = alt.Y('MA50:Q'))
line2 = base.mark_line(color='red').encode(y = alt.Y('MA100:Q'))

bar = base.mark_bar().encode(
    y = alt.Y('Open:Q'),
    y2 = alt.Y2('Close:Q')
)

BTC & (rule + bar + line + line2)


### What could be improved about this view:

Correlation matrix is created to show the heatmap of the correlation among all the assets

In [None]:
corrMatrix = df.corr()

In [None]:
mdata = pd.DataFrame()
mdata['JPM'] = df.groupby('Symbol').get_group('JPM')['Close'].reset_index(drop=True)
for tick in tickers:
    if tick != 'JPM':
        tickdata = pd.DataFrame(df.groupby('Symbol', axis=0).get_group(tick)['Close'].reset_index(drop=True)).rename(columns = {'Close': tick})
        mdata = pd.concat([mdata, tickdata], axis=1, join='inner')

In [None]:
mdata

In [None]:
#result = pd.concat([mdata, mdata1], axis=1, join='inner')

In [None]:
cor_data = mdata.corr().stack().reset_index().rename(columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'})

cor_data['correlation_label'] = cor_data['correlation'].map('{:.2f}'.format)

In [None]:
base = alt.Chart(cor_data).encode(
    x= alt.X('variable2:O', title = 'Tickers'),
    y=alt.Y('variable:O', title = 'Tickers')    
)

# Text layer with correlation labels
# Use colors that are for easier readability
text = base.mark_text().encode(
    text='correlation_label',
    color=alt.condition(
        alt.datum.correlation > 0.5, 
        alt.value('white'),
        alt.value('black')
    )
)

# The correlation heatmap itself
cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(height = 600, width = 600, title = "BTC Heatmap")

cor_plot + text # overlay the text and rect layer

### Conclusion:

As can be seen from the charts above looking at Apple and S&P 500 to BTC. The returns on a daily basis do mirror each other. When Apple has negative returns often BTC as well. It would be helpful to view this plot with other tickers as well. What also needs to be determined is the magnitude of these returns over a long period of time.

#### Challenges:

The data consists of multiple assets spanning across 5 years of data which created a huge volume of data entries to plot a chart. The memory crunch is a serious issue, so we have modified the dataset to slice the data for each of the chart and tried many methods to avoid generating huge memory size of notebook.


## End of Presentation