# Predicting Crypto Prices

![Dodge](https://d.newsweek.com/en/full/1784128/dogecoin.jpg?w=790&f=5c22adba14c4c3d31d31d006f7a4f669)

This is a starter notebook for the G-Research Crypto forecasting competition. It was created during a live coding session on twitch. Check it out here: https://www.twitch.tv/medallionstallion_

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
from itertools import cycle
import datetime as dt
pd.set_option('max_columns', 50)
plt.style.use('bmh')
color_pal = plt.rcParams['axes.prop_cycle'].by_key()['color']
color_cycle = cycle(plt.rcParams['axes.prop_cycle'].by_key()['color'])

## Read in the training data
Some things of note:
- The training data is ~2.7G
- We are given supplimental data about the assets we are trying to predict.

In [None]:
!ls -GFlash --color ../input/g-research-crypto-forecasting

In [None]:
train = pd.read_csv('../input/g-research-crypto-forecasting/train.csv')
asset = pd.read_csv('../input/g-research-crypto-forecasting/asset_details.csv')

# What are we trying to predict?
- We are trying to predict the prices of 14 "assets".
- Each asset is given a different weight in the evaluation metric.
- We are given historic prices by the minute: the start, end, max and min price.
- Bitcoin predictions are the most important... Marker coins are the least.

In [None]:
ax = asset.set_index('Asset_Name')['Weight'] \
    .sort_values(ascending=False) \
    .plot(kind='bar',
          figsize=(12, 5),
          color=color_pal[0]
         )
ax.set_title('Asset Weight in Evaluation Metric', fontsize=18)
ax.set_xlabel('Asset Name')
ax.set_ylabel('Metric Weight')
ax.bar_label(ax.containers[0], fmt='%0.2f', color='white', padding=-12, fontsize=10)
plt.xticks(rotation=45, ha='right')
plt.show()

# Training Data And Assets

For most assets we are provided 19M rows of historical data. Some of the coins have less historical data.

In [None]:
asset_map = asset.set_index('Asset_ID')['Asset_Name'].to_dict()
asset_count = train['Asset_ID'].value_counts() \
    .sort_values()
asset_count.index = asset_count.index.map(asset_map)
ax = asset_count.plot(kind='barh',
                 title='Count of Rows per Asset ID in Training Dataset',
                 figsize=(12, 6),
                color=color_pal[1])
ax.set_xlabel('Rows in Training Set')
plt.show()

# Coin Prices over Time
- Lets take a look at some time trends.
- We convert the timestamp column into datetime
- Plot historic prices per coin.

In [None]:
train['Asset_Name'] = train['Asset_ID'].map(asset_map)
train['datetime'] = pd.to_datetime(train['timestamp'], unit='s')

In [None]:
fig, axs = plt.subplots(5, 3,
                        figsize=(15, 15),
                       sharex=True)
axs = axs.flatten()
i = 0
for asn, d in train.sample(1_000_00, random_state=529).groupby('Asset_Name'):
    d.set_index('datetime')['Open'] \
        .plot(color=next(color_cycle),
              ax=axs[i],
              title=asn)
    i += 1
fig.suptitle('Training Data Historic Prices', fontsize=25, y=0.95)
plt.show()

In [None]:
fig, axs = plt.subplots(5, 3,
                        figsize=(15, 15),
                       sharex=True)
axs = axs.flatten()
i = 0
for asn, d in train.sample(1_000_00, random_state=529).groupby('Asset_Name'):
    d.set_index('datetime')['Volume'] \
        .plot(color=next(color_cycle),
              ax=axs[i],
              title=asn)
    i += 1
fig.suptitle('Training Data - Historic Trading Volume', fontsize=25, y=0.95)
plt.show()

In [None]:
fig, axs = plt.subplots(5, 3,
                        figsize=(15, 15),
                       sharex=True)
axs = axs.flatten()
i = 0
for asn, d in train.sample(1_000_00, random_state=529).groupby('Asset_Name'):
    d.set_index('datetime')['VWAP'] \
        .plot(color=next(color_cycle),
              ax=axs[i],
              title=asn)
    i += 1
fig.suptitle('Training Data - Historic Trading Volume weighted average price for the minute.',
             fontsize=25, y=0.95)
plt.show()

## What is VWAP?

Plotting VWAP vs the opening prices show they are very similar. VWAP is "The volume weighted average price for the minute."

In [None]:
fig, axs = plt.subplots(5, 3,
                        figsize=(15, 15))
axs = axs.flatten()
i = 0
for asn, d in train.sample(1_000_00, random_state=529).groupby('Asset_Name'):
    d.set_index('datetime')['Volume'].apply(np.log) \
        .plot(color=next(color_cycle),
              ax=axs[i],
              kind='kde',
              title=asn)
    i += 1
fig.suptitle('Training Data - Historic Trading Log(Volume)',
             fontsize=25, y=0.95)
plt.show()

# Time Series Trends.
- How does the day of week, time of year, etc impact the trading value?
- Does our target have time series trends?

In [None]:
def time_series_features(df, dt_col='datetime', label=None):
    """
    Creates time series features from datetime index.
    """
    df = df.copy()
    df['hour'] = df[dt_col].dt.hour
    df['dayofweek'] = df[dt_col].dt.dayofweek
    df['quarter'] = df[dt_col].dt.quarter
    df['month'] = df[dt_col].dt.month
    df['year'] = df[dt_col].dt.year
    df['dayofyear'] = df[dt_col].dt.dayofyear
    df['dayofmonth'] = df[dt_col].dt.day
    df['weekofyear'] = df[dt_col].dt.isocalendar().week
    return df

train = time_series_features(train)

## 2021 VWAP by Month

In [None]:
fig, axs = plt.subplots(5, 3,
                        figsize=(15, 15), sharex=True)
axs = axs.flatten()
i = 0
for asn, d in train.query('year == 2021').sample(1_000_00, random_state=529).groupby('Asset_Name'):
    sns.boxplot(data=d, x='month', y='VWAP', ax=axs[i])
    axs[i].set_title(asn)
    i += 1
plt.tight_layout()
plt.show()

# Target for each Asset

Target is derived from log returns ( `Ra` ) over 15 minutes. We can visualize how the target is different over time for each asset.

In [None]:
fig, axs = plt.subplots(5, 3,
                        figsize=(15, 15),
                       sharex=True)
axs = axs.flatten()
i = 0
for asn, d in train.sample(1_000_00, random_state=529).groupby('Asset_Name'):
    d.set_index('datetime')['Target'] \
        .plot(color=next(color_cycle),
              ax=axs[i],
              title=asn)
    i += 1
fig.suptitle('Training Data - Target',
             fontsize=25, y=0.95)
plt.show()

# Bitcoin Features

In [None]:
train_subset = train.query('Asset_Name == "Bitcoin"') \
    .sample(1_000, random_state=529)
ax = sns.pairplot(train_subset,
             hue='year',
             vars=['Volume',
                   'Count',
                   'VWAP',
                   'Target',
                  ])

# Ethereum Features

In [None]:
train_subset = train.query('Asset_Name == "Ethereum"') \
    .sample(1_000, random_state=529)
ax = sns.pairplot(train_subset,
             hue='year',
                  palette='Spectral',
             vars=['Volume',
                   'Count',
                   'VWAP',
                   'Target',
                  ])

# TODO:
Start a baseline Model