**NOTE: ** This notebook is still largely a work in progress. I have made it public for people to get an early impression and perhaps, leave me some feedback, but still, a large part of it still needs to be added.

In [16]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
print(os.listdir("../input"))

In [17]:
df = pd.read_csv('../input/coinbaseUSD_1-min_data_2014-12-01_to_2018-03-27.csv')
df.tail()

Before we start extracting features out of the data, let's do two simple things:
* Most data exports represent time as a numeric timestamp. This is perfect for machines, but not really comprehensible for us, humans. So, we will add a human-readable `Date` column.
* Relying on the current, minute-per-minute price might create a solution too sensitive to changes. Let's instead create a moving average of the past 1 hr, which will smooth things out a little.

In [18]:
df['Date'] = pd.to_datetime(df['Timestamp'], unit='s')
df['Weighted_Price_MA60'] = df['Weighted_Price'].rolling(60).mean()
df.tail()

## Engineering the features

Let's find two types of features from our input data:
The first one will be a relative ratio between the current price and the maximum price within a given time window, e.g. 1 day, 7 days, 30, 60, 90, and 120 days. These are arbitrary numbers, chosen due to their popularity in technical aanalysis. After some testing, it might turn out that other time windows work better.

The idea is to achieve a simple way of representing price patterns, as illustrated here:
![](https://preslav.me/assets/img/2018/june/price_patterns.jpg)

In [19]:
CN_WEIGHTED_PRICE = 'Weighted_Price_MA60'

for days in [1, 7, 14, 30, 60, 90, 120]:
    minutes = days * 24 * 60
    #print(days, df[CN_WEIGHTED_PRICE].rolling(minutes).mean().iloc[-1])
    df['WP_MA' + str(days) + "d"] = df[CN_WEIGHTED_PRICE] / df[CN_WEIGHTED_PRICE].rolling(minutes).max()

df.tail()    

## Categorize the data

There are many ways to categorize the data, but the simplest one will be to simply grab a future price (say, one day later) at each and every data point, and compare it with the price at the given moment. Thankfully, Pandas makes this such a breeze thanks to its forward and reverse column-shifting capabilities. All we have to do, is compare the current price at anay given moment with the price value of the negatively shifted copy of the price column. If the value is greater than 1, we assign it the label `1`, otherwise, `-1`

In [21]:
#df.loc[df[CN_NEXT_PCT_CHANGE] >= 0, CN_LABEL] = 1
#df.loc[df[CN_NEXT_PCT_CHANGE] <= 0, CN_LABEL] = -1
future_price = df[CN_WEIGHTED_PRICE].shift(-24)
future_price_pct_gain = (future_price / df[CN_WEIGHTED_PRICE]) - 1

CN_LABEL = 'label'

df.loc[future_price_pct_gain >= 0, CN_LABEL] = 1
df.loc[future_price_pct_gain <= 0, CN_LABEL] = -1