In this notebook we are going to lay the framework for a deployable model/script to output a live prediction with a simple initialization. We will establish all required functions and data processsing steps.

In [2]:
# import all of our required libraries for necessary data processing and data requests

import numpy as np
import pandas as pd
from binance.client import Client

In [5]:
# define our function to retrieve klines data from binance API

def get_data():
    
    '''
    This function will execute API call to Binance to retrieve data.
    We will export the results of this data into the appropriately named dataframe for further feature engineering.
    '''
    
    client = Client()
    # establishing our blank client
    
    candles = client.get_klines(symbol='BTCUSDT', interval=Client.KLINE_INTERVAL_1DAY, limit=90)
    # we only need to request the most recent 90 days to calculate our prediction data
    
    data = pd.DataFrame(candles, columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Close time', 'Quote asset volume', 'Number of trades', 'Taker buy base volume', 'Taker buy quote volume', 'Ignore'])
    # these column labels are as labelled on the Binance API documentation
    
    data.drop(['Close time', 'Ignore'], axis=1, inplace=True)
    # dropping unneeded columns
    
    data['Date'] = data['Date'].apply(lambda x: pd.to_datetime(x, unit='ms'))
    # converting to proper date format for better visual reference
    
    data.set_index('Date', inplace=True)
    # setting index to date
    
    data = data.astype('float64')
    # converting from object type to float type
    
    return data

In [6]:
data = get_data()

In [7]:
data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Quote asset volume,Number of trades,Taker buy base volume,Taker buy quote volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2021-09-03,49246.63,51000.0,48316.84,49999.14,59025.644157,2949574000.0,1645163.0,29625.443383,1481238000.0
2021-09-04,49998.0,50535.69,49370.0,49915.64,34664.65959,1733527000.0,1225830.0,16739.70385,837225500.0
2021-09-05,49917.54,51900.0,49450.0,51756.88,40544.835873,2049212000.0,1417660.0,21557.235864,1090240000.0
2021-09-06,51756.88,52780.0,50969.33,52663.9,49249.667081,2549291000.0,1678015.0,24620.352525,1274643000.0
2021-09-07,52666.2,52920.0,42843.05,46863.73,123048.802719,6004106000.0,3321711.0,57935.601976,2832269000.0


Excellent. This works perfectly. Now we can import the functions created in the last notebook for feature creation.

In [9]:
# we will define a function to run prior to calcualting our averages

def feat_eng(X_df):
    '''
    Intakes "X" portion of data and outputs selected engineered features
    '''
    
    X_df['High/Low'] = X_df['High'] - X_df['Low']
    X_df['volX'] = X_df['Quote asset volume'] / X_df['Volume']
    X_df['quote-buy'] = X_df['Taker buy quote volume'] / X_df['Taker buy base volume']
    
    return X_df

# lets define a function to create our moving averages and incoroprate them into our dataframe

def get_sma(X_df):
    '''
    This function intakes the "X" portion of the data and returns the data with moving average columns applied
    '''
    
    SMAs = [7,30,90]                                                     # 7, 30, and 90 day simple moving averages
    for val in SMAs:
        X_df[str(val)+'sma'] = X_df['Close'].rolling(f'{val}D').mean()   # using the pandas rolling function to calculate mean values over each desired SMA value
        
    return X_df

In [12]:
data[-1:]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Quote asset volume,Number of trades,Taker buy base volume,Taker buy quote volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2021-12-01,56950.56,57686.12,56630.0,57161.19,6740.77919,385451200.0,221622.0,3371.37848,192790800.0


In [13]:
# Now we want to take the most recent data point possible to make our prediction from

def X_input(X_df):
    x_input = X_df[-1:]        # take the most recent value after calculations for passing into model
    
    return x_input

This covers all of our feature engineering. After running these 4 functions, we will have the wanted data to input into our final model for a prediction.

Now we just need to finalize our model and we will be able to write a script which upon running will output both the current data and the predicted outcome.

In [14]:
# now to create a function that ties all of these together and gives us our desired input for the model

def to_predict():
    
    data = get_data()
    data_features = feat_eng(data)
    data_all = get_sma(data_features)
    x_input = X_input(data_all)
    
    return x_input

In [15]:
X = to_predict()
X

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Quote asset volume,Number of trades,Taker buy base volume,Taker buy quote volume,High/Low,volX,quote-buy,7sma,30sma,90sma
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2021-12-01,56950.56,57686.12,56630.0,56938.16,7630.46041,436160800.0,251360.0,3712.99828,212262500.0,1056.12,57160.488512,57167.403151,56621.11,60530.089667,55014.341556


Perfect. This combination of functions will execute and gives us our desired input to feed into the model. Now we just need a workable model.