<a href="https://colab.research.google.com/github/j03m/lstm-price-predictor/blob/main/Coin_Predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Current Todo:

* ~Add volatility indicators?~ DONE

The test below shows the with RVI attached, the model actually performs better - but only against what appears to be  higher volume coins. 

Some ideas here: 


* Train against some sh1t coins and see if that helps and doesn't break accuracy for previously trained segments

* Maybe scaling volume directly isn't the right move. Maybe all of our volume should be turned into a ratio against bitcoins volume - and we scale that. That will help
the model potentially understand that the scaled volume is still much lower than bitcoins. Bitcoin volume would be 1. I don't know how that will vibe with the stock series volume. We may need to retrain.

* We can also add VWAP, or mVWAP and SDVWAP


## Next Todo:

* Add TA fields, use random forest to check which fields are the best, verify with mean square error (this could take a while)

* How to verify trades on highly volatile but illiquid predictions?

* SVM direction indicator?

#IMPORT DATASETS AND LIBRARIES


In [27]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

import sys

sys.path.insert(0,'/content/drive/My Drive/ml-trde-notebooks')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [28]:
!pip install pandas_ta

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


#Library

In [29]:
import pandas as pd
import pandas_ta as ta
import plotly.express as px
from copy import copy
from scipy import stats
import matplotlib.pyplot as plt
import numpy as np
import plotly.figure_factory as ff
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score, confusion_matrix, classification_report, accuracy_score, f1_score
from tensorflow import keras
from sklearn.preprocessing import MinMaxScaler
import requests
from requests.exceptions import HTTPError
import json as js
from datetime import datetime, timedelta
import time
from os.path import exists
from decimal import *
from sklearn.model_selection import RandomizedSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras import backend as K
from sklearn.model_selection import KFold, ParameterGrid
from keras.layers import Input, LSTM, Attention, Dense
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error, mean_absolute_error


pd.options.display.float_format = '{:f}'.format
np.set_printoptions(formatter={'float': '{:f}'.format})

# Function to plot interactive plots using Plotly Express
sc = MinMaxScaler()
num_features = 2 #3
candle_features = 5 #6
coin_base = False
ku_coin = True
load_models = True
extra_training = False
COINBASE_REST_API = 'https://api.pro.coinbase.com'
COINBASE_PRODUCTS = COINBASE_REST_API+'/products'
KUCOIN_REST_API = "https://api.kucoin.com"
KUCOIN_PRODUCTS = KUCOIN_REST_API+ "/api/v1/market/allTickers"
KUCOIN_CANDLES = KUCOIN_REST_API+ "/api/v1/market/candles"

data_path = '/content/drive/My Drive/ml-trde-notebooks/data'
model_path = "/content/drive/My Drive/ml-trde-notebooks/models"

def interactive_plot(df, title):
  fig = px.line(title = title)
  for i in df.columns[1:]:
    fig.add_scatter(x = df['Date'], y = df[i], name = i)
  fig.show()

def get_single_stock(price_df, vol_df, name):
    return pd.DataFrame({'Date': price_df['Date'], 'Close': price_df[name], 'Volume': vol_df[name]})

def scale_data(data):
  # Scale the data
  scaled_data = sc.fit_transform(data)
  return scaled_data

def sort_date(pric_df):
  pric_df = pric_df.sort_values(by = ['Date'])
  return pric_df

def append_price_dif(df):
  df['Target'] = df['Close'].shift(-1)
  df['Diff'] = df['Target'] - df['Close']
  df = df[:-1]
  return df

def append_price_dif_(df):
  df['Target'] = df['Close'].shift(-1)
  df['Diff'] = df['Target'] - df['Close']
  return df

def append_15d_slope(df):
  df['15Close'] = df['Close'].shift(15)
  df['15Date'] = df['Date'].shift(15)
  df['Trend'] = (df['Close'] - df['15Close']) / 15
  df = df[15:]
  return df

def show_plot(data, title):
  plt.figure(figsize = (13, 5))
  plt.plot(data, linewidth = 3)
  plt.title(title)
  plt.grid()

def build_model(features, outcomes):
  # Create the model
  inputs = keras.layers.Input(shape=(features,outcomes))
  x = keras.layers.LSTM(150, return_sequences= True)(inputs)
  x = keras.layers.Dropout(0.3)(x)
  x = keras.layers.LSTM(150, return_sequences=True)(x)
  x = keras.layers.Dropout(0.3)(x)
  x = keras.layers.LSTM(150)(x)
  outputs = keras.layers.Dense(1, activation='linear')(x)

  model = keras.Model(inputs=inputs, outputs=outputs)
  model.compile(optimizer='adam', loss="mse")
  return model

def build_attention_model(features, outcomes):
  # Create the model
  inputs = keras.layers.Input(shape=(features,outcomes))
  x = keras.layers.LSTM(150, return_sequences= True)(inputs)
  x = keras.layers.Dropout(0.3)(x)
  x = keras.layers.LSTM(150, return_sequences=True)(x)
  x = keras.layers.Dropout(0.3)(x)
  x = keras.layers.LSTM(150)(x)
  attention_layer = Attention()([x, x])
  outputs = keras.layers.Dense(1, activation='linear')(x)
  model = Model(inputs=inputs, outputs=outputs)
  model.compile(optimizer='adam', loss="mse")
  return model

def connect(url, params):
  response = requests.get(url,params)
  response.raise_for_status()
  return response

def coinbase_json_to_df(delta, product, granularity='86400'):
  start_date = (datetime.today() - timedelta(seconds=delta*int(granularity))).isoformat()
  end_date = datetime.now().isoformat()
  # Please refer to the coinbase documentation on the expected parameters
  params = {'start':start_date, 'end':end_date, 'granularity':granularity}
  response = connect(COINBASE_PRODUCTS+'/' + product + '/candles', params)
  response_text = response.text
  df_history = pd.read_json(response_text)
  # Add column names in line with the Coinbase Pro documentation
  df_history.columns = ['time','low','high','open','close','volume']
  df_history['time'] = [datetime.fromtimestamp(x) for x in df_history['time']]
  return df_history

def ku_coin_json_to_df(delta, product, granularity='86400'):
  granularity = int(granularity)
  start_date = (datetime.today() - timedelta(seconds=delta*granularity))
  end_date = datetime.now()

  # Please refer to the kucoin documentation on the expected parameters
  params = {'startAt':int(start_date.timestamp()), 'endAt':int(end_date.timestamp()), 'type':gran_to_string(granularity), 'symbol':product}
  response = connect(KUCOIN_CANDLES, params)
  response_text = response.text
  response_data = js.loads(response_text);
  if (response_data["code"] != "200000"):
    raise Exception("Illegal response: " + response_text)

  df_history = pd.DataFrame(response_data["data"])

  # kucoin is weird in that they don't have candles for everything. IF we don't have the requested
  # number of bars here, it throws off the whole algo. I don't want to try and project so we
  # just won't trade those instruments
  got_bars = len(df_history)
  if ( got_bars < delta-1):
    raise Exception("Requested:" + str(delta) + " bars " + " but only got:" + str(got_bars))

  df_history.columns = ['time','open','close','high','low','volume', 'amount']
  df_history['time'] = [datetime.fromtimestamp(int(x)) for x in df_history['time']]
  df_history['open'] = [float(x) for x in df_history['open']]
  df_history['close'] = [float(x) for x in df_history['close']]
  df_history['high'] = [float(x) for x in df_history['high']]
  df_history['low'] = [float(x) for x in df_history['low']]
  df_history['low'] = [float(x) for x in df_history['low']]
  df_history['volume'] = [float(x) for x in df_history['volume']]
  df_history['amount'] = [float(x) for x in df_history['amount']]
  return df_history

def gran_to_string(granularity):
  #todo implement this actually
  if granularity == 86400:
    return "1day"
  if granularity == 900:
    return "15min"
  raise Exception("Joe didn't implement a proper granularity to string. Lazy, lazy.")

#def get_coin_data_frames(time, product, granularity='86400', feature_set = ["Close", "Volume", "Trend"]):
def get_coin_data_frames(time, product, granularity='86400', feature_set = ["Close", "Volume"]):
  if coin_base:
    df_raw = coinbase_json_to_df(time, product, granularity)
  else:
    df_raw = ku_coin_json_to_df(time, product, granularity)

  df_btc_history = df_raw
  if len(df_btc_history.index) == 0:
    print("No data for ", product)

  df_btc_history = df_btc_history.rename(columns={"time":"Date", "open":"Open", "high":"High", "low":"Low", "close":"Close", "volume":"Volume"})
  df_btc_history = sort_date(df_btc_history)
  df_btc_history = append_price_dif_(df_btc_history)
  #df_btc_history = append_15d_slope(df_btc_history)
  df_btc_features = df_btc_history[feature_set]
  df_history_scaled = sc.fit_transform(df_btc_features)
  return [df_btc_history, df_btc_features, df_history_scaled, df_raw]

def build_profit_estimate(predicted, df_btc_history):
  df_predicted_chart = pd.DataFrame();
  df_predicted_chart["Date"] = df_btc_history["Date"]
  df_predicted_chart["Predicted"] = predicted
  df_predicted_chart["Predicted-Target"] = df_predicted_chart["Predicted"].shift(-1)
  df_predicted_chart["Predicted-Diff"] = df_predicted_chart["Predicted-Target"] - df_predicted_chart["Predicted"]
  df_predicted_chart["Should-Trade"] = np.where(df_predicted_chart["Predicted-Diff"] > 0, True, False)
  df_predicted_chart["RealDiff"] = df_btc_history["Diff"]
  df_predicted_chart["Percent"] = df_predicted_chart["RealDiff"] / df_btc_history["Close"]
  df_predicted_chart["Profit"] = np.where(df_predicted_chart["Should-Trade"] > 0, df_predicted_chart["Percent"] * budget, 0)
  profit = df_predicted_chart["Profit"].sum()
  return [df_predicted_chart, profit]

def debug_prediction_frame(predicted, df_history, df_history_scaled):
  df_predicted_chart = pd.DataFrame();
  df_predicted_chart["Date"] = df_history["Date"]
  df_predicted_chart["Predicted"] = predicted
  df_predicted_chart["Original"] = df_history_scaled[:,0]
  #Trend
  #df_predicted_chart["Original-Target"] = df_history_scaled[:,2]
  df_predicted_chart["Original-Target"] = df_history_scaled[:,1]
  df_predicted_chart["Target-Date"] = df_predicted_chart["Date"].shift(-1)
  df_predicted_chart["Predicted-Diff"] = df_predicted_chart["Predicted"] - df_predicted_chart["Original"]
  df_predicted_chart["Actual-Diff"] = df_predicted_chart["Original-Target"] - df_predicted_chart["Original"]
  df_predicted_chart["Should-Trade"] = np.where(df_predicted_chart["Predicted-Diff"] > 0, True, False)
  df_predicted_chart["Close"] = df_history["Close"]
  df_predicted_chart["Target"] = df_history["Target"]
  df_predicted_chart["RealDiff"] = df_history["Diff"]
  df_predicted_chart["Percent"] = df_predicted_chart["RealDiff"] / df_predicted_chart["Close"]
  df_predicted_chart["Profit"] = np.where(df_predicted_chart["Should-Trade"] > 0, df_predicted_chart["Percent"] * budget, 0)
  return df_predicted_chart

def get_all_products():
  if coin_base:
    return get_all_coinbase_products()

  if ku_coin:
    return get_all_kucoin_products()

def get_all_kucoin_products():
  response = connect(KUCOIN_PRODUCTS, {})
  products = js.loads(response.text)
  df_products = pd.DataFrame(products["data"]["ticker"])
  df_products = df_products.rename(columns={"symbol":"id"})
  return df_products

def get_all_coinbase_products():
  response = connect(COINBASE_PRODUCTS, {})
  response_text = response.text
  df_products = pd.read_json(response_text)
  return df_products

def predict_trade(model, product, bars, npa_scaled=[]):

  if len(npa_scaled) == 0:
    print("downloading...")
    [df_full, df_features, npa_scaled, df_raw] = get_coin_data_frames(bars, product)

  predicted = model.predict(npa_scaled).flatten()

  #convert to data frames that have the correct shape for being unscaled
  #df_scaled = pd.DataFrame(npa_scaled, columns = ["Close", "Volume", "Trend"])
  df_scaled = pd.DataFrame(npa_scaled, columns = ["Close", "Volume"])

  # I want to believe that scaling happens on a per column basis, we only care about
  # price here so we will dummy out volume and trend and use the scaler on it
  # this kinda sucks, if we add features we'll need to add them here for unscaling
  df_temp = pd.DataFrame(predicted, columns = ["Close"])
  df_temp["Volume"] = 0
  #df_temp["Trend"] = 0

  # unscale them both
  #df_temp = pd.DataFrame(sc.inverse_transform(df_temp), columns = ["Close", "Volume", "Trend"])
  #df_trade = pd.DataFrame(sc.inverse_transform(df_scaled), columns = ["Close", "Volume", "Trend"])
  df_temp = pd.DataFrame(sc.inverse_transform(df_temp), columns = ["Close", "Volume"])
  df_trade = pd.DataFrame(sc.inverse_transform(df_scaled), columns = ["Close", "Volume"])


  # add predicted
  df_trade["Predicted"] = df_temp["Close"]
  df_trade = df_trade.tail(1)

  # add the product, derive a move and percent
  df_trade["Product"] = row.id;
  df_trade["Move"] = df_trade["Predicted"] - df_trade["Close"]
  df_trade["Percent"] = (df_trade["Move"] / df_trade["Close"]) * 100
  df_trade["RawPercent"] = df_trade["Move"] / df_trade["Close"]
  df_trade["250Fees"] = (250 * 0.004) * 2
  df_trade["5kFees"] = (5000 * 0.004) * 2
  df_trade["10kFees"] = (10000 * 0.0025) * 2
  df_trade["250Profit"] = (250 * df_trade["RawPercent"]) - df_trade["250Fees"]
  df_trade["5kProfit"] = (5000 * df_trade["RawPercent"]) - df_trade["5kFees"]
  df_trade["10k0Profit"] = (10000 * df_trade["RawPercent"]) - df_trade["10kFees"]
  return df_trade

def get_yf_training_set_for(df, columns=["Open", "High", "Low", "Close", "Volume", "Target"]):
  target_df = append_price_dif(df)
  features = target_df[columns]
  scaled_features = scale_data(features)
  return extract_training(scaled_features, len(target_df),len(features.columns)-1)

def extract_training(scaled_features, length, num_features):
  X = []
  y = []

  for i in range(0, length):
    X.append(scaled_features [i][0:num_features])
    y.append(scaled_features [i][num_features])
  X = np.asarray(X)
  y = np.asarray(y)
  return [scaled_features, X, y]

def get_training_set_for(ticker):
  target_df = get_single_stock(all_stocks_price_df, all_stocks_vol_df, ticker)
  target_df = append_price_dif(target_df)
  #target_df = append_15d_slope(target_df)
  #features = target_df[["Close", "Volume", "Trend", "Target"]]
  features = target_df[["Close", "Volume", "Target"]]
  scaled_features = scale_data(features)
  return extract_training(scaled_features, len(target_df), len(features.columns)-1)

def train_model(model, X, y):

  # One day we might need test, but for now we don't we can use another
  # time series, we have so many
  # Split the data
  #split = int(0.7 * len(X))
  #X_train = X[:split]
  #y_train = y[:split]
  #X_test = X[split:]
  #y_test = y[split:]

  # Reshape the 1D arrays to 3D arrays to feed in the model
  X_train = np.reshape(X, (X.shape[0], X.shape[1], 1))

  # Create an early stopping callback
  early_stopping = EarlyStopping(monitor='val_loss', patience=5)

  history = model.fit(
      X_train, y,
      epochs = 20,
      batch_size = 32,
      validation_split = 0.2,
      callbacks=[early_stopping]
  )
  return [model, history]

def get_group_bars(df):
  df = pd.DataFrame(sc.fit_transform(df[["Close", "Volume"]]), columns=["Close","Volume"])
  # Split into input sequences and target values
  n_steps = 4*4  # 4 hours of data at 15 minute intervals
  X = []
  Y = []
  for i in range(0, len(df), n_steps):
    df_group = df.iloc[i:i+n_steps]
    if len(df_group) != n_steps:
      continue
    X.append(np.array(df_group.values))
    Y.append(df_group.values[-1,0])

  # Convert the lists to NumPy arrays
  X = np.array(X)
  Y = np.array(Y)
  return [X, Y]

def getTrainingVanilla15mSet(ticker):
  file_path = data_path + "/" + ticker + "-15.csv"
  df = sort_date(pd.read_csv(file_path).rename(columns={"Datetime":"Date"}))
  df['Date'] = pd.to_datetime(df['Date'])
  return get_group_bars(df)


def build_15m_model(getTrainingSet=getTrainingVanilla15mSet):
  # the 15 min bar model
  # Build the model
  group_size = 4*4
  features = 2
  model15 = build_model(group_size, features)

  # Compile the model
  model15.compile(loss='mean_squared_error', optimizer='adam')

  # Train it
  tickers = ["SPY", "IBM", "TSLA", "CAT", "XOM", "B", "F", "AAPL", "AMZN"]

  early_stopping = EarlyStopping(monitor='val_loss', patience=5)
  for ticker in tickers:
    [X,Y] = getTrainingSet(ticker)
    model15.fit(X, Y,
      epochs = 20,
      batch_size = 32,
      validation_split = 0.2,callbacks=[early_stopping])
  return model15

def fetch_and_predict_short_term(model, product):
  if coin_base:
    df_raw = coinbase_json_to_df(16, product, 900)
  else:
    df_raw = ku_coin_json_to_df(16, product, 900)
  df_raw = df_raw.rename(columns={"close":"Close", "volume": "Volume"})
  [X, Y] = get_group_bars(df_raw[["Close", "Volume"]])
  predicted = model.predict(X)
  df_pred = pd.DataFrame(predicted, columns = ["Close"])
  df_pred["Volume"] = 0
  return [predicted.flatten()[0], sc.inverse_transform(df_pred).flatten()[0]]

def attachVWAPS(df, length):
  vwaps = df
  vwaps.set_index(pd.DatetimeIndex(vwaps["Date"]), inplace=True)
  vwaps["VWAP"] = df.ta.vwap(length=length)
  vwaps = vwaps.dropna(subset=["VWAP"])
  vwaps['VWAPD'] = vwaps['Close'] - vwaps['VWAP']
  return vwaps

def attachRVI(df):
  vol_df = df
  vol_df["RVI"] = df.ta.rvi()
  return vol_df.dropna(subset=["RVI"])

#pull training data
all_stocks_price_df = sort_date(pd.read_csv(data_path+'/stock.csv'))
all_stocks_vol_df = sort_date(pd.read_csv(data_path+"/stock_volume.csv"))
spy_df = sort_date(pd.read_csv(data_path+'/SPY.csv'))
cat_df = sort_date(pd.read_csv(data_path+'/CAT.csv'))
f_df = sort_date(pd.read_csv(data_path+'/F.csv'))
xom_df = sort_date(pd.read_csv(data_path+'/XOM.csv'))
ibm_df = sort_date(pd.read_csv(data_path+'/IBM.csv'))
spy_15_df = sort_date(pd.read_csv(data_path+'/SPY-15.csv').rename(columns={"Datetime":"Date"}))

# Experimental


In [30]:
if 0:
  dfs = [spy_df, cat_df, f_df, ibm_df, xom_df]
  model_vwap = build_model(5, 1)
  for df in dfs:
    vol_df = attachRVI(df)
    vol_df = attachVWAPS(vol_df, 30)
    [scaled_features, X, y] = get_yf_training_set_for(vol_df, columns=["Close", "Volume", "RVI", "VWAP", "VWAPD", "Target"])  
    [model_vwap, history] = train_model(model_vwap, X, y)
  model_vwap.save(model_path + "/model_vwap.h15")  

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20




In [None]:
# TODO: Load these models and then test them to see whats what!
model_exp = keras.models.load_model(model_path + "/model_exp.h15")
model_vwap = keras.models.load_model(model_path + "/model_vwap.h15")  

In [65]:
[btc_history, df_btc_features, df_history_scaled, df_raw] = get_coin_data_frames(180, "ETH-USDT")
rvi_history = attachRVI(btc_history)
X = scale_data(rvi_history[["Close", "Volume", "RVI"]])
predicted_orig = model_orig.predict(X[:,[0,1]]).flatten()
predicted_rvi = model_exp.predict(X).flatten()

scaled_close = scale_data(rvi_history[["Close"]])

print("predicted_orig mse: ", mean_squared_error(scaled_close, predicted_orig))
print("predicted_exp  mse: ", mean_squared_error(scaled_close, predicted_rvi))
 
df_chart = pd.DataFrame();
df_chart["Date"] = rvi_history["Date"]
df_chart["Close"] = X[:,[0]]
df_chart["Orig"] = predicted_orig
df_chart["RVI"] = predicted_rvi

interactive_plot(df_chart[["Date","Orig", "Close", "RVI"]], "wow")    

predicted_orig mse:  0.00034375439242626156
predicted_exp  mse:  0.00026147204178347203


# Get or Train a Model

In [None]:
if load_models:
  #load models
  print("loading models from disk")
  model_orig = keras.models.load_model(model_path + "/model_orig.h15")
  model_ohlc = keras.models.load_model(model_path + "/model_ohlc.h15")
  model_att1 = keras.models.load_model(model_path + "/model_att1.h15")
  model_att2 = keras.models.load_model(model_path + "/model_att2.h15")
  model_15m  = keras.models.load_model(model_path + "/model_15m.h15")
else:
  #if file_exists:
  #  print("hello")
  #  model = keras.models.load_model(model_path)
  #else:

  [scaled_features, X, y] = get_training_set_for("sp500")  
  [scaled_features1, X1, y1] = get_yf_training_set_for(spy_df)  

  model_orig = build_model(num_features, 1)
  model_ohlc = build_model(candle_features, 1)
  model_att1 = build_attention_model(num_features, 1)
  model_att2 = build_attention_model(candle_features, 1)

  [model_att1, history] = train_model(model_att1, X, y)
  [model_orig, history] = train_model(model_orig, X, y)
  [model_ohlc, history] = train_model(model_ohlc, X1, y1)
  [model_att2, history] = train_model(model_att2, X1, y1)
  model_15m = build_15m_model()

  model_orig.save(model_path + "/model_orig.h15")
  model_ohlc.save(model_path + "/model_ohlc.h15")
  model_att1.save(model_path + "/model_att1.h15")
  model_att2.save(model_path + "/model_att2.h15")
  model_15m.save(model_path + "/model_15m.h15")


loading models from disk


In [None]:
if extra_training:
  [scaled_features, X, y] = get_training_set_for("IBM")  
  [model_orig, history] = train_model(model_orig, X, y)
  [scaled_features, X, y] = get_training_set_for("T")  
  [model_orig, history] = train_model(model_orig, X, y)
  [scaled_features, X, y] = get_training_set_for("BA")  
  [model_orig, history] = train_model(model_orig, X, y)
  [scaled_features, X, y] = get_training_set_for("TSLA")  
  [model_orig, history] = train_model(model_orig, X, y)


In [None]:
# additional training?
if extra_training:

  [scaled_features1, X1, y1] = get_yf_training_set_for(cat_df)  
  [model_ohlc, history] = train_model(model_ohlc, X1, y1)
  [scaled_features1, X1, y1] = get_yf_training_set_for(f_df)  
  [model_ohlc, history] = train_model(model_ohlc, X1, y1)
  [scaled_features1, X1, y1] = get_yf_training_set_for(ibm_df)  
  [model_ohlc, history] = train_model(model_ohlc, X1, y1)
  [scaled_features1, X1, y1] = get_yf_training_set_for(xom_df)  
  [model_ohlc, history] = train_model(model_ohlc, X1, y1)

# Visualize and Backtest

## backtest the main models

In [None]:

# Run Random Search flat, attention flat, random ohlc and flat ohlc against these two models
# Loss isn't cutting it, its always 0?
# After we find the winner above, run it trained on spy vs trained on all

[btc_history, df_btc_features, df_history_scaled, df_raw] = get_coin_data_frames(180, "GALAX3L-USDT")
#[btc_history1, df_btc_features1, df_history_scaled1, df_raw1] = get_coin_data_frames(180, "FCON-USDT", 86400, ["Open", "High", "Low", "Close", "Volume", "Trend"])
[btc_history1, df_btc_features1, df_history_scaled1, df_raw1] = get_coin_data_frames(180, "GALAX3L-USDT", 86400, ["Open", "High", "Low", "Close", "Volume"])

budget = 3000

scaled_features = scale_data(btc_history[["Close"]])
scaled_features1 = scale_data(btc_history1[["Close"]])

predicted_orig = model_orig.predict(df_history_scaled).flatten()
predicted_ohlc = model_ohlc.predict(df_history_scaled1).flatten()
predicted_att1 = model_att1.predict(df_history_scaled).flatten()
predicted_att2 = model_att2.predict(df_history_scaled1).flatten()

print("predicted_orig mse: ", mean_squared_error(scaled_features, predicted_orig))
print("predicted_ohlc mse: ", mean_squared_error(scaled_features1, predicted_ohlc))
print("predicted_att1 mse: ", mean_squared_error(scaled_features, predicted_att1))
print("predicted_att2 mse: ", mean_squared_error(scaled_features1, predicted_att2))

print("predicted_orig mae: ", mean_absolute_error(scaled_features, predicted_orig))
print("predicted_ohlc mae: ", mean_absolute_error(scaled_features1, predicted_ohlc))
print("predicted_att1 mae: ", mean_absolute_error(scaled_features, predicted_att1))
print("predicted_att2 mae: ", mean_absolute_error(scaled_features1, predicted_att2))


[df_profit, profit1] = build_profit_estimate(predicted_orig, btc_history)
[df_profit, profit2] = build_profit_estimate(predicted_ohlc, btc_history1)
[df_profit, profit3] = build_profit_estimate(predicted_att1, btc_history)
[df_profit, profit4] = build_profit_estimate(predicted_att2, btc_history1)


df_chart = debug_prediction_frame(predicted_orig, btc_history, df_history_scaled)
df_chart["Predicted-ohlc"] = predicted_ohlc
df_chart["Predicted-att1"] = predicted_att1
df_chart["Predicted-att2"] = predicted_att2

#interactive_plot(df_chart[["Date","Original", "Predicted-ohlc", "Predicted", "Predicted-atten", "Predicted-opt"]], "Wtf")
interactive_plot(df_chart[["Date","Original", "Predicted-ohlc", "Predicted", "Predicted-att1", "Predicted-att2"]], "Wtf")
print("Profits:", profit1, profit2, profit3, profit4)




predicted_orig mse:  0.0002889011502731766
predicted_ohlc mse:  0.0010731565560603994
predicted_att1 mse:  0.0010661308382314478
predicted_att2 mse:  0.0015278410415681038
predicted_orig mae:  0.010030000727504691
predicted_ohlc mae:  0.017205392952562203
predicted_att1 mae:  0.030140364509538012
predicted_att2 mae:  0.020313248457411472


Profits: 16571.70580518135 5677.673666706648 13696.66324585133 -74.2490413226983


## backtest the 15m model

In [None]:
df_raw = coinbase_json_to_df(180, "BTC-USD", 900)
df_raw = df_raw.rename(columns={"close":"Close", "time":"Date", "volume":"Volume"})
df_raw = df_raw[["Date", "Close", "Volume"]]
[X,Y] = get_group_bars(df_raw)

predicted = model_15m.predict(X).flatten()
df = pd.DataFrame()
df["Date"] = df_raw["Date"]
df["Close"] = sc.fit_transform(df_raw[["Close"]])
n_steps = 16
expanded_prd = []
last = len(predicted) -1
for i in range(0, len(df)):
  pos = int(i/n_steps)
  if pos > last:
    expanded_prd.append(predicted[-1])
  else:
    expanded_prd.append(predicted[pos])
df["Predicted"] = expanded_prd
interactive_plot(df[["Date", "Close", "Predicted"]], "Wtf")



# What has a buy indicator for tomorrow?

In [None]:
[predicted_scaled, predicted] = fetch_and_predict_short_term(model_15m, "BTC-USDT")
[predicted_scaled, predicted] 

                  time         open        Close         high          low  \
0  2023-01-03 10:15:00 16734.900000 16735.000000 16735.000000 16734.900000   
1  2023-01-03 10:00:00 16738.900000 16735.000000 16747.900000 16734.300000   
2  2023-01-03 09:45:00 16745.400000 16738.900000 16756.100000 16736.300000   
3  2023-01-03 09:30:00 16726.700000 16745.400000 16745.500000 16726.600000   
4  2023-01-03 09:15:00 16725.200000 16726.600000 16732.100000 16725.200000   
5  2023-01-03 09:00:00 16715.700000 16725.200000 16726.300000 16715.600000   
6  2023-01-03 08:45:00 16700.000000 16715.600000 16715.700000 16699.800000   
7  2023-01-03 08:30:00 16712.800000 16700.100000 16715.300000 16700.000000   
8  2023-01-03 08:15:00 16714.000000 16712.800000 16714.500000 16708.300000   
9  2023-01-03 08:00:00 16730.700000 16714.000000 16735.500000 16713.900000   
10 2023-01-03 07:45:00 16719.100000 16730.800000 16730.800000 16719.100000   
11 2023-01-03 07:30:00 16714.000000 16719.100000 16720.400000 16

[0.46584895, 16721.202957549693]

In [None]:
# Fetch the top 10 and see if they predict up
df_products = get_all_products()
df_products = df_products[df_products.id.str.endswith('USDT')]

if coin_base:
  df_products = df_products[df_products.trading_disabled == False]
  df_products = df_products[df_products.cancel_only == False]

df_trades = pd.DataFrame();
df_estc = pd.DataFrame(); #expected short term closes
df_estc["Product"] = [];
df_estc["Est Close"] = [];
df_estc["Est Close Raw"] = [];
bars = 91
counter = 0;
for index, row in df_products.iterrows():
  try:
    print("fetching: ", row.id)
    [df_full, df_features, npa_scaled, df_raw] = get_coin_data_frames(bars, row.id)
    
    df_trade = predict_trade(model_orig, row.id, bars, npa_scaled)
    df_trade_ohlc = predict_trade(model_ohlc, row.id, bars, npa_scaled)
    df_trade_att1 = predict_trade(model_att1, row.id, bars, npa_scaled)
    df_trade_att2 = predict_trade(model_att2, row.id, bars, npa_scaled)
    
    [predicted_scaled, predicted] = fetch_and_predict_short_term(model_15m, row.id)
    df2 = pd.DataFrame({'Product': [row.id], 'Est Close': [predicted], 'Est Close Raw': predicted_scaled})
    df_estc = df_estc.append(df2)

    # we need to unscale the predicted values so that we have an entry and exit point
    # entry should be roughly close and exit should be roughly predicted

    # Stick this on the end of the main dataframe
    df_trade["prd-ohlc"] = df_trade_ohlc["Predicted"]
    df_trade["pct-ohlc"] = df_trade_ohlc["Percent"]
    df_trade["prd-att1"] = df_trade_att1["Predicted"]
    df_trade["pct-att1"] = df_trade_att1["Percent"]
    df_trade["prd-att2"] = df_trade_att2["Predicted"]
    df_trade["pct-att2"] = df_trade_att2["Percent"]
    df_trades = df_trades.append(df_trade);
    
    #counter+=1
    #if counter > 5:
    #  break
  except Exception as inst:
    #raise inst
    print("Error: ", inst)
  time.sleep(1)
df_trades.reset_index()
df_buys = df_trades[df_trades['Move'] > 0] 
df_shorts = df_trades[df_trades['Move'] < 0] 




fetching:  NKN-USDT
fetching:  GEM-USDT
Error:  Requested:91 bars  but only got:85
fetching:  CUSD-USDT
Error:  Unexpected result of `predict_function` (Empty batch_outputs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.
fetching:  LTC3L-USDT
Error:  429 Client Error: Too Many Requests for url: https://api.kucoin.com/api/v1/market/candles?startAt=1672726745&endAt=1672741145&type=15min&symbol=LTC3L-USDT
fetching:  OAS-USDT
Error:  Requested:91 bars  but only got:23
fetching:  KNC-USDT
fetching:  LYM-USDT
Error:  429 Client Error: Too Many Requests for url: https://api.kucoin.com/api/v1/market/candles?startAt=1672726750&endAt=1672741150&type=15min&symbol=LYM-USDT
fetching:  HAI-USDT
fetching:  MITX-USDT
fetching:  PDEX-USDT
fetching:  FLAME-USDT
fetching:  EPX-USDT
Error:  Unexpected result of `predict_function` (Empty batch_outputs). Please use `Model.compile(...,

In [None]:
df_buys.to_csv(data_path+"/buys-01-03-2023.csv")
df_estc.to_csv(data_path+"/15m-pred-01-03-2023.csv")
df_shorts.to_csv(data_path+"/shorts-01-03-2023.csv")
df_buys

Unnamed: 0,Close,Volume,Predicted,Product,Move,Percent,RawPercent,250Fees,5kFees,10kFees,250Profit,5kProfit,10k0Profit,prd-ohlc,pct-ohlc,prd-att1,pct-att1,prd-att2,pct-att2
90,1.535200,7911.916400,1.539550,PDEX-USDT,0.004350,0.283345,0.002833,2.000000,40.000000,50.000000,-1.291637,-25.832742,-21.665484,1.284219,-16.348438,1.515035,-1.313495,1.238720,-19.312118
90,0.072099,607720.638800,0.072342,YLD-USDT,0.000243,0.337092,0.003371,2.000000,40.000000,50.000000,-1.157270,-23.145402,-16.290803,0.073151,1.458628,0.071936,-0.226221,0.073148,1.454786
90,4.470000,18759.028700,4.480994,UNIC-USDT,0.010994,0.245960,0.002460,2.000000,40.000000,50.000000,-1.385100,-27.702000,-25.403999,4.679166,4.679329,4.390054,-1.788496,4.687158,4.858125
90,0.003285,9278081.352000,0.003305,BULL-USDT,0.000020,0.609191,0.006092,2.000000,40.000000,50.000000,-0.477022,-9.540435,10.919130,0.003518,7.099806,0.003195,-2.748030,0.003522,7.215560
90,1.324000,59787.711400,1.325339,RUNE-USDT,0.001339,0.101109,0.001011,2.000000,40.000000,50.000000,-1.747227,-34.944541,-39.889083,1.131426,-14.544899,1.302050,-1.657843,1.095923,-17.226334
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,0.079300,338239.824200,0.079436,ORC-USDT,0.000136,0.171048,0.001710,2.000000,40.000000,50.000000,-1.572380,-31.447600,-32.895200,0.081353,2.589440,0.077962,-1.687891,0.081248,2.456926
90,0.018581,3041017.327400,0.018640,REAP-USDT,0.000059,0.317462,0.003175,2.000000,40.000000,50.000000,-1.206345,-24.126906,-18.253811,0.016906,-9.016640,0.018394,-1.005444,0.016566,-10.842671
90,0.018600,14872135.876100,0.020312,STORE-USDT,0.001712,9.203250,0.092033,2.000000,40.000000,50.000000,21.008125,420.162501,870.325002,0.020837,12.028778,0.020995,12.877332,0.020709,11.336752
90,0.003220,43790945.434600,0.003312,DMTR-USDT,0.000092,2.859032,0.028590,2.000000,40.000000,50.000000,5.147580,102.951600,235.903200,0.003461,7.499126,0.003164,-1.747185,0.003434,6.655136


In [None]:
df_shorts

Unnamed: 0,Close,Volume,Predicted,Product,Move,Percent,RawPercent,250Fees,5kFees,10kFees,250Profit,5kProfit,10k0Profit,prd-ohlc,pct-ohlc,prd-att1,pct-att1,prd-att2,pct-att2
90,0.083131,11026.963200,0.082680,NKN-USDT,-0.000451,-0.542392,-0.005424,2.000000,40.000000,50.000000,-3.355979,-67.119575,-104.239150,0.080645,-2.990817,0.080473,-3.197756,0.079704,-4.122477
90,0.472000,1656.983700,0.470398,KNC-USDT,-0.001602,-0.339420,-0.003394,2.000000,40.000000,50.000000,-2.848550,-56.970996,-83.941991,0.507707,7.564961,0.445942,-5.520851,0.507300,7.478771
90,0.015120,1960665.267600,0.015076,HAI-USDT,-0.000044,-0.293442,-0.002934,2.000000,40.000000,50.000000,-2.733606,-54.672122,-79.344245,0.014470,-4.301214,0.014852,-1.773510,0.014311,-5.347611
90,0.002821,4838110.307300,0.002819,MITX-USDT,-0.000002,-0.078322,-0.000783,2.000000,40.000000,50.000000,-2.195806,-43.916122,-57.832243,0.003036,7.633789,0.002707,-4.037185,0.003043,7.866616
90,0.020890,3610779.402400,0.020867,FLAME-USDT,-0.000023,-0.112329,-0.001123,2.000000,40.000000,50.000000,-2.280823,-45.616452,-61.232904,0.022696,8.647499,0.019703,-5.683916,0.022683,8.581878
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,0.005810,662481.042200,0.005798,CERE-USDT,-0.000012,-0.207336,-0.002073,2.000000,40.000000,50.000000,-2.518341,-50.366822,-70.733643,0.004713,-18.884407,0.005736,-1.282040,0.004518,-22.240253
90,0.010970,27914250.047200,0.010345,UPO-USDT,-0.000625,-5.697518,-0.056975,2.000000,40.000000,50.000000,-16.243796,-324.875918,-619.751835,0.005976,-45.522876,0.010736,-2.133867,0.005068,-53.802631
90,0.001478,14915678.507300,0.001471,2CRZ-USDT,-0.000007,-0.446469,-0.004465,2.000000,40.000000,50.000000,-3.116171,-62.323428,-94.646856,0.001643,11.165352,0.001362,-7.817300,0.001642,11.107616
90,0.414500,176585.335200,0.413150,RNDR-USDT,-0.001350,-0.325715,-0.003257,2.000000,40.000000,50.000000,-2.814287,-56.285738,-82.571476,0.439127,5.941256,0.394879,-4.733753,0.438487,5.786987


In [None]:
df_estc

Unnamed: 0,Product,Est Close,Est Close Raw
0,NKN-USDT,0.082208,0.173335
0,KNC-USDT,0.472263,0.754244
0,HAI-USDT,0.015222,0.469092
0,MITX-USDT,0.002808,0.413632
0,PDEX-USDT,1.529278,0.405384
...,...,...,...
0,2CRZ-USDT,0.001479,0.472564
0,RNDR-USDT,0.415058,0.475127
0,DMTR-USDT,0.003226,0.391914
0,TRIBE-USDT,0.205871,0.401530


In [None]:
[df_full, df_features, npa_scaled, df_raw] = get_coin_data_frames(90, "AMPL-USDT")
df_trade = predict_trade(model_orig, row.id, bars, npa_scaled)
df_trade

In [None]:
[predicted_scaled, predicted] = fetch_and_predict_short_term(model_15m, "AMPL-USDT")
predicted