## Robust Trading W/ DTW, SVM, and KNN 
###### https://fenix.tecnico.ulisboa.pt/downloadFile/281870113705822/89549-tiago-barroso-resumo.pdf

### Goal of this system: 
- Differentitaion of market types
- Price forecasting of a stock 
- Adaptation of the investing strategy according to each market and price forecast. 
- ** NOTE results from the linked study use data from a two year range. (Test if this is reliable over longer time frame)

### Classification of Market Condition
- SVM: This classification algorithm classifies market into Bull, Bear, or Sideways


### Determining Which Sequences will produce certain market conditions: 
(Time series must be mutually exclusive)
- Best Bullish: Varitation Increase 10% followed by an increase of 5% in a subsequent prediction time period
- Bullish: Variation of at least 5% during a certain time frame followed by 3% in subsequent prediction period. 
- Sideways: Var between -5% and 5% followed by -3% and 3%
- Bearish: Variation less than -5% followed by a decrease of any percentage in subsequent prediction period. 

### Training and Testing Sequences: 
- System goes through training set classifying sequences that meet the criteria above. 
- This data is then used to train a classifier, which will be used to classify the test set's time series. 
- Choose a classifier: KNN, SVM, ... 
- Classify the found sequences from the training data, to train the classifier. 

### Use of KNN w/ DTW
- Essentially uses DTW as the distance measure to find the nearest N neighbors
- ** Use of K as a dynamic parameter for this method/strategy
- Weight of neighbors: Uniform weight, or distnace based weight (inverse distance = weight) ** could be a parameter
- The study this is based on shows that the inverse weights don't add value

### Investment and Evaluation: 
2 Different Methods: 
1. Invest with only long positions, in bull markets, cash in sideways and bear
2. Invest with puts in time series of stocks identified as bear sequences and ignore sideways and bull. 
- We don't ever invest in sideways markets, they move horizontally by definition. 
#### Size of Sequences: 
- Range from 10 to 150. S seems to have better returns in longer sequences according to the study

___

### My System Design: 
- Data: From 2000 to 2023
- Train/Test splits: Train = 2000 - 2016, Test = 2017 - 2023 (6 year testing period)
- Parameters: Sequence Range, Inverse Distance (T or F), K neighbors (int)
- Index Funds: SPY and QQQ


In [120]:
import pandas as pd 
import yfinance as yf 
import numpy as np
import numba
import numpy.typing as npt
from scipy import stats

In [2]:
@numba.njit #wrap in just in time compiler
def DTW(
        ts_a: npt.NDArray[np.float64], #time series A (known result)
        ts_b: npt.NDArray[np.float64], #time series B(what we are trying to learn)
) -> float:
    a_len = len(ts_a)
    b_len = len(ts_b)

    #Init pointwise distance matrix using np arrays
    pointwise_distance = np.zeros((a_len, b_len), dtype = np.float64)
    for x in range(0, a_len):
        for y in range(0, b_len):
            pointwise_distance[x, y] = abs(ts_a[x] - ts_b[y])

    cumu_dist = np.full((a_len + 1, b_len + 1), np.inf)
    cumu_dist[0, 0] = 0 

    for ax in range(1, a_len + 1):
        for by in range(1, b_len + 1):
            cost = pointwise_distance[ax - 1, by - 1]
            cumu_dist[ax, by] = cost + min(
                cumu_dist[ax - 1, by],     # Insertion
                cumu_dist[ax, by - 1],     # Deletion
                cumu_dist[ax - 1, by - 1]  # Match
            )
    dtw_distance = cumu_dist[a_len, b_len]
    return dtw_distance

Pre-Training Method

In [298]:
def train(data, train_split, seq_amount): #bull represented by 2, sideways is 1, bear is 0
    n_rows = data.shape[0]
    idx_price = 0
    idx_long = -2
    train_rows = int(n_rows * train_split)
    train_rows_remainder = train_rows % seq_amount
    class_rors = np.full(train_rows, 0.00, np.float64) #storing the RORs for comparison
    cur_variation = 0 
    counter = 0
    class_array = []
    for i_day in range(1, train_rows): 
        if cur_variation == 0: 
            cur_variation = data[i_day - 1, idx_price]
            counter += 1
        elif counter == seq_amount - 1:
            cur_variation = data[i_day - 1, idx_price] - 1 / cur_variation - 1 #total movement during the period
            sub_variation = data[i_day + seq_amount, idx_price] / data[i_day, idx_price] - 1 #total movement in the next period
            if cur_variation >= .1 and sub_variation >= .05: #Bullish Classification
                class_rors[i_day - counter : i_day] = data[i_day - counter: i_day, idx_long]
                class_array.append(2)
            elif (cur_variation < .05 and cur_variation > -.05) and (sub_variation < 0.02 and sub_variation > -0.02):
                class_rors[i_day - counter : i_day] = data[i_day - counter: i_day, idx_long]
                class_array.append(1)
            else:
                class_rors[i_day - counter : i_day] = data[i_day - counter: i_day, idx_long]
                class_array.append(0)
            counter = 0 
        else: 
            counter += 1
    class_rors = class_rors[:-train_rows_remainder].reshape((train_rows - train_rows_remainder) // seq_amount, seq_amount)
    class_array = np.array(class_array)
    return (class_rors, class_array)
        

In [68]:
data = pd.read_excel('/root/trading/data/data_spy.xlsx', sheet_name = 'DATA')
data_np = data.iloc[1:, 1:].to_numpy()

In [295]:
#Pre-classify data in excel
def engine(data: npt.NDArray[np.float64],
            seq_amount: int,
            neighbors: int)-> npt.NDArray[np.int8]:
    """Strategy 1: Invest only with longs in time series of stocks
        identified as bull sequences and ignore sideways and bear
        sequences;

        Conditions for training: 
        (1) Best Bullish: Any stock series which has a variation of at
            least 10% during a certain time frame, followed by an increase
            of 5% in a subsequent prediction time period
        (3) Sideways: Any stock series which has a variation of between
            -5% and +5% during a certain time frame, followed by an
            variation between -3% and -3% in a subsequent prediction
            time period.
        (4) Bearish: Any stock series which has a variation inferior to
            -5% during a certain time frame, followed by a decrease of
            any percentage in a subsequent prediction time period.

    Args:
        data (_type_): data excluding the dates (type float64)
        seq_amount (_type_): Amount of time series sequences gathered from the training data
        neighbors (_type_): Number of neighbors for KNN 

    Returns:
        npt.NDArray[np.int8]: Returns an array of weights. 1 representing long days, 0 representing cash days. 
    """
    idx_price = 0
    n_rows = data.shape[0]
    weight = np.full(n_rows, -100, dtype = np.int8)
    training = train(data, seq_amount= seq_amount, train_split=0.4) #use 80: 20 train:test split
    testing_rows = training[0].shape[0] * training[0].shape[1]
    min_value_array = np.full((neighbors,2), np.inf, dtype= np.float64)
    counter_ror = 0
    classification = 0
    for i_day in range(testing_rows, n_rows):
        if counter_ror >= seq_amount:
            for x in range(len(training[0])):
                dtw_distance = DTW(training[0][x], 
                                   data[i_day - seq_amount: i_day, idx_price]) #compute the distances between training data and test set
                if dtw_distance < np.max(min_value_array[:,0]):
                    max_index = np.argmax(min_value_array[:,0])
                    min_value_array[max_index,0] = dtw_distance
                    min_value_array[max_index, 1] = training[1][x] #classification
            classification = int(stats.mode(min_value_array[:,1])[0])
            counter_ror = 0 #reset counter
            min_value_array = np.full((neighbors,2), np.inf, dtype= np.float64)
        if classification == 2:
            weight[i_day] = 1
        else:
            weight[i_day] = 0
        counter_ror += 1
    
    return weight[testing_rows:]





    

In [296]:
weight = engine(data_np, 150, 3)


In [297]:
weight_df = pd.DataFrame(weight)
weight_df.to_clipboard()