#   Q18 Machine Learning Rolling Basis

In this example we predict whether the price will rise or fall by using supervised learning (Bayesian Ridge Regression). This template represents a starting point for developing a system which can take part to the **Q18 NASDAQ-100 Stock Long-Short contest**.

It consists of two parts.

* In the **first part** we just perform a global training of the time series using all time series data. We disregard the sequential aspect of the data and use also future data to train past data.

* In the **second part** we use the built-in backtester and perform training and prediction on a rolling basis in order to avoid forward looking. Please note that we are using a **specialized** version of the Quantiacs backtester which dramatically speeds up the the backtesting process by retraining your model on a regular basis.

**Features for learning**: we will use several technical indicators trying to capture different features. You can have a look at [**Technical Indicators**](https://quantiacs.com/documentation/en/user_guide/technical_indicators.html).

Please note that:

* Your trading algorithm can open short and long positions.

* At each point in time your algorithm can trade all or a subset of the stocks which at that point of time are or were part of the NASDAQ-100 stock index. Note that the composition of this set changes in time, and Quantiacs provides you with an appropriate filter function for selecting them.

* The Sharpe ratio of your system since January 1st, 2006, has to be larger than 1.

* Your system cannot be a copy of the current examples. We run a correlation filter on the submissions and detect duplicates.

* For simplicity we will use a single asset. It pays off to use more assets, ideally uncorrelated, and diversify your positions for a more solid Sharpe ratio.

More details on the rules can be found [here](https://quantiacs.com/contest).

**Need help?** Check the [**Documentation**](https://quantiacs.com/documentation/en/) and find solutions/report problems in the [**Forum**](https://quantiacs.com/community/categories) section.

**More help with Jupyter?** Check the official [**Jupyter**](https://jupyter.org/) page.

Once you are done, click on **Submit to the contest** and take part to our competitions.

API reference:

* **data**: check how to work with [data](https://quantiacs.com/documentation/en/reference/data_load_functions.html);

* **backtesting**: read how to run the [simulation](https://quantiacs.com/documentation/en/reference/evaluation.html) and check the results.

Need to use the optimizer function to automate tedious tasks?

* **optimization**: read more on our [article](https://quantiacs.com/community/topic/29/optimizing-and-monitoring-a-trading-system-with-quantiacs).

#   Q18 Machine Learning Rolling Basis MOD

## Author: Manuel Quintana

This is a modification of the **s** 

In [1]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) { return false; }
// disable widget scrolling

<IPython.core.display.Javascript object>

In [8]:
import logging

import xarray as xr  # xarray for data manipulation

import qnt.data as qndata     # functions for loading data
import qnt.backtester as qnbt # built-in backtester
import qnt.ta as qnta         # technical analysis library
import qnt.stats as qnstats   # statistical functions

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

np.seterr(divide = "ignore")

from qnt.ta.macd import macd
from qnt.ta.rsi  import rsi
from qnt.ta.stochastic import stochastic_k, stochastic, slow_stochastic

from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.metrics import explained_variance_score
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingRegressor

In [3]:
# loading nasdaq-100 stock data

# stock_data = qndata.stocks.load_ndx_data(tail = 365 * 5), assets = ["NAS:AAPL", "NAS:AMZN"]
# stock_data = qndata.stocks.load_spx_data(tail = 365 * 5)
# assets=[
#         "SPX:AAPL", "SPX:MSFT", "SPX:AMZN", "SPX:GOOGL", "SPX:GOOG",
#         "SPX:JNJ", "SPX:V", "SPX:PG", "SPX:UNH", "SPX:XOM",
#         "SPX:BAC", "SPX:DIS", "SPX:MA", "SPX:HD",  "SPX:CVX",
#         "SPX:KO",  "SPX:PEP", "SPX:MRK", "SPX:TSLA", "SPX:WMT"
#     ]

stock_data = qndata.stocks.load_spx_data(min_date="2005-06-01")
assets = ["SPY:AAPL", "SPY:MSFT", "SPY:GOOGL", "SPY:AMZN", "SPY:FB", "SPY:BRK.B", "SPY:JNJ", "SPY:V", "SPY:PG", "SPY:JPM", "SPY:UNH", "SPY:HD", "SPY:MA", "SPY:PFE", "SPY:ABBV", "SPY:MRK", "SPY:PEP", "SPY:KO", "SPY:DIS", "SPY:XOM"]

ma_50 = stock_data.sel(field="close").rolling(time=50).mean().expand_dims('field').assign_coords(field=['ma_50'])
ma_200 = stock_data.sel(field="close").rolling(time=200).mean().expand_dims('field').assign_coords(field=['ma_200'])
volatility = stock_data.sel(field="close").rolling(time=50).std().expand_dims('field').assign_coords(field=['volatility'])

stock_data = xr.concat([stock_data, ma_50, ma_200, volatility], dim='field')

/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                              | 8192 Elapsed Time: 0:00:00
| | #                                             | 15975 Elapsed Time: 0:00:00
/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 81920 Elapsed Time: 0:00:00
/ |      #                                       | 176128 Elapsed Time: 0:00:00
- |        #                                     | 356352 Elapsed Time: 0:00:00
\ |          #                                   | 720896 Elapsed Time: 0:00:01
| |           #                                 | 1454080 Elapsed Time: 0:00:01
/ |             #                               | 2654208 Elapsed Time: 0:00:01
| |              #                      

fetched chunk 1/13 3s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 94208 Elapsed Time: 0:00:00
/ |      #                                       | 196608 Elapsed Time: 0:00:00
- |        #                                     | 413696 Elapsed Time: 0:00:00
\ |          #                                   | 835584 Elapsed Time: 0:00:01
| |            #                                | 1687552 Elapsed Time: 0:00:01
/ |             #                               | 2973696 Elapsed Time: 0:00:01
| |              #                              | 3289903 Elapsed Time: 0:00:01


fetched chunk 2/13 6s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 77824 Elapsed Time: 0:00:00
/ |      #                                       | 159744 Elapsed Time: 0:00:00
- |        #                                     | 331776 Elapsed Time: 0:00:00
\ |          #                                   | 667648 Elapsed Time: 0:00:01
| |           #                                 | 1327104 Elapsed Time: 0:00:01
/ |             #                               | 2215936 Elapsed Time: 0:00:01
- |              #                              | 2846720 Elapsed Time: 0:00:01
| |              #                              | 3201765 Elapsed Time: 0:00:01


fetched chunk 3/13 8s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 90112 Elapsed Time: 0:00:00
/ |      #                                       | 176128 Elapsed Time: 0:00:00
- |        #                                     | 356352 Elapsed Time: 0:00:00
\ |          #                                   | 712704 Elapsed Time: 0:00:01
| |            #                                | 1413120 Elapsed Time: 0:00:01
/ |             #                               | 2764800 Elapsed Time: 0:00:01
| |             #                               | 3076097 Elapsed Time: 0:00:01


fetched chunk 4/13 11s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 90112 Elapsed Time: 0:00:00
/ |      #                                       | 172032 Elapsed Time: 0:00:00
- |        #                                     | 339968 Elapsed Time: 0:00:00
\ |         #                                    | 663552 Elapsed Time: 0:00:00
| |           #                                 | 1318912 Elapsed Time: 0:00:01
/ |             #                               | 2613248 Elapsed Time: 0:00:01
| |             #                               | 3275867 Elapsed Time: 0:00:01


fetched chunk 5/13 18s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 32768 Elapsed Time: 0:00:00
| |     #                                         | 77824 Elapsed Time: 0:00:00
/ |      #                                       | 159744 Elapsed Time: 0:00:00
- |        #                                     | 323584 Elapsed Time: 0:00:00
\ |          #                                   | 651264 Elapsed Time: 0:00:01
| |            #                                | 1306624 Elapsed Time: 0:00:01
/ |             #                               | 2617344 Elapsed Time: 0:00:01
| |             #                               | 3354965 Elapsed Time: 0:00:01


fetched chunk 6/13 21s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 32768 Elapsed Time: 0:00:00
| |     #                                         | 77824 Elapsed Time: 0:00:00
/ |      #                                       | 159744 Elapsed Time: 0:00:00
- |        #                                     | 323584 Elapsed Time: 0:00:00
\ |         #                                    | 638976 Elapsed Time: 0:00:00
| |           #                                 | 1253376 Elapsed Time: 0:00:01
/ |             #                               | 2473984 Elapsed Time: 0:00:01
| |             #                               | 3377579 Elapsed Time: 0:00:01


fetched chunk 7/13 24s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 81920 Elapsed Time: 0:00:00
/ |      #                                       | 143360 Elapsed Time: 0:00:00
- |        #                                     | 299008 Elapsed Time: 0:00:00
\ |          #                                   | 598016 Elapsed Time: 0:00:01
| |           #                                 | 1196032 Elapsed Time: 0:00:01
/ |             #                               | 2396160 Elapsed Time: 0:00:01
| |             #                               | 3344787 Elapsed Time: 0:00:01


fetched chunk 8/13 26s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 81920 Elapsed Time: 0:00:00
/ |      #                                       | 167936 Elapsed Time: 0:00:00
- |        #                                     | 339968 Elapsed Time: 0:00:00
\ |          #                                   | 684032 Elapsed Time: 0:00:01
| |           #                                 | 1368064 Elapsed Time: 0:00:01
/ |             #                               | 2605056 Elapsed Time: 0:00:01
- |              #                              | 3379200 Elapsed Time: 0:00:01
| |              #                              | 3435195 Elapsed Time: 0:00:01


fetched chunk 9/13 29s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 32768 Elapsed Time: 0:00:00
| |     #                                         | 86016 Elapsed Time: 0:00:00
/ |      #                                       | 180224 Elapsed Time: 0:00:00
- |        #                                     | 364544 Elapsed Time: 0:00:00
\ |          #                                   | 716800 Elapsed Time: 0:00:01
| |           #                                 | 1396736 Elapsed Time: 0:00:01
/ |             #                               | 2756608 Elapsed Time: 0:00:01
| |             #                               | 3416147 Elapsed Time: 0:00:01


fetched chunk 10/13 32s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 86016 Elapsed Time: 0:00:00
/ |      #                                       | 180224 Elapsed Time: 0:00:00
- |        #                                     | 372736 Elapsed Time: 0:00:00
\ |         #                                    | 753664 Elapsed Time: 0:00:00
| |           #                                 | 1515520 Elapsed Time: 0:00:01
/ |             #                               | 3022848 Elapsed Time: 0:00:01
- |               #                             | 3416064 Elapsed Time: 0:00:01
| |               #                             | 3519430 Elapsed Time: 0:00:01


fetched chunk 11/13 35s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 86016 Elapsed Time: 0:00:00
/ |      #                                       | 172032 Elapsed Time: 0:00:00
- |        #                                     | 344064 Elapsed Time: 0:00:00
\ |         #                                    | 688128 Elapsed Time: 0:00:00
| |           #                                 | 1376256 Elapsed Time: 0:00:01
/ |             #                               | 2617344 Elapsed Time: 0:00:01
- |              #                              | 3002368 Elapsed Time: 0:00:01
| |              #                              | 3704499 Elapsed Time: 0:00:01


fetched chunk 12/13 38s


/ |#                                                  | 0 Elapsed Time: 0:00:00
- | #                                             | 12288 Elapsed Time: 0:00:00
\ |   #                                           | 36864 Elapsed Time: 0:00:00
| |     #                                         | 86016 Elapsed Time: 0:00:00
/ |      #                                       | 180224 Elapsed Time: 0:00:00
- |        #                                     | 372736 Elapsed Time: 0:00:00
\ |          #                                   | 753664 Elapsed Time: 0:00:01
| |            #                                | 1515520 Elapsed Time: 0:00:01
| |            #                                | 1901065 Elapsed Time: 0:00:01


fetched chunk 13/13 40s
Data loaded 41s


In [4]:
def get_features(data):
    """Builds the features used for learning:
       * a trend indicator;
       * the moving average convergence divergence;
       * a volatility measure;
       * the stochastic oscillator;
       * the relative strength index;
       * the logarithm of the closing price.
       These features can be modified and new ones can be added easily.
    """

    # trend:
    trend = qnta.roc(qnta.lwma(data.sel(field="close"), 60), 1)

    # moving average convergence  divergence (MACD):
    macd = qnta.macd(data.sel(field="close"))
    macd2_line, macd2_signal, macd2_hist = qnta.macd(data, 12, 26, 9)

    # volatility:
    volatility = qnta.tr(data.sel(field="high"), data.sel(field="low"), data.sel(field="close"))
    volatility = volatility / data.sel(field="close")
    volatility = qnta.lwma(volatility, 14)

    # the stochastic oscillator:
    k, d = qnta.stochastic(data.sel(field="high"), data.sel(field="low"), data.sel(field="close"), 14)

    # the relative strength index:
    rsi = qnta.rsi(data.sel(field="close"))

    # the logarithm of the closing price:
    price = data.sel(field="close").ffill("time").bfill("time").fillna(0) # fill NaN
    price = np.log(price)

    # combine the six features:
    result = xr.concat(
        [trend, macd2_signal.sel(field="close"), volatility,  d, rsi, price],
        pd.Index(
            ["trend",  "macd", "volatility", "stochastic_d", "rsi", "price"],
            name = "field"
        )
    )

    return result.transpose("time", "field", "asset")

In [5]:
# displaying the features:
my_features = get_features(stock_data)
display(my_features.sel(field="trend").to_pandas())

asset,NAS:AAL,NAS:AAPL,NAS:ABNB,NAS:ACGL,NAS:ADBE,NAS:ADI,NAS:ADP,NAS:ADSK,NAS:AEP,NAS:AKAM,...,NYS:WMB,NYS:WMT,NYS:WRB,NYS:WST,NYS:WY,NYS:XOM,NYS:XYL,NYS:YUM,NYS:ZBH,NYS:ZTS
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2005-06-01,,,,,,,,,,,...,,,,,,,,,,
2005-06-02,,,,,,,,,,,...,,,,,,,,,,
2005-06-03,,,,,,,,,,,...,,,,,,,,,,
2005-06-06,,,,,,,,,,,...,,,,,,,,,,
2005-06-07,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-01-02,0.504040,0.102707,-0.091084,-0.295392,-0.357759,-0.140979,-0.073524,-0.041844,-0.155881,-0.084760,...,0.094916,0.102841,-0.110515,0.127937,-0.332577,-0.268138,-0.265831,-0.036662,-0.089711,-0.289556
2025-01-03,0.476186,0.090928,0.011079,-0.286286,-0.423893,-0.076676,-0.052508,-0.038536,-0.148813,-0.144848,...,0.131496,0.124510,-0.116372,0.194103,-0.284764,-0.244298,-0.225658,-0.038645,-0.090591,-0.269385
2025-01-06,0.570837,0.109125,-0.002609,-0.299095,-0.414204,-0.033469,-0.103898,-0.058897,-0.205249,-0.146438,...,0.076827,0.141274,-0.155560,0.156683,-0.261736,-0.241530,-0.227675,-0.117370,-0.119692,-0.213400
2025-01-07,0.583490,0.067688,-0.097462,-0.235601,-0.466720,-0.061832,-0.089889,-0.080209,-0.196211,-0.170126,...,0.055177,0.111899,-0.139843,0.146731,-0.307577,-0.205887,-0.247929,-0.154948,-0.132968,-0.251964


In [6]:
def get_target_classes(data):
    """ Target classes for predicting if price goes up or down."""

    price_current = data.sel(field="close")
    price_future  = qnta.shift(price_current, -1)

    class_positive = 1 # prices goes up
    class_negative = 0 # price goes down

    target_price_up = xr.where(price_future > price_current, class_positive, class_negative)

    return target_price_up

In [7]:
# displaying the target classes:
my_targetclass = get_target_classes(stock_data)
display(my_targetclass.to_pandas())

asset,NAS:AAL,NAS:AAPL,NAS:ABNB,NAS:ACGL,NAS:ADBE,NAS:ADI,NAS:ADP,NAS:ADSK,NAS:AEP,NAS:AKAM,...,NYS:WMB,NYS:WMT,NYS:WRB,NYS:WST,NYS:WY,NYS:XOM,NYS:XYL,NYS:YUM,NYS:ZBH,NYS:ZTS
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2005-06-01,0,0,0,0,0,1,1,0,0,0,...,0,1,0,0,0,1,0,1,1,0
2005-06-02,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,1,0
2005-06-03,0,0,0,0,0,0,1,1,1,1,...,1,1,1,1,1,1,0,0,0,0
2005-06-06,0,0,0,0,0,0,0,1,0,1,...,0,1,1,1,1,0,0,1,1,0
2005-06-07,0,1,0,1,0,1,0,0,1,0,...,0,0,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-01-02,0,0,1,1,0,1,1,1,1,0,...,1,1,0,1,1,1,1,0,1,1
2025-01-03,1,1,0,0,1,1,0,0,0,0,...,0,1,0,0,1,0,0,0,0,1
2025-01-06,1,0,0,1,0,0,1,0,1,0,...,0,0,1,0,0,1,0,0,0,0
2025-01-07,0,1,0,1,0,1,1,1,1,1,...,1,1,1,1,0,0,1,0,0,1


In [9]:
# def get_model():
#     """This is a constructor for the ML model (Bayesian Ridge) which can be easily
#        modified for using different models.
#     """

#     model = linear_model.BayesianRidge()
#     return model

# def get_model():
#     """
#     Constructor for the ML model:
#     Changed from BayesianRidge to RandomForestClassifier to capture
#     non-linear relationships and interactions among features.
#     """
#     model = RandomForestClassifier(
#         n_estimators=100, 
#         max_depth=5,
#         random_state=42
#     )
#     return model

def get_model():
    """This is a constructor for the ML model (Gradient Boosting Regressor) which can be easily
       modified for using different models.
    """

    # Gradient Boosting Regressor with tuned hyperparameters
    model = GradientBoostingRegressor(
        n_estimators=200,  # Number of boosting stages to be run
        learning_rate=0.05,  # Step size shrinkage
        max_depth=5,  # Maximum depth of the individual regression estimators
        min_samples_split=10,  # Minimum samples required to split an internal node
        min_samples_leaf=4,  # Minimum samples required to be at a leaf node
        random_state=42  # Ensures reproducibility
    )
    return model


In [10]:
# Create and train the models working on an asset-by-asset basis.

asset_name_all = stock_data.coords["asset"].values

models = dict()

for asset_name in asset_name_all:

        # drop missing values:
        target_cur   = my_targetclass.sel(asset=asset_name).dropna("time", "any")
        features_cur = my_features.sel(asset=asset_name).dropna("time", "any")

        # align features and targets:
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join="inner")

        if len(features_cur.time) < 10:
            # not enough points for training
                continue

        model = get_model()

        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

        except:
            logging.exception("model training failed")

print(models)


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please p

{'NAS:AAL': GradientBoostingRegressor(learning_rate=0.05, max_depth=5, min_samples_leaf=4,
                          min_samples_split=10, n_estimators=200,
                          random_state=42), 'NAS:AAPL': GradientBoostingRegressor(learning_rate=0.05, max_depth=5, min_samples_leaf=4,
                          min_samples_split=10, n_estimators=200,
                          random_state=42), 'NAS:ABNB': GradientBoostingRegressor(learning_rate=0.05, max_depth=5, min_samples_leaf=4,
                          min_samples_split=10, n_estimators=200,
                          random_state=42), 'NAS:ACGL': GradientBoostingRegressor(learning_rate=0.05, max_depth=5, min_samples_leaf=4,
                          min_samples_split=10, n_estimators=200,
                          random_state=42), 'NAS:ADBE': GradientBoostingRegressor(learning_rate=0.05, max_depth=5, min_samples_leaf=4,
                          min_samples_split=10, n_estimators=200,
                          random_state=

In [20]:
# Showing which features are more important in predicting:

# importance = models["NAS:AAPL"].coef_
# importance

# for i,v in enumerate(importance):
#     print('Feature: %0d, Score: %.5f' % (i,v))

# plt.bar([x for x in range(len(importance))], importance)
# plt.show()

AttributeError: 'GradientBoostingRegressor' object has no attribute 'coef_'

In [12]:
# Performs prediction and generates output weights:

asset_name_all = stock_data.coords["asset"].values
weights = xr.zeros_like(stock_data.sel(field="close"))

for asset_name in asset_name_all:
    if asset_name in models:
        model = models[asset_name]
        features_all = my_features
        features_cur = features_all.sel(asset=asset_name).dropna("time", "any")
        if len(features_cur.time) < 1:
            continue
        try:
            weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = model.predict(features_cur.values)
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception("model prediction failed")

print(weights)


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please p

<xarray.DataArray 'stocks_s&p500' (time: 4935, asset: 516)> Size: 20MB
array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.56675113, 0.43866667, 0.19022555, ..., 0.500212  , 0.35741943,
        0.4166433 ],
       [0.20283849, 0.57740881, 0.32514656, ..., 0.50779734, 0.4613716 ,
        0.59984767],
       [0.3947806 , 0.43497398, 0.29644342, ..., 0.38037666, 0.4638706 ,
        0.40114506]])
Coordinates:
  * time     (time) datetime64[ns] 39kB 2005-06-01 2005-06-02 ... 2025-01-08
    field    <U5 20B 'close'
  * asset    (asset) <U9 19kB 'NAS:AAL' 'NAS:AAPL' ... 'NYS:ZBH' 'NYS:ZTS'



Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please p

In [13]:
def get_sharpe(stock_data, weights):
    """Calculates the Sharpe ratio"""
    rr = qnstats.calc_relative_return(stock_data, weights)
    sharpe = qnstats.calc_sharpe_ratio_annualized(rr).values[-1]
    return sharpe

sharpe = get_sharpe(stock_data, weights)
sharpe

2.0984750534283703

The sharpe ratio using the method above follows from **forward looking**. Predictions for (let us say) 2017 know about the relation between features and targets in 2020. Let us visualize the results:

In [14]:
import qnt.graph as qngraph

statistics = qnstats.calc_stat(stock_data, weights)

display(statistics.to_pandas().tail())

performance = statistics.to_pandas()["equity"]
qngraph.make_plot_filled(performance.index, performance, name="PnL (Equity)", type="log")

display(statistics[-1:].sel(field = ["sharpe_ratio"]).transpose().to_pandas())

# check for correlations with existing strategies:
qnstats.print_correlation(weights,stock_data)



field,equity,relative_return,volatility,underwater,max_drawdown,sharpe_ratio,mean_return,bias,instruments,avg_turnover,avg_holding_time
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-01-02,1113.03832,-0.001282,0.205876,-0.033817,-0.349578,2.094339,0.431175,1.0,516.0,0.193339,10.171819
2025-01-03,1124.940065,0.010693,0.205866,-0.023486,-0.349578,2.097719,0.431849,0.999915,516.0,0.193355,10.171152
2025-01-06,1127.031315,0.001859,0.205845,-0.02167,-0.349578,2.098085,0.43188,0.999913,516.0,0.193369,10.170055
2025-01-07,1126.316654,-0.000634,0.205825,-0.022291,-0.349578,2.09756,0.43173,0.999894,516.0,0.193402,10.168772
2025-01-08,1130.168691,0.00342,0.205804,-0.018947,-0.349578,2.098475,0.431875,1.0,516.0,0.193421,10.165363


time,2025-01-08
field,Unnamed: 1_level_1
sharpe_ratio,2.098475


NOTICE: The environment variable ENGINE_CORRELATION_URL was not specified. The default value is 'https://quantiacs.io/referee/submission/forCorrelation'
NOTICE: The environment variable STATAN_CORRELATION_URL was not specified. The default value is 'https://quantiacs.io/statan/correlation'
NOTICE: The environment variable PARTICIPANT_ID was not specified. The default value is '0'



Ok. This strategy does not correlate with other strategies.


In [19]:
"""R2 (coefficient of determination) regression score function."""
r2_score(my_targetclass, weights, multioutput="variance_weighted")

-0.14548074679908732

In [22]:
"""The explained variance score explains the dispersion of errors of a given dataset"""
explained_variance_score(my_targetclass, weights, multioutput="uniform_average")


0.2800009332044715

In [16]:
"""The explained variance score explains the dispersion of errors of a given dataset"""
mean_absolute_error(my_targetclass, weights)

0.38574215811348783

Let us now use the Quantiacs **backtester** for avoiding **forward looking**.

The backtester performs some transformations: it trains the model on one slice of data (using only data from the past) and predicts the weights for the following slice on a rolling basis:

In [17]:
def train_model(data):
    """Create and train the model working on an asset-by-asset basis."""

    asset_name_all = data.coords["asset"].values
    features_all   = get_features(data)
    target_all     = get_target_classes(data)

    models = dict()

    for asset_name in asset_name_all:

        # drop missing values:
        target_cur   = target_all.sel(asset=asset_name).dropna("time", "any")
        features_cur = features_all.sel(asset=asset_name).dropna("time", "any")

        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join="inner")

        if len(features_cur.time) < 10:
                continue

        model = get_model()

        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

        except:
            logging.exception("model training failed")

    return models

In [23]:
def predict_weights(models, data):
    """The model predicts if the price is going up or down.
       The prediction is performed for several days in order to speed up the evaluation."""

    asset_name_all = data.coords["asset"].values
    weights = xr.zeros_like(data.sel(field="close"))

    for asset_name in asset_name_all:
        if asset_name in models:
            model = models[asset_name]
            features_all = get_features(data)
            features_cur = features_all.sel(asset=asset_name).dropna("time", "any")

            if len(features_cur.time) < 1:
                continue

            try:
                weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = model.predict(features_cur.values)

            except KeyboardInterrupt as e:
                raise e

            except:
                logging.exception("model prediction failed")

    return weights

In [24]:
# Calculate weights using the backtester:
weights = qnbt.backtest_ml(
    train                         = train_model,
    predict                       = predict_weights,
    train_period                  =  2 *365,  # the data length for training in calendar days
    retrain_interval              = 10 *365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit = 1,        # how often retrain models after submission during evaluation (calendar days)
    predict_each_day              = False,    # Is it necessary to call prediction for every day during backtesting?
                                              # Set it to True if you suspect that get_features is looking forward.
    competition_type              = "stocks_nasdaq100",  # competition type
    lookback_period               = 365,                 # how many calendar days are needed by the predict function to generate the output
    start_date                    = "2005-01-01",        # backtest start date
    analyze                       = True,
    build_plots                   = True  # do you need the chart?
)

Run the last iteration...
fetched chunk 1/1 0s
Data loaded 0s



Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please p

fetched chunk 1/1 0s
Data loaded 0s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Fix liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.



Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.

NOTICE: The environment variable OUTPUT_PATH was not specified. The default value is 'fractions.nc.gz'


Write output: fractions.nc.gz


NOTICE: The environment variable OUT_STATE_PATH was not specified. The default value is 'state.out.pickle.gz'


State saved.
---
Run First Iteration...
fetched chunk 1/1 0s
Data loaded 0s



Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please p

---
Run all iterations...
Load data...
fetched chunk 1/8 0s
fetched chunk 2/8 0s
fetched chunk 3/8 0s



Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.



fetched chunk 4/8 0s
fetched chunk 5/8 0s
fetched chunk 6/8 0s
fetched chunk 7/8 0s
fetched chunk 8/8 0s
Data loaded 1s
fetched chunk 1/7 0s
fetched chunk 2/7 0s
fetched chunk 3/7 0s
fetched chunk 4/7 0s
fetched chunk 5/7 0s
fetched chunk 6/7 0s
fetched chunk 7/7 0s
Data loaded 0s
Backtest...



Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please pass them as keyword arguments.


Passing 'how' as positional argument(s) to dropna was deprecated in version v2023.10.0 and will raise an error two releases later. Please p

fetched chunk 1/7 0s
fetched chunk 2/7 0s
fetched chunk 3/7 0s
fetched chunk 4/7 0s
fetched chunk 5/7 0s
fetched chunk 6/7 0s
fetched chunk 7/7 0s
Data loaded 0s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Fix liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.


NOTICE: The environment variable OUTPUT_PATH was not specified. The default value is 'fractions.nc.gz'


Write output: fractions.nc.gz


NOTICE: The environment variable OUT_STATE_PATH was not specified. The default value is 'state.out.pickle.gz'


State saved.
---
Analyze results...
Check...
Check liquidity...
Ok.
Check missed dates...
Ok.
Check the sharpe ratio...
Period: 2006-01-01 - 2025-01-08
Sharpe Ratio = 0.24847085437367886


ERROR! The Sharpe Ratio is too low. 0.24847085437367886 < 0.7
Improve the strategy and make sure that the in-sample Sharpe Ratio more than 0.7.


---
Align...
Calc global stats...
---
Calc stats per asset...
Build plots...
---
Select the asset (or leave blank to display the overall stats):


interactive(children=(Combobox(value='', description='asset', options=('', 'NAS:AAL', 'NAS:AAPL', 'NAS:ABNB', …

100% (5038 of 5038) |####################| Elapsed Time: 0:04:04 Time:  0:04:04


The Sharpe ratio is obviously smaller as the training process is not looking forward (as it happens by processing data on a global basis), but performed on a rolling basis.

# May I import libraries?

Yes, please refer to the file **init.ipynb** in your home directory. You can for example use:

! conda install -y scikit-learn

# How to load data?

Daily stock data for the **Q18 Nasdaq-100** contest can be loaded using:
```python
data = qndata.stocks.load_ndx_data(tail = 17*365, dims = ("time", "field", "asset"))
```

Cryptocurrency daily data used for the Q16/Q17 contests can be loaded using:
```python
data = qndata.cryptodaily.load_data(tail = 17*365, dims = ("time", "field", "asset"))
```

Futures data for the Q15 contest can be loaded using:
```python
data= qndata.futures.load_data(tail = 17*365, dims = ("time", "field", "asset"))
```

BTC Futures data for the Q15 contest can be loaded using:
```python
data= qndata.cryptofutures.load_data(tail = 17*365, dims = ("time", "field", "asset"))
```

# How to view a list of all tickers?

```python
data.asset.to_pandas().to_list()
```

# How to see which fields are available?

```python
data.field.to_pandas().to_list()
```

# How to load specific tickers?

```python
data = qndata.stocks.load_ndx_data(tail=17 * 365, assets=["NAS:AAPL", "NAS:AMZN"])
```

# How to select specific tickers after loading all data?

```python
def get_data_filter(data, assets):
    filler= data.sel(asset=assets)
    return filler

get_data_filter(data, ["NAS:AAPL", "NAS:AMZN"])
```

# How to get the prices for the previous day?

```python
qnta.shift(data.sel(field="open"), periods=1)
```

or:

```python
data.sel(field="open").shift(time=1)
```

# How to get the Sharpe ratio?

```python
import qnt.stats as qnstats

def get_sharpe(market_data, weights):
    rr = qnstats.calc_relative_return(market_data, weights)
    sharpe = qnstats.calc_sharpe_ratio_annualized(rr).values[-1]
    return sharpe

sharpe = get_sharpe(data, weights) # weights.sel(time=slice("2006-01-01",None))
```

# How do I get a list of the top 3 assets ranked by Sharpe ratio?

```python
import qnt.stats as qnstats

data = qndata.stocks.load_ndx_data(tail = 17*365, dims = ("time", "field", "asset"))

def get_best_instruments(data, weights, top_size):
    # compute statistics:
    stats_per_asset = qnstats.calc_stat(data, weights, per_asset=True)
    # calculate ranks of assets by "sharpe_ratio":
    ranks = (-stats_per_asset.sel(field="sharpe_ratio")).rank("asset")
    # select top assets by rank "top_period" days ago:
    top_period = 1
    rank = ranks.isel(time=-top_period)
    top = rank.where(rank <= top_size).dropna("asset").asset

    # select top stats:
    top_stats = stats_per_asset.sel(asset=top.values)

    # print results:
    print("SR tail of the top assets:")
    display(top_stats.sel(field="sharpe_ratio").to_pandas().tail())

    print("avg SR = ", top_stats[-top_period:].sel(field="sharpe_ratio").mean("asset")[-1].item())
    display(top_stats)
    return top_stats.coords["asset"].values

get_best_instruments(data, weights, 3)
```

# How can I check the results for only the top 3 assets ranked by Sharpe ratio?

Select the top assets and then load their data:

```python
best_assets= get_best_instruments(data, weights, 3)

data= qndata.stocks.load_ndx_data(tail = 17*365, assets=best_assets)
```

# How can prices be processed?

Simply import standard libraries, for example **numpy**:

```python
import numpy as np

high= np.log(data.sel(field="high"))
```

# How can you reduce slippage impace when trading?

Just apply some technique to reduce turnover:

```python
def get_lower_slippage(weights, rolling_time=6):
    return weights.rolling({"time": rolling_time}).max()

improved_weights = get_lower_slippage(weights, rolling_time=6)
```

# How to use technical analysis indicators?

For available indicators see the source code of the library: /qnt/ta

## ATR

```python
def get_atr(data, days=14):
    high = data.sel(field="high") * 1.0
    low  = data.sel(field="low") * 1.0
    close= data.sel(field="close") * 1.0

    return qnta.atr(high, low, close, days)

atr= get_atr(data, days=14)
```

## EMA

```python
prices= data.sel(field="high")
prices_ema= qnta.ema(prices, 15)
```

## TRIX

```python
prices= data.sel(field="high")
prices_trix= qnta.trix(prices, 15)
```

## ADL and EMA

```python
adl= qnta.ad_line(data.sel(field="close")) * 1.0
adl_ema= qnta.ema(adl, 18)
```

# How can you check the quality of your strategy?

```python
import qnt.output as qnout
qnout.check(weights, data, "stocks_nasdaq100")
```

or

```python
stat= qnstats.calc_stat(data, weights)
display(stat.to_pandas().tail())
```

or

```python
import qnt.graph   as qngraph
statistics= qnstats.calc_stat(data, weights)
display(statistics.to_pandas().tail())

performance= statistics.to_pandas()["equity"]
qngraph.make_plot_filled(performance.index, performance, name="PnL (Equity)", type="log")

display(statistics[-1:].sel(field = ["sharpe_ratio"]).transpose().to_pandas())
qnstats.print_correlation(weights, data)

```

# An example using pandas

One can work with pandas DataFrames at intermediate steps and at the end convert them to xarray data structures:

```python
def get_price_pct_change(prices):
    prices_pandas = prices.to_pandas()
    assets = data.coords["asset"].values
    for asset in assets:
        prices_pandas[asset] = prices_pandas[asset].pct_change()
    return prices_pandas

prices = data.sel(field="close") * 1.0
prices_pct_change = get_price_pct_change(prices).unstack().to_xarray()
```

# How to submit a strategy to the competition?

Check that weights are fine:

```python
import qnt.output as qnout
qnout.check(weights, data, "stocks_nasdaq100")
```

If everything is ok, write the weights to file:

```python
qnout.write(weights)
```

In your **personal account**:

* **choose** a strategy;
* click on the **Submit** button;
* select the type of competition.

At the beginning you will find the strategy under the **Checking** area:

* **Sent strategies** > **Checking**.

If technical checks are successful, the strategy will go under the **Candidates** area:

* **Sent strategies** > **Candidates**.

Otherwise it will be **Filtered**:

* **Sent strategies** > **Filtered**

and you should inspect error and warning messages.

Note that a strategy under the **Candidates** area should have a Sharpe ratio larger than 1 for being eligible for a prize. Please check warning messages in your **Candidates** area!