# The environment

The next three code cells contains commands that will configure your Coogle Colab environment.

When you transfer the strategy to Quantiacs, remove these cells.

They are not relevant for the Quantiacs platform.

At first, setup the toolbox from github using pip:

In [None]:
###DEBUG###

! pip install git+https://github.com/quantiacs/toolbox.git 2>/dev/null

# decrease height
from IPython.display import Javascript
display(Javascript('google.colab.output.setIframeHeight(0, true, {maxHeight: 100})'))

Collecting git+https://github.com/quantiacs/toolbox.git
  Cloning https://github.com/quantiacs/toolbox.git to /tmp/pip-req-build-45370t3q
  Resolved https://github.com/quantiacs/toolbox.git to commit 272e66e017d3eb6d40517ffa39cd6a92dc5072d8
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting scipy>=1.14.0 (from qnt==0.0.407)
  Downloading scipy-1.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
Collecting xarray==2024.6.0 (from qnt==0.0.407)
  Downloading xarray-2024.6.0-py3-none-any.whl.metadata (11 kB)
Collecting progressbar2<4,>=3.55 (from qnt==0.0.407)
  Downloading progressbar2-3.55.0-py2.py3-none-any.whl.metadata (11 kB)
Collecting cftime==1.6.4 (from qnt==0.0.407)
  Downloading cftime-1.6.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.7 kB)
Collecting plotly==5.22.0 (from qnt==0.0.407)
  Down

<IPython.core.display.Javascript object>

Then install TA-Lib (indicators library) if you need it.

Instead of TA-Lib you can use qnt.ta or another library. In this case, skip the next cell.

Note that the installation can take several minutes.

In [None]:
"""
###DEBUG###
!wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz
!tar -xzvf ta-lib-0.4.0-src.tar.gz
%cd ta-lib
!./configure --prefix=/usr
! make
!make install
!pip install Ta-Lib

# test import
import talib

# decrease height
from IPython.display import Javascript
display(Javascript('google.colab.output.setIframeHeight(0, true, {maxHeight: 100})'))
"""

"\n###DEBUG###\n!wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz\n!tar -xzvf ta-lib-0.4.0-src.tar.gz\n%cd ta-lib\n!./configure --prefix=/usr\n! make\n!make install\n!pip install Ta-Lib\n\n# test import\nimport talib\n\n# decrease height\nfrom IPython.display import Javascript\ndisplay(Javascript('google.colab.output.setIframeHeight(0, true, {maxHeight: 100})'))\n"

Finally specify the api key and disable interactive charts.

You can find the api key in [your profile](https://quantiacs.com/personalpage/homepage).

We disable interactive charts in the library because interact+plotly do not work correctly in Google Colab.

In [None]:
###DEBUG###
import os

os.environ['API_KEY'] = "aa39740a-02b3-4dbb-a6b5-b871631d4ccc"
os.environ['NONINTERACT'] = 'True'

# The strategy

The next cell contains the strategy code itself.

In [None]:
%%javascript
window.IPython && (IPython.OutputArea.prototype._should_scroll = function(lines) { return false; })
// run this cell for disabling widget scrolling

<IPython.core.display.Javascript object>

In [None]:
import logging

import xarray as xr  # xarray for data manipulation

import qnt.data as qndata     # functions for loading data
import qnt.backtester as qnbt # built-in backtester
import qnt.ta as qnta         # technical analysis library
import qnt.stats as qnstats   # statistical functions

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

np.seterr(divide = "ignore")

from qnt.ta.macd import macd
from qnt.ta.rsi  import rsi
from qnt.ta.stochastic import stochastic_k, stochastic, slow_stochastic

from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.metrics import explained_variance_score
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeClassifier

NOTICE: The environment variable DATA_BASE_URL was not specified. The default value is 'https://data-api.quantiacs.io/'
NOTICE: The environment variable CACHE_RETENTION was not specified. The default value is '7'
NOTICE: The environment variable CACHE_DIR was not specified. The default value is 'data-cache'


In [None]:
#stock_data = qndata.stocks.load_spx_data(tail = 365 * 5, assets = ['NAS:AAL','NAS:AAPL','NAS:EA','NAS:CEPH','NAS:AKAM','NAS:DXCM','NAS:CA','NAS:ALTR~1','NAS:TLAB','NAS:FANG','NAS:GEN','NAS:BMC','NAS:SNPS','NAS:SBAC','NAS:TXN','NAS:PTC','NAS:BKR','NAS:EXC','NAS:ALGN','NAS:LKQ','NAS:ENPH','NAS:CCEP','NAS:ALTR','NAS:FOSL','NAS:HST'])
stock_data = qndata.stocks.load_spx_data(tail = 365 * 5, assets = ['NAS:AAL','NAS:AAPL'])

| |#                                              | 15975 Elapsed Time: 0:00:00
| |#                                              | 15975 Elapsed Time: 0:00:00
| | #                                             | 45926 Elapsed Time: 0:00:00


fetched chunk 1/1 2s
Data loaded 3s


In [None]:

def get_features(data):
    """Builds the features used for learning:
       * a trend indicator;
       * the moving average convergence divergence;
       * a volatility measure;
       * the stochastic oscillator;
       * the relative strength index;
       * the logarithm of the closing price.
       These features can be modified and new ones can be added easily.
    """

    # trend:
    trend = qnta.roc(qnta.lwma(data.sel(field="close"), 60), 1)

    # moving average convergence  divergence (MACD):
    macd = qnta.macd(data.sel(field="close"))
    macd2_line, macd2_signal, macd2_hist = qnta.macd(data, 12, 26, 9)

    # volatility:
    volatility = qnta.tr(data.sel(field="high"), data.sel(field="low"), data.sel(field="close"))
    volatility = volatility / data.sel(field="close")
    volatility = qnta.lwma(volatility, 14)

    # the stochastic oscillator:
    k, d = qnta.stochastic(data.sel(field="high"), data.sel(field="low"), data.sel(field="close"), 14)

    # the relative strength index:
    rsi = qnta.rsi(data.sel(field="close"))

    # the logarithm of the closing price:
    price = data.sel(field="close").ffill("time").bfill("time").fillna(0) # fill NaN
    price = np.log(price)

    # combine the six features:
    result = xr.concat(
        [trend, macd2_signal.sel(field="close"), volatility,  d, rsi, price],
        pd.Index(
            ["trend",  "macd", "volatility", "stochastic_d", "rsi", "price"],
            name = "field"
        )
    )

    return result.transpose("time", "field", "asset")

In [None]:
my_features = get_features(stock_data)
display(my_features.sel(field="trend").to_pandas())

asset,NAS:AAL,NAS:AAPL
time,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-22,,
2020-01-23,,
2020-01-24,,
2020-01-27,,
2020-01-28,,
...,...,...
2025-01-10,0.678094,-0.011361
2025-01-13,0.493014,-0.045787
2025-01-14,0.595151,-0.061104
2025-01-15,0.545674,0.000734


In [None]:
def get_target_classes(data):
    """Clases objetivo para predecir si el precio sube o baja."""

    price_current = data.sel(field="close")
    price_future = qnta.shift(price_current, -1)

    class_positive = 1
    class_negative = 0

    target_price_up = xr.where(
        price_future > price_current, class_positive, class_negative
    )

    return target_price_up

In [None]:
# displaying the target classes:
my_targetclass = get_target_classes(stock_data)
display(my_targetclass.to_pandas())

asset,NAS:AAL,NAS:AAPL
time,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-22,1,1
2020-01-23,0,0
2020-01-24,0,0
2020-01-27,1,1
2020-01-28,0,1
...,...,...
2025-01-10,0,0
2025-01-13,1,0
2025-01-14,0,1
2025-01-15,1,0


In [None]:
def get_model():
    """Modelo de árbol de decisión."""
    return DecisionTreeClassifier(max_depth=5)  # Limitar la profundidad para mayor velocidad

In [None]:
# Create and train the models working on an asset-by-asset basis.

asset_name_all = stock_data.coords["asset"].values

models = dict()

for asset_name in asset_name_all:
    target_cur = my_targetclass.sel(asset=asset_name).dropna(dim="time", how="any")
    features_cur = my_features.sel(asset=asset_name).dropna(dim="time", how="any")

    target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join="inner")

    if len(features_cur.time) < 10:
        continue  # Not enough data for training

    model = get_model()

    try:
        model.fit(feature_for_learn_df.values, target_for_learn_df)
        models[asset_name] = model
    except ValueError as e:
        logging.exception(f"ValueError occurred while training model for {asset_name}: {e}")
    except Exception as e:
        logging.exception(f"Error occurred while training model for {asset_name}: {e}")

print(models)

{'NAS:AAL': DecisionTreeClassifier(max_depth=5), 'NAS:AAPL': DecisionTreeClassifier(max_depth=5)}


In [None]:
# Performs prediction and generates output weights:

asset_name_all = stock_data.coords["asset"].values
weights = xr.zeros_like(stock_data.sel(field="close"))

for asset_name in asset_name_all:
    if asset_name in models:
        model = models[asset_name]
        features_all = my_features
        features_cur = features_all.sel(asset=asset_name).dropna(dim="time", how="any")
        if len(features_cur.time) < 1:
            continue
        try:
            weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = model.predict(features_cur.values)
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception("model prediction failed")

print(weights)

<xarray.DataArray 'stocks_s&p500' (time: 1255, asset: 2)> Size: 20kB
array([[0., 0.],
       [0., 0.],
       [0., 0.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]])
Coordinates:
  * time     (time) datetime64[ns] 10kB 2020-01-22 2020-01-23 ... 2025-01-16
    field    <U5 20B 'close'
  * asset    (asset) <U8 64B 'NAS:AAL' 'NAS:AAPL'


In [None]:
def get_sharpe(stock_data, weights):
    """Calculates the Sharpe ratio"""
    rr = qnstats.calc_relative_return(stock_data, weights)
    sharpe = qnstats.calc_sharpe_ratio_annualized(rr).values[-1]
    return sharpe

sharpe = get_sharpe(stock_data, weights)
sharpe

1.3704849326292572

In [None]:
def train_model(data):
    """Entrenar modelos por activo."""

    asset_name_all = data.coords["asset"].values
    features_all = get_features(data)
    target_all = get_target_classes(data)

    models = dict()

    for asset_name in asset_name_all:
        target_cur = target_all.sel(asset=asset_name).dropna(dim="time", how="any")
        features_cur = features_all.sel(asset=asset_name).dropna(dim="time", how="any")

        target_for_learn_df, feature_for_learn_df = xr.align(
            target_cur, features_cur, join="inner"
        )

        if len(features_cur.time) < 10:
            continue

        model = get_model()

        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

        except:
            logging.exception("model training failed")

    return models

In [None]:
def predict_weights(models, data):
    """Predicción de pesos usando los modelos."""

    asset_name_all = data.coords["asset"].values
    weights = xr.zeros_like(data.sel(field="close"))

    for asset_name in asset_name_all:
        if asset_name in models:
            model = models[asset_name]
            features_all = get_features(data)
            features_cur = features_all.sel(asset=asset_name).dropna(
                dim="time", how="any"
            )

            if len(features_cur.time) < 1:
                continue

            try:
                weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = (
                    model.predict(features_cur.values)
                )

            except KeyboardInterrupt as e:
                raise e

            except:
                logging.exception("model prediction failed")

    return weights

In [None]:
# Calculate weights using the backtester:
weights = qnbt.backtest_ml(
    train                         = train_model,
    predict                       = predict_weights,
    train_period                  =  2 *365,  # the data length for training in calendar days
    retrain_interval              = 10 *365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit = 1,        # how often retrain models after submission during evaluation (calendar days)
    predict_each_day              = False,    # Is it necessary to call prediction for every day during backtesting?
                                              # Set it to True if you suspect that get_features is looking forward.
    competition_type              = "stocks_s&p500",  # competition type
    lookback_period               = 365,                 # how many calendar days are needed by the predict function to generate the output
    start_date                    = "2013-01-01",        # backtest start date
    analyze                       = True,
    build_plots                   = True  # do you need the chart?
)

Run the last iteration...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3756731 Elapsed Time: 0:00:01


fetched chunk 1/2 10s


| |      #                                       | 985508 Elapsed Time: 0:00:00


fetched chunk 2/2 13s
Data loaded 13s


| |    #                                         | 361106 Elapsed Time: 0:00:00


fetched chunk 1/1 8s
Data loaded 8s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Fix liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.


NOTICE: The environment variable OUTPUT_PATH was not specified. The default value is 'fractions.nc.gz'


Write output: fractions.nc.gz


NOTICE: The environment variable OUT_STATE_PATH was not specified. The default value is 'state.out.pickle.gz'


State saved.
---
Run First Iteration...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3459957 Elapsed Time: 0:00:01


fetched chunk 1/2 8s


| |      #                                       | 941340 Elapsed Time: 0:00:00


fetched chunk 2/2 11s
Data loaded 11s
---
Run all iterations...
Load data...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3407724 Elapsed Time: 0:00:01


fetched chunk 1/10 3s


| |             #                               | 3629302 Elapsed Time: 0:00:01


fetched chunk 2/10 6s


| |          #                                  | 3352741 Elapsed Time: 0:00:01


fetched chunk 3/10 8s


| |          #                                  | 3439543 Elapsed Time: 0:00:01


fetched chunk 4/10 11s


| |          #                                  | 3494872 Elapsed Time: 0:00:01


fetched chunk 5/10 14s


| |         #                                   | 3443936 Elapsed Time: 0:00:00


fetched chunk 6/10 17s


| |          #                                  | 3502242 Elapsed Time: 0:00:01


fetched chunk 7/10 19s


| |          #                                  | 3535956 Elapsed Time: 0:00:01


fetched chunk 8/10 22s


| |          #                                  | 3768748 Elapsed Time: 0:00:01


fetched chunk 9/10 25s


| |       #                                     | 1953856 Elapsed Time: 0:00:00


fetched chunk 10/10 27s
Data loaded 28s


| |#                                              | 15975 Elapsed Time: 0:00:00
| |         #                                   | 3665305 Elapsed Time: 0:00:00


fetched chunk 1/8 3s


| |            #                                | 3555376 Elapsed Time: 0:00:01


fetched chunk 2/8 7s


| |          #                                  | 3634486 Elapsed Time: 0:00:01


fetched chunk 3/8 10s


| |           #                                 | 3568344 Elapsed Time: 0:00:01


fetched chunk 4/8 13s


| |           #                                 | 3567527 Elapsed Time: 0:00:01


fetched chunk 5/8 16s


| |           #                                 | 3574821 Elapsed Time: 0:00:01


fetched chunk 6/8 19s


| |            #                                | 3799726 Elapsed Time: 0:00:01


fetched chunk 7/8 23s


| |        #                                    | 2158354 Elapsed Time: 0:00:00


fetched chunk 8/8 25s
Data loaded 25s
Backtest...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |              #                              | 3654448 Elapsed Time: 0:00:01


fetched chunk 1/8 3s


| |           #                                 | 3612514 Elapsed Time: 0:00:01


fetched chunk 2/8 6s


| |         #                                   | 3565471 Elapsed Time: 0:00:00


fetched chunk 3/8 9s


| |         #                                   | 3563299 Elapsed Time: 0:00:00


fetched chunk 4/8 12s


| |            #                                | 3554077 Elapsed Time: 0:00:01


fetched chunk 5/8 16s


| |            #                                | 3540787 Elapsed Time: 0:00:01


fetched chunk 6/8 19s


| |           #                                 | 3760424 Elapsed Time: 0:00:01


fetched chunk 7/8 22s


| |        #                                    | 2612043 Elapsed Time: 0:00:00


fetched chunk 8/8 24s
Data loaded 26s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Fix liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.


NOTICE: The environment variable OUTPUT_PATH was not specified. The default value is 'fractions.nc.gz'


Write output: fractions.nc.gz


NOTICE: The environment variable OUT_STATE_PATH was not specified. The default value is 'state.out.pickle.gz'


State saved.
---
Analyze results...
Check...
Check liquidity...
Ok.
Check missed dates...
Ok.
Check the sharpe ratio...


The first point(2012-11-05) should be earlier than 2006-01-01
Load data more historical data.
The output series should start from 2006-01-01 or earlier instead of 2013-01-02


Period: 2012-11-05 - 2025-01-16
Sharpe Ratio = 0.4605833153901675


ERROR! The Sharpe Ratio is too low. 0.4605833153901675 < 0.7
Improve the strategy and make sure that the in-sample Sharpe Ratio more than 0.7.


---
Align...
Calc global stats...
---
Calc stats per asset...
Build plots...
---
Output:


asset,NAS:AAL,NAS:AAPL,NAS:ABNB,NAS:ACGL,NAS:ADBE,NAS:ADI,NAS:ADP,NAS:ADSK,NAS:AEP,NAS:AKAM
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2025-01-02,0.0,0.0,0.004292,0.004292,0.004292,0.004292,0.0,0.004292,0.0,0.0
2025-01-03,0.0,0.0,0.004464,0.004464,0.004464,0.004464,0.0,0.004464,0.0,0.0
2025-01-06,0.0,0.0,0.004444,0.004444,0.004444,0.004444,0.0,0.004444,0.0,0.0
2025-01-07,0.0,0.0,0.004444,0.004444,0.004444,0.004444,0.0,0.004444,0.0,0.0
2025-01-08,0.0,0.0,0.004464,0.004464,0.004464,0.004464,0.0,0.004464,0.004464,0.0
2025-01-10,0.0,0.0,0.004386,0.004386,0.004386,0.004386,0.0,0.004386,0.004386,0.0
2025-01-13,0.0,0.0,0.004444,0.004444,0.004444,0.004444,0.0,0.004444,0.004444,0.0
2025-01-14,0.0,0.0,0.004444,0.004444,0.004444,0.004444,0.0,0.004444,0.004444,0.0
2025-01-15,0.0,0.0,0.004608,0.004608,0.004608,0.004608,0.0,0.004608,0.004608,0.0
2025-01-16,0.0,0.0,0.004566,0.0,0.004566,0.004566,0.0,0.004566,0.004566,0.0


Stats:


field,equity,relative_return,volatility,underwater,max_drawdown,sharpe_ratio,mean_return,bias,instruments,avg_turnover,avg_holding_time
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-01-02,2.579669,-0.004671,0.180338,-0.080959,-0.412032,0.456136,0.082259,1.0,506.0,0.304106,6.559345
2025-01-03,2.602657,0.008911,0.180325,-0.07277,-0.412032,0.460453,0.083031,1.0,506.0,0.304087,6.559707
2025-01-06,2.598502,-0.001597,0.180296,-0.07425,-0.412032,0.459569,0.082858,1.0,506.0,0.30407,6.55938
2025-01-07,2.588845,-0.003716,0.18027,-0.07769,-0.412032,0.457613,0.082494,1.0,506.0,0.304062,6.559501
2025-01-08,2.589925,0.000417,0.18024,-0.077306,-0.412032,0.45774,0.082503,1.0,506.0,0.304033,6.5595
2025-01-10,2.547963,-0.016202,0.180274,-0.092255,-0.412032,0.449335,0.081003,1.0,506.0,0.304006,6.559254
2025-01-13,2.571209,0.009124,0.180262,-0.083973,-0.412032,0.453746,0.081793,1.0,506.0,0.304022,6.559009
2025-01-14,2.58773,0.006425,0.180241,-0.078088,-0.412032,0.456844,0.082342,1.0,506.0,0.304007,6.558202
2025-01-15,2.607313,0.007568,0.180223,-0.071111,-0.412032,0.4605,0.082993,1.0,506.0,0.303991,6.558144
2025-01-16,2.625037,0.006798,0.180203,-0.064796,-0.412032,0.463781,0.083575,1.0,506.0,0.303988,6.578642


---


100% (3030 of 3030) |####################| Elapsed Time: 0:11:43 Time:  0:11:43
