# Machine learning - RidgeRegression (Long/Short)

This example uses RidgeRegression to predict if the price is going up or down. Based on this predictions we can then define at which certainties we will go long/short or do nothing. This example was build for the **Q18 NASDAQ-100 Stock Long-Short contest**.

**Strategy idea:** We will go long or short on NASDAQ-100 stocks depending on predictions of **RidgeRegression** regarding if the price is moving up or down.

**Features for learning** -  trend indicator, stochastic oscillator, volatility

To have a look at all the technical indicators we offer, go to [**Technical Indicators**](https://quantiacs.com/documentation/en/user_guide/technical_indicators.html)

---


We will use a **specialized** version of the Quantiacs backtester for this purpose, which dramatically speeds up the backtesting process when the models should be retrained on a regular basis.

**Need help?** Check the [**Documentation**](https://quantiacs.com/documentation/en/) and find solutions/report problems in the [**Forum**](https://quantiacs.com/community/categories) section.

**More help with Jupyter?** Check the official [**Jupyter**](https://jupyter.org/) page.

Once you are done, click on **Submit to the contest** and take part to our competitions.

Learn more about **RidgeRegression** and other ML models: [**scikit-learn**](https://scikit-learn.org/stable/)

API reference:

* **data**: check how to work with [data](https://quantiacs.com/documentation/en/reference/data_load_functions.html);

* **backtesting**: read how to run the [simulation](https://quantiacs.com/documentation/en/reference/evaluation.html) and check the results.

In [1]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) { return false; }
// disable widget scrolling

<IPython.core.display.Javascript object>

In [2]:
import logging

import pandas as pd
import xarray as xr
import numpy as np

import qnt.backtester as qnbt
import qnt.ta as qnta

In [3]:
def create_model():
    """This is a constructor for the ML model which can be easily modified using a
       different model. 
    """
    from sklearn.linear_model import Ridge
    model = Ridge(random_state=18)
    
    return model

In [4]:
def get_features(data):
    """Builds the features used for learning:
       * a trend indicator;
       * the stochastic oscillator;
       * volatility;
       
       These features can be modified and new ones can be added easily.
    """
    #print(data)
    #trend
    trend = qnta.roc(qnta.lwma(data.sel(field='close'), 70), 1)

    # stochastic oscillator:
    k, d = qnta.stochastic(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'), 14)
    
    #volatility
    volatility = qnta.tr(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'))

    # combine the selected four features:
    result = xr.concat(
        [trend, d,volatility],
        pd.Index(
            ['trend', 'stochastic_d','volatilty'],
            name = 'field'
        )
    )
    
    return result.transpose('time', 'field', 'asset')

In [5]:
def get_target_classes(data):
    """Builds target classes for predicting if price goes up or down. This will later be use to evaluate if 
        we long or short.
    """

    price_current = data.sel(field='close')
    price_future = qnta.shift(price_current, -1)

    class_positive = 1 #prices goes up
    class_negative = 0 #price goes down

    target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
    
    return target_is_price_up

In [6]:
def train_model(data):
    """Create and train the models working on an asset-by-asset basis."""
    
    asset_name_all = ['NAS:AAPL', 'NAS:AMZN', 'NAS:MSFT']
    
    features_all = get_features(data)
    target_all = get_target_classes(data)

    models = dict()
    
    for asset_name in asset_name_all:
        model = create_model()
        # drop missing values:
        try:
            target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
        except: 
            print(target_all["asset"])
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')

        # align features and targets:
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')

        if len(features_cur.time) < 10:
            # not enough points for training
            continue

        
        
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception('model training failed')

    return models

In [7]:
def predict(models, data):
    """The model predicts if the price is going up or down and we then use this information to determine 
        if we want to go long, short or do nothing.
       Prediction is performed for several days in order to speed up the evaluation.
    """
    
    asset_name_all = ['NAS:AAPL', 'NAS:AMZN' , 'NAS:MSFT']
    weights = xr.zeros_like(data.sel(field='close'))
    
    for asset_name in asset_name_all:
        features_all = get_features(data)
        features_cur = features_all.sel(asset=asset_name).dropna('time','any')
        if len(features_cur.time) < 1:
            continue
        try:
            prediction = models[asset_name].predict(features_cur.values) #prediction for each day (in [0,1])
            
            for i in range(len(prediction)):
                p = prediction[i] 
                
                if p > 0.5: #model predicts price is going up
                    prediction[i] = 1 #long
                    
                elif p < 0.4: #model is fairly certain price is going down
                    prediction[i] = -1 #short
                    
                else: #model is not so sure about price going down
                    prediction[i] = 0 #do nothing
                
            weights.loc[dict(asset=asset_name,time=features_cur.time.values)] = prediction
            
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception('model prediction failed')
                
    return weights

In [8]:
weights = qnbt.backtest_ml(
    train=train_model,
    predict=predict,
    train_period=10*365,   # the data length for training in calendar days
    retrain_interval=365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit=1, # how often retrain models after submission during evaluation (calendar days)
    predict_each_day=False,  # Is it necessary to call prediction for every day during backtesting?
                             # Set it to true if you suspect that get_features is looking forward.
    competition_type='stocks_nasdaq100',  # competition type
    lookback_period=365,      # how many calendar days are needed by the predict function to generate the output
    start_date='2006-01-01',  # backtest start date
    build_plots=True          # do you need the chart?
)

Run the last iteration...
fetched chunk 1/3 0s
fetched chunk 2/3 0s
fetched chunk 3/3 0s
Data loaded 0s
fetched chunk 1/1 0s
Data loaded 0s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.
Write output: /root/fractions.nc.gz
State saved.
---
Run First Iteration...
fetched chunk 1/2 0s
fetched chunk 2/2 0s
Data loaded 0s
---
Run all iterations...
Load data...
fetched chunk 1/9 0s
fetched chunk 2/9 0s
fetched chunk 3/9 0s
fetched chunk 4/9 0s
fetched chunk 5/9 0s
fetched chunk 6/9 0s
fetched chunk 7/9 0s
fetched chunk 8/9 1s
fetched chunk 9/9 1s
Data loaded 1s
fetched chunk 1/5 0s
fetched chunk 2/5 0s
fetched chunk 3/5 0s
fetched chunk 4/5 0s
fetched chunk 5/5 0s
Data loaded 0s
Backtest...


 99% (4227 of 4228) |################### | Elapsed Time: 0:00:08 ETA:   0:00:00

fetched chunk 1/5 0s
fetched chunk 2/5 0s
fetched chunk 3/5 0s
fetched chunk 4/5 0s
fetched chunk 5/5 0s
Data loaded 0s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.
Write output: /root/fractions.nc.gz
State saved.
---
Analyze results...
Check...
Check liquidity...
Ok.
Check missed dates...
Ok.
Check the sharpe ratio...
Period: 2006-01-01 - 2022-06-16
Sharpe Ratio = 0.3336721279843639


ERROR! The Sharpe Ratio is too low. 0.3336721279843639 < 1
Improve the strategy and make sure that the in-sample Sharpe Ratio more than 1.


Check correlation.

Ok. This strategy does not correlate with other strategies.
---
Align...
Calc global stats...
---
Calc stats per asset...
Build plots...
---
Select the asset (or leave blank to display the overall stats):


interactive(children=(Combobox(value='', description='asset', options=('', 'NAS:AAL', 'NAS:AAPL', 'NAS:ABNB', …

100% (4228 of 4228) |####################| Elapsed Time: 0:00:56 Time:  0:00:56


# What libraries are available?

Our library makes extensive use of xarray: 

[xarray](http://xarray.pydata.org/en/stable/)

pandas:

[pandas](https://pandas.pydata.org/)

and numpy:

[numpy](https://numpy.org/)

Function definitions can be found in the qnt folder in your private root directory.

```python
# Import basic libraries.
import xarray as xr
import pandas as pd
import numpy as np

# Import quantnet libraries.
import qnt.data    as qndata  # load and manipulate data
import qnt.output as output   # manage output
import qnt.backtester as qnbt # backtester
import qnt.stats   as qnstats # statistical functions for analysis
import qnt.graph   as qngraph # graphical tools
import qnt.ta      as qnta    # indicators library
```

# May I import libraries?

Yes, please refer to the file **init.ipynb** in your home directory. You can dor example use:

! conda install -y scikit-learn

# How to load data?

Futures:
```python
data= qndata.futures.load_data(tail = 15*365, dims = ("time", "field", "asset"))
```

BTC Futures:
```python
data= qndata.cryptofutures.load_data(tail = 15*365, dims = ("time", "field", "asset"))
```

Cryptocurrencies:
```python
data= qndata.crypto.load_data(tail = 15*365, dims = ("time", "field", "asset"))
```

# How to view a list of all tickers?

```python
data.asset.to_pandas().to_list()
```

# How to see which fields are available?

```python
data.field.to_pandas().to_list()
```

# How to load specific tickers?

```python
data = qndata.futures.load_data(tail=15 * 365, assets=['F_O', 'F_DX', 'F_GC'])
```

# How to select specific tickers after loading all data?

```python
def get_data_filter(data, assets):
    filler= data.sel(asset=assets)
    return filler

get_data_filter(data, ["F_O", "F_DX", "F_GC"])
```

# How to get the prices for the previous day?

```python
qnta.shift(data.sel(field="open"), periods=1)
```

or:

```python
data.sel(field="open").shift(time=1)
```

# How do I get a list of the top 10 assets ranked by Sharpe ratio?

```python
import qnt.stats as qnstats

data= qndata.futures.load_data(tail=16 * 365)

def get_best_instruments(data, weights, top_size):
    # compute statistics:
    stats_per_asset= qnstats.calc_stat(data, weights, per_asset=True)
    # calculate ranks of assets by "sharpe_ratio":
    ranks= (-stats_per_asset.sel(field="sharpe_ratio")).rank("asset")
    # select top assets by rank "top_period" days ago:
    top_period= 300
    rank= ranks.isel(time=-top_period)
    top= rank.where(rank <= top_size).dropna("asset").asset

    # select top stats:
    top_stats= stats_per_asset.sel(asset=top.values)

    # print results:
    print("SR tail of the top assets:")
    display(top_stats.sel(field="sharpe_ratio").to_pandas().tail())

    print("avg SR = ", top_stats[-top_period:].sel(field="sharpe_ratio").mean("asset")[-1].item())
    display(top_stats)
    return top_stats.coords["asset"].values

get_best_instruments(data, weights, 10)
```

# How can I check the results for only the top 10 assets ranked by Sharpe ratio?

Select the top assets and then load their data:

```python
best_assets= get_best_instruments(data, weights, 10)

data= qndata.futures.load_data(tail=15 * 365, assets=best_assets)
...
```

# How can prices be processed?

Simply import standard libraries, for example **numpy**:

```python
import numpy as np

high= np.log(data.sel(field="high"))
```

# How can you reduce slippage impace when trading?

Just apply some technique to reduce turnover:

```python
def get_lower_slippage(weights, rolling_time=6):
    return weights.rolling({"time": rolling_time}).max()

improved_weights = get_lower_slippage(weights, rolling_time=6)
```

# How to use technical analysis indicators?

For available indicators see the source code of the library: /qnt/ta

## ATR

```python
def get_atr(data, days=14):
    high = data.sel(field="high") * 1.0 
    low  = data.sel(field="low") * 1.0 
    close= data.sel(field="close") * 1.0

    return qnta.atr(high, low, close, days)

atr= get_atr(data, days=14)
```

## EMA

```python
prices= data.sel(field="high")
prices_ema= qnta.ema(prices, 15)
```

## TRIX

```python
prices= data.sel(field="high")
prices_trix= qnta.trix(prices, 15)
```

## ADL and EMA

```python
adl= qnta.ad_line(data.sel(field="close")) * 1.0 
adl_ema= qnta.ema(adl, 18)
```

# How can you check the quality of your strategy?

```python
import qnt.output as qnout
qnout.check(weights, data)
```

or

```python
stat= qnstats.calc_stat(data, weights)
display(stat.to_pandas().tail())
```

or

```python
import qnt.graph   as qngraph
statistics= qnstats.calc_stat(data, weights)
display(statistics.to_pandas().tail())

performance= statistics.to_pandas()["equity"]
qngraph.make_plot_filled(performance.index, performance, name="PnL (Equity)", type="log")

display(statistics[-1:].sel(field = ["sharpe_ratio"]).transpose().to_pandas())
qnstats.print_correlation(weights, data)

```

# An example using pandas

One can work with pandas DataFrames at intermediate steps and at the end convert them to xarray data structures:

```python
def get_price_pct_change(prices):
    prices_pandas= prices.to_pandas()
    assets= data.coords["asset"].values
    for asset in assets:
        prices_pandas[asset]= prices_pandas[asset].pct_change()
    return prices_pandas


prices= data.sel(field="close") * 1.0
prices_pct_change= get_price_pct_change(prices).unstack().to_xarray()

```

# How to submit a strategy to the competition?

Check that weights are fine:

```python
import qnt.output as qnout
qnout.check(weights, data)
```

If everything is ok, write the weights to file:

```python
qnout.write(weights)
```

In your **personal account**:

* **choose** a strategy;
* click on the **Submit** button;
* select the type of competition.

At the beginning you will find the strategy under the **Checking** area (**Competition** > **Checking**). If Sharpe ratio is larger than 1 and technical checks are successful, the strategy will go under the **Running** area (**Competition** > **Running**). Otherwise it will be **Filtered** (**Competition** > **Filtered**) and you should inspect error and warning messages.