![](https://miro.medium.com/max/3840/1*LHuN1tJt-abIpuX14v5M8w.png)

-----------------------------
written by katsu1110

-----------------------------

This is yet another starter notebook for the [Numerai Signals](https://signals.numer.ai/). 

What we do here includes:

- fetch US stock price data via YFinance API
- merge the data with the Numerai Signals' historical targets
- perform feature engineering (considering stational features)
- modeling with XGBoost
- submit (if you want)

In a kaggle dataset [YFinance Stock Price Data for Numerai Signals](https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals), I fetch the stock price data on a daily basis via the YFinance API. So if you are bothered using the API for yourself, just use this dataset (it must be up-to-date).

This content is largely inspired by the following starter.

>End to end notebook for Numerai Signals using completely free data from Yahoo Finance, by Jason Rosenfeld (jrAI).

https://colab.research.google.com/drive/1ECh69C0LDCUnuyvEmNFZ51l_276nkQqo#scrollTo=tTBUzPep2dm3

Alright, let's get it started!

# Libraries
Let's import what we need...

In [None]:
!pip install numerapi==2.3.8
import numerapi

In [None]:
!pip install xgboost==1.3.0.post0


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import gc
import pathlib
from tqdm.auto import tqdm
import joblib
import json
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from multiprocessing import Pool, cpu_count
import time
import requests as re
from datetime import datetime
from dateutil.relativedelta import relativedelta, FR

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

# visualize
import matplotlib.pyplot as plt
import matplotlib.style as style
from matplotlib_venn import venn2, venn3
import seaborn as sns
from matplotlib import pyplot
from matplotlib.ticker import ScalarFormatter
sns.set_context("talk")
style.use('seaborn-colorblind')

import warnings
warnings.simplefilter('ignore')

# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Config
A simple config and logging setup.

In [None]:
today = datetime.now().strftime('%Y-%m-%d')
today

In [None]:
# config class
class CFG:
    """
    Set FETCH_VIA_API = True if you want to fetch the data via API.
    Otherwise we use the daily-updated one in the kaggle dataset (faster).
    """
    INPUT_DIR = '../input/yfinance-stock-price-data-for-numerai-signals'
    OUTPUT_DIR = './'
    FETCH_VIA_API = False
    SEED = 46

In [None]:
# Logging is always nice for your experiment:)
def init_logger(log_file='train.log'):
    from logging import getLogger, INFO, FileHandler,  Formatter,  StreamHandler
    logger = getLogger(__name__)
    logger.setLevel(INFO)
    handler1 = StreamHandler()
    handler1.setFormatter(Formatter("%(message)s"))
    handler2 = FileHandler(filename=log_file)
    handler2.setFormatter(Formatter("%(message)s"))
    logger.addHandler(handler1)
    logger.addHandler(handler2)
    return logger

logger = init_logger(log_file=f'{CFG.OUTPUT_DIR}/{today}.log')
logger.info('Start Logging...')

# Setup Numerai API
First of all, let's set up the numerai signals API. 

We can do many things with this API: 

- get a ticker map (between yfinance data and numerai historical targets)
- get the historical targets
- get your model slot name and model_id (if private key and secret key are provided)
- submit

(well, maybe more)

## Get Tickers for Numerai Signals
Let's first get the ticker map.

In [None]:
napi = numerapi.SignalsAPI()
logger.info('numerai api setup!')

In [None]:
# read in list of active Signals tickers which can change slightly era to era
eligible_tickers = pd.Series(napi.ticker_universe(), name='ticker') 
logger.info(f"Number of eligible tickers: {len(eligible_tickers)}")

In [None]:
# read in yahoo to numerai ticker map, still a work in progress, h/t wsouza and 
# this tickermap is a work in progress and not guaranteed to be 100% correct
ticker_map = pd.read_csv('https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/signals_ticker_map_w_bbg.csv')
ticker_map = ticker_map[ticker_map.bloomberg_ticker.isin(eligible_tickers)]

numerai_tickers = ticker_map['ticker']
yfinance_tickers = ticker_map['yahoo']
logger.info(f"Number of eligible tickers in map: {len(ticker_map)}")

In [None]:
print(ticker_map.shape)
ticker_map.head()

This ticker map is necessary for a successful submission if you use yfinance data.

# Load Stock Price Data
Now is the time to get the stock price data, fetched via the [YFiance API](https://pypi.org/project/yfinance/).

The good thing with this API is that it is free of charge.

The bad thing with this API is that the data is often not complete.

For a better quality of stock price data, you might want to try out purchasing one from [Quandl](https://www.quandl.com/data/EOD-End-of-Day-US-Stock-Prices/documentation?anchor=overview).

This is another starter using Quandl data:
https://forum.numer.ai/t/signals-plugging-in-the-data-from-quandl/2431

This is of course wonderful, but if you are a beginner, why not just start with a free one?

In [None]:
# If you want to fetch the data on your own, you can use this function...

def fetch_yfinance(ticker_map, start='2002-12-01'):
    """
    # fetch yfinance data
    :INPUT:
    - ticker_map : Numerai eligible ticker map (pd.DataFrame)
    - start : date (str)
    
    :OUTPUT:
    - full_data : pd.DataFrame ('date', 'ticker', 'close', 'raw_close', 'high', 'low', 'open', 'volume')
    """
    
    # ticker map
    numerai_tickers = ticker_map['ticker']
    yfinance_tickers = ticker_map['yahoo']

    # fetch
    raw_data = yfinance.download(
        yfinance_tickers.str.cat(sep=' '), 
        start=start, 
        threads=True
    ) 
    
    # format
    cols = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']
    full_data = raw_data[cols].stack().reset_index()
    full_data.columns = ['date', 'ticker', 'close', 'raw_close', 'high', 'low', 'open', 'volume']
    
    # map yfiance ticker to numerai tickers
    full_data['ticker'] = full_data.ticker.map(
        dict(zip(yfinance_tickers, numerai_tickers))
    )
    return full_data

In [None]:
%%time

if CFG.FETCH_VIA_API: # fetch data via api
    logger.info('Fetch data via API...may take some time...')
    !pip install yfinance==0.1.62
    !pip install simplejson
    import yfinance
    import simplejson
    
    df = fetch_yfinance(ticker_map, start='2002-12-01')
else: # loading from the kaggle dataset (https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals)
    logger.info('Load data from the kaggle dataset...')
    df = pd.read_csv(pathlib.Path(f'{CFG.INPUT_DIR}/full_data.csv'))

print(df.shape)
df.head(3)

In [None]:
df.tail(3)

## Load Targets for Numerai Signals
For a supervised machine learning, we need a target label. That is available in the Numerai Signals, so we can just fetch it.

In [None]:
%%time

def read_numerai_signals_targets():
    # read in Signals targets
    numerai_targets = 'https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/signals_train_val.csv'
    targets = pd.read_csv(numerai_targets)
    
    # to datetime int
    targets['friday_date'] = pd.to_datetime(targets['friday_date'].astype(str), format='%Y-%m-%d').dt.strftime('%Y%m%d').astype(int)
    
#     # train, valid split
#     train_targets = targets.query('data_type == "train"')
#     valid_targets = targets.query('data_type == "validation"')
    
    return targets

targets = read_numerai_signals_targets()

In [None]:
print(targets.shape, targets['friday_date'].min(), targets['friday_date'].max())
targets.head()

In [None]:
targets.tail()

In [None]:
# there are train and validation...
fig, ax = plt.subplots(1, 2, figsize=(16, 4))
ax = ax.flatten()

for i, data_type in enumerate(['train', 'validation']):
    # slice
    targets_ = targets.query(f'data_type == "{data_type}"')
    logger.info('*' * 50)
    logger.info('{} target: {:,} tickers (friday_date: {} - {})'.format(
        data_type, 
        targets_['ticker'].nunique(),
        targets_['friday_date'].min(),
        targets_['friday_date'].max(),
    ))
    
    # plot target
    ax[i].hist(targets_['target'])
    ax[i].set_title(f'{data_type}')

The target looks exactly like the one from the Numerai Tournament, where both features and targets are given to the participants.

Also note that the train-validation split is based on time (i.e., Time-Series Split):

- train friday_date: 20030131 - 20121228
- validation friday_date: 20130104 - 20200228

## Check Ticker Overlaps
Let's see if we have enough overlap of tickers between our yfiance stock data and the numerai targets. We need at least 5 tickers for submission.

In [None]:
# ticker overlap
venn3(
    [
        set(df['ticker'].unique().tolist())
        , set(targets.query('data_type == "train"')['ticker'].unique().tolist())
        , set(targets.query('data_type == "validation"')['ticker'].unique().tolist())
    ],
    set_labels=('yf price', 'train target', 'valid target')
)

Ah, yeah, not bad, I guess? 

Here I only use our stock price data which have ticker overlaps such that we can build a supervised machine learning model.

In [None]:
# select target-only tickers
df = df.loc[df['ticker'].isin(targets['ticker'])].reset_index(drop=True)

print('{:,} tickers: {:,} records'.format(df['ticker'].nunique(), len(df)))

As I mentioned earlier, the yfiance stock data is not complete. Let's see if we have enough records per ticker.

In [None]:
record_per_ticker = df.groupby('ticker')['date'].nunique().reset_index().sort_values(by='date')
record_per_ticker

In [None]:
record_per_ticker['date'].hist()
print(record_per_ticker['date'].describe())

There are unfortunately some tickers where the number of records is small. 

Here I only use tickers with more than 1,000 records.

In [None]:
tickers_with_records = record_per_ticker.query('date >= 1000')['ticker'].values
df = df.loc[df['ticker'].isin(tickers_with_records)].reset_index(drop=True)

print('Here, we use {:,} tickers: {:,} records'.format(df['ticker'].nunique(), len(df)))

# Feature Engineering
Yeah finally machine learning part!

Here we generate sets of stock price features. There are some caveats to be aware of:

- **No Leak**: we cannot use a feature which uses the future information (this is a forecasting task!)
- **Stationaly features**: Our features have to work whenever (scales must be stationaly over the periods of time)

The implementation of the feature engineering is derived from [J-Quants Tournament](https://japanexchangegroup.github.io/J-Quants-Tutorial/#anchor-2.7). Although this content is in Japanese, I believe this is one of the best resources for feature engineering in the finance domain. 

Also I add the RSI and MACD (PPO) features as a bonus:D

We generate features per ticker repeatedly. To accelerate the process, we use the parallel processing.

In [None]:
# first, fix date column in the yfiance stock data to be friday date (just naming along with numerai targets)
df['friday_date'] = df['date'].apply(lambda x : int(str(x).replace('-', '')))
df.tail(3)

Ready for feature engineering?

In [None]:
# technical indicators
def RSI(close: pd.DataFrame, period: int = 14) -> pd.Series:
    # https://gist.github.com/jmoz/1f93b264650376131ed65875782df386
    """See source https://github.com/peerchemist/finta
    and fix https://www.tradingview.com/wiki/Talk:Relative_Strength_Index_(RSI)
    Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements.
    RSI oscillates between zero and 100. Traditionally, and according to Wilder, RSI is considered overbought when above 70 and oversold when below 30.
    Signals can also be generated by looking for divergences, failure swings and centerline crossovers.
    RSI can also be used to identify the general trend."""

    delta = close.diff()

    up, down = delta.copy(), delta.copy()
    up[up < 0] = 0
    down[down > 0] = 0

    _gain = up.ewm(com=(period - 1), min_periods=period).mean()
    _loss = down.abs().ewm(com=(period - 1), min_periods=period).mean()

    RS = _gain / _loss
    return pd.Series(100 - (100 / (1 + RS)))

def EMA1(x, n):
    """
    https://qiita.com/MuAuan/items/b08616a841be25d29817
    """
    a= 2/(n+1)
    return pd.Series(x).ewm(alpha=a).mean()

def MACD(close : pd.DataFrame, span1=12, span2=26, span3=9):
    """
    Compute MACD
    # https://www.learnpythonwithrune.org/pandas-calculate-the-moving-average-convergence-divergence-macd-for-a-stock/
    """
    exp1 = EMA1(close, span1)
    exp2 = EMA1(close, span2)
    macd = exp1 - exp2
    signal = EMA1(macd, span3)

    return macd, signal

def feature_engineering(ticker='ZEAL DC', df=df):
    """
    feature engineering
    
    :INPUTS:
    - ticker : numerai ticker name (str)
    - df : yfinance dataframe (pd.DataFrame)
    
    :OUTPUTS:
    - feature_df : feature engineered dataframe (pd.DataFrame)
    """
    # init
    keys = ['friday_date', 'ticker']
    feature_df = df.query(f'ticker == "{ticker}"')
    
    # price features
    new_feats = []
    for i, f in enumerate(['close', ]):
        for x in [20, 40, 60, ]:
            # return
            feature_df[f"{f}_return_{x}days"] = feature_df[
                f
            ].pct_change(x)

            # volatility
            feature_df[f"{f}_volatility_{x}days"] = (
                np.log1p(feature_df[f])
                .pct_change()
                .rolling(x)
                .std()
            )
        
            # kairi mean
            feature_df[f"{f}_MA_gap_{x}days"] = feature_df[f] / (
                feature_df[f].rolling(x).mean()
            )
            
            # features to use
            new_feats += [
                f"{f}_return_{x}days", 
                f"{f}_volatility_{x}days",
                f"{f}_MA_gap_{x}days",
                         ]

    # RSI
    feature_df['RSI'] = RSI(feature_df['close'], 14)

    # MACD
    macd, macd_signal = MACD(feature_df['close'], 12, 26, 9) 
    feature_df['MACD'] = 100 * macd / macd_signal

    new_feats += ['RSI', 'MACD', ]

    # only new feats
    feature_df = feature_df[new_feats + keys]

    # fill nan
    feature_df.fillna(method='ffill', inplace=True) # safe fillna method for a forecasting task
    feature_df.fillna(method='bfill', inplace=True) # just in case ... making sure no nan

    return feature_df

def add_features(df):
    # FE with multiprocessing
    tickers = df['ticker'].unique().tolist()
    print('FE for {:,} stocks...using {:,} CPUs...'.format(len(tickers), cpu_count()))
    start_time = time.time()
    with Pool(cpu_count()) as p:
        feature_dfs = list(tqdm(p.imap(feature_engineering, tickers), total=len(tickers)))
    return pd.concat(feature_dfs)

In [None]:
%%time

feature_df = add_features(df)
del df
gc.collect()

In [None]:
print(feature_df.shape)
feature_df.head()

In [None]:
feature_df.tail()

# Merge Targets and Features
Feature engineering is done. Let's merge it with the numerai historical targets.

In [None]:
# do we have enough overlap with respect to 'friday_date'?
venn2([
    set(feature_df['friday_date'].astype(str).unique().tolist())
    , set(targets['friday_date'].astype(str).unique().tolist())
], set_labels=('features_days', 'targets_days'))

In [None]:
# do we have enough overlap with respect to 'ticker'?
venn2([
    set(feature_df['ticker'].astype(str).unique().tolist())
    , set(targets['ticker'].astype(str).unique().tolist())
], set_labels=('features_ticker', 'targets_ticker'))

In [None]:
# merge
feature_df['friday_date'] = feature_df['friday_date'].astype(int)
targets['friday_date'] = targets['friday_date'].astype(int)

feature_df = feature_df.merge(
    targets,
    how='left',
    on=['friday_date', 'ticker']
)

print(feature_df.shape)
feature_df.tail()

In [None]:
# save (just to make sure that we are on the safe side if yfinance is dead some day...)
feature_df.to_pickle(f'{CFG.OUTPUT_DIR}/feature_df.pkl')
feature_df.info()

We now have a merged features + target table! It seems like we are ready for modeling.

# Modeling
Yay, finally!

Here let's use XGBoost. 

The hyperparameters are derived from the Integration-Test, which is an example yet a strong baseline for the Numerai Tournament.

In [None]:
target = 'target'
drops = ['data_type', target, 'friday_date', 'ticker']
features = [f for f in feature_df.columns.values.tolist() if f not in drops]

logger.info('{:,} features: {}'.format(len(features), features))

In [None]:
# train-valid split
train_set = {
    'X': feature_df.query('data_type == "train"')[features], 
    'y': feature_df.query('data_type == "train"')[target].astype(np.float64)
}
val_set = {
    'X': feature_df.query('data_type == "validation"')[features], 
    'y': feature_df.query('data_type == "validation"')[target].astype(np.float64)
}

assert train_set['y'].isna().sum() == 0
assert val_set['y'].isna().sum() == 0

In [None]:
# same parameters of the Integration-Test
import joblib
from sklearn import utils
import xgboost as xgb
import operator

params = {
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
    'colsample_bytree': 0.1,                 
    'learning_rate': 0.01,
    'max_depth': 5,
    'seed': 46,
    'n_estimators': 2000,
#     'tree_method': 'gpu_hist' # if you want to use GPU ...
}

# define 
model = xgb.XGBRegressor(**params)

# fit
model.fit(
    train_set['X'], train_set['y'], 
    eval_set=[(val_set['X'], val_set['y'])],
    verbose=100, 
    early_stopping_rounds=100,
)

# save model
joblib.dump(model, f'{CFG.OUTPUT_DIR}/xgb_model_val.pkl')
logger.info('xgb model with early stopping saved!')

# feature importance
importance = model.get_booster().get_score(importance_type='gain')
importance = sorted(importance.items(), key=operator.itemgetter(1))
feature_importance_df = pd.DataFrame(importance, columns=['features', 'importance'])

In [None]:
# feature importance
fig, ax = plt.subplots(1, 1, figsize=(12, 10))
sns.barplot(
    x='importance', 
    y='features', 
    data=feature_importance_df.sort_values(by='importance', ascending=False),
    ax=ax
)

Looks like 'price gap the moving average' kinds of features are good signals!

# Validation Score
The following snipets are derived from 

https://colab.research.google.com/drive/1ECh69C0LDCUnuyvEmNFZ51l_276nkQqo#scrollTo=tTBUzPep2dm3

Let's see how good our model predictions on the validation data are.

Good? It's good, isn't it?

In [None]:
# https://colab.research.google.com/drive/1ECh69C0LDCUnuyvEmNFZ51l_276nkQqo#scrollTo=tTBUzPep2dm3

def score(df, target_name=target, pred_name='prediction'):
    '''Takes df and calculates spearm correlation from pre-defined cols'''
    # method="first" breaks ties based on order in array
    return np.corrcoef(
        df[target_name],
        df[pred_name].rank(pct=True, method="first")
    )[0,1]

def run_analytics(era_scores):
    print(f"Mean Correlation: {era_scores.mean():.4f}")
    print(f"Median Correlation: {era_scores.median():.4f}")
    print(f"Standard Deviation: {era_scores.std():.4f}")
    print('\n')
    print(f"Mean Pseudo-Sharpe: {era_scores.mean()/era_scores.std():.4f}")
    print(f"Median Pseudo-Sharpe: {era_scores.median()/era_scores.std():.4f}")
    print('\n')
    print(f'Hit Rate (% positive eras): {era_scores.apply(lambda x: np.sign(x)).value_counts()[1]/len(era_scores):.2%}')

    era_scores.rolling(10).mean().plot(kind='line', title='Rolling Per Era Correlation Mean', figsize=(15,4))
    plt.axhline(y=0.0, color="r", linestyle="--"); plt.show()

    era_scores.cumsum().plot(title='Cumulative Sum of Era Scores', figsize=(15,4))
    plt.axhline(y=0.0, color="r", linestyle="--"); plt.show()

In [None]:
# prediction for the validation set
valid_sub = feature_df.query('data_type == "validation"')[drops].copy()
valid_sub['prediction'] = model.predict(val_set['X'])

# compute score
val_era_scores = valid_sub.copy()
val_era_scores['friday_date'] = val_era_scores['friday_date'].astype(str)
val_era_scores = val_era_scores.loc[val_era_scores['prediction'].isna() == False].groupby(['friday_date']).apply(score)
run_analytics(val_era_scores)

Well, I guess it is fairly good as a starter, isn't it?

# Submission
Let's use this trained model to make a submission for the Numerai Signals.

Note that, again, yfinance data is not complete. Sometimes there is no recent data available for many tickers;(

We need at least 5 tickers for a successful submission. Let's first check if we have at least 5 tickers in which the recent friday_date for them is indeed the recent friday date.

In [None]:
# recent friday date?
recent_friday = datetime.now() + relativedelta(weekday=FR(-1))
recent_friday = int(recent_friday.strftime('%Y%m%d'))
print(f'Most recent Friday: {recent_friday}')

In [None]:
# in case no recent friday is available...prep the second last
recent_friday2 = datetime.now() + relativedelta(weekday=FR(-2))
recent_friday2 = int(recent_friday2.strftime('%Y%m%d'))
print(f'Second most recent Friday: {recent_friday2}')

In [None]:
# do we have at least 5 tickers, whose the latest date matches the recent friday?
ticker_date_df = feature_df.groupby('ticker')['friday_date'].max().reset_index()
if len(ticker_date_df.loc[ticker_date_df['friday_date'] == recent_friday]) >= 5:
    ticker_date_df = ticker_date_df.loc[ticker_date_df['friday_date'] == recent_friday]
else: # use dates later than the second last friday
    ticker_date_df = ticker_date_df.loc[ticker_date_df['friday_date'] == recent_friday2]
    recent_friday = recent_friday2
    
print(len(ticker_date_df))
ticker_date_df

Good! That's fairly enough. So we only perform the inference on those tickers and submit!

In [None]:
# live sub
feature_df.loc[feature_df['friday_date'] == recent_friday, 'data_type'] = 'live'
test_sub = feature_df.query('data_type == "live"')[drops].copy()
test_sub['prediction'] = model.predict(feature_df.query('data_type == "live"')[features])

logger.info(test_sub.shape)
test_sub.head()

In [None]:
# histogram of prediction
test_sub['prediction'].hist(bins=100)

Let's submit! What is good with the Numerai Signals is that if you submit your predictions on the validation data, on the website, you can get more information about your model performance such as APY.

In [None]:
# To submit, you need to have Numerai account and have API's id and secret key. Also you need to have at least one (numerai signals') model slot.
def submit_signal(sub: pd.DataFrame, public_id: str, secret_key: str, slot_name: str):
    """
    submit numerai signals prediction
    """
    # setup private API
    napi = numerapi.SignalsAPI(public_id, secret_key)
    
    # write predictions to csv
    model_id = napi.get_models()[f'{slot_name}']
    filename = f"sub_{model_id}.csv"
    sub.to_csv(filename, index=False)
    
    # submit
    submission = napi.upload_predictions(filename, model_id=model_id)
    print(f'Submitted : {slot_name}!')
    
# concat valid and test 
sub = pd.concat([valid_sub, test_sub], ignore_index=True)

# rename to 'signal'
sub.rename(columns={'prediction': 'signal'}, inplace=True)

# select necessary columns
sub = sub[['ticker', 'friday_date', 'data_type', 'signal']]

public_id = '<Your Numerai API ID>'
secret_key = '<Your Numerai Secret Key>'
slot_name = '<Your Numerai Signals Submission Slot Name>'
# submit_signal(sub, public_id, secret_key, slot_name) # uncomment if you submit

# save 
sub.to_csv(f'{CFG.OUTPUT_DIR}/example_submission_{today}.csv', index=False)

In [None]:
sub.head()

In [None]:
sub.tail()

ALL DONE!

Of course, this is just another starter, and there are plenty rooms left to be improved.

- Feature engineering (more stock price, volume features? Be careful for leaks and non-stationality)
- Modeling (another model? another validation strategy? training period?)
- Target (Should we simply use the historical targets? Can we make a new one?)
- Dataset (is yfinance sufficient for stable weekly performance?)

...potentially more...

Have fun with Numerai Signals!