Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The feature names should match those that were passed during fit #116

Closed
ll550 opened this issue Apr 16, 2024 · 6 comments
Closed

The feature names should match those that were passed during fit #116

ll550 opened this issue Apr 16, 2024 · 6 comments

Comments

@ll550
Copy link

ll550 commented Apr 16, 2024

the functions in ctx.preds swap the feature names in the tuple,e.g. (macd macds macdh)->(macd macdh macds). It leads to the problem described in the title.

My solution is simple. Just reorganize the feature names as ctx.preds did in train_slr function.

The problem is due to sklearn doing some validations between fit and prediction. I tried to switch off such validations, but failed. Hope it could be useful for someone else.

@edtechre
Copy link
Owner

Thank you for reporting @ll550! Is this the case with using a custom predict_fn?

@ll550
Copy link
Author

ll550 commented Apr 16, 2024

@edtechre No, I just simply did a Little extension on the example.

@edtechre
Copy link
Owner

edtechre commented Apr 17, 2024 via email

@ll550
Copy link
Author

ll550 commented Apr 17, 2024

@edtechre
OK, I just used the linear regression example from your repo but with multi features input as follows:

Please notice the order of [macd macds macdh]

model_slr = pb.model(name='slr', fn=train_slr, indicators=[macd,macds,macdh])

def train_slr(symbol, train_data, test_data):
    # Previous day close prices.
    rain_prev_close = train_data['close'].shift(1)
    # Calculate daily returns.
    train_daily_returns = (train_data['close'] - train_prev_close) / train_prev_close
    # Predict next day's return.
    train_data['pred'] = train_daily_returns.shift(-1)
    train_data = train_data.dropna()
    # Train the LinearRegession model to predict the next day's return
    # given the 20-day CMMA.
#####################################
    X_train = train_data[['macd','macdh','macds']]
##########################################
    y_train = train_data[['pred']]
     model = LinearRegression()
     model.fit(X_train, y_train)


    # Test
    test_prev_close = test_data['close'].shift(1)
    test_daily_returns = (test_data['close'] - test_prev_close) / test_prev_close
    test_data['pred'] = test_daily_returns.shift(-1)
    test_data = test_data.dropna()
######################################
    X_test = test_data[['macd','macdh','macds']]
######################################
    y_test = test_data[['pred']]
    # Make predictions from test data.
    y_pred = model.predict(X_test)

    # Print goodness of fit.
    r2 = r2_score(y_test, np.squeeze(y_pred))
    print(symbol, f'R^2={r2}')

    # Return the trained model.
    return model

If I put the same order of macd's indicators as in pb.model, the ctx.preds in the following function will issue the error. But the training of linear regression is ok.

def hold_long(ctx):
    if not ctx.long_pos():
        # Buy if the next bar is predicted to have a positive return:
        if ctx.preds('slr')[-1]> 0:
            ctx.buy_shares = 100
    else:
        # Sell if the next bar is predicted to have a negative return:
        #print(ctx.preds('slr'))
        if ctx.preds('slr')[-1] < 0:
            ctx.sell_shares = 100

Hope it will help.

@edtechre
Copy link
Owner

edtechre commented Apr 20, 2024

I wasn't able to repro the error using the code below. Also note that there is no way the framework would be able to fix this, since the ordering would be dependent on the ordering used to access the columns in the train_slr function, which will be code provided by the user.

import pandas as pd
import pybroker
from pybroker import Strategy, StrategyConfig, YFinance

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import talib as ta

pybroker.enable_data_source_cache('walkforward_strategy')

macd = pybroker.indicator('macd', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[0])
macds = pybroker.indicator('macdh', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[1])
macdh = pybroker.indicator('macds', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[2])

def train_slr(symbol, train_data, test_data):
    # Previous day close prices.
    train_prev_close = train_data['close'].shift(1)
    # Calculate daily returns.
    train_daily_returns = (train_data['close'] - train_prev_close) / train_prev_close
    # Predict next day's return.
    train_data['pred'] = train_daily_returns.shift(-1)
    train_data = train_data.dropna()
    # Train the LinearRegession model to predict the next day's return
    # given the 20-day CMMA.
#####################################
    X_train = train_data[['macd','macdh','macds']]
##########################################
    y_train = train_data[['pred']]
    model = LinearRegression()
    model.fit(X_train, y_train)


    # Test
    test_prev_close = test_data['close'].shift(1)
    test_daily_returns = (test_data['close'] - test_prev_close) / test_prev_close
    test_data['pred'] = test_daily_returns.shift(-1)
    test_data = test_data.dropna()
######################################
    X_test = test_data[['macd','macdh','macds']]
######################################
    y_test = test_data[['pred']]
    # Make predictions from test data.
    y_pred = model.predict(X_test)

    # Print goodness of fit.
    r2 = r2_score(y_test, np.squeeze(y_pred))
    print(symbol, f'R^2={r2}')

    # Return the trained model.
    return model

def hold_long(ctx):
    if not ctx.long_pos():
        # Buy if the next bar is predicted to have a positive return:
        if ctx.preds('slr')[-1]> 0:
            ctx.buy_shares = 100
    else:
        # Sell if the next bar is predicted to have a negative return:
        if ctx.preds('slr')[-1] < 0:
            ctx.sell_shares = 100

model_slr = pybroker.model('slr', train_slr, indicators=[macds, macd, macdh])

config = StrategyConfig(bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)
strategy.clear_executions()
strategy.add_execution(hold_long, ['NVDA', 'AMD'], models=model_slr)

strategy.backtest(train_size=0.5)

@ll550
Copy link
Author

ll550 commented Apr 20, 2024

For those who are interested in this issue, I pasted the error version and working version for comparsion. Thanks for @edtechre code. I believe he is correct. If you want to input more features, the ordering would be different from this case. You need to capture the correct order in ctx.preds (the order is from sklearn in an alphabetical form), and change them accordingly in the train_slr function.

This is my erroneous version:

import pandas as pd
import pybroker
from pybroker import Strategy, StrategyConfig, YFinance

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import talib as ta

pybroker.enable_data_source_cache('walkforward_strategy')

macd = pybroker.indicator('macd', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[0])
macds = pybroker.indicator('macds', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[1])
macdh = pybroker.indicator('macdh', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[2])

def train_slr(symbol, train_data, test_data):
    # Previous day close prices.
    train_prev_close = train_data['close'].shift(1)
    # Calculate daily returns.
    train_daily_returns = (train_data['close'] - train_prev_close) / train_prev_close
    # Predict next day's return.
    train_data['pred'] = train_daily_returns.shift(-1)
    train_data = train_data.dropna()
    # Train the LinearRegession model to predict the next day's return
    # given the 20-day CMMA.
#####################################
    X_train = train_data[['macd','macds','macdh']]
##########################################
    y_train = train_data[['pred']]
    model = LinearRegression()
    model.fit(X_train, y_train)


    # Test
    test_prev_close = test_data['close'].shift(1)
    test_daily_returns = (test_data['close'] - test_prev_close) / test_prev_close
    test_data['pred'] = test_daily_returns.shift(-1)
    test_data = test_data.dropna()
######################################
    X_test = test_data[['macd','macds','macdh']]
######################################
    y_test = test_data[['pred']]
    # Make predictions from test data.
    y_pred = model.predict(X_test)

    # Print goodness of fit.
    r2 = r2_score(y_test, np.squeeze(y_pred))
    print(symbol, f'R^2={r2}')

    # Return the trained model.
    return model

def hold_long(ctx):
    if not ctx.long_pos():
        # Buy if the next bar is predicted to have a positive return:
        if ctx.preds('slr')[-1]> 0:
            ctx.buy_shares = 100
    else:
        # Sell if the next bar is predicted to have a negative return:
        if ctx.preds('slr')[-1] < 0:
            ctx.sell_shares = 100

model_slr = pybroker.model('slr', train_slr, indicators=[macd, macds, macdh])

config = StrategyConfig(bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)
strategy.clear_executions()
strategy.add_execution(hold_long, ['NVDA', 'AMD'], models=model_slr)

strategy.backtest(train_size=0.5)

**---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[219], line 69
66 strategy.clear_executions()
67 strategy.add_execution(hold_long, ['NVDA', 'AMD'], models=model_slr)
---> 69 strategy.backtest(train_size=0.5)

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\strategy.py:1083, in Strategy.backtest(self, start_date, end_date, timeframe, between_time, days, lookahead, train_size, shuffle, calc_bootstrap, disable_parallel, warmup, portfolio)
1014 def backtest(
1015 self,
1016 start_date: Optional[Union[str, datetime]] = None,
(...)
1027 portfolio: Optional[Portfolio] = None,
1028 ) -> TestResult:
1029 """Backtests the trading strategy by running executions that were added
1030 with :meth:.add_execution.
1031
(...)
1081 history, and evaluation metrics.
1082 """
-> 1083 return self.walkforward(
1084 windows=1,
1085 lookahead=lookahead,
1086 start_date=start_date,
1087 end_date=end_date,
1088 timeframe=timeframe,
1089 between_time=between_time,
1090 days=days,
1091 train_size=train_size,
1092 shuffle=shuffle,
1093 calc_bootstrap=calc_bootstrap,
1094 disable_parallel=disable_parallel,
1095 warmup=warmup,
1096 portfolio=portfolio,
1097 )

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\strategy.py:1239, in Strategy.walkforward(self, windows, lookahead, start_date, end_date, timeframe, between_time, days, train_size, shuffle, calc_bootstrap, disable_parallel, warmup, portfolio)
1229 if portfolio is None:
1230 portfolio = Portfolio(
1231 self._config.initial_cash,
1232 self._config.fee_mode,
(...)
1237 self._config.max_short_positions,
1238 )
-> 1239 signals = self._run_walkforward(
1240 portfolio=portfolio,
1241 df=df,
1242 indicator_data=indicator_data,
1243 tf_seconds=tf_seconds,
1244 between_time=between_time,
1245 days=day_ids,
1246 windows=windows,
1247 lookahead=lookahead,
1248 train_size=train_size,
1249 shuffle=shuffle,
1250 train_only=train_only,
1251 warmup=warmup,
1252 )
1253 if train_only:
1254 self._logger.walkforward_completed()

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\strategy.py:1349, in Strategy._run_walkforward(self, portfolio, df, indicator_data, tf_seconds, between_time, days, windows, lookahead, train_size, shuffle, train_only, warmup)
1347 if test_data.empty:
1348 return signals
-> 1349 split_signals = self.backtest_executions(
1350 config=self._config,
1351 executions=self._executions,
1352 before_exec_fn=self._before_exec_fn,
1353 after_exec_fn=self._after_exec_fn,
1354 sessions=sessions,
1355 models=models,
1356 indicator_data=indicator_data,
1357 test_data=test_data,
1358 portfolio=portfolio,
1359 pos_size_handler=self._pos_size_handler,
1360 exit_dates=exit_dates,
1361 train_only=train_only,
1362 slippage_model=self._slippage_model,
1363 enable_fractional_shares=self._fractional_shares_enabled(),
1364 round_fill_price=self._config.round_fill_price,
1365 warmup=warmup,
1366 )
1367 for sym, signals_df in split_signals.items():
1368 if sym in signals:

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\strategy.py:312, in BacktestMixin.backtest_executions(self, config, executions, before_exec_fn, after_exec_fn, sessions, models, indicator_data, test_data, portfolio, pos_size_handler, exit_dates, train_only, slippage_model, enable_fractional_shares, round_fill_price, warmup)
310 for sym, ctx in active_ctxs.items():
311 if sym in exec_fns:
--> 312 exec_fnssym
313 if after_exec_fn is not None and active_ctxs:
314 after_exec_fn(active_ctxs)

Cell In[219], line 55, in hold_long(ctx)
52 def hold_long(ctx):
53 if not ctx.long_pos():
54 # Buy if the next bar is predicted to have a positive return:
---> 55 if ctx.preds('slr')[-1]> 0:
56 ctx.buy_shares = 100
57 else:
58 # Sell if the next bar is predicted to have a negative return:

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\context.py:937, in ExecContext.preds(self, model_name, symbol)
925 r"""Returns model predictions.
926
927 Args:
(...)
934 up to the current bar. Sorted in ascending chronological order.
935 """
936 symbol = self._get_symbol(symbol)
--> 937 return super().preds(model_name, symbol)

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\context.py:325, in BaseContext.preds(self, model_name, symbol)
314 r"""Returns model predictions.
315
316 Args:
(...)
322 up to the current bar. Sorted in ascending chronological order.
323 """
324 end_index = self._sym_end_index[symbol]
--> 325 return self._pred_scope.fetch(symbol, model_name, end_index)

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\pybroker\scope.py:495, in PredictionScope.fetch(self, symbol, name, end_index)
493 predict_fn = getattr(trained_model.instance, "predict", None)
494 if predict_fn is not None and callable(predict_fn):
--> 495 pred = trained_model.instance.predict(input_)
496 else:
497 raise ValueError(
498 f"Model instance trained for {model_sym.model_name!r} "
499 "does not define a predict function. Please pass a "
500 "predict_fn to pybroker.model()."
501 )

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\sklearn\linear_model_base.py:286, in LinearModel.predict(self, X)
272 def predict(self, X):
273 """
274 Predict using the linear model.
275
(...)
284 Returns predicted values.
285 """
--> 286 return self._decision_function(X)

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\sklearn\linear_model_base.py:269, in LinearModel._decision_function(self, X)
266 def decision_function(self, X):
267 check_is_fitted(self)
--> 269 X = self.validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False)
270 return safe_sparse_dot(X, self.coef
.T, dense_output=True) + self.intercept

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\sklearn\base.py:608, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
537 def _validate_data(
538 self,
539 X="no_validation",
(...)
544 **check_params,
545 ):
546 """Validate input data and set or check the n_features_in_ attribute.
547
548 Parameters
(...)
606 validated.
607 """
--> 608 self._check_feature_names(X, reset=reset)
610 if y is None and self._get_tags()["requires_y"]:
611 raise ValueError(
612 f"This {self.class.name} estimator "
613 "requires y to be passed, but the target y is None."
614 )

File D:\ProgramData\miniconda3\envs\stock2\Lib\site-packages\sklearn\base.py:535, in BaseEstimator._check_feature_names(self, X, reset)
530 if not missing_names and not unexpected_names:
531 message += (
532 "Feature names must be in the same order as they were in fit.\n"
533 )
--> 535 raise ValueError(message)

ValueError: The feature names should match those that were passed during fit.
Feature names must be in the same order as they were in fit.**

This is my working version:

import pandas as pd
import pybroker
from pybroker import Strategy, StrategyConfig, YFinance

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import talib as ta

pybroker.enable_data_source_cache('walkforward_strategy')

macd = pybroker.indicator('macd', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[0])
macds = pybroker.indicator('macds', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[1])
macdh = pybroker.indicator('macdh', lambda d: ta.MACD(d.close, fastperiod=20, slowperiod=20, signalperiod=20)[2])

def train_slr(symbol, train_data, test_data):
    # Previous day close prices.
    train_prev_close = train_data['close'].shift(1)
    # Calculate daily returns.
    train_daily_returns = (train_data['close'] - train_prev_close) / train_prev_close
    # Predict next day's return.
    train_data['pred'] = train_daily_returns.shift(-1)
    train_data = train_data.dropna()
    # Train the LinearRegession model to predict the next day's return
    # given the 20-day CMMA.
#####################################
    X_train = train_data[['macd','macdh','macds']]
##########################################
    y_train = train_data[['pred']]
    model = LinearRegression()
    model.fit(X_train, y_train)


    # Test
    test_prev_close = test_data['close'].shift(1)
    test_daily_returns = (test_data['close'] - test_prev_close) / test_prev_close
    test_data['pred'] = test_daily_returns.shift(-1)
    test_data = test_data.dropna()
######################################
    X_test = test_data[['macd','macdh','macds']]
######################################
    y_test = test_data[['pred']]
    # Make predictions from test data.
    y_pred = model.predict(X_test)

    # Print goodness of fit.
    r2 = r2_score(y_test, np.squeeze(y_pred))
    print(symbol, f'R^2={r2}')

    # Return the trained model.
    return model

def hold_long(ctx):
    if not ctx.long_pos():
        # Buy if the next bar is predicted to have a positive return:
        if ctx.preds('slr')[-1]> 0:
            ctx.buy_shares = 100
    else:
        # Sell if the next bar is predicted to have a negative return:
        if ctx.preds('slr')[-1] < 0:
            ctx.sell_shares = 100

model_slr = pybroker.model('slr', train_slr, indicators=[macd, macds, macdh])

config = StrategyConfig(bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)
strategy.clear_executions()
strategy.add_execution(hold_long, ['NVDA', 'AMD'], models=model_slr)

strategy.backtest(train_size=0.5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants