<a target="_blank" href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL-Tutorials/blob/master/1-Introduction/Stock_NeurIPS2018_call_func_rolling_window_SB3.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading

* **Pytorch Version**



# Content

<a id='0'></a>
Task Discription

We train a DRL agent for stock trading. This task is modeled as a Markov Decision Process (MDP), and the objective function is maximizing (expected) cumulative return.

We specify the state-action-reward as follows:

* **State s**: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes many features and learns by interacting with the market environment (usually by replaying historical data).

* **Action a**: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* **Reward function r(s, a, s′)**: Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively


**Market environment**: 30 consituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.


The data for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 1. Install Python Packages

In [None]:
# !pip list -v
!pip install pandas-market-calendars

<a id='1.1'></a>
## 1.1. Install packages


In [None]:
## install required packages
!pip install swig
!pip install wrds
!pip install pyportfolioopt
## install finrl library
!pip install -q condacolab
import condacolab
condacolab.install()
!apt-get update -y -qq && apt-get install -y -qq cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx swig
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

<a id='1.3'></a>
## 1.2. Import Packages

In [None]:
from finrl import config
from finrl import config_tickers
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.config import DATA_SAVE_DIR
from finrl.config import INDICATORS
from finrl.config import RESULTS_DIR
from finrl.config import TENSORBOARD_LOG_DIR
from finrl.config import TEST_END_DATE
from finrl.config import TEST_START_DATE
from finrl.config import TRAINED_MODEL_DIR
from finrl.config_tickers import DOW_30_TICKER
from finrl.main import check_and_make_directories
from finrl.meta.data_processor import DataProcessor
from finrl.meta.data_processors.func import calc_train_trade_data
from finrl.meta.data_processors.func import calc_train_trade_starts_ends_if_rolling
from finrl.meta.data_processors.func import date2str
from finrl.meta.data_processors.func import str2date
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.meta.preprocessor.preprocessors import data_split
from finrl.meta.preprocessor.preprocessors import FeatureEngineer
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.plot import backtest_plot
from finrl.plot import backtest_stats
from finrl.plot import get_baseline
from finrl.plot import get_daily_return
from finrl.plot import plot_return
from finrl.applications.stock_trading.stock_trading_rolling_window import stock_trading_rolling_window
import sys
sys.path.append("../FinRL")

import itertools

In [25]:
!pip install finrl
!pip install transformers
!pip install yfinance
!pip install torch
!pip install stable-baselines3

Collecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl)
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /tmp/pip-install-rs6qmk3t/elegantrl_b746ee9013c94df5a89221dd2d01c8ee
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/ElegantRL.git /tmp/pip-install-rs6qmk3t/elegantrl_b746ee9013c94df5a89221dd2d01c8ee
  Resolved https://github.com/AI4Finance-Foundation/ElegantRL.git to commit 5e828af1503098f4da046c0f12432dbd4ef8bd97
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting huggingface-hub<1.0,>=0.30.0 (from transformers)
  Downloading huggingface_hub-0.31.1-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.11.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from tr

<a id='1.4'></a>
# 2 Set parameters and run


In [24]:
train_start_date = "2009-01-01"
train_end_date = "2022-07-01"
trade_start_date = "2022-07-01"
trade_end_date = "2022-11-01"
rolling_window_length = 22  # num of trading days in a rolling window
if_store_actions = True
if_store_result = True
if_using_a2c = True
if_using_ddpg = True
if_using_ppo = True
if_using_sac = True
if_using_td3 = True
stock_trading_rolling_window(
    train_start_date,
    train_end_date,
    trade_start_date,
    trade_end_date,
    rolling_window_length,
    if_store_actions=if_store_actions,
    if_using_a2c=if_using_a2c,
    if_store_result=if_store_result,
    if_using_ddpg=if_using_ddpg,
    if_using_ppo=if_using_ppo,
    if_using_sac=if_using_sac,
    if_using_td3=if_using_td3,
)





[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Shape of DataFrame:  (101891, 8)
Successfully added technical indicators


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3481, 8)
Successfully added vix
Successfully added turbulence index
Stock Dimension: 29, State Space: 291
num_subsets_if_rolling:  4
train_starts:  ['2009-01-02', '2009-02-04', '2009-03-09', '2009-04-08']
train_ends__:  ['2022-07-01', '2022-08-03', '2022-09-02', '2022-10-05']
trade_starts:  ['2022-07-01', '2022-08-03', '2022-09-02', '2022-10-05']
trade_ends__:  ['2022-08-03', '2022-09-02', '2022-10-05', '2022-10-28']
i:  0
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cpu device
Logging to results/a2c
---------------------------------------
| time/                 |             |
|    fps                | 52          |
|    iterations         | 100         |
|    time_elapsed       | 9           |
|    total_timesteps    | 500         |
| train/                |             |
|    entropy_loss       | -41.2       |
|    explained_variance | -1.19e-07   |
|    learning_rate      | 0.0007      |
|    n_updates          | 99          |
|    policy_l

KeyboardInterrupt: 

In [31]:
# !pip install finrl
# !pip install transformers
# !pip install yfinance
# !pip install torch
# !pip install stable-baselines3

# !pip install torch torchvision --upgrade --index-url https://download.pytorch.org/whl/cu121
!pip install transformers --upgrade





In [33]:
# Required imports
import pandas as pd
import numpy as np
from finrl import config
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.main import check_and_make_directories
from stable_baselines3 import PPO
#from transformers import pipeline
import yfinance as yf

# 1. Set up directories and configurations
check_and_make_directories([config.DATA_SAVE_DIR, config.TRAINED_MODEL_DIR, config.TENSORBOARD_LOG_DIR, config.RESULTS_DIR])

# 2. Define stock universe and time period
TRAIN_START_DATE = '2010-01-01'
TRAIN_END_DATE = '2021-12-31'
TEST_START_DATE = '2022-01-01'
TEST_END_DATE = '2023-12-31'
TICKER_LIST = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META']  # Example tickers

# 3. Download stock data
def download_stock_data(start_date, end_date, ticker_list):
    df = YahooDownloader(start_date=start_date,
                        end_date=end_date,
                        ticker_list=ticker_list).fetch_data()
    return df

# 4. Feature engineering
def feature_engineering(df):
    fe = FeatureEngineer(
        use_technical_indicator=True,
        tech_indicator_list=[
            'macd', 'rsi', 'cci', 'dx'
        ]
    )
    processed = fe.preprocess_data(df)
    return processed

# 5. LLM Integration for News Analysis
def get_stock_sentiment(ticker, news_text):
    sentiment_analyzer = pipeline("text-classification",
                                model="ProsusAI/finbert")

    result = sentiment_analyzer(news_text)[0]
    # Convert sentiment to score (1-5 scale)
    sentiment_map = {
        'positive': 5,
        'neutral': 3,
        'negative': 1
    }
    return sentiment_map[result['label']]

# 6. Environment Setup
def setup_trading_env(df, stock_dim, hmax, initial_amount, transaction_cost_pct):
    env_kwargs = {
        "df": df,
        "stock_dim": stock_dim,
        "hmax": hmax,
        "initial_amount": initial_amount,
        "transaction_cost_pct": transaction_cost_pct,
        "reward_scaling": 1e-4,
        "state_space": len(df.columns),
        "action_space": stock_dim,
        "tech_indicator_list": ['macd', 'rsi', 'cci', 'dx']
    }

    e_train = StockTradingEnv(df=df, **env_kwargs)
    return e_train

# 7. Model Training
def train_model(env_train, model_name="ppo"):
    agent = DRLAgent(env=env_train)

    PPO_PARAMS = {
        "n_steps": 2048,
        "ent_coef": 0.01,
        "learning_rate": 0.00025,
        "batch_size": 128
    }

    model_ppo = agent.get_model(model_name, model_kwargs=PPO_PARAMS)
    trained_ppo = agent.train_model(model=model_ppo,
                                  tb_log_name=model_name,
                                  total_timesteps=50000)
    return trained_ppo

# 8. Main execution
def main():
    # Download data
    df = download_stock_data(TRAIN_START_DATE, TRAIN_END_DATE, TICKER_LIST)

    # Process features
    processed_df = feature_engineering(df)

    # Setup environment
    env_train = setup_trading_env(
        df=processed_df,
        stock_dim=len(TICKER_LIST),
        hmax=100,
        initial_amount=1000000,
        transaction_cost_pct=0.001
    )

    # Train model
    trained_model = train_model(env_train)

    # Example of getting recommendations
    def get_recommendations(model, test_df):
        predictions = []
        for ticker in TICKER_LIST:
            # Get recent news (you would need to implement news fetching)
            news_text = f"Recent news about {ticker}"  # Placeholder
            sentiment_score = get_stock_sentiment(ticker, news_text)

            # Combine model prediction with sentiment
            predictions.append({
                'ticker': ticker,
                'sentiment_score': sentiment_score,
                'recommendation': 'Buy' if sentiment_score >= 4 else 'Hold' if sentiment_score >= 3 else 'Sell'
            })

        return pd.DataFrame(predictions)

    # Get test data
    test_df = download_stock_data(TEST_START_DATE, TEST_END_DATE, TICKER_LIST)
    test_df = feature_engineering(test_df)

    # Generate recommendations
    recommendations = get_recommendations(trained_model, test_df)
    print("\nStock Recommendations:")
    print(recommendations)

if __name__ == "__main__":
    main()


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (14501, 8)
Successfully added technical indicators


TypeError: finrl.meta.env_stock_trading.env_stocktrading.StockTradingEnv() got multiple values for keyword argument 'df'

In [34]:
!git clone https://github.com/benstaf/FinRL_DeepSeek.git


Cloning into 'FinRL_DeepSeek'...
remote: Enumerating objects: 145, done.[K
remote: Counting objects: 100% (145/145), done.[K
remote: Compressing objects: 100% (144/144), done.[K
remote: Total 145 (delta 94), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (145/145), 1.21 MiB | 8.76 MiB/s, done.
Resolving deltas: 100% (94/94), done.


In [35]:
%cd FinRL_DeepSeek


/content/FinRL_DeepSeek


In [37]:
!bash installation_script.sh

--2025-05-07 19:32:38--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.32.241, 104.16.191.158, 2606:4700::6810:20f1, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.32.241|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 155472915 (148M) [application/octet-stream]
Saving to: ‘/root/miniconda3/miniconda.sh’


2025-05-07 19:32:39 (131 MB/s) - ‘/root/miniconda3/miniconda.sh’ saved [155472915/155472915]

PREFIX=/root/miniconda3
Unpacking payload ...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
    You currently have a PYTHONPATH environment variable set. This may cause
    unexpected behavior when running the Python interpreter in Miniconda3.
    For best results, please verify that your PYTHONPATH only points to
    directories of packages that are compatible with the Python 

In [39]:
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader

# Fetch historical data for AAPL
data_df = YahooDownloader(
    start_date='2010-01-01',  # Adjust the start date as needed
    end_date='2025-05-07',    # Use today's date
    ticker_list=['AAPL']
).fetch_data()

print(data_df.head())


[*********************100%***********************]  1 of 1 completed

Shape of DataFrame:  (3859, 8)
Price        date     close      high       low      open     volume   tic  \
0      2010-01-04  6.440330  6.455076  6.391278  6.422876  493729600  AAPL   
1      2010-01-05  6.451466  6.487879  6.417460  6.458087  601904800  AAPL   
2      2010-01-06  6.348847  6.477046  6.342226  6.451466  552160000  AAPL   
3      2010-01-07  6.337110  6.379844  6.291067  6.372320  477131200  AAPL   
4      2010-01-08  6.379241  6.379843  6.291368  6.328684  447610800  AAPL   

Price  day  
0        0  
1        1  
2        2  
3        3  
4        4  





In [41]:
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
# Add technical indicators
fe = FeatureEngineer(
    use_technical_indicator=True,
    tech_indicator_list=['macd', 'rsi_30', 'cci_30', 'adx'],  # Add more indicators if needed
    use_turbulence=False
)

processed_data = fe.preprocess_data(data_df)
print(processed_data.head())


Successfully added technical indicators
         date     close      high       low      open     volume   tic  day  \
0  2010-01-04  6.440330  6.455076  6.391278  6.422876  493729600  AAPL    0   
1  2010-01-05  6.451466  6.487879  6.417460  6.458087  601904800  AAPL    1   
2  2010-01-06  6.348847  6.477046  6.342226  6.451466  552160000  AAPL    2   
3  2010-01-07  6.337110  6.379844  6.291067  6.372320  477131200  AAPL    3   
4  2010-01-08  6.379241  6.379843  6.291368  6.328684  447610800  AAPL    4   

       macd      rsi_30      cci_30         adx  
0  0.000000  100.000000   66.666667  100.000000  
1  0.000250  100.000000   66.666667  100.000000  
2 -0.002864    9.494153 -100.000000   68.004143  
3 -0.004634    8.575822 -112.342213   66.676644  
4 -0.003821   32.735916  -69.702014   66.048773  


In [49]:
from env_stocktrading import StockTradingEnv
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split

# Split data into training and trading sets
train_data = data_split(processed_data, start='2010-01-01', end='2020-12-31')
trade_data = data_split(processed_data, start='2021-01-01', end='2025-05-07')

# Define the environment
env_kwargs = {
    "hmax": 100,  # Max shares to trade
    "initial_amount": 100000,  # Starting capital
    "transaction_cost_pct": 0.001,  # Transaction cost
    "state_space": len(processed_data.columns),  # State space size
    "stock_dim": 1,  # Number of stocks (AAPL in this case)
    "tech_indicator_list": ['macd', 'rsi_30', 'cci_30', 'adx'],  # Indicators
    "action_space": 1,  # Action space size
    "reward_scaling": 1e-4
}

# Create the environment
e_train_gym = StockTradingEnv(df=train_data, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()


TypeError: StockTradingEnv.__init__() got an unexpected keyword argument 'transaction_cost_pct'

In [50]:
# Initialize the DRL agent
agent = DRLAgent(env=env_train)

# Define PPO parameters
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}

# Train the PPO model
ppo_model = agent.get_model("ppo", model_kwargs=PPO_PARAMS)
trained_ppo = agent.train_model(model=ppo_model, tb_log_name='ppo', total_timesteps=50000)


NameError: name 'env_train' is not defined

In [51]:
# Create the trading environment
e_trade_gym = StockTradingEnv(df=trade_data, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

# Use the trained model to make predictions
df_account_value, df_actions = DRLAgent.DRL_prediction(
    model=trained_ppo,
    environment=e_trade_gym
)

# Display today's trading decision
print("Today's Trading Actions:")
print(df_actions.tail(1))  # Last row contains today's actions


TypeError: StockTradingEnv.__init__() got an unexpected keyword argument 'transaction_cost_pct'