<a href="https://colab.research.google.com/github/fjme95/aplicaciones-financieras/blob/main/Modulo%203/Semana%203/Manejo_de_carteras_Deep_Reinforcement_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Descripción

En este notebook usaremos aprendizaje profundo por refuerzo (Deep Reinforcement Learning) para crear carteras de inversión. 

En el aprendizaje por refuerzo existen tres partes principales: El agente, el entorno y la recompensa.

El agente va a tratar de capturar los aspectos más importantes de problema interactuando con el entorno a través del tiempo, a cada "captura" de este le llamaremos estado. La interacción con es lo que se conoce acción: El agente toma una acción y "evalua" como afecta al entorno y la recompensa obtenida. 

La acción que el agente toma está dada por la política del modelo. En el enfoque estocástico de la política, la acción se toma basada en distribucion condicional de las acciones dado el estado. El enfoque determinístico hace que la acción esté en función del estado.

Para el problema de asignación de pesos en la cartera:

- El agente será una red neuroal (perceptrón multicapa) que genera los pesos del portafolio
- El estado será el estado del mercado en el periodo en cuestión, dado por la matriz de covarianza e indicadores basados en el precio y volumen de los activos/transacciones
- La recompensa será el retorno de la cartera que el agente creó.

Utilizaremos [```gym```](https://gym.openai.com/) de OpenAI para crear el entorno y [```stable_baselines```](https://stable-baselines.readthedocs.io/en/master/index.html) para que el agente aprenda. En particular, entrenaremos a 3 agentes con los algoritmos A2C, DDPG y PPO.

# Dependencias

In [1]:
%%capture
!pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git@sr_tax_lots
# !pip install yfinance
!pip install --upgrade pandas
!pip install --upgrade pandas-datareader
!pip install numpy
!pip install matplotlib
# !pip install stockstats
!pip install gym
!pip install stable-baselines3[extra]
!pip install tensorflow
!pip install git+https://github.com/quantopian/pyfolio
!pip install ta
!pip install PyPortfolioOpt

In [2]:
#@title Config.py
%%writefile config.py
import pathlib

# import finrl

import pandas as pd
import datetime
import os

# pd.options.display.max_rows = 10
# pd.options.display.max_columns = 10


# PACKAGE_ROOT = pathlib.Path(finrl.__file__).resolve().parent
# PACKAGE_ROOT = pathlib.Path().resolve().parent

# TRAINED_MODEL_DIR = PACKAGE_ROOT / "trained_models"
# DATASET_DIR = PACKAGE_ROOT / "data"

# data
TRAINING_DATA_FILE = "data/ETF_SPY_2009_2020.csv"
TURBULENCE_DATA = "data/dow30_turbulence_index.csv"
TESTING_DATA_FILE = "test.csv"

# now = datetime.datetime.now()
# TRAINED_MODEL_DIR = f"trained_models/{now}"
DATA_SAVE_DIR = f"datasets"
TRAINED_MODEL_DIR = f"trained_models"
TENSORBOARD_LOG_DIR = f"tensorboard_log"
RESULTS_DIR = f"results"
# os.makedirs(TRAINED_MODEL_DIR)


## time_fmt = '%Y-%m-%d'
START_DATE = "2009-01-01"
END_DATE = "2020-12-01"

START_TRADE_DATE = "2019-01-01"

## dataset default columns
DEFAULT_DATA_COLUMNS = ["date", "tic", "close"]

## stockstats technical indicator column names
## check https://pypi.org/project/stockstats/ for different names
TECHNICAL_INDICATORS_LIST = ["macd", "rsi_30", "cci_30", "dx_30"]


## Model Parameters
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.01, "learning_rate": 0.0007}
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.01,
    "learning_rate": 0.00025,
    "batch_size": 64,
}
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}
TD3_PARAMS = {"batch_size": 100, "buffer_size": 1000000, "learning_rate": 0.001}
SAC_PARAMS = {
    "batch_size": 64,
    "buffer_size": 100000,
    "learning_rate": 0.0001,
    "learning_starts": 100,
    "batch_size": 64,
    "ent_coef": "auto_0.1",
}

########################################################
############## Stock Ticker Setup starts ##############
SINGLE_TICKER = ["AAPL"]

# self defined
MULTIPLE_STOCK_TICKER = ["AAPL", "MSFT", "FB"]

# check https://wrds-www.wharton.upenn.edu/ for U.S. index constituents
# Dow 30 constituents at 2019/01
DOW_30_TICKER = [
    "AAPL",
    "MSFT",
    "JPM",
    "V",
    "RTX",
    "PG",
    "GS",
    "NKE",
    "DIS",
    "AXP",
    "HD",
    "INTC",
    "WMT",
    "IBM",
    "MRK",
    "UNH",
    "KO",
    "CAT",
    "TRV",
    "JNJ",
    "CVX",
    "MCD",
    "VZ",
    "CSCO",
    "XOM",
    "BA",
    "MMM",
    "PFE",
    "WBA",
    "DD",
]

# Nasdaq 100 constituents at 2019/01
NAS_100_TICKER = [
    "AMGN",
    "AAPL",
    "AMAT",
    "INTC",
    "PCAR",
    "PAYX",
    "MSFT",
    "ADBE",
    "CSCO",
    "XLNX",
    "QCOM",
    "COST",
    "SBUX",
    "FISV",
    "CTXS",
    "INTU",
    "AMZN",
    "EBAY",
    "BIIB",
    "CHKP",
    "GILD",
    "NLOK",
    "CMCSA",
    "FAST",
    "ADSK",
    "CTSH",
    "NVDA",
    "GOOGL",
    "ISRG",
    "VRTX",
    "HSIC",
    "BIDU",
    "ATVI",
    "ADP",
    "ROST",
    "ORLY",
    "CERN",
    "BKNG",
    "MYL",
    "MU",
    "DLTR",
    "ALXN",
    "SIRI",
    "MNST",
    "AVGO",
    "TXN",
    "MDLZ",
    "FB",
    "ADI",
    "WDC",
    "REGN",
    "LBTYK",
    "VRSK",
    "NFLX",
    "TSLA",
    "CHTR",
    "MAR",
    "ILMN",
    "LRCX",
    "EA",
    "AAL",
    "WBA",
    "KHC",
    "BMRN",
    "JD",
    "SWKS",
    "INCY",
    "PYPL",
    "CDW",
    "FOXA",
    "MXIM",
    "TMUS",
    "EXPE",
    "TCOM",
    "ULTA",
    "CSX",
    "NTES",
    "MCHP",
    "CTAS",
    "KLAC",
    "HAS",
    "JBHT",
    "IDXX",
    "WYNN",
    "MELI",
    "ALGN",
    "CDNS",
    "WDAY",
    "SNPS",
    "ASML",
    "TTWO",
    "PEP",
    "NXPI",
    "XEL",
    "AMD",
    "NTAP",
    "VRSN",
    "LULU",
    "WLTW",
    "UAL",
]

# SP 500 constituents at 2019
SP_500_TICKER = [
    "A",
    "AAL",
    "AAP",
    "AAPL",
    "ABBV",
    "ABC",
    "ABMD",
    "ABT",
    "ACN",
    "ADBE",
    "ADI",
    "ADM",
    "ADP",
    "ADS",
    "ADSK",
    "AEE",
    "AEP",
    "AES",
    "AFL",
    "AGN",
    "AIG",
    "AIV",
    "AIZ",
    "AJG",
    "AKAM",
    "ALB",
    "ALGN",
    "ALK",
    "ALL",
    "ALLE",
    "ALXN",
    "AMAT",
    "AMCR",
    "AMD",
    "AME",
    "AMG",
    "AMGN",
    "AMP",
    "AMT",
    "AMZN",
    "ANET",
    "ANSS",
    "ANTM",
    "AON",
    "AOS",
    "APA",
    "APD",
    "APH",
    "APTV",
    "ARE",
    "ARNC",
    "ATO",
    "ATVI",
    "AVB",
    "AVGO",
    "AVY",
    "AWK",
    "AXP",
    "AZO",
    "BA",
    "BAC",
    "BAX",
    "BBT",
    "BBY",
    "BDX",
    "BEN",
    "BF.B",
    "BHGE",
    "BIIB",
    "BK",
    "BKNG",
    "BLK",
    "BLL",
    "BMY",
    "BR",
    "BRK.B",
    "BSX",
    "BWA",
    "BXP",
    "C",
    "CAG",
    "CAH",
    "CAT",
    "CB",
    "CBOE",
    "CBRE",
    "CBS",
    "CCI",
    "CCL",
    "CDNS",
    "CE",
    "CELG",
    "CERN",
    "CF",
    "CFG",
    "CHD",
    "CHRW",
    "CHTR",
    "CI",
    "CINF",
    "CL",
    "CLX",
    "CMA",
    "CMCSA",
    "CME",
    "CMG",
    "CMI",
    "CMS",
    "CNC",
    "CNP",
    "COF",
    "COG",
    "COO",
    "COP",
    "COST",
    "COTY",
    "CPB",
    "CPRI",
    "CPRT",
    "CRM",
    "CSCO",
    "CSX",
    "CTAS",
    "CTL",
    "CTSH",
    "CTVA",
    "CTXS",
    "CVS",
    "CVX",
    "CXO",
    "D",
    "DAL",
    "DD",
    "DE",
    "DFS",
    "DG",
    "DGX",
    "DHI",
    "DHR",
    "DIS",
    "DISCK",
    "DISH",
    "DLR",
    "DLTR",
    "DOV",
    "DOW",
    "DRE",
    "DRI",
    "DTE",
    "DUK",
    "DVA",
    "DVN",
    "DXC",
    "EA",
    "EBAY",
    "ECL",
    "ED",
    "EFX",
    "EIX",
    "EL",
    "EMN",
    "EMR",
    "EOG",
    "EQIX",
    "EQR",
    "ES",
    "ESS",
    "ETFC",
    "ETN",
    "ETR",
    "EVRG",
    "EW",
    "EXC",
    "EXPD",
    "EXPE",
    "EXR",
    "F",
    "FANG",
    "FAST",
    "FB",
    "FBHS",
    "FCX",
    "FDX",
    "FE",
    "FFIV",
    "FIS",
    "FISV",
    "FITB",
    "FLIR",
    "FLS",
    "FLT",
    "FMC",
    "FOXA",
    "FRC",
    "FRT",
    "FTI",
    "FTNT",
    "FTV",
    "GD",
    "GE",
    "GILD",
    "GIS",
    "GL",
    "GLW",
    "GM",
    "GOOG",
    "GPC",
    "GPN",
    "GPS",
    "GRMN",
    "GS",
    "GWW",
    "HAL",
    "HAS",
    "HBAN",
    "HBI",
    "HCA",
    "HCP",
    "HD",
    "HES",
    "HFC",
    "HIG",
    "HII",
    "HLT",
    "HOG",
    "HOLX",
    "HON",
    "HP",
    "HPE",
    "HPQ",
    "HRB",
    "HRL",
    "HSIC",
    "HST",
    "HSY",
    "HUM",
    "IBM",
    "ICE",
    "IDXX",
    "IEX",
    "IFF",
    "ILMN",
    "INCY",
    "INFO",
    "INTC",
    "INTU",
    "IP",
    "IPG",
    "IPGP",
    "IQV",
    "IR",
    "IRM",
    "ISRG",
    "IT",
    "ITW",
    "IVZ",
    "JBHT",
    "JCI",
    "JEC",
    "JEF",
    "JKHY",
    "JNJ",
    "JNPR",
    "JPM",
    "JWN",
    "K",
    "KEY",
    "KEYS",
    "KHC",
    "KIM",
    "KLAC",
    "KMB",
    "KMI",
    "KMX",
    "KO",
    "KR",
    "KSS",
    "KSU",
    "L",
    "LB",
    "LDOS",
    "LEG",
    "LEN",
    "LH",
    "LHX",
    "LIN",
    "LKQ",
    "LLY",
    "LMT",
    "LNC",
    "LNT",
    "LOW",
    "LRCX",
    "LUV",
    "LW",
    "LYB",
    "M",
    "MA",
    "MAA",
    "MAC",
    "MAR",
    "MAS",
    "MCD",
    "MCHP",
    "MCK",
    "MCO",
    "MDLZ",
    "MDT",
    "MET",
    "MGM",
    "MHK",
    "MKC",
    "MKTX",
    "MLM",
    "MMC",
    "MMM",
    "MNST",
    "MO",
    "MOS",
    "MPC",
    "MRK",
    "MRO",
    "MS",
    "MSCI",
    "MSFT",
    "MSI",
    "MTB",
    "MTD",
    "MU",
    "MXIM",
    "MYL",
    "NBL",
    "NCLH",
    "NDAQ",
    "NEE",
    "NEM",
    "NFLX",
    "NI",
    "NKE",
    "NKTR",
    "NLSN",
    "NOC",
    "NOV",
    "NRG",
    "NSC",
    "NTAP",
    "NTRS",
    "NUE",
    "NVDA",
    "NWL",
    "NWS",
    "O",
    "OI",
    "OKE",
    "OMC",
    "ORCL",
    "ORLY",
    "OXY",
    "PAYX",
    "PBCT",
    "PCAR",
    "PEG",
    "PEP",
    "PFE",
    "PFG",
    "PG",
    "PGR",
    "PH",
    "PHM",
    "PKG",
    "PKI",
    "PLD",
    "PM",
    "PNC",
    "PNR",
    "PNW",
    "PPG",
    "PPL",
    "PRGO",
    "PRU",
    "PSA",
    "PSX",
    "PVH",
    "PWR",
    "PXD",
    "PYPL",
    "QCOM",
    "QRVO",
    "RCL",
    "RE",
    "REG",
    "REGN",
    "RF",
    "RHI",
    "RJF",
    "RL",
    "RMD",
    "ROK",
    "ROL",
    "ROP",
    "ROST",
    "RSG",
    "RTN",
    "SBAC",
    "SBUX",
    "SCHW",
    "SEE",
    "SHW",
    "SIVB",
    "SJM",
    "SLB",
    "SLG",
    "SNA",
    "SNPS",
    "SO",
    "SPG",
    "SPGI",
    "SRE",
    "STI",
    "STT",
    "STX",
    "STZ",
    "SWK",
    "SWKS",
    "SYF",
    "SYK",
    "SYMC",
    "SYY",
    "T",
    "TAP",
    "TDG",
    "TEL",
    "TFX",
    "TGT",
    "TIF",
    "TJX",
    "TMO",
    "TMUS",
    "TPR",
    "TRIP",
    "TROW",
    "TRV",
    "TSCO",
    "TSN",
    "TSS",
    "TTWO",
    "TWTR",
    "TXN",
    "TXT",
    "UA",
    "UAL",
    "UDR",
    "UHS",
    "ULTA",
    "UNH",
    "UNM",
    "UNP",
    "UPS",
    "URI",
    "USB",
    "UTX",
    "V",
    "VAR",
    "VFC",
    "VIAB",
    "VLO",
    "VMC",
    "VNO",
    "VRSK",
    "VRSN",
    "VRTX",
    "VTR",
    "VZ",
    "WAB",
    "WAT",
    "WBA",
    "WCG",
    "WDC",
    "WEC",
    "WELL",
    "WFC",
    "WHR",
    "WLTW",
    "WM",
    "WMB",
    "WMT",
    "WRK",
    "WU",
    "WY",
    "WYNN",
    "XEC",
    "XEL",
    "XLNX",
    "XOM",
    "XRAY",
    "XRX",
    "XYL",
    "YUM",
    "ZBH",
    "ZION",
    "ZTS",
]

# Hang Seng Index constituents at 2019/01
HSI_50_TICKER = [
    "0011.HK",
    "0005.HK",
    "0012.HK",
    "0006.HK",
    "0003.HK",
    "0016.HK",
    "0019.HK",
    "0002.HK",
    "0001.HK",
    "0267.HK",
    "0101.HK",
    "0941.HK",
    "0762.HK",
    "0066.HK",
    "0883.HK",
    "2388.HK",
    "0017.HK",
    "0083.HK",
    "0939.HK",
    "0388.HK",
    "0386.HK",
    "3988.HK",
    "2628.HK",
    "1398.HK",
    "2318.HK",
    "3328.HK",
    "0688.HK",
    "0857.HK",
    "1088.HK",
    "0700.HK",
    "0836.HK",
    "1109.HK",
    "1044.HK",
    "1299.HK",
    "0151.HK",
    "1928.HK",
    "0027.HK",
    "2319.HK",
    "0823.HK",
    "1113.HK",
    "1038.HK",
    "2018.HK",
    "0175.HK",
    "0288.HK",
    "1997.HK",
    "2007.HK",
    "2382.HK",
    "1093.HK",
    "1177.HK",
    "2313.HK",
]

# www.csindex.com.cn, for SSE and CSI adjustments
# SSE 50 Index constituents at 2019
SSE_50_TICKER = [
    "600000.SS",
    "600036.SS",
    "600104.SS",
    "600030.SS",
    "601628.SS",
    "601166.SS",
    "601318.SS",
    "601328.SS",
    "601088.SS",
    "601857.SS",
    "601601.SS",
    "601668.SS",
    "601288.SS",
    "601818.SS",
    "601989.SS",
    "601398.SS",
    "600048.SS",
    "600028.SS",
    "600050.SS",
    "600519.SS",
    "600016.SS",
    "600887.SS",
    "601688.SS",
    "601186.SS",
    "601988.SS",
    "601211.SS",
    "601336.SS",
    "600309.SS",
    "603993.SS",
    "600690.SS",
    "600276.SS",
    "600703.SS",
    "600585.SS",
    "603259.SS",
    "601888.SS",
    "601138.SS",
    "600196.SS",
    "601766.SS",
    "600340.SS",
    "601390.SS",
    "601939.SS",
    "601111.SS",
    "600029.SS",
    "600019.SS",
    "601229.SS",
    "601800.SS",
    "600547.SS",
    "601006.SS",
    "601360.SS",
    "600606.SS",
    "601319.SS",
    "600837.SS",
    "600031.SS",
    "601066.SS",
    "600009.SS",
    "601236.SS",
    "601012.SS",
    "600745.SS",
    "600588.SS",
    "601658.SS",
    "601816.SS",
    "603160.SS",
]

# CSI 300 Index constituents at 2019
CSI_300_TICKER = [
    "600000.SS",
    "600004.SS",
    "600009.SS",
    "600010.SS",
    "600011.SS",
    "600015.SS",
    "600016.SS",
    "600018.SS",
    "600019.SS",
    "600025.SS",
    "600027.SS",
    "600028.SS",
    "600029.SS",
    "600030.SS",
    "600031.SS",
    "600036.SS",
    "600038.SS",
    "600048.SS",
    "600050.SS",
    "600061.SS",
    "600066.SS",
    "600068.SS",
    "600085.SS",
    "600089.SS",
    "600104.SS",
    "600109.SS",
    "600111.SS",
    "600115.SS",
    "600118.SS",
    "600170.SS",
    "600176.SS",
    "600177.SS",
    "600183.SS",
    "600188.SS",
    "600196.SS",
    "600208.SS",
    "600219.SS",
    "600221.SS",
    "600233.SS",
    "600271.SS",
    "600276.SS",
    "600297.SS",
    "600299.SS",
    "600309.SS",
    "600332.SS",
    "600340.SS",
    "600346.SS",
    "600352.SS",
    "600362.SS",
    "600369.SS",
    "600372.SS",
    "600383.SS",
    "600390.SS",
    "600398.SS",
    "600406.SS",
    "600436.SS",
    "600438.SS",
    "600482.SS",
    "600487.SS",
    "600489.SS",
    "600498.SS",
    "600516.SS",
    "600519.SS",
    "600522.SS",
    "600547.SS",
    "600570.SS",
    "600583.SS",
    "600585.SS",
    "600588.SS",
    "600606.SS",
    "600637.SS",
    "600655.SS",
    "600660.SS",
    "600674.SS",
    "600690.SS",
    "600703.SS",
    "600705.SS",
    "600741.SS",
    "600745.SS",
    "600760.SS",
    "600795.SS",
    "600809.SS",
    "600837.SS",
    "600848.SS",
    "600867.SS",
    "600886.SS",
    "600887.SS",
    "600893.SS",
    "600900.SS",
    "600919.SS",
    "600926.SS",
    "600928.SS",
    "600958.SS",
    "600968.SS",
    "600977.SS",
    "600989.SS",
    "600998.SS",
    "600999.SS",
    "601006.SS",
    "601009.SS",
    "601012.SS",
    "601018.SS",
    "601021.SS",
    "601066.SS",
    "601077.SS",
    "601088.SS",
    "601100.SS",
    "601108.SS",
    "601111.SS",
    "601117.SS",
    "601138.SS",
    "601155.SS",
    "601162.SS",
    "601166.SS",
    "601169.SS",
    "601186.SS",
    "601198.SS",
    "601211.SS",
    "601212.SS",
    "601216.SS",
    "601225.SS",
    "601229.SS",
    "601231.SS",
    "601236.SS",
    "601238.SS",
    "601288.SS",
    "601298.SS",
    "601318.SS",
    "601319.SS",
    "601328.SS",
    "601336.SS",
    "601360.SS",
    "601377.SS",
    "601390.SS",
    "601398.SS",
    "601555.SS",
    "601577.SS",
    "601600.SS",
    "601601.SS",
    "601607.SS",
    "601618.SS",
    "601628.SS",
    "601633.SS",
    "601658.SS",
    "601668.SS",
    "601669.SS",
    "601688.SS",
    "601698.SS",
    "601727.SS",
    "601766.SS",
    "601788.SS",
    "601800.SS",
    "601808.SS",
    "601816.SS",
    "601818.SS",
    "601828.SS",
    "601838.SS",
    "601857.SS",
    "601877.SS",
    "601878.SS",
    "601881.SS",
    "601888.SS",
    "601898.SS",
    "601899.SS",
    "601901.SS",
    "601916.SS",
    "601919.SS",
    "601933.SS",
    "601939.SS",
    "601985.SS",
    "601988.SS",
    "601989.SS",
    "601992.SS",
    "601997.SS",
    "601998.SS",
    "603019.SS",
    "603156.SS",
    "603160.SS",
    "603259.SS",
    "603260.SS",
    "603288.SS",
    "603369.SS",
    "603501.SS",
    "603658.SS",
    "603799.SS",
    "603833.SS",
    "603899.SS",
    "603986.SS",
    "603993.SS",
    "000001.SZ",
    "000002.SZ",
    "000063.SZ",
    "000066.SZ",
    "000069.SZ",
    "000100.SZ",
    "000157.SZ",
    "000166.SZ",
    "000333.SZ",
    "000338.SZ",
    "000425.SZ",
    "000538.SZ",
    "000568.SZ",
    "000596.SZ",
    "000625.SZ",
    "000627.SZ",
    "000651.SZ",
    "000656.SZ",
    "000661.SZ",
    "000671.SZ",
    "000703.SZ",
    "000708.SZ",
    "000709.SZ",
    "000723.SZ",
    "000725.SZ",
    "000728.SZ",
    "000768.SZ",
    "000776.SZ",
    "000783.SZ",
    "000786.SZ",
    "000858.SZ",
    "000860.SZ",
    "000876.SZ",
    "000895.SZ",
    "000938.SZ",
    "000961.SZ",
    "000963.SZ",
    "000977.SZ",
    "001979.SZ",
    "002001.SZ",
    "002007.SZ",
    "002008.SZ",
    "002024.SZ",
    "002027.SZ",
    "002032.SZ",
    "002044.SZ",
    "002050.SZ",
    "002120.SZ",
    "002129.SZ",
    "002142.SZ",
    "002146.SZ",
    "002153.SZ",
    "002157.SZ",
    "002179.SZ",
    "002202.SZ",
    "002230.SZ",
    "002236.SZ",
    "002241.SZ",
    "002252.SZ",
    "002271.SZ",
    "002304.SZ",
    "002311.SZ",
    "002352.SZ",
    "002371.SZ",
    "002410.SZ",
    "002415.SZ",
    "002422.SZ",
    "002456.SZ",
    "002460.SZ",
    "002463.SZ",
    "002466.SZ",
    "002468.SZ",
    "002475.SZ",
    "002493.SZ",
    "002508.SZ",
    "002555.SZ",
    "002558.SZ",
    "002594.SZ",
    "002601.SZ",
    "002602.SZ",
    "002607.SZ",
    "002624.SZ",
    "002673.SZ",
    "002714.SZ",
    "002736.SZ",
    "002739.SZ",
    "002773.SZ",
    "002841.SZ",
    "002916.SZ",
    "002938.SZ",
    "002939.SZ",
    "002945.SZ",
    "002958.SZ",
    "003816.SZ",
    "300003.SZ",
    "300014.SZ",
    "300015.SZ",
    "300033.SZ",
    "300059.SZ",
    "300122.SZ",
    "300124.SZ",
    "300136.SZ",
    "300142.SZ",
    "300144.SZ",
    "300347.SZ",
    "300408.SZ",
    "300413.SZ",
    "300433.SZ",
    "300498.SZ",
    "300601.SZ",
    "300628.SZ",
]

############## Stock Ticker Setup ends ##############
########################################################

Writing config.py


In [3]:
#@title models.py
%%writefile models.py
# common library
import pandas as pd
import numpy as np
import time
import gym

# RL models from stable-baselines
# from stable_baselines import SAC
# from stable_baselines import TD3

from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.vec_env import DummyVecEnv

from stable_baselines3.common.noise import (
    NormalActionNoise,
    OrnsteinUhlenbeckActionNoise,
)

from finrl.config import config

from stable_baselines3 import A2C, DDPG, TD3, SAC, PPO
from stable_baselines3.td3.policies import MlpPolicy
from stable_baselines3.common.noise import (
    NormalActionNoise,
    OrnsteinUhlenbeckActionNoise,
)



MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}

MODEL_KWARGS = {x: config.__dict__[f"{x.upper()}_PARAMS"] for x in MODELS.keys()}

NOISE = {
    "normal": NormalActionNoise,
    "ornstein_uhlenbeck": OrnsteinUhlenbeckActionNoise,
}


class DRLAgent:
    """Provides implementations for DRL algorithms

    Attributes
    ----------
        env: gym environment class
            user-defined class

    Methods
    -------
    train_PPO()
        the implementation for PPO algorithm
    train_A2C()
        the implementation for A2C algorithm
    train_DDPG()
        the implementation for DDPG algorithm
    train_TD3()
        the implementation for TD3 algorithm
    train_SAC()
        the implementation for SAC algorithm
    DRL_prediction()
        make a prediction in a test dataset and get results
    """

    @staticmethod
    def DRL_prediction(model, test_data, test_env, test_obs):
        """make a prediction"""
        start = time.time()
        account_memory = []
        actions_memory = []
        for i in range(len(test_data.index.unique())):
            action, _states = model.predict(test_obs)
            test_obs, rewards, dones, info = test_env.step(action)
            if i == (len(test_data.index.unique()) - 2):
                account_memory = test_env.env_method(method_name="save_asset_memory")
                actions_memory = test_env.env_method(method_name="save_action_memory")
        end = time.time()
        return account_memory[0], actions_memory[0]

    def __init__(self, env):
        self.env = env

    def get_model(
        self,
        model_name,
        policy="MlpPolicy",
        policy_kwargs=None,
        model_kwargs=None,
        verbose=1,
    ):
        if model_name not in MODELS:
            raise NotImplementedError("NotImplementedError")

        if model_kwargs is None:
            model_kwargs = MODEL_KWARGS[model_name]

        if "action_noise" in model_kwargs:
            n_actions = self.env.action_space.shape[-1]
            model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
                mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
            )
        print(model_kwargs)
        model = MODELS[model_name](
            policy=policy,
            env=self.env,
            tensorboard_log=f"{config.TENSORBOARD_LOG_DIR}/{model_name}",
            verbose=verbose,
            policy_kwargs=policy_kwargs,
            **model_kwargs,
        )
        return model

    def train_model(self, model, tb_log_name, total_timesteps=5000):
        model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
        return model

Writing models.py


In [4]:
#@title backtest.py
%%writefile backtest.py
import pandas as pd
import numpy as np

from pyfolio import timeseries
import pyfolio
import matplotlib.pyplot as plt

from finrl.marketdata.yahoodownloader import YahooDownloader
from finrl.config import config


def BackTestStats(account_value):
    df = account_value.copy()
    df = get_daily_return(df)
    DRL_strat = backtest_strat(df)
    perf_func = timeseries.perf_stats
    perf_stats_all = perf_func(
        returns=DRL_strat,
        factor_returns=DRL_strat,
        positions=None,
        transactions=None,
        turnover_denom="AGB",
    )
    print(perf_stats_all)
    return perf_stats_all


def BaselineStats(
    baseline_ticker="^DJI",
    baseline_start=config.START_TRADE_DATE,
    baseline_end=config.END_DATE,
):

    dji, dow_strat = baseline_strat(
        ticker=baseline_ticker, start=baseline_start, end=baseline_end
    )
    perf_func = timeseries.perf_stats
    perf_stats_all = perf_func(
        returns=dow_strat,
        factor_returns=dow_strat,
        positions=None,
        transactions=None,
        turnover_denom="AGB",
    )
    print(perf_stats_all)
    return perf_stats_all


def BackTestPlot(
    account_value,
    baseline_start=config.START_TRADE_DATE,
    baseline_end=config.END_DATE,
    baseline_ticker="^DJI",
):

    df = account_value.copy()
    df = get_daily_return(df)

    dji, dow_strat = baseline_strat(
        ticker=baseline_ticker, start=baseline_start, end=baseline_end
    )
    df["date"] = dji["date"]
    df = df.dropna()

    DRL_strat = backtest_strat(df)

    with pyfolio.plotting.plotting_context(font_scale=1.1):
        pyfolio.create_full_tear_sheet(
            returns=DRL_strat, benchmark_rets=dow_strat, set_context=False
        )


def backtest_strat(df):
    strategy_ret = df.copy()
    strategy_ret["date"] = pd.to_datetime(strategy_ret["date"])
    strategy_ret.set_index("date", drop=False, inplace=True)
    strategy_ret.index = strategy_ret.index.tz_localize("UTC")
    del strategy_ret["date"]
    ts = pd.Series(strategy_ret["daily_return"].values, index=strategy_ret.index)
    return ts


def baseline_strat(ticker, start, end):
    dji = YahooDownloader(
        start_date=start, end_date=end, ticker_list=[ticker]
    ).fetch_data()
    dji["daily_return"] = dji["close"].pct_change(1)
    dow_strat = backtest_strat(dji)
    return dji, dow_strat


def get_daily_return(df):
    df["daily_return"] = df.account_value.pct_change(1)
    # df=df.dropna()
    sharpe = (252 ** 0.5) * df["daily_return"].mean() / df["daily_return"].std()

    annual_return = ((df["daily_return"].mean() + 1) ** 252 - 1) * 100
    print("annual return: ", annual_return)
    print("sharpe ratio: ", sharpe)
    return df

Writing backtest.py


In [5]:
#@title env_portfolio.py
%%writefile env_portfolio.py
import numpy as np
import pandas as pd
from gym.utils import seeding
import gym
from gym import spaces
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from stable_baselines3.common.vec_env import DummyVecEnv


class StockPortfolioEnv(gym.Env):
    """A single stock trading environment for OpenAI gym

    Attributes
    ----------
        df: DataFrame
            input data
        stock_dim : int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount : int
            start money
        transaction_cost_pct: float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        state_space: int
            the dimension of input features
        action_space: int
            equals stock dimension
        tech_indicator_list: list
            a list of technical indicator names
        turbulence_threshold: int
            a threshold to control risk aversion
        day: int
            an increment number to control date

    Methods
    -------
    _sell_stock()
        perform sell action based on the sign of the action
    _buy_stock()
        perform buy action based on the sign of the action
    step()
        at each step the agent will return actions, then 
        we will calculate the reward, and return the next observation.
    reset()
        reset the environment
    render()
        use render to return other functions
    save_asset_memory()
        return account value at each time step
    save_action_memory()
        return actions/positions at each time step
        

    """
    metadata = {'render.modes': ['human']}

    def __init__(self, 
                df,
                stock_dim,
                hmax,
                initial_amount,
                transaction_cost_pct,
                reward_scaling,
                state_space,
                action_space,
                tech_indicator_list,
                initial_weights,
                turbulence_threshold=None,
                lookback=252,
                day = 0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback=lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct =transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list
        self.initial_weights = initial_weights

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,)) 
 
        # covariance matrix + technical indicators
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape = (self.state_space+len(self.tech_indicator_list),self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day,:]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False     
        self.turbulence_threshold = turbulence_threshold        
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount
    
        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory=[self.initial_weights]
        self.date_memory=[self.data.date.unique()[0]]
             
    def step(self, actions):
        # print(self.day)
        self.terminal = self.day >= len(self.df.index.unique())-1
        # print(actions)

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(),'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()
            
            plt.plot(self.portfolio_return_memory,'r')
            plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))           
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)
            print("=================================")
            
            return self.state, self.reward, self.terminal,{}

        else:
            #print("Model actions: ",actions)
            # actions are the portfolio weight
            # normalize to sum of 1
            #if (np.array(actions) - np.array(actions).min()).sum() != 0:
            #  norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
            #else:
            
      
            #  norm_actions = actions
            weights = self.softmax_normalization(actions) 
            #print("Normalized actions: ", weights)
            self.actions_memory.append(weights)
            last_day_memory = self.data
            
            
            """
            # Get data frame of close prices 
            # Reset the Index to tic and date
            df_prices = self.data.copy()
            df_prices = df_prices.reset_index().set_index(['tic', 'date']).sort_index()
            tic_list = list(set([i for i,j in df_prices.index]))

            # Get all the Close Prices
            df_close = pd.DataFrame()
            for ticker in tic_list:
                series = df_prices.xs(ticker).close
                df_close[ticker] = series
            
            mu = expected_returns.mean_historical_return(df_close)
            Sigma = risk_models.sample_cov(df_close)
            ef = EfficientFrontier(mu,Sigma)

            raw_weights = ef.max_sharpe()
            weights = [j for i,j in raw_weights.items()]
            self.actions_memory.append(weights)
            last_day_memory = self.data
            
            """

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            #print(self.state)
            # calcualte portfolio return
            # individual stocks' return * weight
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])            
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value 
            #print("Step reward: ", self.reward)
            #self.reward = self.reward*self.reward_scaling

        return self.state, self.reward, self.terminal, {}

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False 
        self.portfolio_return_memory = [0]
              
        self.actions_memory=[self.initial_weights] 
        self.date_memory=[self.data.date.unique()[0]] 
        return self.state
    
    def render(self, mode='human'):
        return self.state
        
    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator/denominator
        return softmax_output

    
    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']
        
        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions
    
    def initial_weights(self, data_frame):
        # Get data frame of close prices 
        # Reset the Index to tic and date
        df_prices = data_frame.copy()
        df_prices = df_prices.reset_index().set_index(['tic', 'date']).sort_index()
        tic_list = list(set([i for i,j in df_prices.index]))
        
        # Get all the Close Prices
        df_close = pd.DataFrame()
        for ticker in tic_list:
            series = df_prices.xs(ticker).close
            df_close[ticker] = series
            
        mu = expected_returns.mean_historical_return(df_close)
        Sigma = risk_models.sample_cov(df_close)
        ef = EfficientFrontier(mu,Sigma, weight_bounds=(0.01, 1))
        
        raw_weights = ef.max_sharpe()
        initial_weights = [j for i,j in raw_weights.items()]
        
        return initial_weights

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs


Writing env_portfolio.py


In [6]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import datetime

import pandas as pd
import numpy as np

import matplotlib
import matplotlib.pylab as plt

from pandas_datareader import data as pdr

from pyfolio import timeseries

import ta
from ta import add_all_ta_features
from ta.utils import dropna

from finrl.preprocessing.data import data_split

import config

from backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat
from backtest import backtest_strat, baseline_strat

import models
from models import DRLAgent

import env_portfolio
from env_portfolio import StockPortfolioEnv

matplotlib.use('Agg')
%matplotlib inline

  'Module "zipline.assets" not found; multipliers will not be applied'


# Entorno


In [7]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

# Datos


## Descarga

In [8]:
ticker_list = config.DOW_30_TICKER

In [9]:
df = pdr.get_data_yahoo([ticker_list][0], start='2008-01-01', end="2021-01-01").dropna()

In [10]:
df

Attributes,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,Close,Close,Close,Close,Close,Close,...,Open,Open,Open,Open,Open,Open,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Symbols,AAPL,MSFT,JPM,V,RTX,PG,GS,NKE,DIS,AXP,HD,INTC,WMT,IBM,MRK,UNH,KO,CAT,TRV,JNJ,CVX,MCD,VZ,CSCO,XOM,BA,MMM,PFE,WBA,DD,AAPL,MSFT,JPM,V,RTX,PG,GS,NKE,DIS,AXP,...,CVX,MCD,VZ,CSCO,XOM,BA,MMM,PFE,WBA,DD,AAPL,MSFT,JPM,V,RTX,PG,GS,NKE,DIS,AXP,HD,INTC,WMT,IBM,MRK,UNH,KO,CAT,TRV,JNJ,CVX,MCD,VZ,CSCO,XOM,BA,MMM,PFE,WBA,DD
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2
2008-03-19,3.965466,21.245945,29.877951,12.828161,30.894299,44.601810,136.437653,12.970126,26.503471,33.403065,18.987404,13.838433,37.073303,72.017250,24.798239,29.601982,19.519815,49.762341,32.930885,42.876759,47.255245,35.911816,16.676668,17.817327,50.754337,54.094536,53.751907,11.213460,25.902073,33.847958,4.631071,28.620001,42.470001,14.125000,43.272499,67.809998,166.490005,15.457500,31.240000,42.000000,...,86.139999,55.009998,33.047550,25.469999,87.809998,76.980003,81.239998,19.781784,36.990002,54.084251,1.010537e+09,61442100.0,70593300.0,708486000.0,9691947.0,15093900.0,24176100.0,20753600.0,11737100.0,14098300.0,23266600.0,69668600.0,25829900.0,9742862.0,16114782.0,14692900.0,23258600.0,7377400.0,3956100.0,15845800.0,14797100.0,8740500.0,20761899.0,63988600.0,35073600.0,9195600.0,4450700.0,48320946.0,8394800.0,6272388.0
2008-03-20,4.075558,21.661661,32.340225,14.610479,31.006624,45.627899,147.205826,14.111277,27.063408,36.568398,19.850466,14.271501,38.862007,72.873276,25.011662,29.410463,19.871410,49.823090,33.910114,43.207195,48.016968,36.420456,17.146633,18.035753,51.096992,55.088783,52.744095,11.202583,26.186872,33.913036,4.759643,29.180000,45.970001,16.087500,43.429829,69.370003,179.630005,16.817499,31.900000,45.980000,...,81.279999,53.950001,33.523659,24.600000,84.180000,73.550003,78.500000,19.639469,36.419998,51.878181,9.087876e+08,60170200.0,72777200.0,198985200.0,10182471.0,16523000.0,20801800.0,43300400.0,13802300.0,17558800.0,22243000.0,67373400.0,44533300.0,11943123.0,17209522.0,12731500.0,31028600.0,9254000.0,6908900.0,16276300.0,18373200.0,13075600.0,19425705.0,70930100.0,44962400.0,8086000.0,6937300.0,57742441.0,10100800.0,6056336.0
2008-03-24,4.266997,21.654238,32.748253,13.561522,31.604202,46.022564,146.591263,14.490966,27.182184,37.705692,20.699379,14.520835,39.154045,73.322853,25.525051,29.735218,19.916986,51.314686,33.839146,42.883377,48.478622,37.323936,17.550144,18.669235,51.668072,56.186153,53.190491,11.213460,27.489807,35.010006,4.983214,29.170000,46.549999,14.932500,44.266834,69.970001,178.880005,17.270000,32.040001,47.410000,...,83.379997,54.580002,34.167805,24.920000,85.169998,75.250000,78.150002,19.753321,38.250000,52.461723,1.066920e+09,48294700.0,66011200.0,149566400.0,6862732.0,10100000.0,15643700.0,29827600.0,10536000.0,13441300.0,25245600.0,53798700.0,22246400.0,8784308.0,18329520.0,7288100.0,14037600.0,5374000.0,5345300.0,9887900.0,9971900.0,7874100.0,13063730.0,55003000.0,22339200.0,4853800.0,4342200.0,37655731.0,9976500.0,5143087.0
2008-03-25,4.311340,21.631962,32.403553,14.360732,31.523342,45.739735,147.205826,14.342029,27.216116,37.785225,20.345669,14.612696,38.730591,72.651566,25.767328,29.377161,19.991863,51.726383,33.888805,42.658676,48.784451,37.611721,17.516914,18.749329,51.217213,55.898907,53.420498,11.305962,27.062611,35.195934,5.035000,29.139999,46.060001,15.812500,44.153557,69.540001,179.630005,17.092501,32.080002,47.509998,...,84.269997,55.669998,34.597237,25.730000,86.260002,76.879997,78.830002,19.620493,38.660000,53.799599,1.052391e+09,49149000.0,54485500.0,87092000.0,7347536.0,10581200.0,11666800.0,20459600.0,8547300.0,9942200.0,18463700.0,48236100.0,20327500.0,8832006.0,14061121.0,9606500.0,15298200.0,5211600.0,3225800.0,9511900.0,12123000.0,7557800.0,15571050.0,46113300.0,27080200.0,5969200.0,3888700.0,46445142.0,5740700.0,5571396.0
2008-03-26,4.436111,21.201405,31.031700,14.521932,31.276218,45.805496,143.804916,13.901510,26.944635,36.075314,19.921211,14.343671,38.621075,71.998779,25.784622,28.394588,19.907221,52.212341,33.739807,42.757812,49.026821,37.210152,17.156134,18.043043,51.854462,56.193527,53.204029,11.256991,27.027027,35.149464,5.180714,28.559999,44.110001,15.990000,43.807426,69.639999,175.479996,16.567499,31.760000,45.360001,...,84.620003,55.830002,34.335842,25.299999,85.230003,75.660004,78.730003,19.658443,37.750000,53.586105,1.182084e+09,45868100.0,41122700.0,43111600.0,6729892.0,8431200.0,11610600.0,20653600.0,9219700.0,11113000.0,14793500.0,51872200.0,19226400.0,10059487.0,21383497.0,15311800.0,16547800.0,7498900.0,2990400.0,11354700.0,11674800.0,5240900.0,19511292.0,79022700.0,23024000.0,4350600.0,3700200.0,33432775.0,4524300.0,3982379.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-12-24,130.994522,220.442551,120.745842,207.088928,68.226273,133.619583,251.740738,140.543198,173.729996,115.707329,265.376099,45.533478,141.253708,112.145279,73.789803,336.064819,51.806728,175.036377,135.425613,147.773376,80.237846,206.658356,55.618660,43.017670,38.760460,217.149994,167.531418,35.595997,37.781544,68.455521,131.970001,222.750000,124.519997,208.699997,70.269997,137.720001,256.160004,141.600006,173.729996,117.349998,...,85.930000,212.119995,59.000000,44.450001,41.650002,219.619995,174.419998,37.400002,39.950001,69.400002,5.493010e+07,10550600.0,4164900.0,3367900.0,1758600.0,2588200.0,968100.0,1821900.0,2721000.0,707000.0,1093900.0,11865600.0,3018200.0,1842111.0,2957456.0,1360600.0,3265500.0,585700.0,416800.0,2114900.0,3335600.0,1047700.0,7751100.0,5720500.0,8039000.0,6398500.0,656200.0,14790100.0,2678000.0,2409700.0
2020-12-28,135.679642,222.629654,121.540993,210.988602,68.575813,134.550995,255.111557,141.366974,178.860001,116.703194,263.740295,45.533478,142.946777,112.262199,74.075241,341.153290,52.504723,173.876373,136.110123,148.471191,79.833496,209.229492,55.750992,43.384678,38.890903,216.089996,167.713821,35.166214,37.819691,67.569679,136.690002,224.960007,125.339996,212.630005,70.629997,138.679993,259.589996,142.429993,178.860001,118.360001,...,85.610001,212.990005,59.160000,44.930000,41.689999,218.190002,175.309998,37.360001,39.680000,69.930000,1.244862e+08,17933500.0,8072600.0,5816200.0,2938000.0,3714700.0,2793400.0,4081500.0,13145400.0,1878700.0,2633800.0,21269200.0,6448300.0,3781499.0,4804242.0,2308200.0,9020500.0,1508800.0,1100300.0,3855500.0,8051900.0,2550100.0,15355600.0,13458400.0,23877500.0,9090600.0,1403000.0,26993700.0,4714500.0,4712300.0
2020-12-29,133.873077,221.828033,121.220993,212.715149,68.294243,134.298737,253.558807,140.513443,177.300003,116.486267,260.742920,47.777740,142.041199,111.344818,74.968384,342.533844,52.475643,172.199692,135.288712,149.391922,79.560806,207.948822,55.590298,43.114250,38.452988,216.250000,166.888260,35.385876,37.581287,68.219284,134.869995,224.149994,125.010002,214.369995,70.339996,138.419998,258.010010,141.570007,177.300003,118.139999,...,85.260002,214.639999,59.029999,44.970001,42.040001,218.300003,175.550003,36.900002,39.810001,68.750000,1.210473e+08,17403200.0,8389200.0,6093400.0,3670100.0,5139300.0,1430900.0,3232400.0,6875400.0,1860400.0,2572100.0,84531400.0,5979400.0,3647402.0,5708037.0,2275700.0,8320600.0,1490300.0,859000.0,5212000.0,7670800.0,1665700.0,15686100.0,11829000.0,20287700.0,14593800.0,1218900.0,23152100.0,4004400.0,5159500.0
2020-12-30,132.731583,219.383636,121.560387,216.674362,69.284584,133.668106,254.973984,140.523346,181.169998,117.758224,259.831909,47.158638,141.923065,111.830490,74.204147,340.206573,52.776165,175.504288,136.002548,151.243088,80.237846,206.824570,54.956978,42.959721,38.760460,216.669998,167.137833,35.089802,37.514538,69.193718,133.720001,221.679993,125.360001,218.360001,71.360001,137.770004,259.450012,141.580002,181.169998,119.430000,...,84.610001,212.960007,58.830002,44.740002,41.330002,216.360001,173.880005,37.029999,39.520000,69.339996,9.645210e+07,20272300.0,7398000.0,8875100.0,5015500.0,3261400.0,1566500.0,3052100.0,11680400.0,1954200.0,2511400.0,37385400.0,6250400.0,3535794.0,5933357.0,1866000.0,8142700.0,2720600.0,1253800.0,5412800.0,7901800.0,1855000.0,18259800.0,11043100.0,23807300.0,10812600.0,1419100.0,24889800.0,4194300.0,4683000.0


In [11]:
data = df.copy()

In [12]:
data = data.stack().reset_index()
data.columns.names = [None]
data = data.drop(['Close'], axis=1)

In [13]:
data.head()

Unnamed: 0,Date,Symbols,Adj Close,High,Low,Open,Volume
0,2008-03-19,AAPL,3.965466,4.796071,4.631071,4.754286,1010537000.0
1,2008-03-19,MSFT,21.245945,29.59,28.620001,29.379999,61442100.0
2,2008-03-19,JPM,29.877951,44.889999,42.439999,43.259998,70593300.0
3,2008-03-19,V,12.828161,17.25,13.75,14.875,708486000.0
4,2008-03-19,RTX,30.894299,44.361233,43.272499,43.813721,9691947.0


In [14]:
data.columns = ['date','tic','close','high','low','open','volume']
data

Unnamed: 0,date,tic,close,high,low,open,volume
0,2008-03-19,AAPL,3.965466,4.796071,4.631071,4.754286,1.010537e+09
1,2008-03-19,MSFT,21.245945,29.590000,28.620001,29.379999,6.144210e+07
2,2008-03-19,JPM,29.877951,44.889999,42.439999,43.259998,7.059330e+07
3,2008-03-19,V,12.828161,17.250000,13.750000,14.875000,7.084860e+08
4,2008-03-19,RTX,30.894299,44.361233,43.272499,43.813721,9.691947e+06
...,...,...,...,...,...,...,...
96625,2020-12-31,BA,214.059998,216.899994,212.699997,216.240005,1.048760e+07
96626,2020-12-31,MMM,167.790604,174.869995,173.179993,174.119995,1.841300e+06
96627,2020-12-31,PFE,35.156662,36.919998,36.290001,36.660000,3.079650e+07
96628,2020-12-31,WBA,38.029480,40.000000,39.029999,39.330002,7.696000e+06


## Guardar a csv

In [15]:
data.to_csv('datasets/data.csv', index=False)

# Ingeniería de características y preprocesamiento de los datos

## Indicadores técnicos

In [16]:
data

Unnamed: 0,date,tic,close,high,low,open,volume
0,2008-03-19,AAPL,3.965466,4.796071,4.631071,4.754286,1.010537e+09
1,2008-03-19,MSFT,21.245945,29.590000,28.620001,29.379999,6.144210e+07
2,2008-03-19,JPM,29.877951,44.889999,42.439999,43.259998,7.059330e+07
3,2008-03-19,V,12.828161,17.250000,13.750000,14.875000,7.084860e+08
4,2008-03-19,RTX,30.894299,44.361233,43.272499,43.813721,9.691947e+06
...,...,...,...,...,...,...,...
96625,2020-12-31,BA,214.059998,216.899994,212.699997,216.240005,1.048760e+07
96626,2020-12-31,MMM,167.790604,174.869995,173.179993,174.119995,1.841300e+06
96627,2020-12-31,PFE,35.156662,36.919998,36.290001,36.660000,3.079650e+07
96628,2020-12-31,WBA,38.029480,40.000000,39.029999,39.330002,7.696000e+06


In [17]:
data = pd.read_csv('datasets/data.csv')
data_w_feat = data.copy()
data_w_feat = add_all_ta_features(data_w_feat, open = 'open', high = 'high', low = 'low', close = 'close', volume = 'volume')

  self._nvi.iloc[i] = self._nvi.iloc[i - 1] * (1.0 + price_change.iloc[i])
  dip[idx] = 100 * (self._dip[idx] / value)
  din[idx] = 100 * (self._din[idx] / value)


In [18]:
feature_list= ['volatility_atr','volatility_bbw','volume_obv','volume_cmf',
               'trend_macd', 'trend_adx', 'trend_sma_fast', 
               'trend_ema_fast', 'trend_cci', 'momentum_rsi']

short_names = ['atr', 'bbw','obv','cmf','macd', 'adx', 'sma', 'ema', 'cci', 'rsi']

data_w_feat = data_w_feat[list(data.columns) + feature_list].dropna()
data_w_feat.columns = list(data.columns) + short_names

In [19]:
data_w_feat

Unnamed: 0,date,tic,close,high,low,open,volume,atr,bbw,obv,cmf,macd,adx,sma,ema,cci,rsi
25,2008-03-19,BA,54.094536,77.000000,73.449997,76.980003,9195600.0,30.117253,280.611233,5.189066e+08,-20.366804,5.201412,0.000000,35.166663,37.380311,48.130516,55.247735
26,2008-03-19,MMM,53.751907,81.680000,78.540001,81.239998,4450700.0,29.864074,183.869161,5.144559e+08,-21.201594,6.122118,0.000000,37.579468,39.899019,88.742553,55.158750
27,2008-03-19,PFE,11.213460,19.971537,19.497154,19.781784,48320946.0,30.303142,185.771772,4.661350e+08,-22.915306,3.380322,9.260898,36.047091,35.485856,-103.397634,45.385077
28,2008-03-19,WBA,25.902073,37.330002,36.299999,36.990002,8394800.0,29.884482,186.113708,4.745298e+08,-23.091596,2.365409,8.770102,36.578946,34.011428,-46.845081,48.761258
29,2008-03-19,DD,33.847958,54.454300,51.807018,54.084251,6272388.0,29.751256,185.980637,4.808022e+08,-23.473638,2.177153,8.597596,35.252748,33.986278,0.095718,50.542380
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96625,2020-12-31,BA,214.059998,216.899994,212.699997,216.240005,10487600.0,100.704312,232.115948,-1.107522e+12,-3.956472,-6.695401,4.907576,131.828187,124.693201,71.164410,53.157662
96626,2020-12-31,MMM,167.790604,174.869995,173.179993,174.119995,1841300.0,94.721881,227.697474,-1.107524e+12,-4.043790,-2.761476,4.837083,139.534215,131.323571,35.279538,51.338474
96627,2020-12-31,PFE,35.156662,36.919998,36.290001,36.660000,30796500.0,98.399754,246.147603,-1.107554e+12,-4.190862,-10.228356,4.943856,113.645794,116.528662,-98.460047,46.432970
96628,2020-12-31,WBA,38.029480,40.000000,39.029999,39.330002,7696000.0,89.044112,265.554799,-1.107547e+12,-4.291463,-15.732743,5.024889,112.384588,104.451864,-86.218507,46.552097


## Matriz de covarianza

In [20]:
def add_cov_matrix(df):
    """
    Function to add Coveriance Matrices as part of the defined states
    """
    # Sort the data and index by date and tic
    df=df.sort_values(['date','tic'],ignore_index=True) 
    df.index = df.date.factorize()[0]
    
    cov_list = [] # create empty list for storing coveriance matrices at each time step
    
    # look back for constructing the coveriance matrix is one year
    lookback=252
    for i in range(lookback,len(df.index.unique())):
        data_lookback = df.loc[i-lookback:i,:]
        price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
        return_lookback = price_lookback.pct_change().dropna()
        covs = return_lookback.cov().values 
        covs = covs#/covs.max()
        cov_list.append(covs)
        
    df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
    df = df.merge(df_cov, on='date')
    df = df.sort_values(['date','tic']).reset_index(drop=True)
    
    return df

In [21]:
data_w_feat_cov = data_w_feat.copy()
data_w_feat_cov = add_cov_matrix(data_w_feat_cov)

In [22]:
data_w_feat_cov

Unnamed: 0,date,tic,close,high,low,open,volume,atr,bbw,obv,cmf,macd,adx,sma,ema,cci,rsi,cov_list
0,2009-03-19,AAPL,3.107663,3.685714,3.580357,3.637500,500180800.0,18.120416,233.294550,-2.422088e+11,-11.832022,-3.218524,6.022222,23.265005,18.759133,-115.647520,44.563560,"[[0.0013245336239407484, 0.001164215634539437,..."
1,2009-03-19,AXP,10.624823,14.570000,12.950000,14.310000,33340700.0,23.609795,311.271757,-2.420588e+11,-9.897507,-0.461978,6.134882,20.013142,22.364379,-62.314367,47.208735,"[[0.0013245336239407484, 0.001164215634539437,..."
2,2009-03-19,BA,25.146063,34.119999,32.970001,33.860001,10427300.0,21.592491,267.663188,-2.421094e+11,-16.619005,1.190334,6.037623,25.184041,26.936902,-15.047805,49.853837,"[[0.0013245336239407484, 0.001164215634539437,..."
3,2009-03-19,CAT,19.784441,28.850000,27.770000,27.969999,17542700.0,21.418635,339.423253,-2.420928e+11,-11.187232,-0.429382,5.377175,25.649076,22.630488,-5.450580,49.181831,"[[0.0013245336239407484, 0.001164215634539437,..."
4,2009-03-19,CSCO,11.817540,16.680000,16.059999,16.680000,56670100.0,19.040104,273.055140,-2.421376e+11,-16.155069,0.093050,5.979474,27.536416,24.597719,-69.538141,46.209972,"[[0.0013245336239407484, 0.001164215634539437,..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89065,2020-12-31,V,217.041473,219.820007,216.199997,218.399994,5922200.0,84.857357,252.357853,-1.107450e+12,-3.242753,5.148149,4.967134,112.864019,136.559560,85.396150,53.778921,"[[0.0008654078323762384, 0.0006141779665130057..."
89066,2020-12-31,VZ,55.533581,58.799999,58.020000,58.060001,12906300.0,112.114058,209.188972,-1.107494e+12,-3.318892,-3.408291,4.534457,132.462845,135.353189,-100.608867,47.047963,"[[0.0008654078323762384, 0.0006141779665130057..."
89067,2020-12-31,WBA,38.029480,40.000000,39.029999,39.330002,7696000.0,89.044112,265.554799,-1.107547e+12,-4.291463,-15.732743,5.024889,112.384588,104.451864,-86.218507,46.552097,"[[0.0008654078323762384, 0.0006141779665130057..."
89068,2020-12-31,WMT,141.893524,144.270004,142.850006,144.199997,5938000.0,104.238841,222.167131,-1.107466e+12,-2.431121,5.045837,4.724694,159.587286,145.788277,9.769952,50.053460,"[[0.0008654078323762384, 0.0006141779665130057..."


# División de los datos

In [23]:
train_pct = 0.8 # percentage of train data
date_list = list(data_w_feat_cov.date.unique()) # List of dates in the data

train_data_len = int(train_pct * len(date_list)) # length of the train data

train_start_date = date_list[0]
train_end_date = date_list[train_data_len]

test_start_date = date_list[train_data_len+1]
test_end_date = date_list[-1]

In [24]:
print('Training Data: ', 'from ', train_start_date, ' to ', train_end_date)
print('Testing Data: ', 'from ', test_start_date, ' to ', test_end_date)

Training Data:  from  2009-03-19  to  2018-08-23
Testing Data:  from  2018-08-24  to  2020-12-31


In [25]:
# Split the whole dataset
train_df = data_split(data_w_feat_cov, train_start_date, train_end_date)
test_df = data_split(data_w_feat_cov, test_start_date, test_end_date)

# Deep Reinforcement Learning

In [26]:
stock_dimension = len(train_df.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

Stock Dimension: 30, State Space: 30


In [27]:
weights_initial = np.repeat(1/stock_dimension, stock_dimension)
weights_initial

array([0.03333333, 0.03333333, 0.03333333, 0.03333333, 0.03333333,
       0.03333333, 0.03333333, 0.03333333, 0.03333333, 0.03333333,
       0.03333333, 0.03333333, 0.03333333, 0.03333333, 0.03333333,
       0.03333333, 0.03333333, 0.03333333, 0.03333333, 0.03333333,
       0.03333333, 0.03333333, 0.03333333, 0.03333333, 0.03333333,
       0.03333333, 0.03333333, 0.03333333, 0.03333333, 0.03333333])

In [28]:
train_df

Unnamed: 0,date,tic,close,high,low,open,volume,atr,bbw,obv,cmf,macd,adx,sma,ema,cci,rsi,cov_list
0,2009-03-19,AAPL,3.107663,3.685714,3.580357,3.637500,500180800.0,18.120416,233.294550,-2.422088e+11,-11.832022,-3.218524,6.022222,23.265005,18.759133,-115.647520,44.563560,"[[0.0013245336239407484, 0.001164215634539437,..."
0,2009-03-19,AXP,10.624823,14.570000,12.950000,14.310000,33340700.0,23.609795,311.271757,-2.420588e+11,-9.897507,-0.461978,6.134882,20.013142,22.364379,-62.314367,47.208735,"[[0.0013245336239407484, 0.001164215634539437,..."
0,2009-03-19,BA,25.146063,34.119999,32.970001,33.860001,10427300.0,21.592491,267.663188,-2.421094e+11,-16.619005,1.190334,6.037623,25.184041,26.936902,-15.047805,49.853837,"[[0.0013245336239407484, 0.001164215634539437,..."
0,2009-03-19,CAT,19.784441,28.850000,27.770000,27.969999,17542700.0,21.418635,339.423253,-2.420928e+11,-11.187232,-0.429382,5.377175,25.649076,22.630488,-5.450580,49.181831,"[[0.0013245336239407484, 0.001164215634539437,..."
0,2009-03-19,CSCO,11.817540,16.680000,16.059999,16.680000,56670100.0,19.040104,273.055140,-2.421376e+11,-16.155069,0.093050,5.979474,27.536416,24.597719,-69.538141,46.209972,"[[0.0013245336239407484, 0.001164215634539437,..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2374,2018-08-22,V,138.058319,141.529999,139.850006,139.850006,4191700.0,67.827214,267.034976,-1.084388e+12,-13.849217,-1.875491,4.561600,105.118575,107.267234,26.369829,51.636534,"[[0.00018308557715923055, 5.7977587046178125e-..."
2374,2018-08-22,VZ,46.524822,55.009998,53.939999,54.840000,13911400.0,72.996064,193.594317,-1.084430e+12,-16.438734,-2.403105,4.249422,105.094205,107.255999,-102.730007,46.578833,"[[0.00018308557715923055, 5.7977587046178125e-..."
2374,2018-08-22,WBA,61.324196,70.470001,69.620003,70.250000,3746600.0,89.721600,266.712490,-1.084443e+12,-19.153698,-0.356481,5.628290,115.458863,112.700035,-60.880085,47.631342,"[[0.00018308557715923055, 5.7977587046178125e-..."
2374,2018-08-22,WMT,90.294754,96.849998,95.230003,96.199997,7765700.0,74.485340,263.386074,-1.084412e+12,-12.287473,-2.329581,4.484446,110.670001,106.276179,-26.156598,49.002722,"[[0.00018308557715923055, 5.7977587046178125e-..."


In [29]:
env_kwargs = {
    "hmax": 500, 
    "initial_amount": 1000000, 
    "transaction_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": short_names, 
    "action_space": stock_dimension, 
    "reward_scaling": 0,
    'initial_weights': [1/stock_dimension]*stock_dimension
}

In [30]:
e_train_gym = StockPortfolioEnv(df = train_df, **env_kwargs)

In [31]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


## A2C

In [32]:
# initialize
agent = DRLAgent(env = env_train)

A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)

{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cuda device


In [33]:
trained_a2c = agent.train_model(model=model_a2c, 
                                tb_log_name='a2c',
                                total_timesteps=50000)

Logging to tensorboard_log/a2c/a2c_1
------------------------------------
| time/                 |          |
|    fps                | 87       |
|    iterations         | 100      |
|    time_elapsed       | 5        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -42.5    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0002   |
|    n_updates          | 99       |
|    policy_loss        | 2.55e+08 |
|    std                | 0.998    |
|    value_loss         | 3.84e+13 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 135      |
|    iterations         | 200      |
|    time_elapsed       | 7        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -42.5    |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0002   |
|    n_updates          | 199      |
|

## PPO

In [34]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.0001, 'batch_size': 128}
Using cuda device


In [35]:
trained_ppo = agent.train_model(model=model_ppo, 
                             tb_log_name='ppo',
                             total_timesteps=50000)

Logging to tensorboard_log/ppo/ppo_1
-----------------------------
| time/              |      |
|    fps             | 393  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
begin_total_asset:1000000
end_total_asset:5523233.176367797
Sharpe:  1.2937177998082536
------------------------------------------
| time/                   |              |
|    fps                  | 358          |
|    iterations           | 2            |
|    time_elapsed         | 11           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 7.799827e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -42.6        |
|    explained_variance   | 0            |
|    learning_rate        | 0.0001       |
|    loss                 | 1.1e+15      |
|    n_updates            | 10           |
|    policy_gradient

## DDPG

In [36]:
agent = DRLAgent(env = env_train)
DDPG_PARAMS = {"batch_size": 128, "buffer_size": 50000, "learning_rate": 0.001}


model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cuda device


In [None]:
trained_ddpg = agent.train_model(model=model_ddpg, 
                             tb_log_name='ddpg',
                             total_timesteps=50000)

Logging to tensorboard_log/ddpg/ddpg_1
begin_total_asset:1000000
end_total_asset:5779057.098162497
Sharpe:  1.3000140722570013
begin_total_asset:1000000
end_total_asset:5831124.617086993
Sharpe:  1.2927918248168193
begin_total_asset:1000000
end_total_asset:5831124.617086993
Sharpe:  1.2927918248168193
begin_total_asset:1000000
end_total_asset:5831124.617086993
Sharpe:  1.2927918248168193
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 123       |
|    time_elapsed    | 76        |
|    total_timesteps | 9500      |
| train/             |           |
|    actor_loss      | -4.87e+10 |
|    critic_loss     | 1.84e+18  |
|    learning_rate   | 0.001     |
|    n_updates       | 7125      |
----------------------------------
begin_total_asset:1000000
end_total_asset:5831124.617086993
Sharpe:  1.2927918248168193
begin_total_asset:1000000
end_total_asset:5831124.617086993
Sharpe:  1.2927918248168193
begin_total_a

## Predicciones

In [None]:
# A2C Train Model
e_trade_gym = StockPortfolioEnv(df = train_df, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

a2c_train_daily_return, a2c_train_weights = DRLAgent.DRL_prediction(model=trained_a2c,
                        test_data = train_df,
                        test_env = env_trade,
                        test_obs = obs_trade)

In [None]:
# PPO Train Model
e_trade_gym = StockPortfolioEnv(df = train_df, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

ppo_train_daily_return, ppo_train_weights = DRLAgent.DRL_prediction(model=trained_ppo,
                        test_data = train_df,
                        test_env = env_trade,
                        test_obs = obs_trade)

In [None]:
# DDPG Train Model
e_trade_gym = StockPortfolioEnv(df = train_df, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

ddpg_train_daily_return, ddpg_train_weights = DRLAgent.DRL_prediction(model=trained_ddpg,
                        test_data = train_df,
                        test_env = env_trade,
                        test_obs = obs_trade)

In [None]:
# A2C Test Model
e_trade_gym = StockPortfolioEnv(df = test_df, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

a2c_test_daily_return, a2c_test_weights = DRLAgent.DRL_prediction(model=trained_a2c,
                        test_data = test_df,
                        test_env = env_trade,
                        test_obs = obs_trade)

In [None]:
a2c_test_daily_return.head()

In [None]:
a2c_test_weights.to_csv('a2c_test_weights.csv')


In [None]:
a2c_test_weights.head()


In [None]:
# PPO Test Model
e_trade_gym = StockPortfolioEnv(df = test_df, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

ppo_test_daily_return, ppo_test_weights = DRLAgent.DRL_prediction(model=trained_ppo,
                        test_data = test_df,
                        test_env = env_trade,
                        test_obs = obs_trade)

In [None]:
ppo_test_weights.to_csv('ppo_test_weights')

In [None]:
# DDPG Test Model
e_trade_gym = StockPortfolioEnv(df = test_df, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

ddpg_test_daily_return, ddpg_test_weights = DRLAgent.DRL_prediction(model=trained_ddpg,
                        test_data = test_df,
                        test_env = env_trade,
                        test_obs = obs_trade)

In [None]:
ddpg_test_weights.to_csv('ddpg_test_weights')

In [None]:
a2c_test_portfolio = a2c_test_daily_return.copy()
a2c_test_returns = a2c_test_daily_return.copy()

ppo_test_portfolio = ppo_test_daily_return.copy()
ppo_test_returns = ppo_test_daily_return.copy()

ddpg_test_portfolio = ddpg_test_daily_return.copy()
ddpg_test_returns = ddpg_test_daily_return.copy()

## Backtesting y evaluación de los portafolios

In [None]:
a2c_train_cum_returns = (1 + a2c_train_daily_return.reset_index(drop=True).set_index(['date'])).cumprod()
a2c_train_cum_returns = a2c_train_cum_returns['daily_return']
a2c_train_cum_returns.name = 'Portfolio 3: a2c Model'

ppo_train_cum_returns = (1 + ppo_train_daily_return.reset_index(drop=True).set_index(['date'])).cumprod()
ppo_train_cum_returns = ppo_train_cum_returns['daily_return']
ppo_train_cum_returns.name = 'Portfolio 4: ppo Model'

ddpg_train_cum_returns = (1 + ddpg_train_daily_return.reset_index(drop=True).set_index(['date'])).cumprod()
ddpg_train_cum_returns = ddpg_train_cum_returns['daily_return']
ddpg_train_cum_returns.name = 'Portfolio 5: ddpg Model'

date_list = list(ddpg_train_cum_returns.index)

In [None]:
# Plot the culmulative returns of the portfolios
fig, ax = plt.subplots(figsize=(15,8))

a2c_train_cum_returns.plot(ax=ax, color='blue', alpha=0.4)
ppo_train_cum_returns.plot(ax=ax, color='green', alpha=0.4)
ddpg_train_cum_returns.plot(ax=ax, color='purple', alpha=0.4)

plt.legend(loc="best");
plt.grid(True);
ax.set_ylabel("cummulative return");
ax.set_title("Backtest based on the data from 2008-12-31 to 2018-10-18", fontsize=14);
fig.savefig('results/back_test_on_train_data.png');

In [None]:
a2c_test_cum_returns = (1 + a2c_test_returns['daily_return']).cumprod()
a2c_test_cum_returns.name = 'Portfolio 3: a2c Model'

ppo_test_cum_returns = (1 + ppo_test_returns['daily_return']).cumprod()
ppo_test_cum_returns.name = 'Portfolio 4: ppo Model'

ddpg_test_cum_returns = (1 + ddpg_test_returns['daily_return']).cumprod()
ddpg_test_cum_returns.name = 'Portfolio 5: ddpg Model'

In [None]:
# Plot the culmulative returns of the portfolios
fig, ax = plt.subplots(figsize=(20,13))
a2c_test_cum_returns.plot(ax=ax, color='blue', alpha=.4)
ppo_test_cum_returns.plot(ax=ax, color='green', alpha=.4)
ddpg_test_cum_returns.plot(ax=ax, color='purple', alpha=.4)
plt.legend(loc="best");
plt.grid(True);
ax.set_ylabel("cummulative return");
ax.set_title("Backtest based on the data from 2018-10-19 to 2020-12-30", fontsize=14);
fig.savefig('results/back_test_on_test_data.png');

In [None]:
# Define a Function for Getting the Portfolio Statistics

def portfolio_stats(portfolio_returns):
    # Pass the returns into a dataframe
    port_rets_df = pd.DataFrame(portfolio_returns)
    port_rets_df = port_rets_df.reset_index()
    port_rets_df.columns = ['date','daily_return']
    
    #Use the FinRL Library to get the Portfolio Returns
    #This makes use of the Pyfolio Library
    
    DRL_strat = backtest_strat(port_rets_df)
    perf_func = timeseries.perf_stats 
    perf_stats_all = perf_func( returns=DRL_strat, 
                                  factor_returns=DRL_strat, 
                                    positions=None, transactions=None, turnover_denom="AGB")
    perf_stats_all = pd.DataFrame( perf_stats_all)
    perf_stats_all.columns = ['Statistic']
    return perf_stats_all

In [None]:
# Get the Portfolio Statistics for all the portfolios
portfolios_returns_dict = {'a2c Model': a2c_test_returns['daily_return'],
                          'ppo Model': ppo_test_returns['daily_return'],
                          'ddpg Model': ddpg_test_returns['daily_return']}

portfolios_stats = pd.DataFrame()
for i,j in portfolios_returns_dict.items():
    port_stats = portfolio_stats(j)
    portfolios_stats[i] = port_stats['Statistic']

In [None]:
portfolios_stats


## Ejercicios

- Comparar los resultados con Markovitz y la cartera de pesos iguales
- Probar con diferentes activos y temporalidad.

Fuentes:

- https://arxiv.org/pdf/2010.04404.pdf
- https://gym.openai.com/docs/
- **https://github.com/Musonda2day/Asset-Portfolio-Management-usingDeep-Reinforcement-Learning-**
- https://github.com/selimamrouni/Deep-Portfolio-Management-Reinforcement-Learning
- https://github.com/rathiromil13/DS-5500-Project-Portfolio-Optimization-Using-Deep-Reinforcement-Learning