<a target="_blank" href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL-Tutorials/blob/master/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Install all the packages through FinRL library


In [9]:
## install finrl library
!pip install wrds
!pip install swig

!pip install 'shimmy>=2.0'
!pip install pandas_market_calendars
!apt-get update -y -qq && apt-get install -y -qq cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx swig
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

zsh:1: command not found: apt-get
Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git
  Cloning https://github.com/AI4Finance-Foundation/FinRL.git to /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-req-build-umaja09w
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-req-build-umaja09w
  Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit 69776b349ee4e63efe3826f318aef8e5c5f59648
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl==0.3.8)
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /private/var/folders/ks/bjl76g8d4zxgw0m5p8z2pd9r0000gn/T/pip-install-6hmpbmk9/elegantrl_c58448199acb4c7783f3bc144bba8

## Import Packages

In [1]:
# ===========================
# Suppress Warnings & Backend Setup
# ===========================
import warnings
warnings.filterwarnings("ignore")

import matplotlib
matplotlib.use('Agg')  # Use non-interactive backend for environments without GUI

# ===========================
# Standard Libraries
# ===========================
import os
import sys
import pandas as pd
import numpy as np

# For Jupyter Notebooks (optional, safe to keep)
%matplotlib inline

# ===========================
# FinRL Imports
# ===========================
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split


# ===========================
# Custom Library Path
# ===========================
sys.path.append("../FinRL-Library")

## `process_csv_to_features(csv_path)`

Processes financial data by adding technical indicators and rolling covariance matrices.

### **Features**
- Filters 5-day and 7-day tickers.
- Applies technical indicators.
- Computes rolling returns & covariance matrices (`lookback` window).
- Merges features for portfolio modeling.

### **Returns**
- Processed DataFrame with indicators, returns, and covariance data.


In [2]:
def process_csv_to_features(csv_path, lookback=252):
    # Step 1: Load data
    df = pd.read_csv(csv_path)

    # Step 2: Identify 5-day and 7-day tickers
    day_values_per_tic = df.groupby('tic')['day'].apply(lambda x: sorted(x.unique())).reset_index()
    day_values_per_tic.columns = ['tic', 'unique_days']
    tics_5day = day_values_per_tic[day_values_per_tic['unique_days'].apply(lambda x: x == list(range(5)))]['tic']
    tics_7day = day_values_per_tic[day_values_per_tic['unique_days'].apply(lambda x: x == list(range(7)))]['tic']

    # Step 3: Filter tickers
    df_5day_full = df[df['tic'].isin(tics_5day)]
    df_7day_full = df[df['tic'].isin(tics_7day)]

    # Step 4: Apply technical indicators
    fe = FeatureEngineer(use_technical_indicator=True, use_turbulence=False, user_defined_feature=False)
    df_5day_full = fe.preprocess_data(df_5day_full)
    if not df_7day_full.empty:
        df_7day_full = fe.preprocess_data(df_7day_full)
    else:
        print("[Info] df_7day_full is empty. Skipping technical indicators.")

    # Step 5: Combine and clean
    df = pd.concat([df_5day_full, df_7day_full], ignore_index=False)
    df.index = range(len(df))
    df['date'] = pd.to_datetime(df['date'])
    df = df[df.groupby('date')['date'].transform('count') > 1]
    df = df.sort_values('date').reset_index(drop=True)

    # Step 6: Prepare for covariance matrix computation
    df = df.sort_values(['date', 'tic'], ignore_index=True)
    df.index = df.date.factorize()[0]  # Re-index based on unique date

    cov_list = []
    return_list = []
    unique_indices = df.index.unique()

    for i in range(lookback, len(unique_indices)):
        data_lookback = df.loc[i - lookback:i, :]
        price_lookback = data_lookback.pivot_table(index='date', columns='tic', values='close')
        return_lookback = price_lookback.pct_change().dropna()
        return_list.append(return_lookback)
        cov_list.append(return_lookback.cov().values)

    # Step 7: Merge covariance matrix and return series back
    df_cov = pd.DataFrame({
        'date': df.date.unique()[lookback:], 
        'cov_list': cov_list, 
        'return_list': return_list
    })
    df = df.merge(df_cov, on='date')
    df = df.sort_values(['date', 'tic']).reset_index(drop=True)

    return df


## Data Processing

Apply `process_csv_to_features` to prepare datasets with technical indicators, returns, and covariance matrices.

### **Datasets Processed**
- `processed_0` : `2007-2025_no_crypto.csv`
- `processed_1` : `2015-2025_crypto.csv`
- `processed_2` : `2015-2025_no_crypto.csv`

In [3]:
processed_0 = process_csv_to_features('2007-2025_no_crypto.csv')
processed_1 = process_csv_to_features('2015-2025_crypto.csv')
processed_2 = process_csv_to_features('2015-2025_no_crypto.csv')

Successfully added technical indicators
[Info] df_7day_full is empty. Skipping technical indicators.
Successfully added technical indicators
Successfully added technical indicators
Successfully added technical indicators
[Info] df_7day_full is empty. Skipping technical indicators.


In [4]:
processed_0.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-06-02,100.96001,101.23,100.56,100.75999,601763.0,agg,0,-0.288574,102.713559,100.68944,44.710578,-177.717942,27.183401,101.798766,102.091751,"[[1.1652520209219119e-05, 6.538966902923824e-0...",tic agg bil gld ...
1,2008-06-02,91.66,91.66,91.56022,91.62,41980.0,bil,0,-0.010875,91.685852,91.564578,48.327996,-51.929205,7.787558,91.656144,91.723087,"[[1.1652520209219119e-05, 6.538966902923824e-0...",tic agg bil gld ...
2,2008-06-02,87.96001,88.55,87.44,87.47,5279523.0,gld,0,-0.089521,92.18139,84.012634,48.189597,6.244685,9.243352,87.829675,90.330049,"[[1.1652520209219119e-05, 6.538966902923824e-0...",tic agg bil gld ...
3,2008-06-02,138.89999,139.86,138.0,139.83,181069872.0,spy,0,0.299452,143.288854,137.326146,50.677289,-58.512146,4.429488,139.929333,136.816167,"[[1.1652520209219119e-05, 6.538966902923824e-0...",tic agg bil gld ...
4,2008-06-02,67.91,68.50999,67.161,68.50999,238323.0,vb,0,0.759427,68.960453,65.514955,56.743234,86.050016,12.869939,66.576536,64.415101,"[[1.1652520209219119e-05, 6.538966902923824e-0...",tic agg bil gld ...


In [5]:
processed_1.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2016-02-02,109.32,109.38,109.1301,109.23,4835115.0,agg,1,0.207871,109.414436,108.155564,56.40162,128.048919,18.311124,108.54,108.509083,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil btcusd ...
1,2016-02-02,91.34,91.36,91.34,91.34,1627284.0,bil,1,-0.003626,91.386806,91.327194,46.986934,-124.094203,12.39946,91.360333,91.364333,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil btcusd ...
2,2016-02-02,372.93,374.41,371.17,371.33,6817.0,btcusd,1,-12.397434,422.125588,352.168412,45.804685,-84.272928,40.704445,405.695,419.803333,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil btcusd ...
3,2016-02-02,108.09,108.18,107.35,107.92,6656018.0,gld,1,1.135824,108.6201,102.47782,56.790972,143.671556,35.269843,104.479307,103.65032,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil btcusd ...
4,2016-02-02,190.16,191.97,189.54,191.96,182564890.0,spy,1,-2.744645,199.106331,183.296129,43.397377,-51.889822,20.952982,195.478153,201.393583,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil btcusd ...


In [6]:
processed_2.head()

Unnamed: 0,date,close,high,low,open,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2016-02-02,109.32,109.38,109.1301,109.23,4835115.0,agg,1,0.207871,109.414436,108.155564,56.40162,128.048919,18.311124,108.54,108.509083,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil gld ...
1,2016-02-02,91.34,91.36,91.34,91.34,1627284.0,bil,1,-0.003626,91.386806,91.327194,46.986934,-124.094203,12.39946,91.360333,91.364333,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil gld ...
2,2016-02-02,108.09,108.18,107.35,107.92,6656018.0,gld,1,1.135824,108.6201,102.47782,56.790972,143.671556,35.269843,104.479307,103.65032,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil gld ...
3,2016-02-02,190.16,191.97,189.54,191.96,182564890.0,spy,1,-2.744645,199.106331,183.296129,43.397377,-51.889822,20.952982,195.478153,201.393583,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil gld ...
4,2016-02-02,99.93,101.3,99.56,101.23,806243.0,vb,1,-2.286971,107.259181,95.234819,39.656457,-58.511357,32.436105,104.496667,109.190167,"[[5.736252007431309e-06, 6.710303377777988e-09...",tic agg bil gld ...


## `run_min_variance_portfolio(...)`

Implements a Minimum Variance Portfolio (MVP) strategy using `PyPortfolioOpt`. Calculates optimal weights daily and tracks portfolio performance.

### **Key Features**
- Computes daily minimum variance allocations.
- Tracks portfolio value and positions over time.
- Saves daily returns and portfolio weights to CSV.
- Organized output in: `<dataset>/min/`

### **Parameters**
- `df`: Processed DataFrame with returns & prices.
- `start_date`, `end_date`: Trading period.
- `initial_capital`: Starting portfolio value.
- `weight_bound`: Constraints on asset weights.
- `original_csv_path`: Dataset name for folder structure.

### **Outputs**
- `df_daily_return`: Daily return series.
- `df_positions`: Daily portfolio weights.

### **Returns**
- Tuple: `(df_daily_return, df_positions)`


In [7]:
import os
import pandas as pd
import numpy as np
from pypfopt.efficient_frontier import EfficientFrontier

def run_min_variance_portfolio(df, 
                                start_date, 
                                end_date, 
                                initial_capital=1_000_000, 
                                weight_bound=(0, 0.15),
                                output_return_csv='df_daily_return_min.csv',
                                output_position_csv='df_positions_min.csv',
                                original_csv_path='data.csv'):
    """
    Calculate Minimum Variance Portfolio.
    Outputs:
      - Daily returns CSV
      - Daily portfolio positions CSV
    Folder structure: /<csv_name>/min/
    """

    # === Step 1: Setup Folder Structure ===
    base_name = os.path.splitext(os.path.basename(original_csv_path))[0]   # e.g., 'data'
    model_folder = "min"
    target_folder = os.path.join(base_name, model_folder)

    if not os.path.exists(target_folder):
        os.makedirs(target_folder)
        print(f"[INFO] Created folder: {target_folder}")

    # Define full paths for outputs
    output_return_csv_path = os.path.join(target_folder, output_return_csv)
    output_position_csv_path = os.path.join(target_folder, output_position_csv)

    # === Step 2: Filter trade dates ===
    trade = data_split(df, start_date, end_date)
    unique_trade_date = trade.date.unique()

    trade_dates = [d for d in unique_trade_date if pd.Timestamp(start_date) <= d <= pd.Timestamp(end_date)]

    if len(trade_dates) < 2:
        raise ValueError("Not enough trade dates between start_date and end_date.")

    # === Step 3: Initialize Portfolio and Position Tracking ===
    portfolio = pd.DataFrame(index=range(1), columns=trade_dates)
    portfolio.loc[0, trade_dates[0]] = initial_capital

    positions_tracking = []   # To store daily weights

    # === Step 4: Loop Through Dates for Min Variance Allocation ===
    for i in range(len(trade_dates) - 1):
        df_temp = df[df.date == trade_dates[i]].reset_index(drop=True)
        df_temp_next = df[df.date == trade_dates[i + 1]].reset_index(drop=True)

        Sigma = df_temp.return_list[0].cov()

        ef_min_var = EfficientFrontier(None, Sigma, weight_bounds=weight_bound)
        ef_min_var.min_volatility()
        cleaned_weights = ef_min_var.clean_weights()

        # --- Track Positions ---
        position_record = {"date": trade_dates[i]}
        position_record.update(cleaned_weights)
        positions_tracking.append(position_record)

        # --- Portfolio Value Update ---
        cap = portfolio.iloc[0, i]
        current_cash = [w * cap for w in cleaned_weights.values()]
        current_shares = np.array(current_cash) / np.array(df_temp.close)
        next_price = np.array(df_temp_next.close)
        portfolio.iloc[0, i + 1] = np.dot(current_shares, next_price)

    # === Step 5: Calculate Daily Returns ===
    portfolio = portfolio.T
    portfolio.columns = ['account_value']

    df_daily_return = portfolio.copy()
    df_daily_return["daily_return"] = df_daily_return["account_value"].pct_change()
    df_daily_return = df_daily_return.infer_objects(copy=False)
    df_daily_return = df_daily_return.reset_index().rename(columns={"index": "date"})
    df_daily_return.loc[0, "daily_return"] = 0.0
    df_daily_return = df_daily_return[["date", "daily_return"]]

    # === Step 6: Save Outputs ===
    df_daily_return.to_csv(output_return_csv_path, index=False)
    print(f"[INFO] Daily returns saved to {output_return_csv_path}")

    df_positions = pd.DataFrame(positions_tracking)
    df_positions.to_csv(output_position_csv_path, index=False)
    print(f"[INFO] Portfolio positions saved to {output_position_csv_path}")

    return df_daily_return, df_positions


## Minimum Variance Portfolio Execution

Runs the `run_min_variance_portfolio` strategy across three datasets to calculate daily returns and portfolio positions.

### **Workflow**
- Applies MVP strategy with:
  - **Initial Capital**: \$1,000,000  
  - **Weight Bounds**: (0, 0.15)  
  - **Trade Period**: 2023-04-05 to 2025-04-10
- Outputs saved in `/min/` folder within each dataset directory.

### **Datasets Processed**
1. `2007-2025_no_crypto.csv`
2. `2015-2025_crypto.csv`
3. `2015-2025_no_crypto.csv`

### **Outputs**
- `df_daily_return_min.csv` : Daily returns  
- `df_positions_min.csv` : Daily portfolio weights


In [8]:
TRADE_START_DATE = '2023-04-05'
TRADE_END_DATE = '2025-04-10'


df_daily_return_min = run_min_variance_portfolio(
    df=processed_0,
    start_date=TRADE_START_DATE,
    end_date=TRADE_END_DATE,
    initial_capital=1_000_000,
    weight_bound=(0.01, 0.25),
    output_return_csv='df_daily_return_min.csv',
    output_position_csv='df_positions_min.csv',
    original_csv_path='2007-2025_no_crypto.csv'
)

df_daily_return_min = run_min_variance_portfolio(
    df=processed_1,
    start_date=TRADE_START_DATE,
    end_date=TRADE_END_DATE,
    initial_capital=1_000_000,
    weight_bound=(0.01, 0.25),
    output_return_csv='df_daily_return_min.csv',
    output_position_csv='df_positions_min.csv',
    original_csv_path='2015-2025_crypto.csv'
)

df_daily_return_min = run_min_variance_portfolio(
    df=processed_2,
    start_date=TRADE_START_DATE,
    end_date=TRADE_END_DATE,
    initial_capital=1_000_000,
    weight_bound=(0.01, 0.25),
    output_return_csv='df_daily_return_min.csv',
    output_position_csv='df_positions_min.csv',
    original_csv_path='2015-2025_no_crypto.csv'
)

[INFO] Daily returns saved to 2007-2025_no_crypto/min/df_daily_return_min.csv
[INFO] Portfolio positions saved to 2007-2025_no_crypto/min/df_positions_min.csv
[INFO] Daily returns saved to 2015-2025_crypto/min/df_daily_return_min.csv
[INFO] Portfolio positions saved to 2015-2025_crypto/min/df_positions_min.csv
[INFO] Daily returns saved to 2015-2025_no_crypto/min/df_daily_return_min.csv
[INFO] Portfolio positions saved to 2015-2025_no_crypto/min/df_positions_min.csv
