# Portfolio Optimization

In this tutorial, we consider portfolio optimization.
A portfolio is a combination of financial instruments such as stocks and bonds.
Unlike bank deposits, these financial instruments are risky assets with uncertain future payouts and returns.
When investing in such risky assets, it is advisable to construct an appropriate portfolio and diversify investments across assets that are not highly correlated.
In this way, if the value of one asset falls significantly, it can be covered by another asset that is not significantly correlated with its price.

In general, diversified investment involve a trade-off between risk and return.
Portfolios must be constructed to properly reflect this trade-off, limiting return variability and risk of loss while maximizing return.

Here, we will use Fixstars Amplify to optimize the portfolio.
The optimization uses estimates based on the historical data method.
The historical data method is an estimation method that uses historical price data to determine assets' expected returns and risks.
In this sample program, we will explain the following:

- [Historical data](#1)
- [Formulation](#2)
- [Execution and evaluation](#3)
- [Operation simulation](#4)

This sample program is intended to demonstrate an optimization application using Fixstars Amplify.
The user is responsible for performing actual operations based on the information provided in this sample program.


<a id="1"></a>

## Historical data

First, we create time series data of stock prices. We prepared dummy simulation data (`dummy_stock_price.csv`) based on geometric Brownian motion. In the simulation, the parameters of the stochastic differential equation are determined so that the distribution of profit rate, volatility (risk), and correlation coefficient among stocks matches the price movements of each stock in the NASDAQ index from 2023 to the first half of 2024 as much as possible. Please refer to the end of this tutorial to learn how to obtain real data.

The following `load_stock_prices()` function can be used to load simulation data.
The stock price data is represented as a `pandas.DataFrame`, with the stock name (or ticker name or code) as a column, the date as a row, and the closing price of each stock on each day as a value. For example, the following table shows the data.

| Date       | Stock 1 | Stock 2 | Stock 3 | ... |
| ---------- | ------- | ------- | ------- | --- |
| 2024/03/01 | 123     | 310     | 2102    | ... |
| 2024/03/04 | 126     | 310     | 2110    | ... |
| 2024/03/05 | 131     | 313     | 2123    | ... |
| 2024/03/06 | 127     | 302     | 2140    | ... |
| ...        | ...     | ...     | ...     | ... |


In [None]:
import datetime
import pandas as pd

In [None]:
def load_stock_prices() -> pd.DataFrame:
    return pd.read_csv(
        "../../../storage/portfolio/dummy_stock_price.csv",
        index_col="Date",
        parse_dates=True,
    )

<a id="2"></a>

## Formulation

We are ready to obtain time series data on stock prices. Now, we formulate the formulation to achieve the optimal portfolio.

First, we pick up the candidate stocks to invest in.
Suppose that we choose $n$ financial instruments: financial instrument $0$, $1$, $\ldots$, and financial instrument $n -1$.
Let $w_i$ (%) be the ratio of investment in financial instrument $i$ to the total investment amount, the constraint that $w_i$ must satisfy the following equation:

$$
\sum_{i=0}^{n-1} w_i = 100
$$

For the convenience of optimization using Amplify, $w_i$ is assumed to be an integer value.
Also, since investing a large percentage in a single financial instrument is risky, we limit the percentage invested in a single financial instrument to 20% or less of the total amount invested.

$$
0 \leq w_i \leq 20
$$

Some ways exist to create a portfolio from these $n$ candidate financial instruments.
Among them, we consider a “good portfolio” with a high expected rate of return and low risk.

First, for the expected rate of return, we will simply adopt the average rate of return for any $d$ business day in the stock price data when operated during that period.

When a stock is invested for a certain period of time, the value at the beginning is $p_{s}$, and the value at the end is $p_{e}$, and the rate of return $r$ is defined as:

$$
r = (p_{e} - p_{s}) / p_{s}
$$

The expected rate of return on the entire portfolio $r_p$ is obtained by the following equation with the rate of return on financial instrument $i$ $r_i$:

$$
r_p = \sum_{i=0}^{n-1} r_i w_i
$$

If other indicators can provide a better numerical assessment of expected return (or “goodness” of a stock), you can use them.

Next, we consider risk.
Risk in diversified investments is usually the variance of the total portfolio rate of return $\sigma^2_p$, which can be described using the covariance of the rates of return of financial instruments $i$ and $j$ $\sigma_{i,j}$ as the following equation:

$$
\sigma_p^2 = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} w_i w_j \sigma_{i,j}.
$$

We have now defined the rate of return and risk.
To make the rate of return $r_p$ larger and the risk $\sigma_p^2$ smaller, the following mean and variance model $f(w_i)$ is used as the objective function and optimized to minimize it.

$$
f(w_i) = - r_p + \frac{\gamma}{2} \sigma_p^2.
$$

Here, $\gamma$ is a parameter related to the balance between rate of return and risk.
The portfolio optimization implemented below uses $\gamma=20$ as the default setting. Larger $\gamma$ means more emphasis is placed on risk reduction.


## Implementation

We will formulate the above model using the Amplify SDK.
First, we define an auxiliary function, `calculate_return_rates()`, which calculates the rate of return.
This function returns a two-dimensional array that contains the return rate for a given time-series data `price` for the number of days `num_days_operation` from the given date and time for each starting date and each stock.


In [None]:
import amplify
import numpy as np

In [None]:
def calculate_return_rates(prices: np.ndarray, num_days_operation: int) -> np.ndarray:
    return (prices[num_days_operation:] - prices[:-num_days_operation]) / prices[
        :-num_days_operation
    ]

Next, we define the formulation and optimization functions.

The following `optimize_portfolio()` function takes historical stock price time series data, the number of days to run, and other parameters necessary for optimization. It returns an optimized portfolio, along with its rate of return and variance.

The function's `gamma` parameter indicates the balance between the rate of return and risk reduction in portfolio construction; the higher the value, the greater the degree to which risk reduction is attempted.
Also, `max_w` represents the maximum percentage of the total investment that can be made in a single issue.

(Note: for the convenience of the QUBO formulation, a larger `max_w` is a more complex optimization problem because it increases the number of variables sent to the machine.)

`constraint_weight` and `timeout` are parameters related to the accuracy of the solution. They represent the weight of converting the constraints to penalty functions and the execution time of Amplify AE, respectively. For more information, see the documents “[Constraints and Penalty Functions](https://amplify.fixstars.com/en/docs/amplify/v1/penalty.html#penalty-weight)” and “[Solver Client](https://amplify.fixstars.com/en/docs/amplify/v1/clients.html#id2)”.

The larger (more complex) the problem, the larger the run time should be, but since this is a tutorial, we set it to 5 seconds.


In [None]:
def optimize_portfolio(
    historical_data: pd.DataFrame,
    num_days_operation: int,
    gamma: float = 20,
    max_w: int = 20,
    constraint_weight: float = 1.0,
    timeout: datetime.timedelta = datetime.timedelta(seconds=5),
):
    # List of names of stocks
    stock_names = list(historical_data.columns)

    # Create a variable `w_i` representing the investment ratio (%) (an integer variable that takes values between 0 and `max_w`)
    gen = amplify.VariableGenerator()
    w = gen.array("Integer", len(stock_names), bounds=(0, max_w))

    # Create constraint (the sum of w is 100)
    constraint = amplify.equal_to(w.sum(), 100)

    # Create objective function (maximize average rate of return)
    w_ratio = w / 100  # w (in %) converted to a real number

    # Calculate the rate of return for each stock after num_days_operation business days of operation
    return_rates = calculate_return_rates(
        historical_data.to_numpy(), num_days_operation
    )

    # Formulate portfolio rate of return
    # Add up (average rate of return) * (investment percentage) for each stock
    portfolio_return_rate = (w_ratio * np.mean(return_rates, axis=0)).sum()

    # Calculate the covariance (two-dimensional array) of the portfolio
    # The i rows and j columns of the array represent the covariance of the rates of return for stocks i and j
    covariance_matrix = np.cov(return_rates, rowvar=False)

    # Formulate portfolio risk
    # Quadratic polynomial about w representing the variance of the overall rate of return
    portfolio_variance = w_ratio @ covariance_matrix @ w_ratio  # type: ignore

    # Formulate objective function (gamma is a parameter representing the degree of risk aversion)
    objective = -portfolio_return_rate + 0.5 * gamma * portfolio_variance

    # Create optimization model
    model = amplify.Model(objective, constraint_weight * constraint)

    # Create a solver client and configure the solver
    client = amplify.FixstarsClient()
    client.parameters.timeout = timeout
    # If you use Amplify in a local environment, enter the Amplify AE API token.
    # client.token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

    # Perform optimization
    result = amplify.solve(model, client)

    # Analyze execution results
    if len(result) == 0:
        raise RuntimeError("No feasible solution found")

    # Get how much (%) to invest in each of all stocks
    w_values = w.evaluate(result.best.values)

    # Create portfolio by selecting only stocks with investment ratios greater than 0
    portfolio = {
        stock_name: int(w_value)
        for stock_name, w_value in zip(stock_names, w_values)
        if w_value > 0
    }

    # Calculate the rate of return and risk of the obtained portfolio
    return_rate = portfolio_return_rate.evaluate(result.best.values)
    variance = portfolio_variance.evaluate(result.best.values)
    return portfolio, return_rate, variance

<a id="3"></a>

## Execution and evaluation

Now that the implementation is complete, we construct the optimal portfolio using the dummy data in practice. First, we create a `pd.DataFrame` representing the stock price data.

The following `stock_prices` contains fictitious stock price data. The data period is for two years (500 business days) from 2023 to 2024, and the stocks are for 100 fictitious companies with color names in their company names.


In [None]:
stock_prices = load_stock_prices()
stock_prices

For example, look at the price movements of `salmon`, `darkslategray`, and `hotpink` stocks over a two years.


In [None]:
import matplotlib.pyplot as plt

plt.plot(stock_prices["salmon"], color="salmon")
plt.plot(stock_prices["darkslategray"], color="darkslategray")
plt.plot(stock_prices["hotpink"], color="hotpink")
plt.show()

Next, we determine the date to start the operation.
We assume that the operation starts on January 1, 2024, for validation after the investment period. (In actual portfolio optimization, the operation would start when the optimized portfolio is obtained. Hence, the validation will be done in the future.)

In optimization, only data past the start date can be used. In this case, we will use one year of data backward from the start date for optimization.
From the `stock_prices` created earlier, we will use only the stock price data for 2023 as historical data.


In [None]:
stock_prices_history = stock_prices.loc["2023":"2023"]

### Optimization

Using the `optimize_portfolio()` function implemented above, we obtain an optimized portfolio based on time series data of stock prices.
The rate of return is calculated based on an investment period of 20 business days.


In [None]:
portfolio, return_rate, variance = optimize_portfolio(
    stock_prices_history, num_days_operation=20
)

We visualize the portfolio using a pie chart.


In [None]:
import matplotlib

# Color map
colors = tuple(matplotlib.colormaps.get_cmap("Set3")(range(12)))

# Draw a pie chart
patches, texts, autotexts = plt.pie(  # type: ignore
    list(portfolio.values()),
    labels=list(portfolio.keys()),
    radius=1.5,
    autopct="%.f%%",
    colors=colors,
    labeldistance=0.8,
    wedgeprops={"linewidth": 1.0, "edgecolor": "white"},
    pctdistance=0.5,
)
for text in texts:
    text.set_horizontalalignment("center")
plt.show()

### Evaluation

Using the above portfolio, we plot a histogram of the profit margin for any 20 business days in 2024 to see how much profit margin can be obtained with minimal risk.

For simplicity, we assume that the portfolio can be purchased and sold at the same price as the closing price on the operation's start and end dates.
As a comparison, we will also evaluate portfolios created with the following policies.

- Invest all funds in stocks with the highest average return
- Allocate funds equally to all stocks


In [None]:
df_future = stock_prices.loc["2024":"2024"]
num_days_operation = 20

historical_return_rates = calculate_return_rates(
    stock_prices_history.to_numpy(), num_days_operation
)
max_profit_stock: str = stock_prices_history.columns[
    historical_return_rates.mean(axis=0).argmax()
]
max_profit_portfolio = {max_profit_stock: 100}
uniform_ratio_portfolio = {stock_name: 1 for stock_name in df_future.columns}


def calculate_portfolio_return_rates(portfolio: dict[str, int]):
    return_rates = calculate_return_rates(df_future.to_numpy(), num_days_operation)
    ratio_array = (
        np.array([portfolio.get(stock_name, 0) for stock_name in df_future.columns])
        / 100
    )  # Array of investment percentages for each stock
    return (ratio_array * return_rates).sum(
        axis=1
    )  # Calculate the rate of return for the entire portfolio for each investment start date


optimized_return_rates = calculate_portfolio_return_rates(portfolio)

max_profit_return_rates = calculate_portfolio_return_rates(max_profit_portfolio)

uniform_return_rates = calculate_portfolio_return_rates(uniform_ratio_portfolio)

print(
    f"optimized:  max return rate = {np.max(optimized_return_rates) * 100:.2f}%, "
    f"mean return rate = {np.mean(optimized_return_rates) * 100:.2f}%, "
    f"min return rate = {np.min(optimized_return_rates) * 100:.2f}%, "
    f"variance = {np.var(optimized_return_rates):.5f}"
)
print(
    f"max profit: max return rate = {np.max(max_profit_return_rates) * 100:.2f}%, "
    f"mean return rate = {np.mean(max_profit_return_rates) * 100:.2f}%, "
    f"min return rate = {np.min(max_profit_return_rates) * 100:.2f}%, "
    f"variance = {np.var(max_profit_return_rates):.5f}"
)
print(
    f"uniform:    max return rate = {np.max(uniform_return_rates) * 100:.2f}%, "
    f"mean return rate = {np.mean(uniform_return_rates) * 100:.2f}%, "
    f"min return rate = {np.min(uniform_return_rates) * 100:.2f}%, "
    f"variance = {np.var(uniform_return_rates):.5f}"
)

bins = np.linspace(-40, 40, 50)
plt.hist(
    optimized_return_rates * 100,
    label="optimized",
    bins=bins,  # type: ignore
    color="royalblue",
    alpha=0.8,
    zorder=3,
)
plt.hist(
    max_profit_return_rates * 100,
    label="max profit",
    bins=bins,  # type: ignore
    color="coral",
    alpha=0.8,
    zorder=1,
)
plt.hist(
    uniform_return_rates * 100,
    label="uniform ratio",
    bins=bins,  # type: ignore
    color="gold",
    alpha=0.8,
    zorder=2,
)
plt.legend()
plt.xlabel("return rate (%)")
plt.show()

As the displayed histogram is closer to the right side, the rate of return is higher. If the histogram is growing sideways, the rate of return is more varied, and the risk is more significant, especially if the rate of return is negative.

The optimized portfolio has a lower average rate of return than investing in a single stock, but the variance in the rate of return is smaller. In addition, we can expect that stocks with higher overall rates of return are selected compared to the case in which all stocks are invested equally.

By changing the parameter $\gamma$, which represents the degree of risk aversion, we can adjust the trade-off between the rate of return and risk to suit our investment objectives. Try changing the argument of the `optimize_portfolio()` function.


## Application: Operation simulation

<a id="4"></a>

The implemented portfolio optimization will be used to simulate operations more realistically.

In the simulation, the operation begins on January 1st, 2024, and consists of 10 rounds, each round being 50 business days long. In each round, the portfolio is optimized using data from 50 business days before the start date, and the stocks are purchased and sold 20 business days later.
All the funds from the sale will be used to purchase the next cycle. However, to make it more realistic, it is assumed that 20% of the profit from each sale will be deducted for taxation. In addition, the purchase and sale amounts will be determined randomly within the range of 0 to 1% increase over the closing price.

The following diagram illustrates the flow of such an operation.

![operation_flow](../figures/portfolio_flow.drawio.svg)


We create a function that simulates one round. Note that the portfolio must be made using data before the operation's start date.


In [None]:
TAX_RATE = 0.2
rng = np.random.default_rng()


def get_portfolio(
    prices: pd.DataFrame,
    start_date: datetime.date,
    num_days_backward: int,
    num_days_operation: int,
) -> dict[str, int]:
    """Obtain an optimized portfolio using historical data

    Args:
        prices (pd.DataFrame): Time series data of stock prices
        start_date (datetime.date): Operation start date
        num_days_backward (int): Number of days of historical data used for optimization

    Returns:
        dict[str, int]: A dictionary whose key is the name of the issue and whose value is the investment ratio
    """

    # Obtain the business day prior to the operation start date
    previous_date = start_date - datetime.timedelta(days=1)

    # Get stock price data for `num_days_backward` days from the operation start date
    stock_price_history = prices.loc[: str(previous_date)].iloc[-num_days_backward:]

    # Create portfolios using data prior to the start of operations
    portfolio, _, _ = optimize_portfolio(stock_price_history, num_days_operation)

    return portfolio


def simulate_stock_trading(
    prices: pd.DataFrame,
    funds: float,
    start_date: datetime.date,
    num_days_operation: int,
    portfolio: dict[str, int],
    tax_rate=TAX_RATE,
) -> float:
    """Simulate stock trades based on a given number of days of operation and portfolio

    Args:
        prices (pd.DataFrame): Time series data of stock prices
        funds (float): Operating funds
        start_date (datetime.date): Operation start date
        num_days_operation (int): Number of days in operation
        portfolio (dict[str, int]): Portfolio
        tax_rate (_type_, optional): Capital gains tax rate

    Returns:
        float: Funds from operations results
    """

    # Convert to an array of investment ratios per stock
    weights = np.array(
        [portfolio.get(stock_name, 0) / 100 for stock_name in prices.columns]
    )

    # Determine the purchase price per share for each stock based on the previous business day's stock price
    previous_date = start_date - datetime.timedelta(days=1)
    start_prices = prices.loc[: str(previous_date)].iloc[-1].to_numpy()
    # 1% 程度の購入価格の増分を考慮
    start_prices = start_prices * rng.uniform(1.0, 1.01, size=len(prices.columns))

    # Consider an incremental purchase price of about 1%
    end_prices = prices.loc[str(start_date) :].iloc[num_days_operation - 1].to_numpy()
    # Consider a difference in sale price of about 1%
    end_prices = end_prices * rng.uniform(0.99, 1.0, size=len(prices.columns))

    # Calculate profit margin
    return_rate: float = (weights * (end_prices / start_prices)).sum()

    # Subtract taxable profits if any
    if return_rate > 1:
        return_rate = 1 + (1 - tax_rate) * (return_rate - 1)

    # Return the sale price (= purchase price x profit margin)
    return funds * return_rate

We create a `simulate_stick_operation()` function that repeatedly simulates multiple trading rounds using the `simulate_stock_trading()` function.
The function will return a historical record of each round (purchase date, purchase amount, sale date, sale amount).


In [None]:
def simulate_stick_operation(
    prices: pd.DataFrame,
    num_rounds: int,
    simulation_start_date: datetime.date,
    num_days_sampling: int,
    num_days_operation: int,
) -> list[tuple[datetime.date, float, datetime.date, float]]:
    """Simulate stock trading based on a given number of operating days and cycles

    Args:
        prices (pd.DataFrame): Time series data of stock prices
        num_rounds (int): Number of rounds
        simulation_start_date (datetime.date): Simulation start date
        num_days_sampling (int): Number of days of historical data to be used for optimization
        num_days_operation (int): Number of days in operation

    Returns:
        list[tuple[datetime.date, float, datetime.date, float]]: _description_
    """

    # start-up capital
    current_funds = 1.0

    # List for storing (date of purchase, amount of purchase, date of sale, amount of sale)
    operation_history: list[tuple[datetime.date, float, datetime.date, float]] = []

    # Time series data on share prices since the simulation start date
    prices_start = prices.loc[str(simulation_start_date) :]

    for i in range(num_rounds):
        # Start and end dates of the round
        start_date = prices_start.iloc[num_days_operation * i].name.date()  # type: ignore
        end_date = prices_start.iloc[
            num_days_operation * i + num_days_operation - 1
        ].name.date()  # type: ignore

        print(f"Round: {i+1}/{num_rounds}, {start_date} - {end_date}")

        # Optimize the portfolio
        portfolio = get_portfolio(
            prices, start_date, num_days_sampling, num_days_operation
        )

        # Simulation of stock trading
        next_funds = simulate_stock_trading(
            prices, current_funds, start_date, num_days_operation, portfolio
        )

        # Add to operational history
        operation_history.append((start_date, current_funds, end_date, next_funds))

        print(
            f"Profit: {next_funds / current_funds:.3f}, Funds: {current_funds:.3f} -> {next_funds:.3f}"
        )

        current_funds = next_funds

    return operation_history

In the following cell, the simulation is repeated ten times for 20 days of operations, with 2024/01/01 as the simulation start date.
The portfolios are optimized using 100 days of historical data backward from the start date of each operation.
Please note that it takes about 2 minutes to run the cell.


In [None]:
operation_history = simulate_stick_operation(
    stock_prices, 10, datetime.date(2024, 1, 1), 100, 20
)
# Operational results of the final round
operation_history[-1][3]

In the following cells, we plot the obtained operational history.


In [None]:
import itertools
import matplotlib.pyplot as plt
from matplotlib import dates as mdates

ax = plt.figure().add_subplot()


def plot(operation_history, color, label):
    for start_date, start_funds, end_date, end_funds in operation_history:
        (line,) = ax.plot(
            [start_date, end_date], [start_funds, end_funds], color=color, marker="o"
        )
    line.set_label(label)  # type: ignore

    for history1, history2 in itertools.pairwise(operation_history):
        _, _, end_date1, end_funds1 = history1
        start_date2, start_funds2, _, _ = history2
        ax.plot(
            [end_date1, start_date2],
            [end_funds1, start_funds2],
            color=color,
            linestyle=":",
        )


plot(operation_history, "C0", "optimized")

ax.legend(loc="lower right")
ax.set_xlabel("Date", fontsize=10)
ax.set_ylabel("Total asset", fontsize=10)
ax.tick_params(labelsize=10)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m/%d"))

plt.show()

## Appendix

Historical data of actual stock prices can be obtained as follows. Use `pandas_datareader` to retrieve the data downloaded from the database at [Stooq](https://stooq.com/).
Please use it cautiously, as the number of API calls equals the number of stocks.

```python
from pandas_datareader import data as web


def load_historical_data(tickers: list[str], start_date: datetime.date, end_date) -> pd.DataFrame:
    """Download historical data since start_date from Stooq"""
    history_df = pd.DataFrame()
    for idx, ticker in enumerate(tickers):
        ticker_df: pd.DataFrame = web.DataReader(ticker, "stooq", start_date, end_date)
        if len(ticker_df) == 0:
            print(f"failed to get {ticker} data")
            continue
        history_df = history_df.join(ticker_df["Close"].rename(ticker), how="outer")
        print("#", end="\n" if (idx + 1) % 20 == 0 else "")
    history_df.dropna(how="any", inplace=True) # Only keep days when all stocks were traded
    history_df.sort_index(inplace=True)
    return history_df

# Acquire stocks comprising the NASDAQ 100
tickers = ["ADBE", "ADP", "ABNB", "GOOGL", "GOOG", "AMZN", "AMD", "AEP", "AMGN", "ADI",
                     "ANSS", "AAPL", "AMAT", "ASML", "AZN", "TEAM", "ADSK", "BKR", "BIIB", "BKNG",
                     "AVGO", "CDNS", "CDW", "CHTR", "CTAS", "CSCO", "CCEP", "CTSH", "CMCSA", "CEG",
                     "CPRT", "CSGP", "COST", "CRWD", "CSX", "DDOG", "DXCM", "FANG", "DLTR", "DASH",
                     "EA", "EXC", "FAST", "FTNT", "GEHC", "GILD", "GFS", "HON", "IDXX", "ILMN",
                     "INTC", "INTU", "ISRG", "KDP", "KLAC", "KHC", "LRCX", "LIN", "LULU", "MAR",
                     "MRVL", "MELI", "META", "MCHP", "MU", "MSFT", "MRNA", "MDLZ", "MDB", "MNST",
                     "NFLX", "NVDA", "NXPI", "ORLY", "ODFL", "ON", "PCAR", "PANW", "PAYX", "PYPL",
                     "PDD", "PEP", "QCOM", "REGN", "ROP", "ROST", "SIRI", "SBUX", "SNPS", "TTWO",
                     "TMUS", "TSLA", "TXN", "TTD", "VRSK", "VRTX", "WBA", "WBD", "WDAY", "XEL", "ZS"]

df = load_historical_data(tickers, datetime.date(2023, 1, 1), datetime.date.today())
```
