In [None]:
/**
 * @file portfolio-optimization-nsga2-cpp.ipynb
 *
 * A simple practical application of Non Dominated Sorting Genetic Algorithm-2
 * (NSGA2) in portfolio optimization. This example allows user to freely choose 
 * multiple stocks of their choice, which upon request, generates csv automagically 
 * via a helper function.
 *
 * The algorithm will try and optimize the trade-off between the returns and
 * volatility of the requested stocks.
 *
 * Data from Pandas Datareader library (https://pandas-datareader.readthedocs.io/en/latest/).
 */

In [None]:
#define ARMA_DONT_USE_WRAPPER

In [None]:
#include <mlpack/xeus-cling.hpp>

#include <ensmallen.hpp>
#include "../utils/portfolio.hpp"

In [None]:
// Header files to create and show the plot.
#define WITHOUT_NUMPY 1
#include "matplotlibcpp.h"
#include "xwidgets/ximage.hpp"

namespace plt = matplotlibcpp;

In [None]:
using namespace ens;

In [None]:
using namespace ens::test;

### 1. Set the Model Parameters

In this section, we will select the parameters for the optimizer. Parameters include name of the stocks, starting date, ending date and Finance API Source.

In [None]:
//! Declare user specified data.
std::string stocks, startDate, endDate, dataSource;

In [None]:
std::cout << "Type the name of symbol of the stocks via comma separated values (no spaces)" << std::endl;
std::cin >> stocks;

We're setting the data source to Yahoo Finance API by default. We encourage users to use custom data source, please refer pandas-datareader documentation for a list of available API sources.

In [None]:
dataSource = "yahoo";

//! Uncomment to set custom data-source
//std::cin >> dataSource;

In [None]:
std::cout << "Starting Date (YYYY/MM/DD or DD/MM/YYYY)" << std::endl;
std::cin >> startDate;

In [None]:
std::cout << "End Date (YYYY/MM/DD or DD/MM/YYYY)" << std::endl;
std::cin >> endDate;

### 2. Loading the Dataset

In this section, we will create a helper class which will generate the CSV file for us based on the parameters provided in previous sections. This class would also define the objective functions in question, namely: Return and Volatility. Ideally, we would want to maximize the returns and reduce the volatility. Since our implementation of algorithm works on minimization of all objectives, we have appended negative sign to the returns objective which converts it into a minimization problem.

In [None]:
class PortfolioFunction
{
  public:
    PortfolioFunction(const std::string& stocks,
                      const std::string& dataSource,
                      const std::string& startDate,
                      const std::string& endDate)
    {
    //! Generate the requested csv file.
      Portfolio(stocks, dataSource, startDate, endDate,"portfolio.csv");
      returns.load("portfolio.csv", arma::csv_ascii);
      returns.shed_col(0);

      assets = returns.n_cols;
    }

    //! Get the starting point.
    arma::mat GetInitialPoint()
    {
      return arma::Col<double>(assets, 1, arma::fill::zeros);
    }

    struct ObjectiveA
    {
        ObjectiveA(const arma::mat& returns) : returns(returns) {}

        double Evaluate(const arma::mat& coords)
        {
          const double portfolioReturns = arma::accu(arma::mean(returns) %
              coords.t()) * 252;

          return -portfolioReturns;
        }

        arma::mat returns;
    };

    struct ObjectiveB
    {
        ObjectiveB(const arma::mat& returns) : returns(returns) {}

        double Evaluate(const arma::mat& coords)
        {
          const double portfolioVolatility = arma::as_scalar(arma::sqrt(
                coords.t() * arma::cov(returns) * 252 * coords));
          return portfolioVolatility;
        }

        arma::mat returns;
    };

    //! Get objective functions.
    std::tuple<ObjectiveA, ObjectiveB> GetObjectives()
    {
      return std::make_tuple(ObjectiveA(returns), ObjectiveB(returns));
    }

    arma::mat returns;
    size_t assets;
};


//! The constructor will generate the csv file.
PortfolioFunction pf(stocks, dataSource, startDate, endDate);

const double lowerBound = 0;
const double upperBound = 1;

ens::NSGA2 opt(20, // population size: The number of candidates in the population.
               300, // max generations: The maximum number of generations allowed.
               0.5, // crossover probability: The probability that the elites reproduce.
               0.5, // mutation  probability: The probability of mutation among the elite.
               1e-3, // mutation strength: The strength of the mutation.
               1e-6, // epsilon: The minimum difference required to distinguish between two solutions.
               lowerBound, // lowerBound: Lower bound of the coordinates of the initial population
               upperBound // upperBound: Upper bound of the coordinates of the initial population
               );

arma::mat coords = pf.GetInitialPoint();
auto objectives = pf.GetObjectives();

### 3. Optimization 

The NSGA2 is a genetic algorithm which works by assigning fitness to each population member based on its performance in each objective. The member "dominates" another if it's assigned fitness is better than other, this creates an "elite" population. The elite population reproduce among themselves to produce even better off-springs. This process is done iteratively to arrive at an optimal set of solution known as the "Pareto Front". 

Begin Optimization!

In [None]:
opt.Optimize(objectives, coords);

Let's collect the results and inspect our first set of solution.

In [None]:
arma::cube paretoFront = opt.ParetoFront();

std::cout << paretoFront.slice(0) << std::endl;

Convert to neccessary data structure.

In [None]:
size_t populationSize = paretoFront.n_slices;

//! Store the X, Y coordinates of the Pareto Front
std::vector<double> frontX(populationSize, 0.);
std::vector<double> frontY(populationSize, 0.);

for (size_t idx = 0; idx < populationSize; ++idx)
{
    frontX[idx] = paretoFront.slice(idx)(0);
    frontY[idx] = paretoFront.slice(idx)(1);
}

### 4.  Plotting

Recall that previously, we appended -ve sign to the returns objective to convert it to minimization problem.

In [None]:
plt::figure_size(800, 800);
plt::plot(frontX, frontY);
plt::xlabel("Returns Objective");
plt::ylabel("Volatility Objective");

plt::title("The Pareto Front");
plt::legend();

plt::save("./plot.png");
auto im = xw::image_from_file("plot.png").finalize();
im

### 5. Final Thoughts

In this notebook, we've seen how a MultiObjective Optimization algorithm can help in investing in stocks. We specified custom stocks and seen in our algorithm optimize the returns vs volatility trade-off in live. Feel free to play around by selecting various stocks and see how the outcomes plays off. 