# Modular, Bias-free Robo-Advisor/Portfolio Optimization Package

## Importing the libraries

In [1]:
from customlibs import database as db
from customlibs import core as co

## Built-in ticker scraping function

The ticker scraping function in db scrapes a list of tickers from a given Wikipedia page.

A few Wikipedia page urls have been hardcoded into the db library for convenience

In [2]:
etf_list = []
etf_list += db.get_etf_list(db.america_etfs)
etf_list += db.get_etf_list(db.japan_etfs)
etf_list += db.get_etf_list(db.hongkong_etfs)
etf_list += db.get_etf_list(db.europe_etfs)
etf_list = list(set(etf_list))

## Creating the Database object

The Database object found in the db library is used to download, manage, and save historical price data.

In [3]:
database = db.Database()

## Adding tickers to the Database

Pass a list of tickers to the Database.add_tickers() function to add them to the Database. The list of tickers is stored as part of the Database object, and the missing historical asset price data is downloaded and appended.

In [4]:
database.add_tickers(etf_list)

- EFA: Data doesn't exist for startDate = 946742399, endDate = 946742399
- XBI: Data doesn't exist for startDate = 946742399, endDate = 946742399
- QEH: No data found, symbol may be delisted
- FNDE: Data doesn't exist for startDate = 946742399, endDate = 946742399
- IPK: Data doesn't exist for startDate = 946742399, endDate = 946742399
- FTGS: No data found, symbol may be delisted
- AGG: Data doesn't exist for startDate = 946742399, endDate = 946742399
- AAXJ: Data doesn't exist for startDate = 946742399, endDate = 946742399
- VUG: Data doesn't exist for startDate = 946742399, endDate = 946742399
- EPRO: Data doesn't exist for startDate = 946742399, endDate = 946742399
- IXJ: Data doesn't exist for startDate = 946742399, endDate = 946742399
- MUNI: Data doesn't exist for startDate = 946742399, endDate = 946742399
- ECON: Data doesn't exist for startDate = 946742399, endDate = 946742399
- UPRO: Data doesn't exist for startDate = 946742399, endDate = 946742399
- OEF: Data doesn't exist f

- IYW: Data doesn't exist for startDate = 946742399, endDate = 946742399
- RWX: Data doesn't exist for startDate = 946742399, endDate = 946742399
- FTSD: Data doesn't exist for startDate = 946742399, endDate = 946742399
- IESM: Data doesn't exist for startDate = 946742399, endDate = 946742399
- FNDX: Data doesn't exist for startDate = 946742399, endDate = 946742399
- RJA: Data doesn't exist for startDate = 946742399, endDate = 946742399
- IYM: Data doesn't exist for startDate = 946742399, endDate = 946742399
- ICF: Data doesn't exist for startDate = 946742399, endDate = 946742399
- FTSL: Data doesn't exist for startDate = 946742399, endDate = 946742399
- MGC: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SHV: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SYG: Data doesn't exist for startDate = 946742399, endDate = 946742399
- GYEN: Data doesn't exist for startDate = 946742399, endDate = 946742399
- RWR: Data doesn't exist for startDate = 9467

- GGBP: No data found for this date range, symbol may be delisted
- VSS: Data doesn't exist for startDate = 946742399, endDate = 946742399
- ITOT: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SKF: Data doesn't exist for startDate = 946742399, endDate = 946742399
- EWM: No data found for this date range, symbol may be delisted
- GLD: Data doesn't exist for startDate = 946742399, endDate = 946742399
- DEM: Data doesn't exist for startDate = 946742399, endDate = 946742399
- VOOG: Data doesn't exist for startDate = 946742399, endDate = 946742399
- BLV: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SYLD: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SDY: Data doesn't exist for startDate = 946742399, endDate = 946742399
- RWO: Data doesn't exist for startDate = 946742399, endDate = 946742399
- IJH: Data doesn't exist for startDate = 946742399, endDate = 946742399
- EWH: No data found for this date range, symbol may be deliste

- RPG: Data doesn't exist for startDate = 946742399, endDate = 946742399
- ACWI: Data doesn't exist for startDate = 946742399, endDate = 946742399
- VTV: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SCHV: Data doesn't exist for startDate = 946742399, endDate = 946742399
- PID: Data doesn't exist for startDate = 946742399, endDate = 946742399
- EWU: No data found for this date range, symbol may be delisted
- IPW: No data found for this date range, symbol may be delisted
- SCHZ: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SCHO: Data doesn't exist for startDate = 946742399, endDate = 946742399
- VEA: Data doesn't exist for startDate = 946742399, endDate = 946742399
- CYB: Data doesn't exist for startDate = 946742399, endDate = 946742399
- IEIS: Data doesn't exist for startDate = 946742399, endDate = 946742399
- BRAZ: Data doesn't exist for startDate = 946742399, endDate = 946742399
- SCHM: Data doesn't exist for startDate = 946742399, endDate

## Adding dates to the Database

Pass a single date as a string to the Database.add_date() function to expand the date range to include the new date. The Database will keep track of the date range, and missing historical asset price data is downloaded and appended.

Database objects must have at least one ticker added before dates can be added.

Database objects are initialised with start and end date 2000-01-01 23:59:59

In [5]:
database.add_date("2020-01-01")

- QEH: No data found, symbol may be delisted
- FTGS: No data found, symbol may be delisted
- BRAF: No data found for this date range, symbol may be delisted
- IPD: No data found for this date range, symbol may be delisted
- CRDT: No data found for this date range, symbol may be delisted
- SCPB: No data found for this date range, symbol may be delisted
- ONEF: No data found for this date range, symbol may be delisted
- IRY: No data found for this date range, symbol may be delisted
- IPU: No data found for this date range, symbol may be delisted
- YPRO: No data found, symbol may be delisted
-  SHE: No data found, symbol may be delisted
- RPX: No data found, symbol may be delisted
- BGU: No data found for this date range, symbol may be delisted
- GMMB: No data found for this date range, symbol may be delisted
- GGBP: No data found for this date range, symbol may be delisted
- IPF: Data doesn't exist for startDate = 946828799, endDate = 1577894399
- RRF: No data found, symbol may be delist

Unnamed: 0,SHE,AADR,AAXJ,ACCU,ACWI,ACWX,AGG,ALD,AMLP,AND,...,XLE,XLF,XLI,XLK,XLP,XLU,XLV,XLY,XOP,YPRO
2000-01-03,,,,,,,,,,,...,16.67,9.44,19.72,43.14,14.12,13.15,22.41,23.36,,
2000-01-04,,,,,,,,,,,...,16.36,9.03,19.18,40.96,13.72,12.75,21.90,22.65,,
2000-01-05,,,,,,,,,,,...,16.79,8.96,19.09,40.35,13.97,13.08,21.71,22.38,,
2000-01-06,,,,,,,,,,,...,17.44,9.35,19.35,39.01,14.23,13.05,21.78,22.63,,
2000-01-07,,,,,,,,,,,...,17.62,9.51,20.09,39.69,15.13,13.17,22.04,23.70,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-24,,54.00,72.53,,78.41,48.35,110.66,,40.28,,...,58.10,30.23,80.79,90.52,62.07,62.92,101.39,123.65,93.19,
2019-12-26,,54.30,72.98,,78.83,48.56,110.76,,40.79,,...,58.08,30.40,80.98,91.19,62.13,63.03,101.32,125.16,93.54,
2019-12-27,,54.51,73.42,,78.88,48.65,110.90,,40.14,,...,57.84,30.32,80.91,91.18,62.40,63.21,101.35,125.15,91.97,
2019-12-30,,54.10,72.98,,78.41,48.35,110.90,,39.53,,...,57.65,30.23,80.53,90.66,62.08,63.20,100.74,124.35,91.85,


## Saving Database objects to a local file

Database objects can be saved locally in one of two ways: .csv or pickle. For each function, the name of the local file to save to must be specified.

In [6]:
database.save_to_csv("db.csv")
database.save_to_pickle("db.pickle")

## Reading Database objects a local file

Database objects can be read from local files using the functions corresponding to how they were saved.

In [7]:
database = db.read_from_csv("db.csv")
database = db.read_from_pickle("db.pickle")

## Accessing Database historical asset prices

Historical asset prices stored in Database objects can be accessed simply by calling the .data attribute.

In [8]:
database.data

Unnamed: 0,SHE,AADR,AAXJ,ACCU,ACWI,ACWX,AGG,ALD,AMLP,AND,...,XLE,XLF,XLI,XLK,XLP,XLU,XLV,XLY,XOP,YPRO
2000-01-03,,,,,,,,,,,...,16.67,9.44,19.72,43.14,14.12,13.15,22.41,23.36,,
2000-01-04,,,,,,,,,,,...,16.36,9.03,19.18,40.96,13.72,12.75,21.90,22.65,,
2000-01-05,,,,,,,,,,,...,16.79,8.96,19.09,40.35,13.97,13.08,21.71,22.38,,
2000-01-06,,,,,,,,,,,...,17.44,9.35,19.35,39.01,14.23,13.05,21.78,22.63,,
2000-01-07,,,,,,,,,,,...,17.62,9.51,20.09,39.69,15.13,13.17,22.04,23.70,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-24,,54.00,72.53,,78.41,48.35,110.66,,40.28,,...,58.10,30.23,80.79,90.52,62.07,62.92,101.39,123.65,93.19,
2019-12-26,,54.30,72.98,,78.83,48.56,110.76,,40.79,,...,58.08,30.40,80.98,91.19,62.13,63.03,101.32,125.16,93.54,
2019-12-27,,54.51,73.42,,78.88,48.65,110.90,,40.14,,...,57.84,30.32,80.91,91.18,62.40,63.21,101.35,125.15,91.97,
2019-12-30,,54.10,72.98,,78.41,48.35,110.90,,39.53,,...,57.65,30.23,80.53,90.66,62.08,63.20,100.74,124.35,91.85,


## Filtering data based on data availability requirement

For portfolio construction, it may sometimes be desirable to only select from assets for which a sufficiently large amount of historical data is available.

This is done using the co.filter() function.

The function an unfiltered dataset and a constant, min_frac, as its input, and returns the filtered dataset. If the length of the unfiltered dataset is x, assets with fewer than x * min_frac historical price data points available are removed.

Alternatively, manually screening the dataset based on other requirements can be achieved using pandas functions.

In [9]:
data = co.filter(database.data, 0.5)

## Calculating forward-looking returns

The returns for each asset are calculated using the co.get_return_df() function.

The function takes a filtered dataset and a constant, periods, as its inputs, and returns a dataframe with the periods-duration returns of each asset at each point in time.

In [10]:
return_df = co.get_return_df(data, 252)

# Filtering anomalous behaviour

The downloading function provided above does not guarantee that the data downloaded is free from artifacts, which may occur if the source from which the data is downloaded contains errors. As such, it is important to filter the data for assets with anomalous behaviours, and to remove these assets before proceeding. This can be done using the co.anomaly_filter() function.

The function takes return_df and a constant, max_dev, as its inputs. The higher the value of max_dev, the greater the tolerance for deviation in asset behaviour. The function returns a copy of return_df with the anomalous assets removed.

In [11]:
return_df = co.anomaly_filter(return_df, 3)

## Calculating summary data

Basic functions are provided to calculate the expected returns for each asset (mean_series), and the covariance matrix for all assets (cov_df)

In [12]:
mean_series = co.get_mean_series(return_df)
cov_df = co.get_cov_df(return_df)

## Shortlisting assets

It may be unwieldly or otherwise inefficient to generate portfolios based on a very large set of assets. As such, a utility is provided to narrow down the number of asssets in consideration.

The co.get_shortlist() function takes cov_df and a constant, n_clusters, as its inputs. The assets are clustered using a K-means clustering algorithm into n_clusters number of clusters, based on their characteristics as reflected in the covariance matrix. The list of shortlisted asset tickers is returned.

There are many other ways to construct an asset shortlist, however, most involve some extent of subjective decision-making, which introduces bias. Nonetheless, these can also be implemented with relative ease, by hardcoding the list of tickers.

In [13]:
shortlist = co.get_shortlist(cov_df, 100)

## Generating optimized portfolios

The co.plot_optimize() function takes mean_series, cov_df, and a constant, n_points as inputs. n_inputs number of optimized portfolios are returned. The optimized portfolios are evenly spread out across different levels of expected returns.

The co.eliminate() function eliminates portfolios which are dominated (there is at least one other portfolio with higher or equal returns and lower or equal risk.

In [14]:
portfolio_df = co.plot_optimize(
    mean_series[shortlist], cov_df.loc[shortlist, shortlist], 50
)
portfolio_df = co.eliminate(portfolio_df)

## Other portfolio calculations

Functions are provided to calculate other key metrics for the list of generated portfolios.

co.get_at_risk_series() will return the value-at-risk in the worst x percentile situation for each portfolio.

co.get_loss_prob_series() will return the probability of lower than x (default 0) returns for each portfolio.

co.get_coefs_dict() and co.get_coefs_string() convert the portfolio objects to a more readable form.

In [15]:
portfolio_df["at_risk"] = co.get_at_risk_series(portfolio_df, 0.01)
portfolio_df["loss_prob"] = co.get_loss_prob_series(portfolio_df, 0)
portfolio_df["coefs_dict"] = co.get_coefs_dict(
    portfolio_df, mean_series[shortlist].index, 10 ** -3
)
portfolio_df["coefs_string"] = co.get_coefs_string(portfolio_df)

In [16]:
portfolio_df

Unnamed: 0,coefs,var,mean,at_risk,loss_prob,coefs_dict,coefs_string
0,"[0.0, 0.0, 1.4040129971970912e-15, 0.0, 0.0, 0...",0.112419,0.366134,-0.413867,0.137418,{'UPRO': 0.9999999945050864},UPRO0.9999999945050864
1,"[2.4929667044350543e-16, 5.898180178738709e-16...",0.070281,0.35056,-0.266167,0.093027,"{'UPRO': 0.31129334762116595, 'QLD': 0.6887066...",UPRO0.31129334762116595 QLD0.6887066523788344
2,"[1.0913796901555277e-16, 1.0478089401777796e-1...",0.055542,0.334987,-0.213271,0.0776,"{'QLD': 0.963821787680155, 'EDV': 0.0361782123...",QLD0.963821787680155 EDV0.0361782123198447
3,"[2.3578639411195112e-17, 0.0, 3.25881615654170...",0.046668,0.319413,-0.183143,0.069627,"{'QLD': 0.8978035125018592, 'EDV': 0.102196487...",QLD0.8978035125018592 EDV0.10219648749814213
4,"[0.0, 9.520457073564802e-17, 1.703904856966290...",0.038949,0.30384,-0.155275,0.061833,"{'QLD': 0.8317852375001451, 'EDV': 0.168214762...",QLD0.8317852375001451 EDV0.16821476249985579
5,"[0.0, 9.741337104283444e-19, 4.698332757853279...",0.032377,0.288266,-0.130326,0.054572,"{'UPRO': 0.016012850273290673, 'QLD': 0.748219...",UPRO0.016012850273290673 QLD0.7482191486450445...
6,"[2.5891340802016966e-17, 0.06924811709839378, ...",0.02671,0.272692,-0.107506,0.047604,"{'FDN': 0.0692481170983938, 'UPRO': 0.01924083...",FDN0.0692481170983938 UPRO0.019240835744114105...
7,"[1.2779006080724355e-17, 0.14156375293853124, ...",0.021661,0.257119,-0.085262,0.040317,"{'FDN': 0.1415637529385313, 'UPRO': 0.01331993...",FDN0.1415637529385313 UPRO0.013319937517352814...
8,"[9.591504633176178e-18, 0.20071553060253264, 0...",0.017208,0.241545,-0.063623,0.032786,"{'FDN': 0.20071553060253272, 'UPRO': 0.0115794...",FDN0.20071553060253272 UPRO0.01157944712422396...
9,"[8.588439746708141e-17, 0.24164978486891836, 0...",0.013324,0.225972,-0.042555,0.025134,"{'FDN': 0.24164978486891847, 'QLD': 0.39449550...",FDN0.24164978486891847 QLD0.39449550883596485 ...


## Alternative targeting

In some cases, portfolios aim not for a particular expected level of return, but for some other target metric, such as value-at-risk or loss probability.

This can be accomplished as well. However, the steps for targeting alternative metrics are a little less straightforward.

An initial range of portfolios must first be generated, as we have done above. Then, the target metric must be calculated for each of the generated portfolios, as we have done for value-at-risk and loss probability.

Then a list of targeted values for the new metric must be hardcoded. The co.metric_to_mean() function is then used to convert the list of targeted values for the new metric into target levels of expected returns. Portfolios are then generated targeting each of these new levels.

In [17]:
at_risk_target_list = [
    -0.01,
    -0.02,
    -0.03,
    -0.04,
    -0.05,
    -0.06,
    -0.07,
    -0.08,
    -0.09,
    -0.10,
]
mean_target_list = co.metric_to_mean(
    portfolio_df["mean"], portfolio_df["at_risk"], 3, at_risk_target_list
)
portfolio_df = co.target_optimize(
    mean_series[shortlist], cov_df.loc[shortlist, shortlist], mean_target_list
)

In [18]:
portfolio_df["at_risk"] = co.get_at_risk_series(portfolio_df, 0.01)
portfolio_df["loss_prob"] = co.get_loss_prob_series(portfolio_df, 0)
portfolio_df["coefs_dict"] = co.get_coefs_dict(
    portfolio_df, mean_series[shortlist].index, 10 ** -3
)
portfolio_df["coefs_string"] = co.get_coefs_string(portfolio_df)

In [19]:
portfolio_df

Unnamed: 0,coefs,var,mean,at_risk,loss_prob,coefs_dict,coefs_string
0,"[0.0, 0.2807896554913968, 3.1156951919821916e-...",0.009846,0.209557,-0.021282,0.017349,"{'FDN': 0.2807896554913969, 'QLD': 0.304958784...",FDN0.2807896554913969 QLD0.3049587842132876 ID...
1,"[3.774573630543031e-17, 0.25665205380576245, 7...",0.011823,0.21925,-0.033701,0.021879,"{'FDN': 0.25665205380576256, 'QLD': 0.35822861...",FDN0.25665205380576256 QLD0.35822861487390245 ...
2,"[1.6376619705509373e-17, 0.2362138870426717, 0...",0.013776,0.227907,-0.045134,0.026081,"{'FDN': 0.23621388704267188, 'QLD': 0.40542919...",FDN0.23621388704267188 QLD0.4054291978552987 I...
3,"[1.368064088934103e-17, 0.2167053386907145, 1....",0.015705,0.235782,-0.055753,0.029955,"{'FDN': 0.21670533869071457, 'UPRO': 0.0032930...",FDN0.21670533869071457 UPRO0.00329300475634142...
4,"[2.6970799589997804e-17, 0.20080934496399924, ...",0.017611,0.243039,-0.065679,0.033519,"{'FDN': 0.20080934496399933, 'UPRO': 0.0068781...",FDN0.20080934496399933 UPRO0.00687816903555018...
5,"[0.0, 0.18073406979787648, 0.0, 0.0, 0.0, 8.48...",0.019493,0.249792,-0.075008,0.036798,"{'FDN': 0.18073406979787665, 'UPRO': 0.0131924...",FDN0.18073406979787665 UPRO0.01319248187870787...
6,"[9.319366711508742e-17, 0.14734163602374525, 9...",0.021359,0.256126,-0.083864,0.039842,"{'FDN': 0.1473416360237453, 'UPRO': 0.01371432...",FDN0.1473416360237453 UPRO0.01371432357159465 ...
7,"[0.0, 0.12331075549121606, 3.5809257281717055e...",0.023209,0.262102,-0.092302,0.042674,"{'FDN': 0.12327273858846184, 'UPRO': 0.0234104...",FDN0.12327273858846184 UPRO0.02341042061618548...
8,"[2.7616704563926586e-17, 0.09787056266181571, ...",0.02505,0.26777,-0.100424,0.045338,"{'FDN': 0.09787056266181574, 'UPRO': 0.0266234...",FDN0.09787056266181574 UPRO0.02662341678212844...
9,"[0.0, 0.06690179172284137, 1.0112992017602561e...",0.026874,0.273168,-0.108195,0.047822,"{'FDN': 0.06690179172284141, 'UPRO': 0.0192410...",FDN0.06690179172284141 UPRO0.01924107776922459...


In [20]:
portfolio_df.to_csv("result.csv")

## Installation

Install by downloading and moving the customlibs folder to your project directory.

# Design Decisions

Robo-advisors available to retail investors combine proprietary software and financial expertise to deliver investment strategy and construct portfolios. They are necessarily opaque and prone to bias. There is value to a transparent, bias-free robo-advisor. We have thus designed and developed python packages with the core functionality for such a project. Our product is modular and allows for more complicated and tailored strategies to be rapidly designed and tested. This report serves to detail and explain the architecture of and design decisions behind our product.

The design decisions we have made are (1) with regards to the input the algorithm accepts, (2) with regards to the output of the algorithm, and (3) with regards to the processes of the algorithm.

The robo-advisor core functionality package provides for robo-advisors which take as inputs the historical adjusted closing price data of various exchange-traded assets. The guiding principles behind this decision were simplicity and accessibility. Conventional financial advisors require detailed, highly-technical, and current knowledge on the mechanics of a large range of asset classes. Different asset classes have different properties and behaviors, and accommodating such a range of assets would dramatically increase the size of the package. Exchange-traded funds provide exposure to a large number of asset classes and greatly reduce the complexity associated with monitoring the value and performance of these assets. Adjusted closing prices are a sufficiently-accurate proxy for the value of an exchange-traded fund. Instead of performing different, more complicated (and perhaps less accurate) calculations to ascertain the value of assets of various types, by monitoring the adjusted closing prices of exchange-traded funds, the size and complexity of our product are greatly reduced.

The package provides for robo-advisors which generate a list of optimized portfolios to match a list of investment targets. The investment objectives of retail investors span a wide range, from financial milestones to retirement and succession planning. These objectives can often be complicated by conditions such as irregular cash inflows and large one-off expenses. We have opted not to include models for such a wide range of investment objectives in our package, as doing so would contribute greatly to the bloat of our package, and we do not consider such functionality to be essential. Instead, users will be able to choose from some key metrics that can be used as targets or to implement other custom metrics. Metrics included in the package are (1) target level of expected returns, (2) target level of risk as measured by variance, and (3) target level of risk as measured by value-at-risk. The output of robo-advisors built around the package will be a list of optimized portfolios, each targetting a particular value of the chosen metric. For example, three optimised portfolios with 10%, 20%, and 30% value-at-risk. The ability to create custom metrics and to target values at any interval allows for a great degree of flexibility without dramatically inflating the size of our package.

The process catered for by our package consists of: (1) extracting summary data from the input, (2) selecting a representative shortlist of assets, and (3) generating the list of target portfolios.

Summary data describes the key characteristics of each asset in the input dataset. Our package applies modern portfolio theory to optimize portfolios. Modern portfolio theory in turn requires the expected return of each asset and the covariance matrix relating all assets to each other. We can arrive at this data in a two-step process. First, calculating the returns for each asset over a given period, for investments made at each point in time. Second, by calculating the mean of these returns and the covariance matrix relating the returns of all assets to each other. One parameter in this process is the length of the period over which to calculate returns. A shorter period would allow for a larger dataset of returns, and more granular data may produce more accurate covariance values. A longer period increases interpretability, for example, by taking one-year periods. This parameter can be set by the user depending on preference, and will likely have little effect on the resulting portfolios.

A shortlist of assets is composed, each representing one cluster of similarly-behaved assets from the initial pool. The greater the number of assets in consideration, the more computationally intensive it becomes to generate optimal portfolios. The more assets in a portfolio, the more costly it is to manage and re-balance. As such, it does not make sense to optimize portfolios comprising the entire universe of exchange-traded funds. It becomes necessary to narrow down the list to a smaller number of exchange-traded funds which each represent some segment of the asset universe. This is done by applying a clustering algorithm to classify each asset based on its covariance with all other assets. Assets with similar relationships are grouped. The shortlist is assembled by selecting the asset which most closely matches the mean values for the cluster it belongs to. This process was chosen as it is entirely bias-free, and perfectly repeatable, given a target number of clusters. Robo-advisors in practice often shortlist several asset classes/types and select from within each category the exchange-traded fund with the highest volume, greatest liquidity, or some other factor. The identification of asset classes/types and the selection of criteria introduce bias and reduce transparency. Nonetheless, it is relatively hassle-free to implement a manual shortlist instead.

The final step in the process is to generate the portfolios which match a range of values for a target metric. There are two parts to this. First, the range of optimal portfolios is generated. Optimal portfolios are generated with target levels of expected returns, spanning the whole range of possible levels of return (from that of the lowest-returning asset to that of the highest-returning asset). Optimizing each portfolio is done using the scipy module’s optimize library. A set of weights for each asset are adjusted such that the total portfolio return variance is a minimum, given that the sum of weights is one and the total portfolio return mean is equal to the target value. Mean, variance, and other metrics are calculated for each of these portfolios. Second, a polynomial curve is fitted to approximate the relationship between the expected level of return and the chosen metric (for example, value-at-risk), with mean as the independent variable and the chosen metric as the dependent variable. Then, the mean values corresponding to each of a set of target values for the chosen metric are computed. Optimized portfolios are then generated with expected returns equal to these mean values. The resulting portfolios are the output of the robo-advisor.

The package constructed leaves room for further tuning and development, and the implementation of more complicated strategies. We elaborate on a few possibilities below.

Further functionality may attempt to move away from the naive assumption about future performance and behavior. One way to do this is to more heavily weight periods which better match a subjective expectation of future asset behavior in the computation of summary statistics. Weighted means of each asset’s return series and a weighted covariance matrix can be generated with relative ease. Weights can be used to (1) emphasize recent data, (2) emphasize data that matches a subjective expectation of future asset performance and behavior, or (3) emphasize either more bullish or more bearish periods. These options replace the assumption that future performance and behavior resemble past performance and behavior with more nuanced approaches. Greater control can be gained by manual tuning of summary statistics. One noteworthy use case would be to adjust the expected returns on various assets by different amounts based on the relevant tax policy for each asset class or geography.

Curated shortlists as opposed to automated shortlisting can also be implemented with relative ease. This leaves room for robo-advisors which prefer to stick with a more human-in-the-loop design. Manual selection of asset components may also be required for regulatory or other risk-management reasons.