In this notebook, we will practice some calculations of portfolio returns. This will give us some practice using Numpy arrays and plotting with Matplotlib.

In [3]:
# always import all necessary libraries at the top of the file
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

# Data

We'll begin by loading data from the file `dataPort.csv`. This contains a matrix of prices, each row corresponding to a day. The columns, in order from left to right, correspond to the S&P 500 index, crude oil index, and the U.S. 10-year Treasury index.

In [6]:
data_port = pd.read_csv('./dataPort.csv', parse_dates=['date'])

In [7]:
data_port.dtypes

date             datetime64[ns]
sp500                   float64
crude_oil               float64
treasury_10yr           float64
dtype: object

In [8]:
data_port.head()

Unnamed: 0,date,sp500,crude_oil,treasury_10yr
0,2013-08-29,1638.17,87.26,2.7617
1,2013-08-30,1632.97,87.5,2.7839
2,2013-09-03,1639.77,87.76,2.8576
3,2013-09-04,1653.08,87.78,2.8966
4,2013-09-05,1655.08,87.94,2.9937


In [9]:
data_port.tail()

Unnamed: 0,date,sp500,crude_oil,treasury_10yr
495,2015-09-03,1951.13,46.75,2.1596
496,2015-09-04,1921.22,46.05,2.1244
497,2015-09-08,1969.41,45.94,2.1828
498,2015-09-09,1942.04,44.15,2.2006
499,2015-09-10,1952.29,45.92,2.222


In [11]:
# Don't worry about the Pandas details here
prices = data_port.iloc[:,1:].values

In [12]:
prices

array([[1638.17  ,   87.26  ,    2.7617],
       [1632.97  ,   87.5   ,    2.7839],
       [1639.77  ,   87.76  ,    2.8576],
       ...,
       [1969.41  ,   45.94  ,    2.1828],
       [1942.04  ,   44.15  ,    2.2006],
       [1952.29  ,   45.92  ,    2.222 ]])

In [13]:
type(prices)

numpy.ndarray

In [14]:
dates = data_port.iloc[:,0].values
dates[0:10]

array(['2013-08-29T00:00:00.000000000', '2013-08-30T00:00:00.000000000',
       '2013-09-03T00:00:00.000000000', '2013-09-04T00:00:00.000000000',
       '2013-09-05T00:00:00.000000000', '2013-09-06T00:00:00.000000000',
       '2013-09-09T00:00:00.000000000', '2013-09-10T00:00:00.000000000',
       '2013-09-11T00:00:00.000000000', '2013-09-12T00:00:00.000000000'],
      dtype='datetime64[ns]')

# Data Manipulation
 Create 3 column vectors---each containing the price history of one of the asset classes. Name the vectors `sp500`, `oil`, `bonds`.

In [44]:
sp5001 = prices[:1]
sp500 = np.array(data_port['sp500'])
oil = np.array(data_port['crude_oil'])
print(sp500)
print(sp5001)

                         

[1638.17 1632.97 1639.77 1653.08 1655.08 1655.17 1671.71 1683.99 1689.13
 1683.42 1687.99 1697.6  1704.76 1725.52 1722.34 1709.91 1701.84 1697.42
 1692.77 1698.67 1691.75 1681.55 1695.   1693.87 1678.66 1690.5  1676.12
 1655.45 1656.4  1692.56 1703.2  1698.06 1721.54 1733.15 1744.5  1744.66
 1754.67 1746.38 1752.07 1759.77 1762.11 1771.95 1763.31 1756.54 1761.64
 1767.93 1762.97 1770.49 1770.61 1767.69 1782.   1790.62 1798.18 1791.53
 1787.87 1781.37 1795.85 1804.76 1802.48 1802.75 1807.23 1805.81 1800.9
 1795.15 1792.81 1785.03 1805.09 1808.37 1802.62 1782.22 1775.5  1775.32
 1786.54 1810.65 1809.6  1818.32 1827.99 1833.32 1842.02 1841.4  1841.07
 1848.36 1831.98 1831.37 1826.77 1837.88 1837.49 1838.13 1842.37 1819.2
 1838.88 1848.38 1845.89 1838.7  1843.8  1844.86 1828.46 1790.29 1781.56
 1792.5  1774.2  1794.19 1782.59 1741.89 1755.2  1751.64 1773.43 1797.02
 1799.84 1819.75 1819.26 1829.83 1838.63 1840.76 1828.75 1839.78 1836.25
 1847.61 1845.12 1845.16 1854.29 1859.45 1845.73 1873

 Create a row vector, `pNow`, with the latest price of each security.

 Calculate the number of days `Nt` and the number of assets `Nk`.

 Calculate the matrix of history of returns for the three assets. Calculate both the log-return (`rets_log`) and the level-return (`rets`). The log and level returns are defined as:
 
 - log-return, $\tilde r_t = \log \left( \frac{P_{t+1}}{P_t} \right)$
 - level-return, $r_t = \frac{P_{t+1}}{P_t} - 1$

Calculate a matrix of cumulative returns of the assets, named `rets_cum_log` and `rets_cum`. Recall that the cumulative return is simply

 - log-return, $\tilde r_{t,t_h} = \log \left( \frac{P_{t+h}}{P_t} \right)$
 - level-return, $r_{t,t+h} = \frac{P_{t+h}}{P_t} - 1$.
 
Note that you may find the Numpy function `numpy.matlib.repmat` helpful here. It allows you to build an array where every row corresponds to $P_t$. You can then use this array in the denominator of array arithmetic and skip any need for coding loops. You could also do this with matrix multiplication involving a vector of 1's.

# Calculations

Simply for practice, let's make the following calculations. For how many days does the S&P500 log-return and level-return differ by more than 5 basis points? That is,
$$
| \tilde r_t - r_t | > .0005
$$

For which asset is there the biggest difference in the total cumulative level-return versus the total cumulative log-return? That is, compute the absolute value of the difference between the log-return and the level-return at each time period. Sum these difference over each time period. Do this for each asset and see for which asset these differences are the largest.

Suppose an investor puts weights of 50%, 30%, and 20% in the S&P500, oil, and bonds. Calculate the history of portfolio returns, (level-returns). Call this vector `rets_port`.

What percentage of days does the portfolio have a positive return?

Calculate the history of cumulative (level-)returns of the portfolio. Call this `rets_cum_port`. Calculate the cumulative portfolio return by starting with the portfolio return series and using `numpy.cumprod`.

Calculate the cumulative log-returns of the three separate assets using `numpy.cumsum`.

# Plots

Create a plot of the price history of the S&P 500.

In a separate figure, create 4 sub-plots of cumulative returns. That is, plot the history of cumulative returns for each of the 3 assets as well as the portfolio. Use pyplot's `subplot` function.

You can check here to see Crude Oil prices to verify that this looks correct: http://www.macrotrends.net/1369/crude-oil-price-history-chart

Make the same plot as above again. However, this time, make sure that the y-axis is the same on each plot. You can do this the manually using pyplot's `ylim` (note the corresponding `xlim` function) or you can do this the better way using pyplot's shared axis feature seen here: https://matplotlib.org/gallery/subplots_axes_and_figures/shared_axis_demo.html#sphx-glr-gallery-subplots-axes-and-figures-shared-axis-demo-py

Finally, create another figure where you plot both the S&P500 cumulative return history as well as the portfolio cumulative return history in the same figure. Give each plot a label and include a legend. To do this, look up the pyplot `legend` function: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html

# Extra

- Try plotting the dates against the cumulative returns. Use `plt.plot(x,y)` where `x` is the dates and `y` is the returns data (a `499 x 3` array). For the dates data, use `data_port['dates']` or `data_port.dates`. 

- Note that it will be helpful to rotate the ticks on the x-axis. You can do this using the function shown here: https://stackoverflow.com/a/37708190/1411791

- Add a legend. Specify the legend with `plt.legend(['label1', 'label2', 'label3'])`

Try using xlabel, ylabel, title. Search the matplotlib.pyplot help documentation for more.

Try calculating the basic statistics of the level returns. Get the mean, standard-deviation,
skewness, and correlations. Mean, standard deviation, and correlation can be calculated in numpy. The skewness can be calculated using `scipy.stats.skew`

The return matrices have one less row than the price matrix. Remedy this by adding a first row of NaN values to the return matrices. Use numpy's `vstack` function.

Import the `seaborn` package and run `seaborn.set()` to change the default matplotlib style. Regenerate some of the plots above to see how the style changes.