diff --git a/exams/Midterm.ipynb b/exams/Midterm.ipynb new file mode 100644 index 0000000..3de5ed9 --- /dev/null +++ b/exams/Midterm.ipynb @@ -0,0 +1,638 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Midterm: Coding Portion\n", + "\n", + "There are two parts below. Each question (marked Q1,Q2, etc.) is worth 2 points." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import scipy.optimize\n", + "from matplotlib import pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part A: Revisiting Mean-Variance Efficient Portfolios\n", + "\n", + "Suppose that $\\mu$ is a column vector of the means of each asset and suppose that $\\Sigma$ is the associated variance-covariance matrix. Now, suppose that we create a column vector of weights $w$ that tells us the fraction of our wealth that we invest in each asset. Then, the formulas in matrix notation for the mean and variance of the resulting portfolio are\n", + "$$\n", + "\\mu_p = \\mu' w\n", + "$$\n", + "and\n", + "$$\n", + "\\sigma_p^2 = w' \\Sigma w\n", + "$$\n", + "respectively.\n", + "\n", + "I have provided code below to download data from the file `dataAssets.csv`. The code below loads a matrix of asset prices. In the matrix of prices, each row corresponds to a day. The columns correspond, (in order,) to the S&P 500 index, USD index, crude oil index, HYG index, and U.S. 10-yr Treasury index. Here I have converted prices to returns for you." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "url = 'https://raw.githubusercontent.com/jmbejara/comp-econ-sp18/master/HW/hw-01/dataAssets.csv'\n", + "data_assets = pd.read_csv(url, parse_dates=['date'])\n", + "prices = data_assets.iloc[:,1:].values\n", + "dates = data_assets.iloc[:,0].values\n", + "rets = (prices[1:,:]/prices[:-1,:] - 1) * 100" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0.38854818, -0.38136479, -0.37572583, 0.21891419, -0.36173324],\n", + " [-0.35699979, 0.18400287, 0.51428571, 0.0109218 , -0.41711726],\n", + " [-0.11528775, 0.25515864, 0.27288232, -0.08736486, 1.63667391],\n", + " ...,\n", + " [ 2.50830202, -0.25356182, -0.23887079, 0.54676594, 2.74901149],\n", + " [-1.38975632, 0.0281294 , -3.89638659, -0.12727062, 0.81546637],\n", + " [ 0.52779551, -0.57909428, 4.00906002, 0.24328082, 0.97246206]])" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rets" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In HW1, we solved a constrained minimization problem of the following form:\n", + "\n", + "\\begin{align*}\n", + "\\min_{w} \\quad & w' \\Sigma w \\\\\n", + "\\text{s.t.} \\quad & w' \\mu = \\mu_p \\\\\n", + "& w' \\mathbb 1 = 1 \n", + "\\end{align*}\n", + "where $w$ is a $k \\times 1$ vector of portfolio weights corresponding to the weights put on $k$ assets, $\\mu$ is an $k \\times 1$ vector of expected (mean) portfolio reaturns of each asset, $\\Sigma$ is the $k \\times k$ variance-covariance matrix of the returns, $\\mathbb 1$ is a $k \\times 1$ conforming vector of ones, and $\\mu_p$ is an arbitrary scalar number. In HW 1, we used `scipy.optimize.fmin_slsqp` to numerically solve this problem. I have recreated the solution below. (Note that we have $k=5$ assets.)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# MY CODE PROVIDED FOR YOU\n", + "mu = np.mean(rets, axis=0)\n", + "Sigma = np.cov(rets, rowvar=False)\n", + "\n", + "N = 100\n", + "mean_grid = np.linspace(0.0, 0.02, N)\n", + "var_grid = np.zeros(N)\n", + "\n", + "for i in range(N):\n", + " mu_target = mean_grid[i]\n", + " var_func = lambda w: w.transpose() @ Sigma @ w\n", + " con_mean = lambda w: w @ mu - mu_target\n", + " con_port = lambda w: w @ np.ones((5,1)) - 1\n", + " eqcons = [con_mean, con_port]\n", + " w0 = np.array([.2, .2, .2, .2, .2])\n", + " wstar = scipy.optimize.fmin_slsqp(var_func, w0, eqcons=eqcons, acc=1e-09, \n", + " iprint=0)\n", + " var_grid[i] = var_func(wstar)\n", + "\n", + "plt.plot(var_grid, mean_grid, label='Unconstrained')\n", + "plt.xlabel('Portfolio Return Variance')\n", + "plt.ylabel('Mean Portfolio Return')\n", + "plt.title('Mean-Variance Efficient Frontier')\n", + "plt.legend();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This problem, as it turns out, can be solved analytically. \n", + "The solution is given by\n", + "$$\n", + "w^* = \\Sigma^{-1} \n", + "\\begin{bmatrix}\n", + "\\mu & \\mathbb 1\n", + "\\end{bmatrix}\n", + "A^{-1}\n", + "\\begin{bmatrix}\n", + "\\mu_p \\\\\n", + "1\n", + "\\end{bmatrix},\n", + "$$\n", + "where $A$ is a matrix defined by\n", + "$$\n", + "A =\n", + "\\begin{bmatrix}\n", + "\\mu & \\mathbb 1\n", + "\\end{bmatrix}^\\intercal\n", + "\\Sigma^{-1}\n", + "\\begin{bmatrix}\n", + "\\mu & \\mathbb 1\n", + "\\end{bmatrix}.\n", + "$$\n", + "Note that\n", + "$\\begin{bmatrix}\n", + "\\mu & \\mathbb 1\n", + "\\end{bmatrix}$\n", + "is an $N \\times 2$ matrix. The two columns are stacked together.\n", + "\n", + "### Q1. Analytical solution of MVE Frontier\n", + "\n", + "Reconstruct the graph above of the Mean-Variance Efficient Frontier. However, this time, instead of using the numerical solved to solve for the efficient frontier, use the analytical solution I provided. For this exercise, only keep track of the resulting minimal variance. In the above numerical solution, the result is stored in `var_grid`. Your solution should be in the form of a numpy array called `var_grid_exact`. \n", + "\n", + "**Hint:** Compute the inverse of a matrix as follow:\n", + "```python\n", + "import numpy.linalg as la\n", + "a = np.array([[1,2],[3,4]])\n", + "la.inv(a)\n", + "```\n", + "gives\n", + "```\n", + "array([[-2. , 1. ],\n", + " [ 1.5, -0.5]])\n", + "```\n", + "\n", + "Also, to make stacking rows and columns easier, consider the following Numpy functions. It doesn't matter what you choose to use---you don't even need to use these suggested function if you don't want to. I provide these for convenience.\n", + "\n", + " - `np.hstack` https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.hstack.html\n", + " - `np.vstack` https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.vstack.html\n", + " - `np.c_` https://docs.scipy.org/doc/numpy/reference/generated/numpy.c_.html\n", + " - `np.r_` https://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html#numpy.r_\n", + " \n", + "Also, the following starter code may be helpful:\n", + "```python\n", + "mu = np.mean(rets, axis=0)\n", + "Sigma = np.cov(rets, rowvar=False)\n", + "\n", + "N = 100\n", + "mean_grid = np.linspace(0.0, 0.02, N)\n", + "var_grid_exact = np.zeros(N)\n", + "\n", + "YOUR_CODE_HERE\n", + "\n", + "for i in range(N):\n", + " mu_target = mean_grid[i]\n", + " wstar = YOUR_CODE_HERE\n", + " var_grid_exact[i] = var_func(wstar)\n", + " \n", + "\n", + "\n", + "plt.plot(var_grid_exact, mean_grid, label=\"Analytical Solution\")\n", + "plt.xlabel('Portfolio Return Variance')\n", + "plt.ylabel('Mean Portfolio Return')\n", + "plt.title('Mean-Variance Efficient Frontier')\n", + "plt.legend();\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q2. Compare Numerical and Analytical Solution\n", + "\n", + "Make sure your result matches the result you got from the solver by using `np.allclose`. Use `np.allclose` to compare `var_grid` and `var_grid_exact`. The result should be `True`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q3. Bounding the Portfolio Weights\n", + "\n", + "Suppose we take the problem from before and add a new set of constraints. Let's add bounds on the portfolio weights such that no weight may be less than zero. We can interpret this as a constraint that disallows \"short-selling.\" In particular, solve the problem\n", + "\n", + "\\begin{align*}\n", + "\\min_{w} \\quad & w' \\Sigma w \\\\\n", + "\\text{s.t.} \\quad & w' \\mu = \\mu_p \\\\\n", + "& w' \\mathbb 1 = 1 \\\\\n", + "& w_i \\geq 0 \\quad \\text{ for all } i=1,..,k.\n", + "\\end{align*}\n", + "\n", + "Use `scipy.optimize.fmin_slsqp` again. Check the documentation for information on how to include \"bound constraints.\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_slsqp.html\n", + "\n", + "\n", + "**Hint**: Edit the code below to incorporate the constraints:\n", + "\n", + "```python\n", + "mu = np.mean(rets, axis=0)\n", + "Sigma = np.cov(rets, rowvar=False)\n", + "\n", + "N = 100\n", + "mean_grid = np.linspace(0.0, .02, N)\n", + "var_grid = np.zeros(N)\n", + "var_grid_no_short = np.zeros(N)\n", + "\n", + "for i in range(N):\n", + " mu_target = mean_grid[i]\n", + " var_func = lambda w: w.transpose() @ Sigma @ w\n", + " con_mean = lambda w: w @ mu - mu_target\n", + " con_port = lambda w: w @ np.ones((5,1)) - 1\n", + " eqcons = [con_mean, con_port]\n", + " w0 = np.array([.2, .2, .2, .2, .2])\n", + " wstar = scipy.optimize.fmin_slsqp(var_func, w0, eqcons=eqcons, acc=1e-09, \n", + " iprint=0)\n", + " var_grid[i] = var_func(wstar)\n", + " wstar = scipy.optimize.fmin_slsqp(var_func, w0, \n", + " eqcons=eqcons,\n", + " bounds=YOUR_CODE_HERE,\n", + " acc=1e-09, \n", + " iprint=0)\n", + " var_grid_no_short[i] = var_func(wstar)\n", + "\n", + "plt.plot(var_grid, mean_grid, label='Unconstrained')\n", + "plt.plot(var_grid_no_short, mean_grid, label='No-Short Selling')\n", + "plt.xlabel('Portfolio Return Variance')\n", + "plt.ylabel('Mean Portfolio Return')\n", + "plt.title('Mean-Variance Efficient Frontier')\n", + "plt.legend()\n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Part B. Revisiting Euro 2012\n", + "\n", + "This part uses data from the UEFA European Under-21 Championship in 2012. Use the provided code below to load the data and to add two new columns: `passes_quartile` and `shoot_acc_quartile`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Team object\n", + "Goals int64\n", + "Shots on target int64\n", + "Shots off target int64\n", + "Shooting Accuracy float64\n", + "% Goals-to-shots object\n", + "Total shots (inc. Blocked) int64\n", + "Hit Woodwork int64\n", + "Penalty goals int64\n", + "Penalties not scored int64\n", + "Headed goals int64\n", + "Passes int64\n", + "Passes completed int64\n", + "Passing Accuracy object\n", + "Touches int64\n", + "Crosses int64\n", + "Dribbles int64\n", + "Corners Taken int64\n", + "Tackles int64\n", + "Clearances int64\n", + "Interceptions int64\n", + "Clearances off line float64\n", + "Clean Sheets int64\n", + "Blocks int64\n", + "Goals conceded int64\n", + "Saves made int64\n", + "Saves-to-shots ratio object\n", + "Fouls Won int64\n", + "Fouls Conceded int64\n", + "Offsides int64\n", + "Yellow Cards int64\n", + "Red Cards int64\n", + "Subs on int64\n", + "Subs off int64\n", + "Players Used int64\n", + "passes_quartile category\n", + "shoot_acc_quartile category\n", + "dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "euro12 = pd.read_csv('https://raw.githubusercontent.com/jokecamp/FootballData/master/UEFA_European_Championship/Euro%202012/Euro%202012%20stats%20TEAM.csv', sep=',')\n", + "euro12['Shooting Accuracy'] = euro12['Shooting Accuracy'].str[:-1].astype(float)\n", + "quartiles = [0, .25, .5, .75, 1.]\n", + "# labels = ['Q1', 'Q2', 'Q3', 'Q4']\n", + "labels = None\n", + "euro12['passes_quartile'] = pd.qcut(euro12['Passes'], q=quartiles, labels=labels)\n", + "euro12['shoot_acc_quartile'] = pd.qcut(euro12['Shooting Accuracy'], q=quartiles, labels=labels)\n", + "euro12.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q4. Which team had the most number of \"Passes\"?\n", + "\n", + "Just tell me the team. No need to get fancy here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q5. Which team(s) had the most number of \"Headed goals\"?\n", + "\n", + "Just tell me the team. No need to get fancy here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q6. Group the data ...\n", + "\n", + "Group the data by `passes_quartile`. What is the average number of `Goals` made in each quartile? **Do this using `pivot_table`.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q7. Group the data ...\n", + "\n", + "Group the data by `passes_quartile`. What is the average number of `Goals` made in each quartile? **Do this using `groupby`.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q8. Group the data by `passes_quartile` and `shoot_acc_quartile` ...\n", + "\n", + "Group the data by `passes_quartile` and `shoot_acc_quartile` and, within each group, calculate the average number of `Goals` scored. Your solution should look like the dataframe `dfsol` that I have provided in the code cell below. **Complete this exercise using `pivot_table`.**" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
shoot_acc_quartile(21.198999999999998, 33.4](33.4, 40.65](40.65, 48.35](48.35, 55.9]
passes_quartile
(850.999, 1190.75]5.01.5NaN4.0
(1190.75, 1522.0]2.0NaN5.04.5
(1522.0, 1934.75]3.56.04.0NaN
(1934.75, 4317.0]NaN3.08.012.0
\n", + "
" + ], + "text/plain": [ + "shoot_acc_quartile (21.198999999999998, 33.4] (33.4, 40.65] (40.65, 48.35] \\\n", + "passes_quartile \n", + "(850.999, 1190.75] 5.0 1.5 NaN \n", + "(1190.75, 1522.0] 2.0 NaN 5.0 \n", + "(1522.0, 1934.75] 3.5 6.0 4.0 \n", + "(1934.75, 4317.0] NaN 3.0 8.0 \n", + "\n", + "shoot_acc_quartile (48.35, 55.9] \n", + "passes_quartile \n", + "(850.999, 1190.75] 4.0 \n", + "(1190.75, 1522.0] 4.5 \n", + "(1522.0, 1934.75] NaN \n", + "(1934.75, 4317.0] 12.0 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Your result should look like `dfsol` below.\n", + "from io import StringIO\n", + "sol = '''\n", + "passes_quartile,\"(21.198999999999998, 33.4]\",\"(33.4, 40.65]\",\"(40.65, 48.35]\",\"(48.35, 55.9]\"\n", + "\"(850.999, 1190.75]\",5.0,1.5,,4.0\n", + "\"(1190.75, 1522.0]\",2.0,,5.0,4.5\n", + "\"(1522.0, 1934.75]\",3.5,6.0,4.0,\n", + "\"(1934.75, 4317.0]\",,3.0,8.0,12.0\n", + "'''\n", + "dfsol = pd.read_csv(StringIO(sol), index_col=[0])\n", + "dfsol.columns.name = 'shoot_acc_quartile'\n", + "dfsol" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Q9. Group the data by `passes_quartile` and `shoot_acc_quartile` ...\n", + "\n", + "Group the data by `passes_quartile` and `shoot_acc_quartile` and, within each group, calculate the average number of `Goals` scored. Your solution should look the same as the previous question. **Complete this exercise using `group_by`.**\n", + "\n", + "HINT: You may need to use `unstack()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}