# Copyright and License

In [None]:
# This notebook was authored by Kevin P. Murphy (murphyk@gmail.com) and Mahmoud Soliman (mjs@aucegypt.edu)

[![GitHub](https://img.shields.io/github/license/probml/pyprobml)](https://github.com/probml/pml-book/blob/main/LICENSE/)

<a href="https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/figures//chapter11_figures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cloning the pyprobml repo

In [None]:
!git clone https://github.com/probml/pyprobml 
%cd pyprobml/scripts

## Figure 11.1:

  Polynomial of degrees 1 and 2 fit to 21 datapoints. 

In [None]:
%run ./linreg_poly_vs_degree.py

## Figure 11.2:

  (a) Contours of the RSS error surface for the example in \cref  fig:linregPolyDegree1 . The blue cross represents the MLE. (b) Corresponding surface plot. 

In [None]:
%run ./linreg_contours_sse_plot.py

## Figure 11.4:

  Regression coefficients over time for the 1d model in \cref  fig:linregPoly2 (a). 

In [None]:
!octave -W linregOnlineDemo.m >> _

## Figure 11.5:

  Residual plot for polynomial regression of degree 1 and 2 for the functions in \cref  fig:linregPoly2 (a-b). 

In [None]:
%run ./linreg_poly_vs_degree.py

## Figure 11.6:

  Fit vs actual plots for polynomial regression of degree 1 and 2 for the functions in \cref  fig:linregPoly2 (a-b). 

In [None]:
%run ./linreg_poly_vs_degree.py

## Figure 11.7:

  (a-c) Ridge regression applied to a degree 14 polynomial fit to 21 datapoints. (d) MSE vs strength of regularizer. The degree of regularization increases from left to right, so model complexity decreases from left to right. 

In [None]:
%run ./linreg_poly_ridge.py

## Figure 11.8:

  Geometry of ridge regression. The likelihood is shown as an ellipse, and the prior is shown as a circle centered on the origin. Adapted from Figure 3.15 of \citep  BishopBook . 

In [None]:
!octave -W geomRidge.m >> _

## Figure 11.9:

  (a) Illustration of robust linear regression. 

In [None]:
!octave -W linregRobustDemoCombined.m >> _

In [None]:
!octave -W huberLossPlot.m >> _

## Figure 11.12:

  (a) Profiles of ridge coefficients for the prostate cancer example vs bound $B$ on $\ell _2$ norm of $\mathbf  w $, so small $B$ (large $\lambda $) is on the left. The vertical line is the value chosen by 5-fold CV using the 1 standard error rule. Adapted from Figure 3.8 of \citep  HastieBook . 

In [None]:
!octave -W ridgePathProstate.m >> _

In [None]:
!octave -W lassoPathProstate.m >> _

## Figure 11.14:

  (a) Boxplot displaying (absolute value of) prediction errors on the prostate cancer test set for different regression methods. 

In [None]:
%run ./prostateComparison.py

In [None]:
!octave -W sparseSensingDemo.m >> _

## Figure 11.15:

  Illustration of group lasso where the original signal is piecewise Gaussian. (a) Original signal. (b) Vanilla lasso estimate. (c) Group lasso estimate using a $\ell _2$ norm on the blocks. (d) Group lasso estimate using an $\ell _ \infty  $ norm on the blocks. Adapted from Figures 3-4 of \citep  Wright09 . 

In [None]:
!octave -W groupLassoDemo.m >> _

## Figure 11.17:

  Sequential Bayesian inference of the parameters of a linear regression model $p(y|\mathbf  x ) = \mathcal  N (y | w_0 + w_1 x_1, \sigma ^2)$. Left column: likelihood function for current data point. Middle column: posterior given first $N$ data points, $p(w_0,w_1|\mathbf  x _ 1:N ,y_ 1:N ,\sigma ^2)$. Right column: samples from the current posterior predictive distribution. Row 1: prior distribution ($N=0$). Row 2: after 1 data point. Row 3: after 2 data points. Row 4: after 100 data points. The white cross in columns 1 and 2 represents the true parameter value; we see that the mode of the posterior rapidly converges to this point. The blue circles in column 3 are the observed data points. Adapted from Figure 3.7 of \citep  BishopBook . 

In [None]:
%run ./linreg_2d_bayes_demo.py

## Figure 11.18:

  Posterior samples of $p(w_0,w_1| \mathcal  D  )$ for 1d linear regression model $p(y|x,\boldsymbol  \theta  )=\mathcal  N (y|w_0 + w_1 x, \sigma ^2)$ with a Gaussian prior. (a) Original data. (b) Centered data. 

In [None]:
%run ./linreg_2d_bayes_centering_pymc3.py

## Figure 11.19:

  (a) Plugin approximation to predictive density (we plug in the MLE of the parameters) when fitting a second degree polynomial to some 1d data. (b) Posterior predictive density, obtained by integrating out the parameters. Black curve is posterior mean, error bars are 2 standard deviations of the posterior predictive density. (c) 10 samples from the plugin approximation to posterior predictive distribution. (d) 10 samples from the true posterior predictive distribution. 

In [None]:
%run ./linreg_post_pred_plot.py

## Figure 11.26:

  (a) A dynamic generalization of linear regression. (b) Illustration of the recursive least squares algorithm applied to the model $p(y|\mathbf  x ,\boldsymbol  \theta  ) = \mathcal  N (y|w_0 + w_1 x, \sigma ^2)$. We plot the marginal posterior of $w_0$ and $w_1$ vs number of data points. (Error bars represent $\mathbb  E \left [ w_j|y_ 1:t  \right ] \pm \sqrt  \mathbb  V \left [  w_j|y_ 1:t  \right ] $.) After seeing all the data, we converge to the offline ML (least squares) solution, represented by the horizontal lines. 

In [None]:
!octave -W linregOnlineDemoKalman.m >> _