In [None]:
# Copyright 2021 Google LLC
# Use of this source code is governed by an MIT-style
# license that can be found in the LICENSE file or at
# https://opensource.org/licenses/MIT.

# Author(s): Kevin P. Murphy (murphyk@gmail.com) and Mahmoud Soliman (mjs@aucegypt.edu)

<a href="https://opensource.org/licenses/MIT" target="_parent"><img src="https://img.shields.io/github/license/probml/pyprobml"/></a>

<a href="https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/figures//chapter11_figures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cloning the pyprobml repo

In [None]:
!git clone https://github.com/probml/pyprobml 
%cd pyprobml/scripts

# Installing required software (This may take few minutes)

In [None]:
!apt-get install octave  -qq > /dev/null
!apt-get install liboctave-dev -qq > /dev/null

In [None]:
%%capture
%load_ext autoreload 
%autoreload 2



DISCLAIMER = 'WARNING : Editing in VM - changes lost after reboot!!'
from google.colab import files

def interactive_script(script, i=True):
  if i:
    s = open(script).read()
    if not s.split('\n', 1)[0]=="## "+DISCLAIMER:
      open(script, 'w').write(
          f'## {DISCLAIMER}\n' + '#' * (len(DISCLAIMER) + 3) + '\n\n' + s)
    files.view(script)
    %run $script
  else:
      %run $script

## Figure 11.1:

  Polynomial of degrees 1 and 2 fit to 21 datapoints.  
Figure(s) generated by [linreg_poly_vs_degree.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_poly_vs_degree.py) 

In [None]:
interactive_script("linreg_poly_vs_degree.py")

## Figure 11.2:

  (a) Contours of the RSS error surface for the example in \cref  fig:linregPolyDegree1 . The blue cross represents the MLE. (b) Corresponding surface plot.  
Figure(s) generated by [linreg_contours_sse_plot.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_contours_sse_plot.py) 

In [None]:
interactive_script("linreg_contours_sse_plot.py")

## Figure 11.3:

  Graphical interpretation of least squares for $m=3$ equations and $n=2$ unknowns when solving the system $\mathbf  A \mathbf  x = \mathbf  b $. $\mathbf  a _1$ and $\mathbf  a _2$ are the columns of $\mathbf  A $, which define a 2d linear subspace embedded in $\mathbb  R ^3$. The target vector $\mathbf  b $ is a vector in $\mathbb  R ^3$; its orthogonal projection onto the linear subspace is denoted $ \mathbf  b  $. The line from $\mathbf  b $ to $ \mathbf  b  $ is the vector of residual errors, whose norm we want to minimize. 

## Figure 11.4:

  Regression coefficients over time for the 1d model in \cref  fig:linregPoly2 (a).  
Figure(s) generated by [linregOnlineDemo.m](https://github.com/probml/pmtk3/blob/master/demos/linregOnlineDemo.m) 

In [None]:
!octave -W linregOnlineDemo.m >> _

## Figure 11.5:

  Residual plot for polynomial regression of degree 1 and 2 for the functions in \cref  fig:linregPoly2 (a-b).  
Figure(s) generated by [linreg_poly_vs_degree.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_poly_vs_degree.py) 

In [None]:
interactive_script("linreg_poly_vs_degree.py")

## Figure 11.6:

  Fit vs actual plots for polynomial regression of degree 1 and 2 for the functions in \cref  fig:linregPoly2 (a-b).  
Figure(s) generated by [linreg_poly_vs_degree.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_poly_vs_degree.py) 

In [None]:
interactive_script("linreg_poly_vs_degree.py")

## Figure 11.7:

  (a-c) Ridge regression applied to a degree 14 polynomial fit to 21 datapoints. (d) MSE vs strength of regularizer. The degree of regularization increases from left to right, so model complexity decreases from left to right.  
Figure(s) generated by [linreg_poly_ridge.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_poly_ridge.py) 

In [None]:
interactive_script("linreg_poly_ridge.py")

## Figure 11.8:

  Geometry of ridge regression. The likelihood is shown as an ellipse, and the prior is shown as a circle centered on the origin. Adapted from Figure 3.15 of <a href='#BishopBook'>[Bis06]</a> .  
Figure(s) generated by [geom_ridge.py](https://github.com/probml/pyprobml/blob/master/scripts/geom_ridge.py) 

In [None]:
interactive_script("geom_ridge.py")

## Figure 11.9:

  (a) Illustration of robust linear regression.  
Figure(s) generated by [linregRobustDemoCombined.m](https://github.com/probml/pmtk3/blob/master/demos/linregRobustDemoCombined.m) [huberLossPlot.m](https://github.com/probml/pmtk3/blob/master/demos/huberLossPlot.m) 

In [None]:
!octave -W linregRobustDemoCombined.m >> _

In [None]:
!octave -W huberLossPlot.m >> _

## Figure 11.10:

  Illustration of $\ell _1$ (left) vs $\ell _2$ (right) regularization of a least squares problem. Adapted from Figure 3.12 of <a href='#Hastie01'>[HTF01]</a> . 

## Figure 11.11:

  Left: soft thresholding. Right: hard thresholding. In both cases, the horizontal axis is the residual error incurred by making predictions using all the coefficients except for $w_k$, and the vertical axis is the estimated coefficient $ w _k$ that minimizes this penalized residual. The flat region in the middle is the interval $[-\lambda ,+\lambda ]$. 

## Figure 11.12:

  (a) Profiles of ridge coefficients for the prostate cancer example vs bound $B$ on $\ell _2$ norm of $\mathbf  w $, so small $B$ (large $\lambda $) is on the left. The vertical line is the value chosen by 5-fold CV using the 1 standard error rule. Adapted from Figure 3.8 of <a href='#HastieBook'>[HTF09]</a> .  
Figure(s) generated by [ridgePathProstate.m](https://github.com/probml/pmtk3/blob/master/demos/ridgePathProstate.m) [lassoPathProstate.m](https://github.com/probml/pmtk3/blob/master/demos/lassoPathProstate.m) 

In [None]:
!octave -W ridgePathProstate.m >> _

In [None]:
!octave -W lassoPathProstate.m >> _

## Figure 11.13:

  Values of the coefficients for linear regression model fit to prostate cancer dataset as we vary the strength of the $\ell _1$ regularizer. These numbers are plotted in \cref  fig:lassoPathProstate (b). 

## Figure 11.14:

  (a) Boxplot displaying (absolute value of) prediction errors on the prostate cancer test set for different regression methods.  
Figure(s) generated by [prostateComparison.py](https://github.com/probml/pyprobml/blob/master/scripts/prostateComparison.py) [sparseSensingDemo.m](https://github.com/probml/pmtk3/blob/master/demos/sparseSensingDemo.m) 

In [None]:
interactive_script("prostateComparison.py")

In [None]:
!octave -W sparseSensingDemo.m >> _

## Figure 11.15:

  Illustration of group lasso where the original signal is piecewise Gaussian. (a) Original signal. (b) Vanilla lasso estimate. (c) Group lasso estimate using a $\ell _2$ norm on the blocks. (d) Group lasso estimate using an $\ell _ \infty  $ norm on the blocks. Adapted from Figures 3-4 of <a href='#Wright09'>[WNF09]</a> .  
Figure(s) generated by [groupLassoDemo.m](https://github.com/probml/pmtk3/blob/master/demos/groupLassoDemo.m) 

In [None]:
!octave -W groupLassoDemo.m >> _

## Figure 11.16:

  Same as \cref  fig:groupLassoGauss , except the original signal is piecewise constant. 

## Figure 11.17:

  Sequential Bayesian inference of the parameters of a linear regression model $p(y|\mathbf  x ) = \mathcal  N (y | w_0 + w_1 x_1, \sigma ^2)$. Left column: likelihood function for current data point. Middle column: posterior given first $N$ data points, $p(w_0,w_1|\mathbf  x _ 1:N ,y_ 1:N ,\sigma ^2)$. Right column: samples from the current posterior predictive distribution. Row 1: prior distribution ($N=0$). Row 2: after 1 data point. Row 3: after 2 data points. Row 4: after 100 data points. The white cross in columns 1 and 2 represents the true parameter value; we see that the mode of the posterior rapidly converges to this point. The blue circles in column 3 are the observed data points. Adapted from Figure 3.7 of <a href='#BishopBook'>[Bis06]</a> .  
Figure(s) generated by [linreg_2d_bayes_demo.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_2d_bayes_demo.py) 

In [None]:
interactive_script("linreg_2d_bayes_demo.py")

## Figure 11.18:

  Posterior samples of $p(w_0,w_1| \mathcal  D  )$ for 1d linear regression model $p(y|x,\boldsymbol  \theta  )=\mathcal  N (y|w_0 + w_1 x, \sigma ^2)$ with a Gaussian prior. (a) Original data. (b) Centered data.  
Figure(s) generated by [linreg_2d_bayes_centering_pymc3.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_2d_bayes_centering_pymc3.py) 

In [None]:
interactive_script("linreg_2d_bayes_centering_pymc3.py")

## Figure 11.19:

  (a) Plugin approximation to predictive density (we plug in the MLE of the parameters) when fitting a second degree polynomial to some 1d data. (b) Posterior predictive density, obtained by integrating out the parameters. Black curve is posterior mean, error bars are 2 standard deviations of the posterior predictive density. (c) 10 samples from the plugin approximation to posterior predictive distribution. (d) 10 samples from the true posterior predictive distribution.  
Figure(s) generated by [linreg_post_pred_plot.py](https://github.com/probml/pyprobml/blob/master/scripts/linreg_post_pred_plot.py) 

In [None]:
interactive_script("linreg_post_pred_plot.py")

## Figure 11.20:

  (a) Representing lasso using a Gaussian scale mixture prior. (b) Graphical model for group lasso with 2 groups, the first has size $G_1=2$, the second has size $G_2=3$. 

## Figure 11.21:

  A hierarchical Bayesian linear regression model for the radon problem. 

## Figure 11.22:

  Posterior marginals for $\alpha _c$ and $\beta _c$ for each county in the radon model. 

## Figure 11.23:

  Predictions from the radon model for 3 different counties in Minnesota. Black dots are observed datapoints. Green represents results of hierarchical (shared) prior, blue represents results of non-hierarchical prior. Thick lines are the result of using the posterior mean, thin lines are the result of using posterior samples. 

## Figure 11.24:

  (a) Bivariate posterior $p(\beta _c,\sigma _ \beta  | \mathcal  D  )$ for the hierarchical radon model for county $c=75$ using centered parameterization. (b) Similar to (a) except we plot $p(\cc@accent  "707E  \beta  _c,\sigma _ \beta  | \mathcal  D  )$ for the non-centered parameterization. From   https://twiecki.io/blog/2017/02/08/bayesian-hierchical-non-centered/ . Used with kind permission of Thomas Wiecki. 

## Figure 11.25:

  Illustration of why ARD results in sparsity. The vector of inputs $\mathbf  x $ does not point towards the vector of outputs $\mathbf  y $, so the feature should be removed. (a) For finite $\alpha $, the probability density is spread in directions away from $\mathbf  y $. (b) When $\alpha =\infty $, the probability density at $\mathbf  y $ is maximized. Adapted from Figure 8 of <a href='#Tipping01'>[Tip01]</a> . 

## Figure 11.26:

  (a) A dynamic generalization of linear regression. (b) Illustration of the recursive least squares algorithm applied to the model $p(y|\mathbf  x ,\boldsymbol  \theta  ) = \mathcal  N (y|w_0 + w_1 x, \sigma ^2)$. We plot the marginal posterior of $w_0$ and $w_1$ vs number of data points. (Error bars represent $\mathbb  E \left [ w_j|y_ 1:t  \right ] \pm \sqrt  \mathbb  V \left [  w_j|y_ 1:t  \right ] $.) After seeing all the data, we converge to the offline ML (least squares) solution, represented by the horizontal lines.  
Figure(s) generated by [linregOnlineDemoKalman.m](https://github.com/probml/pmtk3/blob/master/demos/linregOnlineDemoKalman.m) 

In [None]:
!octave -W linregOnlineDemoKalman.m >> _

## References:
 <a name='BishopBook'>[Bis06]</a> C. Bishop "Pattern recognition and machine learning". (2006). 

<a name='Hastie01'>[HTF01]</a> T. Hastie, R. Tibshirani and J. Friedman. "The Elements of Statistical Learning". (2001). 

<a name='HastieBook'>[HTF09]</a> T. Hastie, R. Tibshirani and J. Friedman. "The Elements of Statistical Learning". (2009). 

<a name='Tipping01'>[Tip01]</a> M. Tipping "Sparse Bayesian learning and the relevance vector machine". In: jmlr (2001). 

<a name='Wright09'>[WNF09]</a> S. Wright, R. Nowak and M. Figueiredo. "Sparse reconstruction by separable approximation". In: IEEE Trans. on Signal Processing (2009). 

