In [None]:
# Copyright 2021 Google LLC
# Use of this source code is governed by an MIT-style
# license that can be found in the LICENSE file or at
# https://opensource.org/licenses/MIT.
# Notebook authors: Kevin P. Murphy (murphyk@gmail.com)
# and Mahmoud Soliman (mjs@aucegypt.edu)

# This notebook reproduces figures for chapter 10 from the book
# "Probabilistic Machine Learning: An Introduction"
# by Kevin Murphy (MIT Press, 2021).
# Book pdf is available from http://probml.ai

<a href="https://opensource.org/licenses/MIT" target="_parent"><img src="https://img.shields.io/github/license/probml/pyprobml"/></a>

<a href="https://colab.research.google.com/github/probml/pml-book/blob/main/pml1/figure_notebooks/chapter10_logistic_regression_figures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Figure 10.1:<a name='10.1'></a> <a name='iris-logreg-2d'></a> 


(a) Visualization of a 2d plane in a 3d space with surface normal $ \bm w  $ going through point $ \bm x  _0=(x_0,y_0,z_0)$. See text for details. (b) Visualization of optimal linear decision boundary induced by logistic regression on a 2-class, 2-feature version of the iris dataset.  
Figure(s) generated by [iris_logreg.py](https://github.com/probml/pyprobml/blob/master/scripts/iris_logreg.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run iris_logreg.py

## Figure 10.2:<a name='10.2'></a> <a name='sigmoidPlot2d'></a> 


 Plots of $\bm \sigma  (w_1 x_1 + w_2 x_2)$. Here $ \bm w  = (w_1,w_2)$ defines the normal to the decision boundary. Points to the right of this have $\bm \sigma  ( \bm w  ^ \mkern -1.5mu\mathsf T   \bm x  )>0.5$, and points to the left have $\bm \sigma  ( \bm w  ^ \mkern -1.5mu\mathsf T   \bm x  ) < 0.5$. Adapted from Figure 39.3 of <a href='#MacKay03'>[Mac03]</a> .  
Figure(s) generated by [sigmoid_2d_plot.py](https://github.com/probml/pyprobml/blob/master/scripts/sigmoid_2d_plot.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run sigmoid_2d_plot.py

## Figure 10.3:<a name='10.3'></a> <a name='kernelTrickQuadratic'></a> 


 Illustration of how we can transform a quadratic decision boundary into a linear one by transforming the features from $ \bm x  =(x_1,x_2)$ to $\boldsymbol \phi  ( \bm x  )=(x_1^2,x_2^2)$. Used with kind permission of Jean-Philippe Vert

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.3.png" width="256"/>

## Figure 10.4:<a name='10.4'></a> <a name='logregPoly'></a> 


 Polynomial feature expansion applied to a two-class, two-dimensional logistic regression problem. (a) Degree $K=1$. (b) Degree $K=2$. (c) Degree $K=4$. (d) Train and test error vs degree.  
Figure(s) generated by [logreg_poly_demo.py](https://github.com/probml/pyprobml/blob/master/scripts/logreg_poly_demo.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run logreg_poly_demo.py

## Figure 10.5:<a name='10.5'></a> <a name='irisLossSurface'></a> 


 NLL loss surface for binary logistic regression applied to Iris dataset with 1 feature and 1 bias term. The goal is to minimize the function.  
Figure(s) generated by [iris_logreg_loss_surface.py](https://github.com/probml/pyprobml/blob/master/scripts/iris_logreg_loss_surface.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run iris_logreg_loss_surface.py

## Figure 10.6:<a name='10.6'></a> <a name='logregPolyRidge'></a> 


 Weight decay with variance $C$ applied to two-class, two-dimensional logistic regression problem with a degree 4 polynomial. (a) $C=1$. (b) $C=316$. (c) $C=100,000$. (d) Train and test error vs $C$.  
Figure(s) generated by [logreg_poly_demo.py](https://github.com/probml/pyprobml/blob/master/scripts/logreg_poly_demo.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run logreg_poly_demo.py

## Figure 10.7:<a name='10.7'></a> <a name='logregMultinom3class'></a> 


 Example of 3-class logistic regression with 2d inputs. (a) Original features. (b) Quadratic features.  
Figure(s) generated by [logreg_multiclass_demo.py](https://github.com/probml/pyprobml/blob/master/scripts/logreg_multiclass_demo.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run logreg_multiclass_demo.py

## Figure 10.8:<a name='10.8'></a> <a name='labelTree'></a> 


 A simple example of a label hierarchy. Nodes within the same ellipse have a mutual exclusion relationship between them

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.8.png" width="256"/>

## Figure 10.9:<a name='10.9'></a> <a name='hierSoftmax'></a> 


 A flat and hierarchical softmax model $p(w|C)$, where $C$ are the input features (context) and $w$ is the output label (word). Adapted from  https://www.quora.com/What-is-hierarchical-softmax 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.9_A.png" width="256"/>

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.9_B.png" width="256"/>

## Figure 10.10:<a name='10.10'></a> <a name='logregRobust'></a> 


(a) Logistic regression on some data with outliers (denoted by x). Training points have been (vertically) jittered to avoid overlapping too much. Vertical line is the decision boundary, and its posterior credible interval. (b) Same as (a) but using robust model, with a mixture likelihood. Adapted from Figure 4.13 of <a href='#Martin2018'>[Mar18]</a> .  
Figure(s) generated by [logreg_iris_bayes_robust_1d_pymc3.py](https://github.com/probml/pyprobml/blob/master/scripts/logreg_iris_bayes_robust_1d_pymc3.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run logreg_iris_bayes_robust_1d_pymc3.py

## Figure 10.11:<a name='10.11'></a> <a name='bitemperedLoss'></a> 


(a) Illustration of logistic and tempered logistic loss with $t_1=0.8$. (b) Illustration of sigmoid and tempered sigmoid transfer function with $t_2=2.0$. From  https://ai.googleblog.com/2019/08/bi-tempered-logistic-loss-for-training.html . Used with kind permission of Ehsan Amid

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.11_A.png" width="256"/>

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.11_B.png" width="256"/>

## Figure 10.12:<a name='10.12'></a> <a name='bitempered'></a> 


 Illustration of standard and bi-tempered logistic regression on data with label noise. From  https://ai.googleblog.com/2019/08/bi-tempered-logistic-loss-for-training.html . Used with kind permission of Ehsan Amid

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.12.png" width="256"/>

## Figure 10.13:<a name='10.13'></a> <a name='logregLaplaceGirolamiPost'></a> 


(a) Illustration of the data. (b) Log-likelihood for a logistic regression model. The line is drawn from the origin in the direction of the MLE (which is at infinity). The numbers correspond to 4 points in parameter space, corresponding to the lines in (a). (c) Unnormalized log posterior (assuming vague spherical prior). (d) Laplace approximation to posterior. Adapted from a figure by Mark Girolami.  
Figure(s) generated by [logreg_laplace_demo.py](https://github.com/probml/pyprobml/blob/master/scripts/logreg_laplace_demo.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run logreg_laplace_demo.py

## Figure 10.14:<a name='10.14'></a> <a name='logregLaplaceDemoPred'></a> 


 Posterior predictive distribution for a logistic regression model in 2d. (a): contours of $p(y=1| \bm x  ,  \bm w   _ map )$. (b): samples from the posterior predictive distribution. (c): Averaging over these samples. (d): moderated output (probit approximation). Adapted from a figure by Mark Girolami.  
Figure(s) generated by [logreg_laplace_demo.py](https://github.com/probml/pyprobml/blob/master/scripts/logreg_laplace_demo.py) 

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

In [None]:
deimport(superimport)
%run logreg_laplace_demo.py

## Figure 10.15:<a name='10.15'></a> <a name='ridgeLassoOLS'></a> 


(a) Data for logistic regression question. (b) Plot of $ w _k$ vs amount of correlation $c_k$ for three different estimators

In [None]:
#@title Click me to run setup { display-mode: "form" }
try:
  if PYPROBML_SETUP_ALREADY_RUN:
    print('skipping setup')
except:
  PYPROBML_SETUP_ALREADY_RUN = True
  print('running setup...')
  !git clone --depth 1 https://github.com/probml/pyprobml  /pyprobml &> /dev/null 
  %cd -q /pyprobml/scripts
  %reload_ext autoreload 
  %autoreload 2
  !pip install superimport deimport -qqq  import superimport  from deimport.deimport import deimport  print('finished!')

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.15_A.png" width="256"/>

<img src="https://raw.githubusercontent.com/probml/pml-book/main/pml1/figures/images/Figure_10.15_B.png" width="256"/>

## References:
 <a name='MacKay03'>[Mac03]</a> D. MacKay "Information Theory, Inference, and Learning Algorithms". (2003). 

<a name='Martin2018'>[Mar18]</a> O. Martin "Bayesian analysis with Python". (2018). 

