In [None]:
# Copyright 2021 Google LLC
# Use of this source code is governed by an MIT-style
# license that can be found in the LICENSE file or at
# https://opensource.org/licenses/MIT.

# Author(s): Kevin P. Murphy (murphyk@gmail.com) and Mahmoud Soliman (mjs@aucegypt.edu)

<a href="https://opensource.org/licenses/MIT" target="_parent"><img src="https://img.shields.io/github/license/probml/pyprobml"/></a>

<a href="https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/figures//chapter6_figures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cloning the pyprobml repo

In [None]:
!git clone https://github.com/probml/pyprobml 
%cd pyprobml/scripts

# Installing required software (This may take few minutes)

In [None]:
!apt-get install octave  -qq > /dev/null
!apt-get install liboctave-dev -qq > /dev/null

In [None]:
%%capture
%load_ext autoreload 
%autoreload 2
DISCLAIMER = 'WARNING : Editing in VM - changes lost after reboot!!'
from google.colab import files

def interactive_script(script, i=True):
  if i:
    s = open(script).read()
    if not s.split('\n', 1)[0]=="## "+DISCLAIMER:
      open(script, 'w').write(
          f'## {DISCLAIMER}\n' + '#' * (len(DISCLAIMER) + 3) + '\n\n' + s)
    files.view(script)
    %run $script
  else:
      %run $script

def show_image(img_path):
  from google.colab.patches import cv2_imshow
  import cv2
  img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
  img=cv2.resize(img,(600,600))
  cv2_imshow(img)

## Figure 6.1:<a name='6.1'></a> <a name='\pycodebernoulli\_entropy\_fig'></a> 


  Entropy of a Bernoulli random variable as a function of $\theta $. The maximum entropy is $\qopname o log _2 2 = 1$.  
Figure(s) generated by [bernoulli_entropy_fig.py](https://github.com/probml/pyprobml/blob/master/scripts/bernoulli_entropy_fig.py) 

In [None]:
interactive_script("bernoulli_entropy_fig.py")

## Figure 6.2:<a name='6.2'></a> <a name='seqlogo'></a> 


  (a) Some aligned DNA sequences. Each row is a sequence, each column is a location within the sequence. (b) The corresponding  \bf position weight matrix  represented as a sequence logo. Each column represents a probablity distribution over the alphabet $\ A,C,G,T\ $ for the corresponding location in the sequence. The size of the letter is proportional to the probability. The height of column $t$ is given by $2-H_t$, where $0 \leq H_t \leq 2$ is the entropy (in bits) of the distribution $\mathbf  p _t$. Thus deterministic distributions (with an entropy of 0, corresponding to highly conserved locations) have height 2, and uniform distributions (with an entropy of 2) have height 0.  
Figure(s) generated by [seqlogoDemo.m](https://github.com/probml/pmtk3/blob/master/demos/seqlogoDemo.m) 

In [None]:
!octave -W seqlogoDemo.m >> _

## Figure 6.3:<a name='6.3'></a> <a name='KLreverse'></a> 


  Illustrating forwards vs reverse KL on a bimodal distribution. The blue curves are the contours of the true distribution $p$. The red curves are the contours of the unimodal approximation $q$. (a) Minimizing forwards KL, $\KL \left ( p \middle \delimiter "026B30D  q \right )$, wrt $q$ causes $q$ to ``cover'' $p$. (b-c) Minimizing reverse KL, $\KL \left ( q \middle \delimiter "026B30D  p \right )$ wrt $q$ causes $q$ to ``lock onto'' one of the two modes of $p$. Adapted from Figure 10.3 of <a href='#BishopBook'>[Bis06]</a> .  
Figure(s) generated by [KLfwdReverseMixGauss.m](https://github.com/probml/pmtk3/blob/master/demos/KLfwdReverseMixGauss.m) 

In [None]:
!octave -W KLfwdReverseMixGauss.m >> _

## Figure 6.4:<a name='6.4'></a> <a name='entropy'></a> 


  The marginal entropy, joint entropy, conditional entropy and mutual information represented as information diagrams. Used with kind permission of Katie Everett. 

In [None]:
show_image("/content/pyprobml/notebooks/figures/images/ceb4.png")

## Figure 6.5:<a name='6.5'></a> <a name='MIC'></a> 


  Left: Correlation coefficient vs maximal information criterion (MIC) for all pairwise relationships in the WHO data. Right: scatter plots of certain pairs of variables. The red lines are non-parametric smoothing regressions fit separately to each trend. From Figure 4 of <a href='#Reshef11'>[Res+11]</a> . Used with kind permission of David Reshef. 

In [None]:
show_image("/content/pyprobml/notebooks/figures/images/{MICfig4}.png")

## References:
 <a name='BishopBook'>[Bis06]</a> C. Bishop "Pattern recognition and machine learning". (2006). 

<a name='Reshef11'>[Res+11]</a> D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander, M. Mitzenmacher and P. Sabeti. "Detecting Novel Associations in Large Data Sets". In: Science (2011). 

