some updates on binary classification

shogun-toolbox · Sep 23, 2013 · 8051c74 · 8051c74
1 parent d8fe94e
commit 8051c74
Showing 1 changed file with 49 additions and 1 deletion.
diff --git a/doc/ipython-notebooks/gaussian_process/gaussian_process_classification.ipynb b/doc/ipython-notebooks/gaussian_process/gaussian_process_classification.ipynb
@@ -38,7 +38,7 @@
       "where $\\mathbf{f}|\\boldsymbol{\\theta}\\sim\\mathcal{N}(\\mathbf{m}_\\theta, \\mathbf{C}_\\theta)$ is the joint Gaussian distribution  for the GP variables, with mean $\\mathbf{m}_\\boldsymbol{\\theta}$ and covariance $\\mathbf{C}_\\theta$. The $(i,j)$-th entriy of $\\mathbf{C}_\\boldsymbol{\\theta}$ is given by the covariance or kernel between the $(i,j)$-th  covariates $k(\\mathbf{x}_i, \\mathbf{x}_j)$.\n",
       "\n",
       "\n",
-      "Mean and covariance are both depending on hyper-parameters coming from a prior distribution $\\boldsymbol{\\theta}\\sim p(\\boldsymbol{\\theta})$. The data itself $\\mathbf{y}\\in \\mathcal{Y}^n$ (no assumptions on $\\mathcal{Y}$ for now) is modelled by a likelihood function $p(\\mathbf{y}|\\mathbf{f})$, which gives the probability of the data given a state of the latent Gaussian variables $\\mathbf{f}$.\n",
+      "Mean and covariance are both depending on hyper-parameters coming from a prior distribution $\\boldsymbol{\\theta}\\sim p(\\boldsymbol{\\theta})$. The data itself $\\mathbf{y}\\in \\mathcal{Y}^n$ (no assumptions on $\\mathcal{Y}$ for now) is modelled by a likelihood function $p(\\mathbf{y}|\\mathbf{f})$, which gives the probability of the data $\\mathbf{y}$ given a state of the latent Gaussian variables $\\mathbf{f}$, i.e. $p(\\mathbf{y}|\\mathbf{f}):\\mathcal{Y}^n\\rightarrow [0,1]$.\n",
       "\n",
       "\n",
       "TODO: Write about integrals needsd for inference (for fixed hyperparmeters). General form, no specific likelihood yet\n",
@@ -69,6 +69,54 @@
       "Non-Linear, Binary Bayesian Classification"
      ]
     },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "In binary classification, the observed data comes from a space of discrete, binary labels, i.e. $\\mathbf{y}\\in\\mathcal{Y}^n=\\{-1,+1\\}^n$. To model these observations with a GP, we need a likelihood function $p(\\mathbf{y}|\\mathbf{f})$ that maps a  set of such discrete observations to a probability, given a fixed response $\\mathbf{f}$ of the Gaussian Process.\n",
+      "\n",
+      "In regression, this way straight-forward, as we could simply use the response variable $\\mathbf{f}$ itself, plus some Gaussian noise, which gave rise to a probability distribution. However, now that the $\\mathbf{y}$ are discrete, we cannot do the same thing. We rather need a function that squashes the Gaussian response variable itself to a probability, given some data. This is a common problem in Machine Learning and Statistics and is usually done with some sort of *Sigmoid* function of the form $\\sigma:\\mathbb{R}\\rightarrow[0,1]$. One popular choicefor such a function is the *Logit* likelihood, given by\n",
+      "\n",
+      "$p(\\mathbf{y}|\\mathbf{f})=\\prod_{i=1}^n p(y_i|f_i)=\\prod_{i=1}^n \\frac{1}{1-\\exp(-y_i f_i)}.$\n",
+      "\n",
+      "This likelihood is implemented in Shogun under <a href=\\\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLogitLikelihood.html\\\">CLogitLikelihood</a>. We can easily use the class to illustrate the sigmoid function for a 1D example and a fixed data point with label $+1$"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "from modshogun import LogitLikelihood\n",
+      "from modshogun import BinaryLabels\n",
+      "\n",
+      "lik=LogitLikelihood()\n",
+      "\n",
+      "# A couple of Gaussian response functions, 1-dimensional here\n",
+      "F=linspace(-10.0,10.0)\n",
+      "\n",
+      "# Single observation label with +1\n",
+      "lab=BinaryLabels(array([1.0]))\n",
+      "\n",
+      "# compute log-likelihood for all values in F\n",
+      "log_liks=zeros(len(F))\n",
+      "for i in range(len(F)):\n",
+      "    # Shogun expects a 1D array for f, not a single number\n",
+      "    f=array(F[i]).reshape(1,)\n",
+      "    log_liks[i]=lik.get_log_probability_f(lab, f)\n",
+      "    \n",
+      "# in fact, loops are slow and Shogun offers a method to compute the likelihood for many f. Much faster!\n",
+      "log_liks=lik.get_log_probability_fmatrix(lab, F.reshape(1,len(F)))\n",
+      "\n",
+      "# plot the sigmoid function itself, note that Shogun computes it in log-domain, so we have to exponentiate\n",
+      "plot(F, exp(log_liks))\n",
+      "ylabel(\"$p(y_i|f_i)$\")\n",
+      "xlabel(\"$f_i$\")\n",
+      "_=title(\"Logit Likelihood\")"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
     {
      "cell_type": "markdown",
      "metadata": {},