atom.xml

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Exchangeable random experiments]]></title>
  <link href="http://snippyhollow.github.com/atom.xml" rel="self"/>
  <link href="http://snippyhollow.github.com/"/>
  <updated>2014-11-03T16:51:36+01:00</updated>
  <id>http://snippyhollow.github.com/</id>
  <author>
    <name><![CDATA[syhw]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[From Logistic Regression to Backprop (and Beyond)]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/11/03/from-logistic-regression-to-backprop-and-beyond/"/>
    <updated>2014-11-03T16:17:00+01:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/11/03/from-logistic-regression-to-backprop-and-beyond</id>
    <content type="html"><![CDATA[<p>I’ve started a <a href="https://github.com/SnippyHolloW/DL4H">repo</a> where I’d like to put some very basic but also very didactic (i.e. python+numpy and/or C) code about all stuff machine learning.</p>

<p>So the first thing I did about it is a practical for my labmates at <a href="http://www.lscp.net/">LSCP</a>, <a href="http://nbviewer.ipython.org/github/SnippyHolloW/DL4H/blob/master/from_logistic_regression_to_deep_nets.ipynb">“From Logistic Regression to Deep Nets, a Crash Course”</a>. It’s very imperfect, but it’s “out there™”.  </p>

<p>E.g. it has a very basic intro to stochastic gradient descent:
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAgIAAAFHCAYAAADTOCSOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlclNX+B/DPsLmwKirKUiCLrLK4paaBhOZG5nZdUtPyUl7XvGW+KjNfuWXdm+bNlp9bVq65lWimhTvirrkiSgGiYSoIiMB4fn+cGEXAAZmZZ5bP+/WaF7M8M893nu71+cw55zlHJYQQICIiIotkpXQBREREpBwGASIiIgvGIEBERGTBGASIiIgsGIMAERGRBWMQICIismCKBoFRo0bBzc0NYWFhVW4zfvx4+Pv7Izw8HMeOHTNgdUREROZP0SAwcuRIbNu2rcrXExMTcfHiRaSmpuLLL7/Ea6+9ZsDqiIiIzJ+iQaBTp05o0KBBla9v3rwZI0aMAAC0a9cOt27dwrVr1wxVHhERkdkz6jECWVlZ8PLy0jz29PREZmamghURERGZF6MOAgDw8AzIKpVKoUqIiIjMj43SBTyKh4cHMjIyNI8zMzPh4eFRYTuGAyIisjS6WirIqFsE4uPj8fXXXwMAkpOT4eLiAjc3t0q3FULwpsfbe++9p3gN5n7jMeZxNpcbj7H+b7qkaIvA4MGDsWvXLly/fh1eXl54//33UVJSAgBISEhAjx49kJiYCD8/P9jb22Pp0qVKlktERGR2FA0CK1eu1LrNwoULDVAJERGRZTLqrgEyHtHR0UqXYPZ4jA2Dx1n/eIxNi0rourNBASqVSud9JkRERMZKl+c9tggQERFZMAYBIiIiC8YgQEREZMEYBIiIiCwYgwAREZEFYxAgIiKyYAwCREREFoxBgIiIyIIxCBAREVkwBgEiIiILxiBARERkwRgEiIiILBiDABERkQVjECAiIrJgDAJEREQWjEGAiIjIgjEIEBERWTCzCQJCKF0BERGR6TGbIJCTo3QFREREpsdsgsDx40pXQEREZHrMJggcPKh0BURERKbHbIJARobSFRAREZkeswkCN28qXQEREZHpMZsgcOOG0hUQERGZHrMJAmwRICIiqjmzCQJsESAiIqo5BgEiIiILZjZB4O5d4M4dpasgIiIyLWYTBHx8gLQ0pasgIiIyLWYTBPz9gdRUpasgIiIyLWYTBJo3By5fVroKIiIi02I2QaBhQyA3V+kqiIiITIvZBAEnJwYBIiKimjKrIJCXp3QVREREpoVBgIiIyIKZTRBwdmbXABERUU0pGgS2bduGwMBA+Pv7Y+7cuRVeT0pKgrOzMyIjIxEZGYkPPvigys9iiwAREVHN2Si1Y7VajbFjx2LHjh3w8PBAmzZtEB8fj6CgoHLbPfPMM9i8ebPWz3NyAv78U1/VEhERmSfFWgRSUlLg5+cHb29v2NraYtCgQdi0aVOF7YQQ1fq8pk2B9HRg40YdF0pERGTGFAsCWVlZ8PLy0jz29PREVlZWuW1UKhX279+P8PBw9OjRA2fOnKny8xo2BMaMATIz9VYyERGR2VGsa0ClUmndJioqChkZGahfvz62bt2KPn364MKFC5VuO336dPz2G3D6NBAaGo3o6GgdV0xERKSMpKQkJCUl6eWzVaK6be86lpycjOnTp2Pbtm0AgNmzZ8PKygpTpkyp8j0+Pj44cuQIGjZsWO55lUoFIQQWLJDrDXz6qV5LJyIiUlTZeU8XFOsaaN26NVJTU5Geno7i4mKsXr0a8fHx5ba5du2a5oumpKRACFEhBDzI1RX46y+9lk1ERGRWFOsasLGxwcKFC9GtWzeo1Wq8/PLLCAoKwhdffAEASEhIwLp167Bo0SLY2Nigfv36WLVq1SM/09UVuH7dENUTERGZB8W6BnSprInk8GEgIQE4ckTpioiIiPTHLLoG9IFdA0RERDVjdkGAXQNERETVZ1ZBwNERKC4GioqUroSIiMg0mFUQUKnYPUBERFQTZhUEAAYBIiKimjC7INCiBZCcrHQVREREpsHsgsCoUcCKFUpXQUREZBrMah4BACgsBNzcgCtX5OBBIiIic8N5BB6hfn3ZPfCIhQqJiIjob2YXBAAgIACoYpFCIiIieoBZBgF/f7kKIRERET2aWQaBiAjg0CGlqyAiIjJ+ZjdYEJDTDPv6AjduANbWChZGRESkBxwsqEWjRoC9PZCdrXQlRERExs0sgwAA+PgAly8rXQUREZFxM9sg4O0NpKcrXQUREZFxM9sg4O8PnD+vdBVERETGzWyDQKtW8sqBmzcB0x8OSUREpB9medUAAFy9CjRrJu//8gsQE6NAYURERHrAqwaqoWnT+/cLC5Wrg4iIyJiZbRAAAA8P+ZdBgIiIqHJmHQT++AN48UUGASIioqqYdRCwspITC925o3QlRERExsmsgwAA1KvHFgEiIqKqmH0QqF+fLQJERERVMfsgwBYBIiKiqpl9EGCLABERUdUsIgiwRYCIiKhyZh8E6tVjiwAREVFVzD4I1K8P5OUpXQUREZFxMvsgEBoq1xpYv17pSoiIiIyP2S469KDoaGDXLiArC3B3N1xdRERE+qDLRYdsqrNRVlYW0tPToVarIYSASqVC586ddVKAISxaBAQHA+fPMwgQERE9SGuLwJQpU7B69WoEBwfD2tpa8/wPP/yg9+KqqzrJaPRoYOdOoEMH4Kuv5CBCIiIiU6TLFgGtQSAgIACnTp1CnTp1dLJDfajOATl6FNi+HVi1CvjgA6BXLwMVR0REpGMG7Rrw9fVFcXGxUQeB6oiKkrdLl4DMTKWrISIiMg5ag0C9evUQERGB2NhYTRhQqVRYsGCB3ovTB3d3OWiQiIiIqhEE4uPjER8fD5VKBQCawYK6sG3bNkycOBFqtRqvvPIKpkyZUmGb8ePHY+vWrahfvz6WLVuGyMjIWu3TwwNITq7VRxAREZkNrUHgpZdewt27d3HhwgUAQGBgIGxtbWu9Y7VajbFjx2LHjh3w8PBAmzZtEB8fj6CgIM02iYmJuHjxIlJTU3Hw4EG89tprSK7lWdzTE/jjj9pWT0REZB60TiiUlJSEgIAA/Otf/8K//vUv+Pv7Y9euXbXecUpKCvz8/ODt7Q1bW1sMGjQImzZtKrfN5s2bMWLECABAu3btcOvWLVy7dq1W+w0OBk6fBq5ckV0Epj+LAhER0ePT2iLw+uuvY/v27WjRogUA4MKFCxg0aBCOHj1aqx1nZWXBy8tL89jT0xMHDx7Uuk1mZibc3Nwee79PPAHcuAH4+gJFRfK5jRuB559/7I8kIiIyWVqDQGlpqSYEAPJywtLS0lrvuLrjDB6+PKK24xNUKmDOHODmTaB5c+Cll7g6IRERWS6tQaBVq1Z45ZVX8OKLL0IIgW+//RatW7eu9Y49PDyQkZGheZyRkQFPT89HbpOZmQkPD49KP2/69Oma+9HR0YiOjq5y3xMn3r+/cSNQXFyz2omIiAwpKSkJSUlJevlsrRMKFRUV4X//+x/27dsHAOjUqRPGjBlT63kFyloadu7cCXd3d7Rt2xYrV66sMFhw4cKFSExMRHJyMiZOnFjpYMHaTKzw2mtAWBgwZsxjfxUiIiKDMuiEQnXr1sXkyZMxefJknexQs2MbGyxcuBDdunWDWq3Gyy+/jKCgIHzxxRcAgISEBPTo0QOJiYnw8/ODvb09li5dqtMaAMDenl0DRERkuapsERgwYADWrl2L0NDQCv3yKpUKJ0+eNEiB1VGbZDRtGmBtDbz3no6LIiIi0hODtAjMnz8fALBlyxadD9gzJvb2wF9/KV0FERGRMqqcR8D97/V6P/vsM3h7e5e7ffbZZwYrUN/s7YGCAqWrICIiUobWCYW2b99e4bnExES9FKMEjhEgIiJLVmXXwKJFi/DZZ58hLS0NYWFhmudv376Njh07GqQ4Q2CLABERWbIqg8CQIUPQvXt3vPXWW5g7d65mnICjoyNcXV0NVqC+2dsDt28rXQUREZEytM4jcODAAYSEhMDJyQkAkJeXh7Nnz6Jdu3YGKbA6ajN68soVIDwc2L8f8PfXcWFERER6oMurBrSOEXjttdfg4OCgeWxvb49XX31VJzs3Bu7uQOfOwLFjSldCRERkeFqDAABYWd3fzNraGmq1Wm8FKSE4GDhzRukqiIiIDE9rEPDx8cGCBQtQUlKC4uJizJ8/H82bNzdEbQbz7LPAokVAWprSlRARERmW1iDw+eefY9++ffDw8ICnpyeSk5Px5ZdfGqI2g3nmGeCdd4AXXlC6EiIiIsPSOljQFOhi0IRaDdjayr9mNHEiERGZIYMuOvTnn3/iq6++Qnp6OkpLSzUFLFmyRCcFGAtra8DODrh7F6hbV+lqiIiIDENrEHj++efRuXNnxMXFaQYNmtNaAw+qX1/OMsggQERElkJrELhz5w7mzp1riFoUV6+eDAINGypdCRERkWFoHSzYq1cvbNmyxRC1KK6sRYCIiMhSaB0s6ODggMLCQtjZ2cHW1la+SaVCXl6eQQqsDl0NmggPB77+Wv4lIiIyVgYdLJifn6+THZkCtggQEZGl0RoEdu/eXenznTt31nkxSmMQICIiS6M1CHz44YeaqwSKioqQkpKCVq1a4ZdfftF7cYbGIEBERJZGaxD48ccfyz3OyMjAhAkT9FaQksquGiAiIrIU1Vp06EGenp44e/asPmpRXP36wJ07SldBRERkOFpbBMaNG6e5f+/ePRw/fhytWrXSa1FK8fMDFi8GrKyAgQM5sRAREZk/rZcPLl++XHPfxsYG3t7e6Nixo94LqwldXUZRXAx8+SXw44/Arl1AUZF87u+rJomIiIyCLi8frDIIxMbGYufOnXjzzTfx4Ycf6mRn+qLLA1Lm/HkgMBA4cwYICtLpRxMREdWKQYJAcHAw/u///g+jRo3Cd999V+H1qKgonRSgC/oIAgAQHw84OwOzZwOenjr/eCIiosdikCCwdu1aLF68GPv27UPr1q0rvP7rr7/qpABd0FcQ2LQJePNNYOpU4KWXdP7xREREj8UgQaDMjBkzMG3aNJ3sTF/0FQQAYMoUwMVFhgEiIiJjoMvzntbLB409BOhbs2bA1atKV0FERKQfNZ5HwNI0bQpkZytdBRERkX4wCGjh7g5kZipdBRERkX5oDQIXL15EUVERADlAcMGCBbh165beCzMWLVsCp04BpaVKV0JERKR7WoNAv379YGNjg4sXLyIhIQEZGRkYMmSIIWozCi4uwBNPAH37AitXKl0NERGRbmkNAlZWVrCxscH69esxbtw4zJs3D9kW1mm+ZQvQujUwcSKgVitdDRERke5oDQJ2dnb47rvv8PXXX6NXr14QQqCkpMQQtRkNb29g2jTAyQk4d07paoiIiHRHaxBYsmQJkpOT8fbbb8PHxwfp6ekYNmyYIWozOm3bAlu3Kl0FERGR7midUOhBN27cQGZmJlq2bKnPmmpMnxMKPejwYaBnT+DaNb3vioiIqEoGnVDomWeeQV5eHm7cuIFWrVrhlVdewaRJk3Syc1MTFQXcvAncvat0JURERLqhNQjk5ubCyckJ69evx/Dhw5GSkoIdO3YYojajY2XFCYaIiMi8aA0CarUa2dnZWLNmDXr27AlANknUxo0bNxAXF4eAgAB07dq1ynkJvL290bJlS0RGRqJt27a12qeuuLsDV64oXQUREZFuVGutgW7dusHX1xdt27ZFWloa/P39a7XTOXPmIC4uDhcuXEBsbCzmzJlT6XYqlQpJSUk4duwYUlJSarVPXXnySeDLL4E7d5SuhIiIqPZqNFhQVwIDA7Fr1y64ubnh6tWriI6OxrlKrsvz8fHB4cOH4erq+sjPM9RgQQA4fRoICwNWrwYGDDDILomIiMox6DLEGRkZGD9+PPbu3QsA6Ny5M+bPnw9PT8/H3mmDBg1w8+ZNAIAQAg0bNtQ8flDz5s3h7OwMa2trJCQkYPTo0ZV/CQMGAQCYO1deRhgdLdchcHMDunSRt1r2mhAREWmly/OejbYNRo4ciaFDh2LNmjUAgG+//RYjR47Ezz///Mj3xcXF4Wol6/fOnDmz3GOVSlXlmIN9+/ahWbNmyMnJQVxcHAIDA9GpU6dKt50+fbrmfnR0NKKjox9ZX20MHgzcugXs2QP06gWcPAk8+yxw4oRcm4CIiEiXkpKSkJSUpJfP1toiEB4ejhMnTmh9riYCAwORlJSEpk2bIjs7GzExMZV2DTzo/fffh4ODAyZPnlzhNUO3CFSmTx9g2DCgXz9FyyAiIgtg0HkEXF1dsWLFCqjVapSWluKbb75Bo0aNarXT+Ph4LF++HACwfPly9OnTp8I2hYWFuH37NgCgoKAA27dvR1hYWK32q0/+/sD+/UpXQUREVDNaWwTS09Mxbtw4JCcnAwA6dOiATz/9FE888cRj7/TGjRsYOHAg/vjjD3h7e2PNmjVwcXHBlStXMHr0aGzZsgWXLl1C3759AQClpaUYOnQopk6dWvmXMIIWgc2bgeefB6ZOBa5fB8aMASIiFC2JiIjMlEEHC5oCYwgCgAwDixYB7doBH34oWwmaNZMDCzmIkIiIdMUgQWDcuHGPLGDBggU6KUAXjCUIPOjkSblkcffuQEoKUIsGFCIionIMctVAq1atKh3NL4So9cyClqDs6oGoKGD2bNlSQEREZGzYNaBnKSlAjx7Ajh0cM0BERLph0KsGqHbatgVeeQVYsUI+3rcPSEtTtiYiIqIyWicUotobPBh46ikgKAgYPVrOObBhg9JVERERsUXAIMLDgS1bgKVL5ePGjZWth4iIqIzWIHD+/HnExsYiJCQEAHDy5El88MEHei/M3HTpIrsFNm8GLl4ECguVroiIiKgaQWD06NGYNWsW7OzsAABhYWFYuXKl3gszV08+Cfz6K9Cxo9KVEBERVSMIFBYWol27dprHKpUKtra2ei3KnIWFyTkGsrOBs2fl4kVGesEDERFZAK1BoHHjxrh48aLm8bp169CsWTO9FmXOVCoZBsaOBUJD5RLGgYHAjBlKV0ZERJZI6zwCaWlp+Oc//4n9+/ejQYMG8PHxwbfffgtvb28DlaidMc8jUBW1Gjh8WM4t8OOPwIsvAnl5QFljy6+/Ai+8AJw7BzRtqmytRERkXBRZa6CgoAD37t2Do6OjTnasS6YYBB7WujUwaRIwdKh8/N//Aq+/DsTHA717A4MGAQ4OytZIRETGwaBBoKioCN9//z3S09OhVqs1UwxPmzZNJwXogjkEgaQkuWLhiROyVWDSJMDRUd7/4QfZOlDF4otERGRhDLLWQJnnn38eLi4uaNWqFerWrauTnVJFnTsDrq6AvT0wa5YcSPjyy8CAAXK9goQEoFMn4Omnla6UiIjMidYWgdDQUPz222+GquexmEOLQJn0dDkDoZ0dcPw44OMDlJTIeQj27pUtBR9/zGWNiYgsmUHXGujQoQNOnjypk52Rdt7ewJ9/ypuPj3zO1hbYswe4fBn46it5ySEREZEuVNkiEBYWBgBQq9VITU2Fj48P6tSpI9+kUhlVODCnFgFtfHyAnTuB5s2VroSIiJRikDECP/zwQ5U7U7FdWjEuLmwRICIi3akyCJTNEzBs2DCsKFtD92+VPUeGwSBARES6pHWMwMMDBUtLS3HkyBG9FUSPxiBARES6VGUQmDVrFhwdHXHq1Ck4Ojpqbk2aNEF8fLwha6QHODoCs2cD8+cD334LlJYqXdF9paVyQCMREZkOrZcPvvXWW5gzZ46h6nksljRYMDJSXlb46qvAhg3y1r690lVJmzYBffoA9+7x8kYiIn0y6OWDxh4CLM2yZXL2wUWLgNhYoEMHOdGQMSibfdrHB/j8c2VrISKi6tEaBMi4hIcDLVvK+0VF8u+BA8YxbkCtlq0TU6YAP/2kdDVERFQdVQaBy+zsNXozZgCbN8tljVNTla5GjhFwcgI6dgTOnzeu8QtERFS5KoNA//79AQBdunQxWDFUMyEhcmXCgADg/fflL/H33gNOnVKmHrUasLYG/P3lVMm2tnIMAxERGa8q5xFQq9WYOXMmLly4gP/85z/lBiWoVCq8/vrrBimQtJs9G1i5Up6Eb94EoqPlCbhz5/vbDBgADB8ug4O+qNWAjQ1Qrx5w4wYwdy6QkiJXTiQiIuNUZRBYtWoVNm7cCLVajdu3b2uWHy77S8ajeXPg7bfvP27cWA4qfDAIrFsHZGQAvXrpb0R/aakMIwBQt64cyzBnDpCYCISGAg0ayFaCyhaxLC2VIYKIiAxL6+WDiYmJ6NGjh6HqeSyWdPlgdVy6BLRpIwcWqlRAfr78ZQ7Ik3L37vrZ75o1wNq18gbI/b7xBnD6tBwzUFAA3L0LPPecXGr57+UssGSJXHK5Th1Zt6cn0LMn8OKL+qmTiMjUGWStgTIdOnTApEmTsHv3bgBAdHQ0pk2bBmdnZ50UQLrXvDlw6JDsp795E+jfH9i2TZ6M163TXxAo6xoo4+AgL3N8UHGxDAHPPgvMmycfz5kDfPEFMHiwXGr5r7+A118HWrWSSzLXRGmpHCMRGVn770NEZAm0BoFRo0YhLCwMa9euhRACK1aswMiRI7F+/XpD1EePqXnz+ysUlk3wU68esHq1/vb5YNdAVezsgOnTZbfBmjWyrrQ0OR+Co+P9kJKRAUycCPzwg3xPdW3ZIic1+u47OS6C3Q1ERI+mdR6BtLQ0vP/++2jevDl8fX0xffp0pKWlGaI20pGyMQE+PvqdArjsqoHq6NsXWLVKDnIsKpJjCB7073/LsQRt2wLa1re6fh1YuhS4c0deRmljI7sauC4WEZF2WoNAvXr1sGfPHs3jvXv3on79+notivTD3V2O5r9zRz+fX5Mg8KA6dSo+Z2sLfP898O67wKRJwNNPy8sjy7rEiopkqwEAfP01MGoUUL++HJOwaBGwc6d8X9k2RERUOa0Np59//jmGDx+O3NxcAECDBg2wfPlyvRdGumdtDTRrBly5Avj66v7zdT3y38YG6NdP9vdnZgIJCcC+fXIQ4rlz8sQfEABcuCBbF/r3B7KygCZNZGvC0KEyIIweLa+WYH4lIqpI61UDZcqCgDEOEuRVA9XXti2wYAHw1FO6/+zPPpMD9R4eIKgrqanAJ58AMTFA165yzENSkhwUOWQIYG9ffvs7d4CvvpIh4cwZwNVVbvP007JWIiJTZdCrBsoYYwCgmmvcWPap68Pjdg1Ul78/8L//lX8uLq7q7evVA8aPB8aNA/78E7h9W7YmxMXJINGxozweajWQmws0bKi/2omIjBUXHbIwjRsDOTn6+eyHLx80FioV4OYG+PkBERHAp58C//2vDBbPPy+vVvDwULpKIiJlKBIE1q5di5CQEFhbW+Po0aNVbrdt2zYEBgbC398fc+fONWCF5qtRIxkESkrKD7zTRQtTdS4fNAaDBgG7dgEnT8pJiyZNAoKDla6KiEgZWn+/ff/99xWmFHZ2dkZYWBiaNGnyWDsNCwvDhg0bkJCQUOU2arUaY8eOxY4dO+Dh4YE2bdogPj4eQTWdYYbK8fKS1+e//ba8nDAsDFi/Xv5Cnjixdp+t764BXXviifu3n39WuhoiImVoDQJLlizBgQMHEBMTAwBISkpCVFQULl++jGnTpmH48OE13mlgYKDWbVJSUuDn5wdvb28AwKBBg7Bp0yYGgVoaO1b2jzs4ANnZcgBekyby13FtGWvXgDZ2dnKGQyIiS6T1n+2SkhKcPXsWbm5uAIBr165h2LBhOHjwIDp37vxYQaA6srKy4OXlpXns6emJgwcP6mVflsTa+v60vV5e8ioCDw/gnXfkpXkuLnJ548dhKl0DD7O1lV0lRESWSGsQyMjI0IQAAGjSpAkyMjLg6uoKu0fM/RoXF4erV69WeH7WrFnoXY21cLnCoeEEBwNHjsjL6po3l/P9u7kBVjUcQWJqXQNl2CJARJZMaxCIiYlBz549MXDgQAgh8P333yM6OhoFBQVwcXGp8n0/17LT1cPDAxkPTAuXkZEBT0/PKrefPn265n50dDSio6NrtX9L0qQJkJcn1yTo1UsGg2HD5HwDNaFWV77EsLGzs2OLABEZt6SkJCQlJenls7VOKHTv3j2sX78e+/btAwB07NgR/fr108kv9piYGHz00Udo1apVhddKS0vRokUL7Ny5E+7u7mjbti1WrlxZ6RgBTiikW9nZQGCgvLa+Jt56C3B2BqZO1U9d+pKVJZc/vnJF6UqIiKpHl+c9rY2/VlZWePrpp9GlSxd06dIFnTt3rnUI2LBhA7y8vJCcnIyePXui+99Lzl25cgU9e/YEANjY2GDhwoXo1q0bgoOD8Y9//IMDBQ2kaVN5SWFN1yQw5a4BtggQkaXS2iKwZs0avPHGG3jmmWcAALt378a8efMwYMAAgxRYHWwR0D0vLzlW4Mknq/+e118HPD3lX1Ny65b8njVtASEiUopBpxj+4IMPcOjQIc2cATk5OYiNjTWqIEC65+YGXLtWsyBgqlcNsEWAiCyZ1q4BIQQaN26seezq6spf3xagcWM56VBN+s1NtWvA1pZXDRCR5dLaIvDcc8+hW7duGDJkCIQQWL16taZPn8xX166yiT82Vi71u2YNUKfOo99jqkHAxkbWfu9ezS+ZJCIydVrHCAghsH79euzduxcqlQqdOnXCCy+8YKj6qoVjBPTjzz/l1MOffQbs2CFH1j/K6NFygqLRow1Tny7Z2cnVCbWFHSIiY6DL857WIGAKGAT06+WX5VUEy5bJZvT0dCAxUQ4ozM+Xy/0+/zwwahTQqZP8a2ocHICrV+VfIiJjZ5DBgg4ODlVeJqhSqZCXl6eTAsj4TZki1yhwcZGTDV27JhcrunsXcHUFfv1VhgJT7RoAOE6AiCxXlUEgPz/fkHWQEQsIALZvl03nO3fK+x99BNSvL1+fMEE+Z6qLDgG8coCILBeHRlG1OToCffrIMQNlIQAAunUDfvrJdC8fBNgiQESWi0GAau2ZZ4AzZ+SVBaYaBNgiQESWikGAas3eHli9Wt6vV0/ZWh4XWwSIyFKZaI8uGZuYGHk1gZeX0pU8nrt35SWSgYFKV0JEZFgMAqQzNZkE9JSvAAAYEElEQVSO2Ni8+668OTvLwZHt2ildERGRYXAeAaK/ffwxcPQokJwMhIYC/fvLNReaNgVatADGjQNUKjnjYosWSldLRJaMEwo9hEGAdKm4GPjkE+DECTlnwpUrQGqqbCVo0gRo1Aj48kulqyQiS8Yg8BAGAdK3oiI5R8K5c0D79sDAgXJtAk9PYOpUIClJXlK5cyfg7g68+qpsPSAi0gcGgYcwCJAh/fQTkJkpT/TLlslLJ93d5fiC5s2BCxeAQ4eAAQOAlSuVrpaIzBGDwEMYBEgpJSXA5csyAJTNqigEkJYmr6TIyJDP3b3LBY2ISHd0ed7jPAJEtWBrK68yeHBqZZVKBoObN4HvvgM6dgTq1gUWLJDdCURExoRBgEgPrKzkOIHly4F+/YCtW4HFi4Fp0+TgQyIiY8GuASIDOXoUePtt4PhxuVzzu+/K9Rvc3eUUx0RE1cWuASITFBUlWwY2bAAaNwYGDZKBwNsb2LRJji0oLAS6d2cXAhEZDlsEiBS2fj0wdKgceGhtLecxyMqSLQVERJVhiwCRGenbF/jrLzlXwZUrcr2D33+///qUKcA77wC3bytXIxGZL7YIEBmZAQOAXr2AESOA3r2BH3+8f1VCcDDg7y9fi4iQYwvq1QOcnJStmYgMiy0CRGasdWvgpZfkpEW7dgEFBbLbIDUVOHlSnvynTAGeegoICwOaNQN8fID//U9ObkREVBNsESAyQgMHAmvXAq1aAYcP339epZKPW7W6/5xaDWzZIi9V/PNPYM8ew9dLRIbFmQUfwiBA5qaoCDh4EPDykpMTlbl3T85RUJnTp+WKiWfPGqZGIlIOg8BDGASI5EqJoaFATo7SlRCRvjEIPIRBgEiOI6hXT15+WFWrARGZBw4WJKIKbG3lTIW3bildCRGZEgYBIjPi6gqcOqV0FURkShgEiMzIhAlAQgKwbx+QnQ3k5ytdEREZO44RIDIjQsgg8NVX959bvVpejgjIdQ5u3JBXIhQVAQ4OgJsbcOGCnNXQ21t2LTRqJCcpCgyU2wgh33fnDuDpqchXI6IHcLDgQxgEiMoTAjhyRM4x0KsX8NxzcmKiLVvkbIUbNwIxMfLkfv26DAaenrIVwclJTnl844acxKikRF62WL8+UKeOvN+smZzToHlz4L//lbMdPq49e4A//pDho21boEED3R2Hmrp2TR67xo3lug9ExopB4CEMAkRVO3tWnmxLSoAePeQshELIE7k2ubnySgRbW7n9vXsyIFy7Ju8vWiRnQExIkKHg6lU586G9vWxJaNJE+/THbdoADRvKsQ3x8cDnn+vmez+OJk3ut37Y2QF168rwU7euvEVEAO3by5UkPT1lC4pKBfz6K5CSIkNMw4byFhjIhaNIfxgEHsIgQKSMnBzgvfeAS5eAixflYEVra9n6UFAA5OXJ7ggXF9mKMGmSvJ+WJk+g33wDPPmkfG9OjjzJ9uwpWwfq1ZOPvbyAoCD5WJ8KC+UJ/M4dGXLu3pW3oiJ5u3MH2LwZyMgAdu+WoSc0VM7y+PHHwMsvy+9044b8LunpcvGo6gQuoppiEHgIgwCRcZo5U3ZD/PWXPCkOGCBbABo1AhIT5XTJly7JE69KBRw4AJw/L8cp5OQAx4/LE2pmpvz17ewsA0Fenvz17ucng4WXl+yecHEBQkIer9bUVNmFkpZWve3z84HvvpMh5vRp4Icf7s/fIIRsMZgzRwYgGxs5v0N6umw5Uatl8AgLA0pLZQuKq6vsfsnJkV0lderIm6Mj0LTp430nUzN1qjwGDg7yv7OdnTx2fn7ymDRuLP834OoqW2iaNVO6YuUwCDyEQYDIuAkhT4R2dvd/IZeUADNmADdvAgsXPvr9OTkyDOTmypaGOnXkifjYMXlSTUqSn//774CvLxAQALRrB8TGyl/yarU8eTz5ZNX7SEoCpk2Tv/Z1YcECYPt22ZpQWiprcHOT37eoSLacZGXdb0HJy5OB5upVWWdxsWyRyMgAfvutduMwTIVKJa98AeR/t5ISeexWrpR/IyPla9ev31+Mq0kT2SVjZSWPpZWV/O/v6irvW1nJzy277+AATJ5c9RiQ0lLgxRfl/su6hMpukZEyfDg7y6Dn6Xk/xBq65cfkg8DatWsxffp0nDt3DocOHUJUVFSl23l7e8PJyQnW1tawtbVFSkpKpdsxCBARIAPDb7/JVoXNm4GjR+WJwtpaDoR0dQU6dABatJAnj8xMOagyPV3+sh80CFiyRJnab92SwaBJE/nLt0z37sCYMXLQZ2GhPCGZ40DG4mLZMlJcXPGkum6d/O7Dh99/7t49GZ4yM+Vr9+7JW2kpcOKEDFv37skQWvbavXvyv+/KlbLbqTL79wOjRsnWnAe7hm7fBg4dkkEuN1f+78XPT47BuXNHhgMnJ/nX0VG2bJQFiGbNZOtY/frl93X9uvwOTk6yW6omM4KafBA4d+4crKyskJCQgI8//rjKIODj44MjR46gYcOGj/w8BgEi0ubePfkP+bFj8h/v/Hz5j3RsrPxHODJSnohsbJSutLxx42Qrx4EDwJo1ssZGje7/7dJFLkndtCkQHq50tY/vr79kq8eNG/rdz9tvA/PnywGwavX9lhoXF/lcTg4wciTw7ruP/pzsbBkGHByA4GB5Qs/LkyEhL+9+gCgqApYulaGhfXvZGlFQIENGnToy9F2/Drz6KjBvXvW/hy7Pe4r8Tz4wMLDa2/IET0S6YGUluwvatVO6kpoJCpJ95zY2stVApZInzcJC+Wv4l1/kyeX0aeCJJ+SVCrGx8pfxuXPyb/368pf2m2/KFhFjdPu2/CWtbzNmAOPHy+NpYyNbVzIz73c1WFsDrVtr/5xmzcqPUWjcuHxLzoNiY+X4mEOHgK+/lv8N6tS5fzVOYiLwySeVvzcvT16lY2MDeHjI7jVdU3SMQExMzCNbBJo3bw5nZ2dYW1sjISEBo0ePrnQ7tggQkblSq+WJytFRtlxUJSNDNlsfPiy7GKys5MBEGxs5oVT//rILZNiw+83kp04BZ87Ik8tbb8kuk5IS+cvc0C0jJ08CQ4bIrh1Lk5Ym5/U4evT+IFFbW/laRIRsZVCrZSvQCy/IUPef/5hAi0BcXByuXr1a4flZs2ahd+/e1fqMffv2oVmzZsjJyUFcXBwCAwPRqVOnSredPn265n50dDSio6Mfp2wiIqNibf3oQY5lvLzkrWXLiq9duiS7DmbMkM3ZVlbyF/CyZcCuXcCff8qTcKNGsk/dyko+7+Gh869TJUO1CBgjb2/53Vu0uH/ZqlotA4GfH3D5MrBrVxK++CIJhw7pfnVRo24ReND7778PBwcHTJ48ucJrbBEgIqq5kpL7vzwfNH68DALBwbL/3NVVbteggXyuit9jtbJtG/Cf/8grLUgGgeJi2TJT2X8jkx8j8KCqvkhhYSHUajUcHR1RUFCA7du347333jNwdURE5quyEwwAfPihnC3x5k25BkXZZZtpacD06XIcQtkleZX9LS6WAxwHDy5/+Z21tfyV6+R0f96EMpbcIlAZa2v9T6JVRpEWgQ0bNmD8+PG4fv06nJ2dERkZia1bt+LKlSsYPXo0tmzZgkuXLqFv374AgNLSUgwdOhRTp06t9PPYIkBEZBg5OXK+hrLL8ir7W1oKrFolQ8SDl+CVlsrHeXnyxB8fL0fdp6TIMQKvvgp8+qnS39A0mPzlg7rGIEBEZFrS04Gff5bdE87OQL9+ctCirvu/zRWDwEMYBIiIyJLo8rzH7EVERGTBGASIiIgsGIMAERGRBWMQICIismAMAkRERBaMQYCIiMiCMQgQERFZMAYBIiIiC8YgQEREZMEYBIiIiCwYgwAREZEFYxAgIiKyYAwCREREFoxBgIiIyIIxCBAREVkwBgEiIiILxiBARERkwRgEiIiILBiDABERkQVjECAiIrJgDAJEREQWjEGAiIjIgjEIEBERWTAGASIiIgvGIEBERGTBGASIiIgsGIMAERGRBWMQICIismAMAkRERBaMQYCIiMiCMQgQERFZMAYBIiIiC8YgQEREZMEYBIiIiCwYgwAREZEFYxAgIiKyYIoEgTfeeANBQUEIDw9H3759kZubW+l227ZtQ2BgIPz9/TF37lwDV0lERGT+FAkCXbt2xenTp3HixAkEBARg9uzZFbZRq9UYO3Ystm3bhjNnzmDlypU4e/asAtUSACQlJSldgtnjMTYMHmf94zE2LYoEgbi4OFhZyV23a9cOmZmZFbZJSUmBn58fvL29YWtri0GDBmHTpk2GLpX+xv9j6x+PsWHwOOsfj7FpUXyMwJIlS9CjR48Kz2dlZcHLy0vz2NPTE1lZWYYsjYiIyOzZ6OuD4+LicPXq1QrPz5o1C7179wYAzJw5E3Z2dhgyZEiF7VQqlb5KIyIiojJCIUuXLhUdOnQQd+7cqfT1AwcOiG7dumkez5o1S8yZM6fSbQHwxhtvvPHGm0XddEVvLQKPsm3bNsybNw+7du1C3bp1K92mdevWSE1NRXp6Otzd3bF69WqsXLmy0m1lFiAiIqKaUmSMwLhx45Cfn4+4uDhERkZizJgxAIArV66gZ8+eAAAbGxssXLgQ3bp1Q3BwMP7xj38gKChIiXKJiIjMlkrw5zQREZHFUvyqgdrghEO6kZGRgZiYGISEhCA0NBQLFiwAANy4cQNxcXEICAhA165dcevWLc17Zs+eDX9/fwQGBmL79u1KlW5y1Go1IiMjNQNmeYx179atW+jfvz+CgoIQHByMgwcP8jjr2OzZsxESEoKwsDAMGTIEd+/e5TGupVGjRsHNzQ1hYWGa5x7nmB45cgRhYWHw9/fHhAkTqrdznY02MLDS0lLh6+srLl++LIqLi0V4eLg4c+aM0mWZpOzsbHHs2DEhhBC3b98WAQEB4syZM+KNN94Qc+fOFUIIMWfOHDFlyhQhhBCnT58W4eHhori4WFy+fFn4+voKtVqtWP2m5OOPPxZDhgwRvXv3FkIIHmM9GD58uFi8eLEQQoiSkhJx69YtHmcdunz5svDx8RFFRUVCCCEGDhwoli1bxmNcS7t37xZHjx4VoaGhmudqckzv3bsnhBCiTZs24uDBg0IIIbp37y62bt2qdd8m2yLACYd0p2nTpoiIiAAAODg4ICgoCFlZWdi8eTNGjBgBABgxYgQ2btwIANi0aRMGDx4MW1tbeHt7w8/PDykpKYrVbyoyMzORmJiIV155RTPAlcdYt3Jzc7Fnzx6MGjUKgBxr5OzszOOsQ05OTrC1tUVhYSFKS0tRWFgId3d3HuNa6tSpExo0aFDuuZoc04MHDyI7Oxu3b99G27ZtAQDDhw/XvOdRTDYIcMIh/UhPT8exY8fQrl07XLt2DW5ubgAANzc3XLt2DYAc1Onp6al5D4999UyaNAnz5s3TzKoJgMdYxy5fvozGjRtj5MiRiIqKwujRo1FQUMDjrEMNGzbE5MmT8cQTT8Dd3R0uLi6Ii4vjMdaDmh7Th5/38PCo1rE22SDACYd0Lz8/H/369cP8+fPh6OhY7jWVSvXIY87/Ho/2448/okmTJoiMjKzyclce49orLS3F0aNHMWbMGBw9ehT29vaYM2dOuW14nGsnLS0Nn3zyCdLT03HlyhXk5+fjm2++KbcNj7HuaTumtWGyQcDDwwMZGRmaxxkZGeWSENVMSUkJ+vXrh2HDhqFPnz4AZAItmx0yOzsbTZo0AVDx2GdmZsLDw8PwRZuQ/fv3Y/PmzfDx8cHgwYPxyy+/YNiwYTzGOubp6QlPT0+0adMGANC/f38cPXoUTZs25XHWkcOHD6NDhw5wdXWFjY0N+vbtiwMHDvAY60FN/n3w9PSEh4dHubV7qnusTTYIPDjhUHFxMVavXo34+HilyzJJQgi8/PLLCA4OxsSJEzXPx8fHY/ny5QCA5cuXawJCfHw8Vq1aheLiYly+fBmpqamaPimq3KxZs5CRkYHLly9j1apV6NKlC1asWMFjrGNNmzaFl5cXLly4AADYsWMHQkJC0Lt3bx5nHQkMDERycjLu3LkDIQR27NiB4OBgHmM9qOm/D02bNoWTkxMOHjwIIQRWrFihec8j6XDQo8ElJiaKgIAA4evrK2bNmqV0OSZrz549QqVSifDwcBERESEiIiLE1q1bxV9//SViY2OFv7+/iIuLEzdv3tS8Z+bMmcLX11e0aNFCbNu2TcHqTU9SUpLmqgEeY907fvy4aN26tWjZsqV44YUXxK1bt3icdWzu3LkiODhYhIaGiuHDh4vi4mIe41oaNGiQaNasmbC1tRWenp5iyZIlj3VMDx8+LEJDQ4Wvr68YN25ctfbNCYWIiIgsmMl2DRAREVHtMQgQERFZMAYBIiIiC8YgQEREZMEYBIiIiCwYgwAREZEFYxAgMjFTp05FUlISNm7cWGH6XENZtmwZxo0bp8i+iUi3GASITExKSgqeeuop7Nq1C507d1akBs4VT2Q+GASITMSbb76J8PBwHDp0CO3bt8fixYvx2muv4YMPPgAALFiwACEhIQgPD8fgwYMByNDQoUMHREVFoWPHjpqpd5ctW4Y+ffqga9eu8PHxwcKFC/HRRx8hKioK7du3x82bNwEA0dHRmDhxIiIjIxEWFoZDhw5VqCsnJwf9+/dH27Zt0bZtW+zfvx8AsGvXLkRGRiIyMhJRUVHIz88v976CggL07NkTERERCAsLw5o1awAAR44cQXR0NFq3bo3nnntOM9d6WloaunfvjtatW6Nz5844f/48AOCll17ChAkT0LFjR/j6+uL777/X9aEnMm86niWRiPTo0KFDYvz48aKkpER07Nix3Gvu7u6iuLhYCCFEbm6uEEKIvLw8UVpaKoQQ4ueffxb9+vUTQgixdOlS4efnJ/Lz80VOTo5wcnISX3zxhRBCiEmTJolPPvlECCFEdHS0+Oc//ymEEGL37t0iNDRU8/6xY8cKIYQYPHiw2Lt3rxBCiN9//10EBQUJIYTo3bu32L9/vxBCiIKCAk0dZdatWydGjx6teZybmyuKi4tF+/btxfXr14UQQqxatUqMGjVKCCFEly5dRGpqqhBCiOTkZNGlSxchhBAjRowQAwcOFEIIcebMGeHn51fzA0tkwWyUDiJEVH1HjhxBy5YtcfbsWQQFBZV7rWXLlhgyZAj69OmjWWjk1q1bGD58OC5evAiVSoXS0lLN9jExMbC3t4e9vT1cXFzQu3dvAEBYWBhOnjyp2a6sdaFTp07Iy8tDbm5uuf3u2LEDZ8+e1Ty+ffs2CgoK0LFjR0yaNAlDhw5F3759K6yC1rJlS/z73//GW2+9hV69euHpp5/Gb7/9htOnT+PZZ58FAKjVari7u6OgoAD79+/HgAEDNO8vLi4GILspyr5vUFCQZs12IqoeBgEiE3DixAm89NJLyMzMRKNGjVBYWAghBKKiorB//37UrVsXW7Zswe7du/HDDz9g5syZOHXqFN59913ExsZiw4YN+P333xEdHa35zDp16mjuW1lZaR5bWVmVCwwPs7Iq36MohMDBgwdhZ2dX7vkpU6agV69e2LJlCzp27IiffvoJLVq00Lzu7++PY8eOYcuWLXjnnXcQGxuLF154ASEhIZruhTJ5eXlo0KABjh07VmlND+5bcPkUohrhGAEiExAeHo5jx44hICAAZ8+eRZcuXbB9+3YcPXoUdevWhRACf/zxB6KjozFnzhzk5uYiPz8feXl5cHd3BwAsXbq0Wvt68EQqhMDq1asBAHv37oWLiwscHR3Lbd+1a1csWLBA8/j48eMAZJ9+SEgI3nzzTbRp00bTp18mOzsbdevWxdChQ/Hvf/8bx44dQ4sWLZCTk4Pk5GQAQElJCc6cOQMnJyf4+Phg3bp1mroebLUgosfHIEBkInJyctCwYUMAwLlz5xAYGKh5Ta1WY9iwYWjZsiWioqIwYcIEODs7480338TUqVMRFRUFtVqtGe2vUqnKjfx/+P6D29WtWxdRUVEYM2YMFi9eXGGbBQsW4PDhwwgPD0dISAi+/PJLAMD8+fMRFhaG8PBw2NnZoXv37uW+z6lTp9CuXTtERkZixowZeOedd2Bra4t169ZhypQpiIiIQGRkJA4cOAAA+Pbbb7F48WJEREQgNDQUmzdvrrJ+Iqo+LkNMRFWKiYnBxx9/jKioKKVLISI9YYsAERGRBWOLABERkQVjiwAREZEFYxAgIiKyYAwCREREFoxBgIiIyIIxCBAREVkwBgEiIiIL9v/WoXvvmfa5WwAAAABJRU5ErkJggg==" /></p>

<p>Now go play with it and break stuff. :)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Classic speech recognition features in one picture]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/09/25/classical-speech-recognition-features-in-one-picture/"/>
    <updated>2014-09-25T16:18:00+02:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/09/25/classical-speech-recognition-features-in-one-picture</id>
    <content type="html"><![CDATA[<p>Quite some time ago (when I was coding some of this myself for educational purposes), I made a diagram of the succesive transformations that are applied to “raw” (wave files) audio for a classic speech recognition pipeline. I think sharing it would be interesting to some of you, pay close attention to the X-axis labels, and you shouldn’t get lost!</p>

<p><img src="http://i.imgur.com/GcTIM7m.png" /></p>

<p>Useful links for neophytes:</p>

<ul>
  <li><a href="http://en.wikipedia.org/wiki/Short-time_Fourier_transform">STFT</a></li>
  <li><a href="http://en.wikipedia.org/wiki/Window_function#Generalized_Hamming_windows">Hamming windows</a></li>
  <li><a href="http://en.wikipedia.org/wiki/Spectral_density">Spectral density</a></li>
  <li><a href="http://en.wikipedia.org/wiki/Filter_bank">Filterbank</a></li>
  <li><a href="http://en.wikipedia.org/wiki/Mel_scale">Mel scale</a></li>
  <li><a href="http://en.wikipedia.org/wiki/Discrete_cosine_transform">DCT</a></li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[So you wanna try Deep Learning?]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/08/09/so-you-wanna-try-deep-learning/"/>
    <updated>2014-08-09T13:37:00+02:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/08/09/so-you-wanna-try-deep-learning</id>
    <content type="html"><![CDATA[<p>I’m keeping this post quick and dirty, but at least it’s out there. The gist of this post is that <a href="https://gist.github.com/SnippyHolloW/8a0f820261926e2f41cc">I put out a one file gist that does all the basics</a>, so that you can play around with it yourself. First of all, I would say that deep learning is simply kernel machines whose kernel we learn. That’s gross but that’s not totally false. Second of all, there is nothing magical about deep learning, just that we can efficiently train (GPUs, clusters) large models (millions of weights, billions if you want to make a Wired headline) on large datasets (millions of images, thousands of hours of speech, more if you’re GOOG/FB/AAPL/MSFT/NSA). I think a good part of the success of deep learning comes from the fact that practitionners are not affraid to go around beautiful mathematical principles to have their model work on whatever dataset and whatever task. But I disgress…</p>

<h2 id="what-is-a-deep-neural-network">What is a deep neural network?</h2>

<p>A series of matrix multiplications and non-linearities. You take your input $x$ in your features space, multiply it by a matrix $W$ (add biases $b$), apply a non-linearity (Rectified Linear Unit is fashionable these days, that’s $max(0, output)$, but $sigmoid$ and $tanh$ are OK too) and keep on doing that with other layers until you reach a classifier. For instance, you have a 3 layers ReLUs-based neural network with a softmax classifier on top? That gives:</p>

<script type="math/tex; mode=display">y = softmax(max(0, W_2.(max(0, W_1.(max(0, W_0.x + b_0))+ b_1)) + b_2))</script>

<p>There are all sorts of different mammals, with very strong specificities, but I think I just described a rat (or is it an <a href="https://en.wikipedia.org/wiki/Euarchontoglires">euarchontoglires</a>?).</p>

<h2 id="links-and-papers">Links and Papers</h2>

<p>I’m just dumping here a collection of links that I think everybody with an interest in deep learning should at least skim:</p>

<ul>
  <li>First, you should of course start with <a href="http://deeplearning.net/tutorial/">the deeplearning.net tutorials</a>, even though it’s pretty old. Overall, these are very good foundations nevertheless.</li>
  <li>If you want to get an intuition for <a href="http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/">how NNs fold space with non-linearities</a> and <a href="http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html">an online demo to play around with this concept</a>.</li>
  <li>These online demos were nice, right? They’re done by a guy who also wrote a pretty interesting <a href="http://karpathy.github.io/2014/07/03/feature-learning-escapades/">personnal history that concurs with my point-of-view on feature learning</a>.</li>
  <li>I’m going to advise you against it in a bit, but if you want to do RBM pre-training, <a href="https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf">this paper is a must-read</a></li>
  <li>If you want to do anything that has to deal with images, start <a href="http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf">here</a> and <a href="http://arxiv.org/pdf/1311.2901.pdf">there</a>.</li>
  <li>If you want to do anything that has to deal with speech (I assume you know about speech coding, <a href="http://i.imgur.com/fA0QIQr.png">otherwise I did a crash course</a>), start <a href="http://www.cs.utoronto.ca/~gdahl/papers/dbnPhoneRec.pdf">here</a> and <a href="http://www.csri.utoronto.ca/~hinton/absps/googlerectified.pdf">there</a>.</li>
  <li>If you want to do NLP with deep learning, there are lots of hot papers right now, but you could start with <a href="http://leon.bottou.org/publications/pdf/jmlr-2011.pdf">NLP (almost) from scratch</a>.</li>
  <li>In any case, you should learn <a href="http://research.microsoft.com/pubs/192769/tricks-2012.pdf">practical stuff about SGD (must-read)</a>, <a href="http://machinelearning.wustl.edu/mlpapers/paper_files/icml2013_sutskever13.pdf">learn about momentum</a>, and you can geek out about extensions (I’m fond of <a href="http://arxiv.org/pdf/1212.5701.pdf">Adadelta</a>). You should learn about <a href="http://arxiv.org/pdf/1207.0580.pdf">Dropout</a>, and maybe geek out about the variants (fast dropout, dropconnect…).</li>
  <li>If you like videos, <a href="https://www.youtube.com/watch?v=6WeyTUnbwQQ">Optimization I</a> and <a href="http://www.youtube.com/embed/cXzGpiUcvRI?vq=hd1080&amp;autoplay=1">Leon (1)</a> <a href="http://www.youtube.com/embed/4-hTxJAwr8U?vq=hd1080&amp;autoplay=1">Bottou’s (2)</a> <a href="http://www.youtube.com/embed/adXwym8Lakg?vq=hd1080&amp;autoplay=1">MLSS class (3)</a> are good introductions.</li>
  <li>Finally, if you want more, you can have a look at my <a href="https://pinboard.in/search/u:syhw?query=deeplearning">non-extensive collection of links on deep learning</a>. </li>
</ul>

<h2 id="stuff-youll-learn">Stuff you’ll learn</h2>

<p>There I’m getting totally subjective, because I’m telling you stuff that I learned the hard way.</p>

<h4 id="generic">Generic</h4>

<ul>
  <li>Always answer “Do you want more data?” with “Yes, please.”</li>
  <li>If something feels wrong, check your gradients with finite differences.</li>
  <li>For all gradient descent related stuff, first <a href="http://research.microsoft.com/pubs/192769/tricks-2012.pdf">RTFM</a>.</li>
  <li>When do we stop the training? Almost everybody does it but nobody speaks about it: <a href="https://en.wikipedia.org/wiki/Early_stopping">early stopping on a validation set</a>.</li>
  <li>If you use $tanh$ or $sigmoid$ activation units, <a href="http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_GlorotB10.pdf">initialize them well</a>, respectively with uniform weights in <script type="math/tex">[-\sqrt{\frac{6}{\mathrm{fan}_{in} + \mathrm{fan}_{out}}}, \sqrt{\frac{6}{\mathrm{fan}_{in} + \mathrm{fan}_{out}}}]</script> or $4$ times that.</li>
</ul>

<h4 id="unsupervised-pre-training">Unsupervised Pre-Training</h4>

<ul>
  <li>“What is unsupervised pre-training?” Using un-annotated data to initialize the network’s </li>
  <li>What is unsupervised pre-training doing? <a href="http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf">“unsupervised pre-training guides the learning towards basins of attraction of minima that are better in terms of the underlying data distribution; the evidence from these results supports a regularization explanation for the effect of pre-training.”</a></li>
  <li>This is not needed if you have enough data.</li>
</ul>

<h4 id="dropout">Dropout</h4>

<ul>
  <li>“How do we approach a problem with the deep learning mindset?” You design an under-constrained over-capacity over-fitting hog (by being deep and wide, just barely tractable efficiently on your hardware), and you keep it in check by using Dropout.</li>
  <li>“What is Dropout?” Dropping hidden units randomly (usually with a binomial probability of 0.5) during training so that the networks learns to be “robust” and doesn’t learn stupid co-activations of units (a way to tell the network to not just learn to compress the training set).</li>
  <li>“What is Dropout doing exactly?” <a href="http://papers.nips.cc/paper/4882-dropout-training-as-adaptive-regularization.pdf">“the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix”</a>, <a href="http://papers.nips.cc/paper/4878-understanding-dropout.pdf">“Dropout performs gradient descent on-line with respect to both the training examples and the ensemble of all possible subnetworks. (…) The regularization term is the usual weight decay or Gaussian prior term based on the square of the weights to prevent overfitting. Dropout provides immediately the magnitude of the regularization term which is scaled by the inputs and by the variance of the dropout variables.”</a></li>
  <li>“Sorry, what?” You know about <a href="https://en.wikipedia.org/wiki/Tikhonov_regularization">L2 regularization</a> right? So you know about <a href="http://ej.iop.org/images/1741-2552/9/5/056002/Full/jne427232f9_online.jpg">this picture</a>, where regularization means inflating the L2 (or L1) ball until it intersects your feasible set. Now imagine an ellipsis that has its moments matching the ones of the inverse of the Fisher information matrix of the data. You now have a picture of “kinda” what Dropout is doing.</li>
</ul>

<h2 id="practice">Practice</h2>

<p>I’d advise to start by using either <a href="http://torch.ch/">Torch</a> (Lua) or <a href="http://deeplearning.net/software/theano/">Theano</a> (Python), both nice libraries that do automatic differentiation.</p>

<p>I put together a <a href="https://gist.github.com/SnippyHolloW/8a0f820261926e2f41cc">single file simple deep neural network working on small datasets (Python)</a>, more for pedagogical purposes than production ready, but it runs relatively fast on GPUs thanks to Theano. So if you want to run it, install Theano (I use the <a href="http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions">bleeding edge</a>). If you want to play around with it, look for <code>TODO</code> in the code and change values there. <a href="https://gist.github.com/SnippyHolloW/8a0f820261926e2f41cc#file-dnn-py-L567-L573">There are several datasets</a> that you can use. Also, you should play around with the parameters of <a href="https://gist.github.com/SnippyHolloW/8a0f820261926e2f41cc#file-dnn-py-L575-L577">this function</a>, and maybe try against the SVMs from scikit-learn. Finally, if you use Dropout, you will see improvement only on large-enough networks (&gt; 1000 units / layer, &gt; 3-4 layers). Here is the result on running this file (<code>python dnn.py</code>) with a small ($784\times200\times200\times10$) ReLU-based L2-regularized network on MNIST:</p>

<p><img src="http://i.imgur.com/M3COTRE.png" /></p>

<p>If your GPU can handle it, you want to try Dropout on MNIST with 4 (or more) layers of 2000 units. ;-)</p>

<h2 id="conclusion">Conclusion</h2>

<p>I didn’t talk about convolutional neural networks, nor recurrent neural networks, nor other beasts. That should be the next step for the passionate reader. This was just a primer on raw facts for basic deep learning. Depending on what people want, I can either explain function by function the file that I provided here, talk about different loss functions (learning embeddings, e.g. as <a href="https://code.google.com/p/word2vec/">word2vec</a>), recurrent neural nets, etc.</p>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A random thought about ReLUs]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/07/18/a-random-thought-about-relus/"/>
    <updated>2014-07-18T16:59:00+02:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/07/18/a-random-thought-about-relus</id>
    <content type="html"><![CDATA[<p>We know ReLU rock, they’re fast to compute, they’re fast to converge (train), they combine well with dropout… When I transitioned to ReLU for good, I found out that in phones recognition the “hard sigmoids” (piecewise approximation with 5 pieces) are doing almost as well as ReLUs (e.g. $\approx$24% phone error rate on TIMIT for a given architecture vs 23.5% IIRC), and much better than “smooth” sigmoids (26%). I’ve been wondering for a few months about how much of the good performance of ReLUs comes from the fact that they have a hard 0, that propagates to the upper layer and makes the higher level activations sparser and sparser. Is it well known? Is this random thought in the bad part of the random walk on the posterior? ;-)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Spikey Spheres]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/07/17/spikey-spheres/"/>
    <updated>2014-07-17T14:42:00+02:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/07/17/spikey-spheres</id>
    <content type="html"><![CDATA[<p>Just a quick blog post to say that if you didn’t read <a href="http://www.penzba.co.uk/cgi-bin/PvsNP.py?SpikeySpheres">“Spikey Spheres”</a> before, you should. And <a href="http://nbviewer.ipython.org/urls/gist.github.com/SnippyHolloW/9025964/raw/b2d266e7e19d64e0343fd899dfbc3e8ddc889269/SpikeySpheres?create=1">there is my IPython notebook that goes with it</a>. There is also <a href="http://djalil.chafai.net/blog/2013/07/14/a-cube-a-starfish-a-thin-shell-and-the-central-limit-theorem/">this connection with the central limit theorem</a> (and the $l^{\infty}$ ball).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Visualizing phn2vec with biclustering]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/06/20/visualizing-phn2vec-with-biclustering/"/>
    <updated>2014-06-20T12:17:00+02:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/06/20/visualizing-phn2vec-with-biclustering</id>
    <content type="html"><![CDATA[<p>Following <a href="http://snippyhollow.github.io/blog/2014/05/27/phn2vec-embeddings/">my previous blog post on phn2vec</a>, I used <a href="http://scikit-learn.org/stable/modules/biclustering.html">scikit-learn’s biclustering</a> to make the similarity matrices more readable. So here are some quick results for TIMIT:</p>

<h2 id="phonetic-annotation">Phonetic annotation</h2>

<h3 id="biclusters">2 biclusters</h3>

<p>We clearly see consonants vs. vowels.</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_2_biclusters.png" /></p>

<h3 id="biclusters-1">4 biclusters</h3>

<p>We clearly see a separation in the place (+nasals) in the consonants. Silences get their own cluster.</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_4_biclusters.png" /></p>

<h3 id="biclusters-2">6 biclusters</h3>

<p>Fricatives and nasals get their own clusters.</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_6_biclusters.png" /></p>

<h2 id="phonemic-transcription">Phonemic transcription</h2>

<h3 id="biclusters-3">2 biclusters</h3>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phonemes_2_biclusters.png" /></p>

<h3 id="biclusters-4">4 biclusters</h3>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phonemes_4_biclusters.png" /></p>

<h3 id="biclusters-5">6 biclusters</h3>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phonemes_6_biclusters.png" /></p>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[phn2vec embeddings]]></title>
    <link href="http://snippyhollow.github.com/blog/2014/05/27/phn2vec-embeddings/"/>
    <updated>2014-05-27T21:00:00+02:00</updated>
    <id>http://snippyhollow.github.com/blog/2014/05/27/phn2vec-embeddings</id>
    <content type="html"><![CDATA[<p>Several months ago, I started thinking in terms of embeddings for everything,
let’s forget about discrete/categorical values and replace everything with
vector spaces that behave as we ask of them!<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></p>

<h3 id="xyz2vec">xyz2vec</h3>

<p>A few months ago, I toyed with <a href="http://radimrehurek.com/gensim/models/word2vec.html">“word2vec”</a> (<a href="http://arxiv.org/pdf/1301.3781.pdf">Mikolov et al. 2013</a>) in gensim on a lot of stuff. One of them were phonetic annotations of speech corpora. Basically, a word2vec model is a one hidden layer neural network trained with backpropagation of a loss based on a) either predicting the central word given its neighbors (continuous bag-of-word), or b) predicting the neighbors given the central word (skip-gram). This can be applied to corpora of continuous text of words, but anything that has neightboring structure really. So I ran word2vec on phonetic and phonemic datasets (TIMIT and Buckeye), with a window of 5 phone(me)s (+/- 2 around the central one) and both skip-grams (SG) and continuous bag-of-words (CBOW). For all the following results, I used an embedding dimension of 10, so it is “contractive” compared to the number of phone(me)s (39). I tried with 100 dimensions and this gave very similar results, so this does not seem to matter. All the code to reproduce these results is <a href="https://github.com/SnippyHolloW/speech_embeddings">here</a>.</p>

<h2 id="phone2vec">phone2vec</h2>

<p>Using <a href="https://catalog.ldc.upenn.edu/LDC93S1">TIMIT</a> phonetic annotations here are the similarity matrices of phones (SG left and CBOW right):</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_similarity_sg_left_cbow_right.png" /></p>

<p>If we do a 2 dimensional <a href="http://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html">isomap projection</a> of the skip-grams, we can see 3 clusters of vowels, (mainly) plosives (stop consonants) and other consonants (some fricatives, nasals..).<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_isomap_sg.png" /></p>

<p>An isomap of the CBOW gives roughly the same clusters:</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_isomap_cbow.png" /></p>

<p>Contrary to the TIMIT corpus (read sentences that were designed for phonetic variability and effects), the <a href="http://buckeyecorpus.osu.edu/">Buckeye</a> is a corpus of conversational speech. We find the same (but a little bit weaker) clusters, e.g. in an isomap of skip-grams:</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/buckeye_phones_isomap_sg.png" /></p>

<h2 id="read-speech-vs-conversational-speech">read speech vs. conversational speech</h2>

<p>To the extent that the Buckeye and TIMIT corpus have slightly different phonetic annotations (and different annotations quality too), we can try and compare read speech vs. conversational speech. Here we plot the difference of the similarity matrices between one and the other (SG left, CBOW right). The biggest difference is in silences vs. stop-consonants:</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_vs_buckeye_phones_similarity_sg_left_cbow_right.png" /></p>

<h2 id="phoneme2vec">phoneme2vec</h2>

<p>What about phonemic annotation (phonemes from the word-level transcription)? Here are the similarity matrices (SG left, CBOW right):</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_words_similarity_sg_left_cbow_right.png" /></p>

<p>We still have the clusters of consonants vs vowels in the isomap of the skip-gram:</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_words_isomap_sg.png" /></p>

<p>and of the CBOW</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_words_isomap_cbow.png" /></p>

<p>but we have much lesser distinctions between front and back consonants as well as nasals and stop. That’s obvious, because phonotactics are not accounted for in the phonemic transcription that I did.</p>

<h2 id="phonetic-vs-phonemic">phonetic vs. phonemic</h2>

<p>We already know that speech (phonetics) and this “higher level” (phonemic) representation differ, how do they differ in this embedding?</p>

<p><img src="https://dl.dropboxusercontent.com/u/14035465/pictures/figures_10dim_5window/timit_phones_vs_words%28phonemes%29_similarity_sg_left_cbow_right.png" /></p>

<p>That’s all folks! Currently, I worked only on English datasets. That would be fun to see what comes up for other languages.</p>

<div class="footnotes">
  <ol>
    <li id="fn:1">
      <p>There are several interesting papers about turning text into emdeddings that have useful properties. Picking a few: from classical nature language processing tasks done only with vector spaces (and neural networks) (<a href="http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/35671.pdf">Collobert et al. 2011</a>), to semantic spaces for multi-relational data (<a href="http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf">Bordes et al. 2013</a>). The recent “word2vec” (<a href="http://arxiv.org/pdf/1301.3781.pdf">Mikolov et al. 2013</a>) from Google spiked interest from the NLP community, and it quickly got implemented in <a href="http://radimrehurek.com/gensim/">gensim</a> (if you want to geek out the implementation, I recommend the <a href="http://radimrehurek.com/2013/09/word2vec-in-python-part-two-optimizing/">excellent blog post about its optimisation</a>).<a href="#fnref:1" rel="reference">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p><a href="https://en.wikipedia.org/wiki/Consonant#Features">Wikipedia provides some phonetics 101 of consonants</a>.<a href="#fnref:2" rel="reference">&#8617;</a></p>
    </li>
  </ol>
</div>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Collapsed Gibbs Sampling for Dirichlet Process Gaussian Mixture Models]]></title>
    <link href="http://snippyhollow.github.com/blog/2013/03/10/collapsed-gibbs-sampling-for-dirichlet-process-gaussian-mixture-models/"/>
    <updated>2013-03-10T15:44:00+01:00</updated>
    <id>http://snippyhollow.github.com/blog/2013/03/10/collapsed-gibbs-sampling-for-dirichlet-process-gaussian-mixture-models</id>
    <content type="html"><![CDATA[<p>I really enjoyed the pedagogy of <a href="http://blog.echen.me/2012/03/20/infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-process/">Edwin Chen’s introduction to infinite mixture models</a>, but I was a little disappointed that it does not go as far as presenting the details of the Dirichlet process Gaussian mixture model (DPGMM), as he uses <a href="http://scikit-learn.org/stable/modules/mixture.html#dpgmm-classifier-infinite-gaussian-mixtures">sklearn’s variational Bayes DPGMM implementation</a>. </p>

<p>For this reason, I will try and give here sufficient information to implement a DPGMM with collapsed Gibbs sampling. This is not an <a href="http://www.is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/GoeRas10_[0].pdf">in-depth evaluation of which conjugate priors to use</a>, nor an analysis of the <a href="http://www.stat.duke.edu/courses/Spring06/sta376/Support/Mixtures/Escobar.West.1995.pdf">parameters</a> and <a href="ftp://dce.hut.edu.vn/vinhlt/Papers/GMM/Infinite%20GMM.pdf">hyper-parameters</a> (that should have their own priors! ;)).</p>

<h3 id="prerequisites">Prerequisites</h3>
<p>On Dirichlet processes, Chinese Restaurant processes, Indian Buffet processes, there is <a href="http://blog.echen.me/2012/03/20/infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-process/">the excellent blog post by Edwin Chen</a>. Another excellent introduction to Dirichlet processes is provided by <a href="http://www.ee.washington.edu/research/guptalab/publications/UWEETR-2010-0006.pdf">Frigyik, Kapila and Gupta</a>.</p>

<p>If you lack some knowledge about clustering or density estimation (unsupervised learning), you can read Chapters 20 (p. 284) to (at least) 22 of <a href="http://www.amazon.com/gp/product/0521642981/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0521642981&amp;linkCode=as2&amp;tag=syhwsblog-20">MacKay’s ITILA</a>, that you can find as a <a href="http://www.inference.phy.cam.ac.uk/itila/book.html">free ebook</a>; or chapter 9 of <a href="http://www.amazon.com/gp/product/0387310738/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0387310738&amp;linkCode=as2&amp;tag=syhwsblog-20">Bishop’s PRML</a>. As a refresher, the <a href="https://en.wikipedia.org/wiki/Mixture_model">Wikipedia article on mixture models</a>, and <a href="http://scikit-learn.org/stable/modules/mixture.html">the sklearn documentation on GMM</a> are more efficient.</p>

<h3 id="dpgmm-the-model">DPGMM: the model</h3>

<p>Let’s say we have $N$ observations and $K$ clusters, $i \in [1\dots N]$ is the indice for the observations, while $k \in [1\dots K]$ is the indice for the clusters. With $z_i$ the cluster assignment of observation $x_i$, and $\theta_k$ the parameter of mixture $k$:</p>

<script type="math/tex; mode=display">
P(x_{1:N}) = \prod_{i=1}^N \sum_{k=1}^K P(x_i|\theta_{z_i}) P(z_i=k)
</script>

<p>So, the generative story of a DPGMM is as follows:</p>

<p>$\pi \sim Stick(\alpha)$ (mixing rates)<br />
$z_i \sim \pi$ (cluster assignments)<br />
$\theta_k \sim H(\lambda)$ (parameters)<br />
$x_i \sim F(\theta_{z_i})$ (values)   </p>

<h3 id="fitting-the-data">Fitting the data</h3>
<p>Notation: </p>

<script type="math/tex; mode=display">% &lt;![CDATA[

\begin{align}
& z_{-i} = \{z_j | j \neq i\}\\
& x = x_{1:N}\\
& k^*\mathrm{\ is\ a\ new\ cluster}\\
& x_{-i,c} = \{x_j | z_j = c, j \neq i\}\\
& N_{k,-i} = card(x_{-i,c})
\end{align}
 %]]&gt;</script>

<p>Let’s decompose the probability that the observation $i$ belongs to cluster $k$ into its two independent factors:</p>

<script type="math/tex; mode=display">
P(z_i = k | z_{-i}, x, \alpha, \lambda) \propto P(z_i = k | z_{-i}, \alpha)P(x_i | x_{-i}, z_i=k, z_{-i}, \lambda)
</script>

<p>Then:</p>

<script type="math/tex; mode=display">
\boxed{P(z_i = k | z_{-i}, \alpha) = \left\{ 
\begin{array}{l}
\frac{N_{k,-i}}{\alpha + N - 1} \mathrm{\ if\ }k\ \mathrm{has\ been\ seen\ before}\\
\frac{\alpha}{\alpha + N - 1} \mathrm{\ if\ }k\ \mathrm{is\ a\ new\ cluster}\\
\end{array}
    \right.}
</script>

<script type="math/tex; mode=display">
P(x_i | x_{-i}, z_i=k, z_{-i}, \lambda) = P(x_i | x_{-i,k}, \lambda) = \frac{P(x_i, x_{-i, k}, \lambda)}{P(x_{-i,k}|\lambda)}
</script>

<script type="math/tex; mode=display">
\boxed{P(x_i, x_{-i, k}, \lambda) = \int P(x_i | \theta_k)\left[\prod_{j \neq i, z_j = k}P(x_j | \theta_k)\right] H(\theta_k | \lambda) d\theta_k}
</script>

<p>is the marginal likelihood of all the data assigned to cluster $k$, including $i$.</p>

<p>If $z_i = k^*$ (new cluster) then:</p>

<script type="math/tex; mode=display">
P(x_i | x_{-i}, z_i=k^*, z_{-i}, \lambda) = \boxed{P(x_i | \lambda) = \int P(x_i | \theta)H(\theta | \lambda) d\theta}
</script>

<h3 id="conjugate-priors">Conjugate priors</h3>

<p>Now we should choose $H$ for it to be conjugate to $F$ and have easy to compute parameters posterior. As we want $F$ to be multivariate normal: we can look on <a href="http://en.wikipedia.org/wiki/Conjugate_prior">Wikipedia’s page of conjugate priors</a> under multivariate normal with unknown $\mu$ and $\Sigma$ to see that $H$ should be normal-inverse-Wishart with prior parameters:</p>

<ul>
  <li>$\mu_0$ initial mean guess [In my code further, I set it to the mean of whole the dataset.]</li>
  <li>$\kappa_0$ mean fraction (smoothing parameter) [A common value is 1. I set it to 0.]</li>
  <li>$\nu_0$ degrees of freedom [I set it to the number of dimensions.]</li>
  <li>$\Psi_0$ pairwise deviation product (matrix) [I set it to $10 \times I_d$ ($I_d$ is the $d\times d$ identity matrix). Indentity matrix makes this prior Gaussian circular, the $10$ factor should be dependant on the dataset, for instance on the mean distance between points.]</li>
</ul>

<p>This gives us MAP estimates on parameters, for <em>one</em> of the clusters:</p>

<script type="math/tex; mode=display">% &lt;![CDATA[

\begin{align}
& \mu_n = \frac{\kappa_0 \mu_0 + n\tilde{x}}{\kappa_0 +n} = \mu\\
& \kappa_n = \kappa_0 + n\\
& \nu_n = \nu_0 + n \\
& \Psi_n = \Psi_0 + C + \frac{\kappa_0 n}{\kappa_0 + n}(\tilde{x} - \mu_0)(\tilde{x} - \mu_0)^T\\
& \Sigma = \frac{\kappa_n + 1}{\kappa_n * (\nu_n - d + 1)}\Psi_n
\end{align}
 %]]&gt;</script>

<p>with $\tilde{x}$ the sample mean and $C=\sum_{i=1}^n (x_i-\tilde{x})(x_i-\tilde{x})^T$.</p>

<p>Set $\kappa_{0} = 0$ to have no effect of the prior on the posterior mean. 
This reduces to MLE estimates if:</p>

<script type="math/tex; mode=display">
\kappa_{0} = 0, \nu_{0} = d, \|\Psi_{0}\| = 0
</script>

<p>So now we can compute the posterior predictive for cluster $k$ evaluated at $x_i$</p>

<script type="math/tex; mode=display">
P(x_i | x_{-i}, z_i=k, z_{-i}, \lambda) \propto \mathcal{N}(\mu_{k,-i}, \Sigma_{k,-i})
</script>

<h3 id="collapsed-gibbs-sampling">Collapsed Gibbs sampling</h3>

<p>Here is the pseudo-code of collapsed Gibbs sampling adapted from algorithm 3 of <a href="http://www.stat.purdue.edu/~rdutta/24.PDF">Neal’s seminal paper</a>:</p>

<pre><code>while (not converged on mus and sigmas):
    for each i = 1 : N in random order do:
        remove x[i]'s sufficient statistics from old cluster z[i]
        if any cluster is empty, remove it and decrease K
        for each k = 1 : K do
            compute P_k(x[i]) = P(x[i] | x[-i]=k)
            N[k,-i] = dim(x[-i]=k)
            compute P(z[i]=k | z[-i], Data) = N[k,-i] / (alpha + N - 1)
        compute P*(x[i]) = P(x[i] | lambda)
        compute P(z[i]=* | z[-i], Data) = alpha / (alpha + N - 1)
        normalize P(z[i] | ...)
        sample z[i] from P(z[i] | ...)
        add x[i]'s sufficient statistics to new cluster z[i]
        (possibly increase K)
</code></pre>

<h3 id="results">Results</h3>

<p>Here is the result of our implementation of collapsed Gibbs sampling DPGMM compared to scikit-learn’s implementation of <a href="http://scikit-learn.org/stable/modules/mixture.html#dpgmm-classifier-infinite-gaussian-mixtures">variational Bayes DPGMM</a>:</p>

<p><img src="https://dl.dropbox.com/u/14035465/pictures/DPGMM.png" /></p>

<h3 id="code">Code</h3>

<p>Here is my quick-and-dirty code implementing this version of Gibbs sampling for DPGMM. You may want to comment out scikit-learn (that I used for the comparison above) if you do not have it installed.</p>

<div><script src="https://gist.github.com/5128969.js"></script>
<noscript><pre><code /></pre></noscript></div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[From hacks to Bayesian probability]]></title>
    <link href="http://snippyhollow.github.com/blog/2013/03/08/from-hacks-to-bayesian-probability/"/>
    <updated>2013-03-08T18:58:00+01:00</updated>
    <id>http://snippyhollow.github.com/blog/2013/03/08/from-hacks-to-bayesian-probability</id>
    <content type="html"><![CDATA[<p>In which we look at two pragmatic hacks that lead to the Bayesian approach of probabilities, when pushed further and added as constraints.</p>

<h2 id="coinflips">Coinflips</h2>

<p>Let’s say we have a coin, and we want to decide if it’s fair. We throw it $N$ times and we get $m$ heads, we can code heads=1, tails=0. With $\mu$ the ratio of heads:</p>

<script type="math/tex; mode=display"> 
P(m | N, \mu) = Binomial(m|N,\mu) = \binom{N}{m} \mu^m (1-\mu)^{N-m}
</script>

<h3 id="maximum-likelihood">Maximum likelihood</h3>
<p>How do we set $\mu$? We could maximize the probability of the data that we saw under our model, that is maximizing the <em>likelihood</em>. Let’s say that $D = {x_1 \dots x_N}$, then we have:</p>

<script type="math/tex; mode=display">
P(D|\mu) = \prod_{n=1}^N P(x_n|\mu) = \prod_{n=1}^N \mu^{x_n}(1-\mu)^{1-x_n}
</script>

<p>The maximum of this function of $\mu$ is reached for $\mu= \frac{m}{N}$. The problem arises if we have little data (in fact, when we have data that does not cover the whole space of possible data). If $D=(1,1,1)$, the maximum likelihood estimate of $\mu$ will be $1.0$. It means that we predict that <em>all</em> the tosses will land on heads, after only three observations!</p>

<h3 id="smoothing">Smoothing</h3>

<p>A classical hack is to smooth the maximum likelihood estimate by adding “fake data”, we could consider that we already saw the coin land on heads and tails once, before getting our data. This way, before (“prior to”) the experiment, we would have $\mu=1/2=0.5$. After (<em>posterior</em> to) our experiment, taking the data into account, we would have $\mu = (3+1)/(3+2) = 0.8$. How do we set the these prior coin flips (smoothing parameters)?</p>

<h3 id="maximum-a-posteriori">Maximum A Posteriori</h3>

<p>The <em>right way</em> to encode this prior knowledge is to put a probability distribution on the parameter $\mu$. As $\mu$ is a ratio, we should have a continuous distribution on $[0, 1]$ that can represent a whole range of prior belief on what the coin’s ratio of heads is. For these reasons, a sensible choice is the Beta distribution:</p>

<script type="math/tex; mode=display">
Beta(\mu|a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \mu^{a-1}(1-\mu)^{b-1}
</script>

<p>On Wikpedia, we can check how the $Beta(x|\alpha, \beta)$ distribution looks like:</p>

<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Beta_distribution_pdf.svg/639px-Beta_distribution_pdf.svg.png" alt="Plots of the Beta distribution" /></p>

<p>Now we can compute again what is the <em>posterior</em> value of $\mu$ knowing the data $D$ and the prior Beta ($\propto$ means “proportional to”):</p>

<script type="math/tex; mode=display">% &lt;![CDATA[

\begin{align}
P(\mu | a_0, b_0, D) & \propto P(D|\mu)P(\mu|a_0, b_0)\\
                    & \propto \left( \prod_{n=1}^N \mu^{x_n} (1-\mu)^{1-x_n} \right) Beta(\mu | a_0, b_0)
\end{align}
 %]]&gt;</script>

<p>Hopefully, the Beta distribution is the conjugate prior for the Bernouilli and binomial distributions, and thus a bit of calculus reduces it to:</p>

<script type="math/tex; mode=display">
P(\mu | a_0, b_0, D) \propto Beta(\mu|a_N, b_N)\\
a_N = a_0 + m\\
b_N = b_0 + (N-m)
</script>

<p>We can compute that, when $N \rightarrow \infty$, the expectation of $\mu$: $\mathbb{E}[\mu] = \mu_{ML}$, as:</p>

<script type="math/tex; mode=display">
\mathbb{E} [\mu | a_0, b_0, D] = \frac{a_N}{b_N}
</script>

<h3 id="first-conclusion">First conclusion</h3>

<p>This approach of using a prior on the parameters of the distributions that are essential to our model (the predicting distribution) is central to the Bayesian approach of building models. It makes the model robust to what can happen, even though we had few data. It makes it easier to reason about our prior assumptions that simply “adding unseen data”, and it yields in the presence of more data.</p>

<p>If you’re interested about Bayesian modeling, there are plenty of very good textbooks. My prefered gradual introduction is <a href="http://www.amazon.com/gp/product/0521642981/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0521642981&amp;linkCode=as2&amp;tag=syhwsblog-20">MacKay’s ITILA</a>, that you can find as a <a href="http://www.inference.phy.cam.ac.uk/itila/book.html">free ebook</a>.</p>

<h2 id="causality">Causality</h2>

<p>Now here is another hack for logical reasoning, that leads to Bayesian probabilities. Let’s say that you want to express that an event $A$ entails an event $B$, in logic you would write $A \Rightarrow B$. We will be abusing the notation $A=[A=true]$ and $\neg A=[A=false]$. Now with the <a href="http://en.wikipedia.org/wiki/Modus_ponens"><em>modus ponens</em></a>, you can deduce $B$ whenever $A$ is true.</p>

<script type="math/tex; mode=display">
\frac{[A\Rightarrow B] \wedge A}{B}
</script>

<h3 id="plausible-reasoning">Plausible reasoning</h3>

<p>Now, we want to extend prepositional logic to <em>plausible reasoning</em>, in which we can have degrees of probability that rules are true; or degrees of belief in these rules and facts. A pragmatic way to do that is to introduce the variable $C$ which represents $A \Rightarrow B$, that is: if $P(C)=p$, there is a probability $p$ that $A \Rightarrow B$. Then, this previous <em>modus ponens</em> translates to:</p>

<script type="math/tex; mode=display">
P(B|A,C) = \frac{P(A|B,C)P(B|C)}{P(A|C)}\ (Bayes'\ theorem)\\
P(B|A,C)=\frac{P(A,B|C)}{P(A|C)}\ (Product\ rule)
</script>

<p>And actually, as $P(A,B|C)=P(A|C)$, we have $P(B|A,C)=1$, which corresponds to the strong syllogism of <em>modus ponens</em>. </p>

<p>So now, if we are only 80% sure of $C$, we can write $P(C) = 0.8$ and seek for $P(B|A)$ (we are 100% sure of A):</p>

<script type="math/tex; mode=display">% &lt;![CDATA[

\begin{align}
P(B|A) = \frac{\sum_{C\in\{false,true\}} P(B|A,C)P(A)P(C)}{P(A)} & = P(\neg C)P(B|A,\neg C) + P(C)P(B|A,C)\\
& = 0.2*x(\in [0,1]) + 0.8*1.0 \geq 0.8
\end{align}
 %]]&gt;</script>

<p>Which means that $B$ has 80% chances to be true by following the strong syllogism of modus ponens, but it can also be true even though $C=false$.</p>

<p>Finally, contrary to prepositional logic, we <em>also</em> get the weak syllogism (and I’ll let you think it through):</p>

<script type="math/tex; mode=display">
\frac{[A\Rightarrow B] \wedge B}{A\ becomes\ more\ plausible}
</script>

<p>A similar derivation and observation can be done for <a href="http://en.wikipedia.org/wiki/Modus_tollens"><em>modus tollens</em></a>.</p>

<h3 id="cox-jaynes-theorem">Cox-Jaynes theorem</h3>

<p>A reasoning mechanism needs to be consistent (one cannot prove $A$ and $\neg A$ at the same time). For plausible reasoning, consistency means: a) all the possible ways to reach a conclusion leads to the same result, b) information cannot be ignored, c) two equal states of knowledge have the same plausibilities. Adding consistency to plausible reasoning leads to <a href="http://en.wikipedia.org/wiki/Cox's_theorem">Cox’s theorem</a>, which derives the laws of probability (the product-rule and the sum-rule). So, the degrees of belief of any consistent induction mechanism verify Kolmogorov’s axioms.</p>

<h3 id="second-and-last-conclusion">Second and last conclusion</h3>

<p>With plausible reasoning, we get all the benefits of prepositional logic, but we can also reason with/about facts and rules that are not 100% true. We have another example of how a pragmatical (sensical) hack to extend logic to “degrees of beliefs” (probabilities) leads to Bayesian probabilities. </p>

<p>If you are interested by learning about plausible reasonning, you can <a href="http://emotion.inrialpes.fr/people/synnaeve/phdthesis/phdthesis.html#x1-590003.2">look at my thesis</a>, or, better yet, read it directly from one of the masters in <a href="http://www.amazon.com/gp/product/0521592712/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0521592712&amp;linkCode=as2&amp;tag=syhwsblog-20">Jayne’s Probability Theory: The Logic of Science</a> for which the pre-print is <a href="http://www-biba.inrialpes.fr/Jaynes/prob.html">there</a>.</p>

]]></content>
  </entry>
  
</feed>