intro.html


<!DOCTYPE html>


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>What is structured learning? &mdash; pystruct 0.2.4 documentation</title>
    
    <link rel="stylesheet" href="_static/basic.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="_static/gallery.css" type="text/css" />
    <link rel="stylesheet" href="_static/pystruct.css" type="text/css" />
    <link rel="stylesheet" href="_static/bootstrap.min.css" type="text/css" />
    <link rel="stylesheet" href="_static/bootswatch-3.3.4/cerulean/bootstrap.min.css" type="text/css" />
    <link rel="stylesheet" href="_static/bootstrap-sphinx.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '0.2.4',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <script type="text/javascript" src="_static/js/jquery-1.11.0.min.js"></script>
    <script type="text/javascript" src="_static/js/jquery-fix.js"></script>
    <script type="text/javascript" src="_static/bootstrap-3.3.4/js/bootstrap.min.js"></script>
    <script type="text/javascript" src="_static/bootstrap-sphinx.js"></script>
    <link rel="top" title="pystruct 0.2.4 documentation" href="index.html" />
    <link rel="next" title="Installation" href="installation.html" />
    <link rel="prev" title="pystruct.plot_learning.plot_learning" href="generated/pystruct.plot_learning.plot_learning.html" />
<meta charset='utf-8'>
<meta http-equiv='X-UA-Compatible' content='IE=edge,chrome=1'>
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1'>
<meta name="apple-mobile-web-app-capable" content="yes">

  </head>
  <body role="document">

  <div id="navbar" class="navbar navbar-default navbar-fixed-top">
    <div class="container">
      <div class="navbar-header">
        <!-- .btn-navbar is used as the toggle for collapsed navbar content -->
        <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".nav-collapse">
          <span class="icon-bar"></span>
          <span class="icon-bar"></span>
          <span class="icon-bar"></span>
        </button>
        <a class="navbar-brand" href="index.html">
          PyStruct</a>
        <span class="navbar-text navbar-version pull-left"><b>0.2.4</b></span>
      </div>

        <div class="collapse navbar-collapse nav-collapse">
          <ul class="nav navbar-nav">
            
                <li><a href="index.html">Start</a></li>
                <li><a href="installation.html">Installation</a></li>
                <li><a href="#">Introduction</a></li>
                <li><a href="user_guide.html">User Guide</a></li>
                <li><a href="auto_examples/index.html">Examples</a></li>
                <li><a href="references.html">API</a></li>
            
            
              <li class="dropdown globaltoc-container">
  <a role="button"
     id="dLabelGlobalToc"
     data-toggle="dropdown"
     data-target="#"
     href="index.html">Site <b class="caret"></b></a>
  <ul class="dropdown-menu globaltoc"
      role="menu"
      aria-labelledby="dLabelGlobalToc"></ul>
</li>
              
            
          </ul>

          
<form class="navbar-form navbar-right" action="search.html" method="get">
 <div class="form-group">
  <input type="text" name="q" class="form-control" placeholder="Search" />
 </div>
  <input type="hidden" name="check_keywords" value="yes" />
  <input type="hidden" name="area" value="default" />
</form>
          
        </div>
    </div>
  </div>

<div class="container content-container">
  
  <div class="section" id="what-is-structured-learning">
<span id="intro"></span><h1>What is structured learning?<a class="headerlink" href="#what-is-structured-learning" title="Permalink to this headline">¶</a></h1>
<p>Structured prediction is a generalization of the standard paradigms of
supervised learning, classification and regression. All of these can be thought
of finding a function that minimizes some loss over a training set. The
differences are in the kind of functions that are used and the losses.</p>
<p>In classification, the target domain are discrete class labels, and the loss
is usually the 0-1 loss, i.e. counting the misclassifications. In regression,
the target domain is the real numbers, and the loss is usually mean squared
error.
In structured prediction, both the target domain and the loss are
more or less arbitrary. This means the goal is not to predict a label or a
number, but a possibly much more complicated object like a sequence or a
graph.</p>
<p>In structured prediction, we often deal with finite, but large output spaces Y.
This situation could be dealt with using classification with a very large
number of classes. The idea behind structured prediction is that we can do
better than this, by making use of the structure of the output space.</p>
<div class="section" id="a-very-simplified-example">
<h2>A (very simplified) example<a class="headerlink" href="#a-very-simplified-example" title="Permalink to this headline">¶</a></h2>
<p>Let&#8217;s say we want to generate text from spoken sentences. Viewed as a pure
classification problem, we could see each possible sentence as a class. This
has several drawbacks: we have many classes, and to do correct predictions, we
have to have all possible sentences in the training set. That doesn&#8217;t work
well. Also, we might not care about getting the sentence completely right.</p>
<p>If we misinterpret a single word, this might be not as bad as
misinterpreting every word. So a 0-1 loss on sentences seems inappropriate.
We could also try to view every word as a separate class and try to predict
each word individually. This seems somehow better, since we could learn to get
most of the word in a sentence right. On the other hand, we lose all context.
So for example the expression &#8220;car door&#8221; is way more likely than &#8220;car boar&#8221;,
while predicted individually these could be easily confused.
For a similar example, see <a class="reference internal" href="auto_examples/plot_letters.html#sphx-glr-auto-examples-plot-letters-py"><span>OCR Letter sequence recognition</span></a>.</p>
<p>Structured prediction tries to overcome these problems by considering the
output (here the sentence) as a whole and using a loss function that is
appropriate for this domain.</p>
</div>
<div class="section" id="a-formalism">
<h2>A formalism<a class="headerlink" href="#a-formalism" title="Permalink to this headline">¶</a></h2>
<p>I hope I have convinced you that structured prediction is a useful thing. So
how are we going to formalize this? Having functions that produce arbitrary
objects seem a bit hard to handle. There is one very basic formula at the heart
of structured prediction:</p>
<div class="math">
<p><img src="_images/math/f7235989bf1dfa9e7727b21a7e467cdb2dd47c79.png" alt="y^* = \arg \max_{y \in Y} f(x, y)"/></p>
</div><p>Here x is the input, Y is the set of all possible outputs and f is a
compatibility function that says how well y fits the input x. The prediction
for x is y*, the element of Y that maximizes the compatibility.</p>
<p>This very simple formula allows us to predict arbitrarily complex outputs, as
long as we can say how compatible a given output is with the input.</p>
<p>This approach opens up two questions:</p>
<p>How do we specify f? How do we compute y*?</p>
<p>As I said above, the output set Y is usually a finite but very large set (all
graphs, all sentences in the English language, all images of a given
resolution). Finding the argmax in the above equation by exhaustive search is
therefore out of the question. We need to restrict ourselves to f such that
we can do the maximization over y efficiently. The most popular tool for
building such f is using energy functions or conditional random fields (CRFs).</p>
<p>There are basically three challenges in doing structured learning and prediction:</p>
<ul class="simple">
<li>Choosing a parametric form of f.</li>
<li>Solving <img class="math" src="_images/math/e0d7d9cd8e2168df793d78d09c74eeed72127abb.png" alt="\arg\max_y f(x, y)"/>.</li>
<li>Learning parameters for f to minimize a loss.</li>
</ul>
<p>PyStruct takes <img class="math" src="_images/math/0001d02b63ede2fe3219e05a7cd09c82ae6298b6.png" alt="f"/> to be a linear function of some parameters <code class="docutils literal"><span class="pre">w</span></code> and a joint feature function of <code class="docutils literal"><span class="pre">x</span></code> and <code class="docutils literal"><span class="pre">y</span></code>:</p>
<div class="math">
<p><img src="_images/math/7b079201bf95e8d4675a0a9c95dce44853d60b63.png" alt="f(x, y) = w^T \text{joint\_feature}(x, y)"/></p>
</div><p>So that the prediction is given by</p>
<div class="math">
<p><img src="_images/math/b1e5c1cf6b4f9480fe317d65e83b2a9207859e25.png" alt="y^* = \arg \max_{y \in Y} w^T \text{joint\_feature}(x, y)"/></p>
</div><p>Here <img class="math" src="_images/math/8659700e6646cd91bc02c32affaa5ec046ee9935.png" alt="w"/> are parameters that are learned from data, and <code class="docutils literal"><span class="pre">joint_feature</span></code> is
defined by the user-specified structure of the model.
The definition of <code class="docutils literal"><span class="pre">joint_feature</span></code> is given by the <a class="reference internal" href="references.html#models"><span>Models</span></a>.
PyStruct assumes that <code class="docutils literal"><span class="pre">y</span></code> is a discrete vector, and most models in PyStruct
assume a pairwise decomposition of the energy f over entries of <code class="docutils literal"><span class="pre">y</span></code>, that is</p>
<div class="math">
<p><img src="_images/math/bc87b4911f306f548498800b60e79e793afd9e97.png" alt="f(x, y) = w^T\ \text{joint\_feature}(x, y) = \sum_{i \in V} w_i^T\ \text{joint\_feature}_i(x, y_i) + \sum_{(i, j) \in E} w_{i, j}^T\ \text{joint\_feature}_{i, j}(x, y_i, y_j)"/></p>
</div><p>Here V are a set of nodes corresponding to the entries of <code class="docutils literal"><span class="pre">y</span></code>, and E are a set of edges between the nodes.
The particular form of <img class="math" src="_images/math/182c2a332b2e6cb9a1abbce1c5916ac2c3515748.png" alt="\text{joint\_feature}_i"/> and <img class="math" src="_images/math/8afd74b8878391de16a0babf27ae89f940fe1b74.png" alt="\text{joint\_feature}_{i, j}"/> depends on the model used. See the <a class="reference internal" href="user_guide.html#user-guide"><span>User Guide</span></a> for details on the models.</p>
<p>The second problem, computation of the argmax, is done via third party inference solvers.
The interfaces to these are explained at <a class="reference internal" href="references.html#inference"><span>Inference</span></a>.
The last part, the learning of <img class="math" src="_images/math/8659700e6646cd91bc02c32affaa5ec046ee9935.png" alt="w"/> is actually the core part of PyStruct.
There are several different algorithms implemented, which you can find under <a class="reference internal" href="references.html#learning"><span>Learning</span></a>.</p>
<p>There have been many publications and book on this topics. For a nice introduction (in the context of computer vision), I recommend
Sebastian Nowozin, Christoph H. Lampert:</p>
<p><a class="reference external" href="http://pub.ist.ac.at/%7Echl/papers/nowozin-fnt2011.pdf">Structured Learning and Prediction in Computer Vision</a></p>
<p>Two of the founding publications on the topic of learning structured models are:</p>
<ul class="simple">
<li>Ben Taskar, Carlos Guestrin, Daphne Koller <a class="reference external" href="http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2003_AA04.pdf">Max-Margin Markov Networks</a></li>
<li>Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun <a class="reference external" href="http://www.jmlr.org/papers/volume6/tsochantaridis05a/tsochantaridis05a.pdf">Large Margin Methods for Structured and Interdependent Output Variables</a></li>
</ul>
</div>
</div>


</div>
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-43292385-1', 'pystruct.github.io');
  ga('send', 'pageview');

</script>
<footer class="footer">
  <div class="container">
    <p class="pull-right">
      <a href="#">Back to top</a>
      
    </p>
    <p>
        &copy; Copyright 2013, Andreas Mueller.<br/>
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.3.3.<br/>
    </p>
  </div>
</footer>
  </body>
</html>