<center>
<h2>Practical advice</h2>
<h3>Making sure everything actually works!</h3>
</center>

### How to apply Imitation Learning?###

For any task, define:
- Transition
- Loss function
- Expert policy

### Is the expert policy working as intended? ###

Perform full rollins with $\pi^{\star}$.<ul><li>Resulting sequences should have $L(S_{final}, \mathbf{y}) = 0$.</li><li class="fragment" data-fragment-index="3">Or close to 0, if happy with suboptimal $\pi^{\star}$.</li></ul>

<center>
<img src="images/stateTransitExpert.png">
</center>

If expert policy takes too long to calculate during rollout:
- Consider suboptimal alternatives!
- More IL iterations could be more helpful.

### Are the roll-outs working as intended? ###

Ensure costs obtained through rollouts are sensible.
- Actions returned by optimal $\pi^{\star}$ should have low cost.
- Equally optimal actions should be costed closely.

<center>
<img src="images/determReachStatesScored.png" width="60%">
</center>

### Is Imitation Learning working? ###

Examine time-steps before and after application of IL.

<center>
<img src="images/toBeAnimated/NLG_LOLS.png" width="80%">
</center>

If loss rises:
- Adjust transition system.
- Make sure features (latent representations) describe the state and actions adequately.

### Avoid cycles ###

Need to prevent cycles between state transitions!
<br>
<br>
<span style="font-variant: small-caps;">... -> Swap($e_i$, $e_j$) -> Swap($e_j$, $e_i$) -> ...</span></ul>

Cycles can be controlled in either:
- Transition system, or
- loss function.

### Are the actions learnable? ###

Make sure that the features (latent representations) can describe them adequately.

Include features that consider the previous actions.
- To avoid repeating or undoing previous actions.
- To learn action chains.

### Identify problems in your task ###

Suboptimal expert policy?
- Use rollouts that mix expert and classifier.

Errors in rollin introducing noise to the feature vectors?
- Try sequence correction.

Variance or irrelevant errors in rollouts affect costing?
- Try focused costing.

Too many actions to rollout?
- Try targeted exploration.

Irrelevant errors in rollouts affect costing?
- Try focused costing.

Noisy training instances?
- Try $a$-bound noise reduction.

### Work using imitation learning in EACL 2017 ###

<b>Tackling Error Propagation through Reinforcement Learning:
A Case of Greedy Dependency Parsing</b>
<br>
Minh Lê and Antske Fokkens
<br>

### Summary ###

Discussed expert policy definitions:
<ul><li>Static vs. dynamic vs. suboptimal policies.</li></ul>
<br><br>
<span class="fragment" data-fragment-index="2">Examined variations to exploration:</span>
<ul><li class="fragment" data-fragment-index="2">Early termination, and targeted and partial exploration.</li></ul>

Showed how to reduce noise in the transition sequence.
<ul><li>Sequence correction and focused costing.</li></ul><br>
<span class="fragment" data-fragment-index="1">And how to reduce noise for the classifier.</span>
<ul><li class="fragment" data-fragment-index="1">$\alpha$-bound and focused costing.</li></ul>
<br><br>
<span class="fragment" data-fragment-index="2">Applying Imitation Learning on various NLP tasks.</span>
<ul><li class="fragment" data-fragment-index="2">Transitions, loss functions, expert policies.</li>
<li class="fragment" data-fragment-index="3">And improved on results!</li></ul>

<center>
<h2>Thank you!</h2>
<p style="text-align:center"><small><a href="http://sheffieldnlp.github.io/ImitationLearningTutorialEACL2017/">sheffieldnlp.github.io/ImitationLearningTutorialEACL2017/</a></small></p>
</center>

### Available code ### 

<b>Imitation Learning</b><br>
Vowpal Wabbit Credit assignment compiler ([in Python](http://hunch.net/~vw))<br>
V-DAgger ([in Scala](https://github.com/hopshackle/dagger-AMR)) ([in Python](https://github.com/sheffieldnlp/APEimitaion))<br> 
LOLS ([in Java](https://github.com/glampouras/JLOLS_NLG))
<br><br>
<b>Applications</b><br>
Dependency parsing ([in Python](https://bitbucket.org/yoavgo/tacl2013dynamicoracles))<br>
Natural language generation ([in Java](https://github.com/glampouras/JLOLS_NLG))<br>
Semantic parsing for AMR ([in Scala](https://github.com/hopshackle/dagger-AMR))
<br><br>
<b>Cost-sensitive multiclass classification</b><br>
Vowpal Wabbit cost-sensitive classifier ([in Python](http://hunch.net/~vw))<br>
Adaptive Regularization of Weights ([in Python](https://github.com/andreasvlachos/arow_csc)) ([in Java](https://github.com/glampouras/JLOLS_NLG))

### Annotated bibliography 

#### PhD theses

[Hal Daumé III, 2006](http://www.umiacs.umd.edu/~hal/docs/daume06thesis.pdf): Practical Structured Learning Techniques for Natural Language Processing

[Stéphane Ross, 2013](http://www.cs.cmu.edu/~sross1/publications/ross_phdthesis.pdf): Interactive Learning for Sequential Decisions and Predictions

[Pieter Abbeel, 2008](http://www.cs.stanford.edu/~pabbeel/thesis/thesis.pdf): Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control

#### Papers

[Abeel and Ng, 2004](http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf): Apprenticeship Learning via Inverse Reinforcement Learning *(inverse reinforcement learning)*

[Viera and Eisner, 2016](http://timvieira.github.io/doc/2016-tacl-pruning.pdf): Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing *(LOLS with random expert=RL, changeprop)*

[Goldberg and Nivre, 2012](http://www.aclweb.org/anthology/C12-1059): A Dynamic Oracle for Arc-Eager Dependency Parsing *(DAgger for dependency parsing)*

[Ballesteros et al., 2016](https://arxiv.org/pdf/1603.03793.pdf): Training with Exploration Improves a Greedy Stack LSTM Parser *(DAgger for LSTM-based dependency parsing)*

[Clark and Manning, 2015](http://cs.stanford.edu/people/kevclark/resources/clark-manning-acl15-entity.pdf): Entity-Centric Coreference Resolution with Model Stacking

#### Papers

[Lampouras and Vlachos 2016](https://aclweb.org/anthology/C/C16/C16-1105.pdf): Imitation learning for language generation from unaligned data *(LoLS for natural language generation, sequence correction)*.

[Goodman et al. 2016](http://aclweb.org/anthology/P16-1001): Noise reduction and targeted exploration in imitation learning for
Abstract Meaning Representation parsing *(V-DAgger for semantic parsing, targeted exploration)*.

[Vlachos and Craven, 2011](http://www.aclweb.org/anthology/W/W11/W11-0307.pdf): Search-based Structured Prediction applied to Biomedical Event Extraction *(SEARN for biomedical event extraction, focused costing)*

[Berant and Liang, 2015](https://www.transacl.org/ojs/index.php/tacl/article/view/646/160): Imitation Learning of Agenda-based Semantic Parsers

[Ranzato et al., 2016](https://arxiv.org/pdf/1511.06732.pdf): Sequence Level Training with Recurrent Neural Networks *(Imitation learning for RNNs, learns a cost estimator instead of using roll-outs)*

#### Papers

[Daumé III et al., 2009](http://hunch.net/~jl/projects/reductions/searn/searn.pdf): Search-based structured prediction *(SEARN algorithm)*

[Daumé III, 2009](http://www.umiacs.umd.edu/~hal/docs/daume09unsearn.pdf): Unsupervised Search-based Structured Prediction *(Unsupervised structured prediction with SEARN)*

[Chang et al., 2015](https://arxiv.org/pdf/1502.02206.pdf): Learning to search better than your teacher *(LoLS algorithm, connection with RL)*

[Ho and Ermon, 2016](https://arxiv.org/abs/1606.03476): Generative Adversarial Imitation Learning *(Connection with adversarial training)*

[He et al., 2012](https://papers.nips.cc/paper/4545-imitation-learning-by-coaching.pdf): Imitation Learning by Coaching *(coaching for DAgger)*