<center>
<h2>Applying Imitation Learning on Semantic Parsing</h2>
</center>

### Semantic parsing ###
###### ([Goodman et al. 2016](http://aclweb.org/anthology/P16-1001)) ###### 

Semantic parsers map natural language to meaning representations
- Need to abstract over syntactic phenomena, resolve anaphora, eliminate ambiguousness in word 

- Essentially the inverted task of natural language generation

### Abstract meaning representation ###
###### ([Banarescu et al. 2013](http://www.aclweb.org/anthology/W13-2322)) ###### 

<div style="width:60%; float: left;">
<p style="float: left;">
<ul style="float: left;">
A meaning representation formalism that utilizes a graph to represent relationships between concepts.
<li>Structure similar to dependency parses.</li>
<li>But abstracts away from function words, and inflection details of words.</li>
<li>Due to its structure, transition-based approaches are common.</li>
</ul>
</p>
</div>

<img src="images/amrExample.png" style="width:40%; float: right;">

### How can Imitation Learning help with that? ### 

Similarly to dependency parsing, greedy encoding suffers from error propagation.

And Imitation Learning addresses error propagation!

### Transition system? ###

We consider a dependency graph (tree) as input.
- Dependency graphs are derived from the sentences.
- There is a lot more training data avalaible for dependecy parsing, than exists for AMR parsing.

Transition actions transform the dependency graph into an AMR graph.
- In intermediate stages, some nodes are labeled with words from the sentence, and others with AMR concepts.

### Actions ###

<img src="images/amrParseActions.png">

### Transition-based AMR parsing in action! ###

<img src="images/toBeAnimated/parse2amr_1.png">

### Transition-based AMR parsing in action! ###

<img src="images/toBeAnimated/parse2amr_2.png">

### Transition-based AMR parsing in action! ###

<img src="images/toBeAnimated/parse2amr_3.png">

### Transition-based AMR parsing in action! ###

<img src="images/toBeAnimated/parse2amr_4.png">

### Transition-based AMR parsing in action! ###

<img src="images/toBeAnimated/parse2amr_5.png">

### Loss function? ###

Smatch ([Cai and Knight, 2013](http://amr.isi.edu/smatch-13.pdf))
- F<sub>1</sub>-Score between predicted and gold-target AMR graphs.

- Computationally expensive for every rollout.

- Naive Smatch doesn't calculate all possible mappings of nodes between predicted and target graphs; employs heuristics.

### Expert policy? ###

Best reachable state is explored via roll-outs, with naive Smatch used as a loss function.

- Also, to encourage short trajectories, a length penalty is applied.

### V-DAgger ###

Variant of DAgger proposed by [Vlachos and Clark (2014)](http://www.aclweb.org/anthology/Q14-1042)
- Employs roll-outs, with the same policy used for both roll-ins and roll-outs.

### Imitation Learning challenges ###

Incredible number of possible actions at each time-step.
- In the order of 10<sup>3</sup> to 10<sup>4</sup>.
- Exploring all alternative actions at each time-step can be very time-consuming.

Incredible length of the action sequences.
- In the range of 50-200 actions.
- Especially challenging when combined with the large number of possible actions.

### Targeted exploration ###

There is no reason to explore alternative actions when:
- Expert and learned policy agree on the correct action, <b>and</b>

- no alternative action is scored highly.

The algorithm limits the exploration to the expert action and learned policy actions whose scores is within a threashold $\tau$ from the best scored one.
- In first epoch, where there is no learned policy, we randomly explore a number of actions.

### Other cases of partial exploration ###

SCB-LOLS and AggreVaTe both use partial exploration.
- They select which time-step they apply it at random.
- They select which actions they explore at random.

Targeted exploration focuses on the actions for which the leaned policy is least certain, or disagrees with the expert.

### Focused costing ###

Introduced by [Vlachos and Craven (2011)](http://www.aclweb.org/anthology/W/W11/W11-0307.pdf)
- Instead of using learned policy for $\beta$% of the rollout steps,

- use it for the first $\beta$ steps and reverty to the expert policy for the rest.

This keeps roughly the same computational cost, while focusing the effect of the explored action to the immediate actions that follow.

- Reduces noise, the mistakes the learned policy  may make on distant actions are considered irrelevant.

- We can increase $\beta$ with each epoch, to move away from the expert.

### Issues of step-level stochasticity ###

v-DAgger and SEARN employ step-level stochasticity during their roll-outs.
- i.e. each step during roll-out can be performed by either the learned or expert policy.

- In other words, the same training example may have very different roll-outs when reexamined.

- This results in high variance in the reward signal, and hinders effective learning.

### Noise reduction ###

$a$-bound by [Khardon and Wachman (2007)](http://www.jmlr.org/papers/volume8/khardon07a/khardon07a.pdf)
- Exclude a training example from subsequent training if it has been already misclassified $a$ times during training.

Alternatively, we could use LOLS
- Rollouts are performed consistently with the same policy.

- Can hurt training times when moving from exclusive expert to exclusive learned policy, due to large length of action sequences.

### DAgger with a-bound ###

<center>
<img src="images/aboundResults.png">
</center>

### Targeted exploration and focused costing results ###

<center>
<img src="images/amr_ILmods.png">
</center>

### Comparison with previous work ###

<center>
<img src="images/amrResults_previousWork.png">
</center>

### Comparison between different IL algorithms ###

<center>
<img src="images/amrResults_otherIL.png">
</center>

### Summary so far ### 

We discussed more modifications to the DAgger framework.
- Targeted exploration consideres only actions for which the learned policy is uncertain and that disagree with the expert policy.
- Using and $a$-bound we filter out training examples that confuse the classifier.
- Focused costing performs the learned policy only on the actions that are immediately effected by the current explored one.

We showed that imitation learning improves on the results.