In [4]:
%matplotlib
from IPython.html.services.config import ConfigManager
from IPython.paths import locate_profile
from IPython.display import Image
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
              'theme': 'solarized',
              'transition': 'slide',
              'start_slideshow_at': 'selected',
              'progress': 'true',
})

Using matplotlib backend: Qt5Agg


{'progress': 'true',
 'start_slideshow_at': 'selected',
 'theme': 'solarized',
 'transition': 'slide'}

<center>
<h2>Noise reduction and targeted exploration</h2>
<h3>applied on semantic parsing</h3>
</center>

### Semantic parsing ###

Semantic parsers map natural language to meaning representations
- Need to abstract over syntactic phenomena, resolve anaphora, eliminate ambiguousness in word senses
- Essentially the inverted task of natural language generation

Also known as:
- natural language understanding
- natural language database interfaces
- semantic role labeling
- question answering on databases

### Meaning representations ###

Meaning representation examples...take some pics from Andreas

### Abstract meaning representation ###

A meaning representation formalism that utilizes a graph to represent relationships between concepts.
- Structure similar to dependency parses.
- But abstracts away from function words, and inflection details of words.
- Due to its structure, transition-based approaches are common.

<img src="images/amrExample.png">

### How can Imitation Learning help with that? ### 

Similarly to dependency parsing, greedy encoding suffers from error propagation.

Imitation Learning addresses error propagation, by considering the interaction among the transition being considered and transitions to be predicted later in the sentence.
- Explores the search space, but avoids enumerating all possible outputs.
- Also learns how to recover from errors.

<center>
<h2>Transition-based semantic parsing</h2>
</center>

### Transition from what to what? ###

We consider a dependency graph (tree) as input.
- Dependency graphs are derived from the sentences.
- There is a lot more training data avalaible for dependecy parsing, than exists for AMR parsing.

Transition actions transform the dependency graph into an AMR graph.
- In intermediate stages, some nodes are labeled with words from the sentence, and others with AMR concepts.

### States ###

Graph
- $x = {0,1,...N}$ are the nodes, each of them representing either one of the words in the sentence or a concept of the AMR graph
- $a \subseteq x \times x \times L$ are labeled directed arcs between the words, with labels coming from a predefined set $L$.

### States ###

- Stack $\sigma$: initialized with all the nodes in the dependency tree, root at the bottom.
- Buffer $\beta$: initialized with all the children of the top node in $\sigma$.

### Actions ###

<img src="images/amrParseActions.png">

### Transition-based AMR parsing in action! ###

<img src="images/amrExample.png">

<center>
<h2>Imitation learning for semantic parsing</h2>
</center>

### V-DAgger ###

Variant of DAgger proposed by Vlachos and Clark (2014)
- Employs roll-outs, with the same policy used for both roll-ins and roll-outs.

### Challenges ###

Incredible number of possible actions at each time-step.
- In the order of 10<sup>3</sup> to 10<sup>4</sup>.
- Exploring all alternative actions at each time-step can be very time-consuming.

Incredible length of the action sequences.
- In the range of 50-200 actions.
- Especially challenging when combined with the large number of possible actions.

### Targeted exploration ###

There is no reason to explore alternative actions when:
- Expert and learned policy agree on the correct action, <b>and</b>
- no alternative action is scored highly.

The algorithm limits the exploration to the expert action and learned policy actions whose scores is within a threashold $\tau$ from the best scored one.
- In first epoch, where there is no learned policy, we randomly explore a number of actions.

### Other cases of partial exploration ###

SCB-LOLS and AggreVaTe both use partial exploration.
- They select which time-step they apply it at random.
- They select which actions they explore at random.

Targeted exploration focuses on the actions for which the leaned policy is least certain, or disagrees with the expert.

### Issues of step-level stochasticity ###

v-DAgger and SEARN employ step-level stochasticity during their roll-outs.
- i.e. each step during roll-out can be performed by either the learned or expert policy.
- In other words, the same training example may have very different roll-outs when reexamined.
- This results in high variance in the reward signal, and hinders effective learning.

### Noise reduction ###

$a$-bound by Khardon and Wachman (2007)
- Exclude a training example from subsequent training if it has been already misclassified $a$ times during training.

Alternatively, we could use LOLS
- Rollouts are performed consistently with the same policy.
- Can hurt training times when moving from exclusive expert to exclusive learned policy, due to large length of action sequences.

### Focused costing ###

Introduced by Vlachos and Craven (2011)
- Instead of using learned policy for $\beta$% of the rollout steps, 
- use it for the first $\beta$ steps and reverty to the expert policy for the rest.

This keeps roughly the same computational cost, while focusing the effect of the explored action to the immediate actions that follow.
- Reduces noise, the mistakes the learned policy  may make on distant actions are considered irrelevant.
- We can increase $\beta$ with each epoch, to move away from the expert.

### Expert policy ###

Smatch (Cai and Knight, 2013)
- F<sub>1</sub>-Score between predicted and gold-target AMR graphs.
- Computationally expensive for every rollout.

Naive Smatch used during training
- Skips combinatorial mapping of nodes between predicted and target graphs.
- Also, to encourage short trajectories, a length penalty is applied.

<center>
<h2>Semantic parser experiments</h2>
</center>

### DAgger with a-bound ###

<center>
<img src="images/aboundResults.png">
</center>

### Targeted exploration and focused costing results ###

<center>
<img src="images/amr_ILmods.png">
</center>

### Comparison with previous work ###

<center>
<img src="images/amrResults_previousWork.png">
</center>

### Comparison with previous work ###

<center>
<img src="images/amrResults_otherIL.png">
</center>

### Summary so far ### 

We discussed more modifications to the DAgger framework.
- Targeted exploration consideres only actions for which the learned policy is uncertain and that disagree with the expert policy.
- Using and $a$-bound we filter out training examples that confuse the classifier.
- Focused costing performs the learned policy only on the actions that are immediately effected by the current explored one.

We showed that imitation learning improves on the results.