<center>
<h2>Applying Imitation Learning on Natural Language Generation</h2>
</center>

### Natural Language Generation <i> (concept-to-text)</i> ###### 
###### ([Lampouras and Vlachos 2016](https://aclweb.org/anthology/C/C16/C16-1105.pdf)) ###### 

The natural language processing task of generating text from a non-linguistic form.
- e.g. a meaning representation, database records.

<div>
    <div style="display:inline-block;">
        <p style="border:2px; border-radius: 15px; background-color:white; border-style:solid; border-color:black; padding: 0.5em; font-size: 80%; display: inline-block">
        \begin{align}
        & \text{Predicate: INFORM}\\
        & \text{______________________}\\
        & \text{type = "hotel"}\\
        & \text{count = "182"}\\
        & \text{dogs_allowed = dont_care}
        \end{align}
        </p>
    </div>
        <font size="300">&#10143;</font>
    <div style="display:inline-block; vertical-align: middle;" >
        <p style="display:block; width:400px; word-wrap:break-word;">
        There are _182_ _hotels_ if you _do not care_ whether dogs are allowed.
        </p>
    </div>
</div>

### NLG Examples ### 

<br>
<br>
<div>
    <div style="display:inline-block;">
        <p style="border:2px; border-radius: 15px; background-color:white; border-style:solid; border-color:black; padding: 0.5em; font-size: 80%; display: inline-block">
        \begin{align}
        & \text{Predicate: ?REQUEST}\\
        & \text{______________________}\\
        & \text{pricerange}\\
        \end{align}
        </p>
    </div>
    <font size="300">&#10143;</font>
    <div style="display:inline-block;">
        <p style="display:block; width:600px; word-wrap:break-word;">
        So what __price range__ are you looking for?
        </p>
    </div>
</div>
<br>
<div>
    <div style="display:inline-block;">
        <p style="border:2px; border-radius: 15px; background-color:white; border-style:solid; border-color:black; padding: 0.5em; font-size: 80%; display: inline-block">
        \begin{align}
        & \text{Predicate: ?SELECT}\\
        & \text{______________________}\\
        & \text{kids_allowed = \{yes, no\}}\\
        \end{align}
        </p>
    </div>
    <font size="300">&#10143;</font>
    <div style="display:inline-block;">
        <p style="display:block; width:600px; word-wrap:break-word;">
        Are you looking for a restaurant that __allows kids, or does not allow kids__?
        </p>
    </div>
</div>

### What would we like to improve? ###

Statistical methods for NLG (mostly) rely on human-annotated data for training.
- Especially on alignments between the meaning representation and reference texts.
- Time-consuming and costly to construct.

<br>
<div>
    <div style="display:inline-block;">
        <p style="border:2px; border-radius: 15px; background-color:white; border-style:solid; border-color:black; padding: 0.5em; font-size: 70%; display: inline-block">
        \begin{align}
        & \text{Predicate: INFORM}\\
        & \text{______________________}\\
        & \text{type = "Sanjalisco"}\\
        & \text{good_for_meal = breakfast}\\
        & \text{near = mission}
        \end{align}
        </p>
    </div>
    <font size="300">&#10143;</font>
    <div style="display:inline-block; vertical-align: middle;">
        <p style="display:block; width:400px; word-wrap:break-word;">
        __Sanjalisco__ is good for __breakfast__ and is near the __mission__ district.
        </p>
    </div>
</div>

### How can Imitation Learning help with that? ### 

We will see how Imitation Learning can be used to learn from unaligned data.

- Why unaligned data? To limit the cost of dataset construction!

- Why Imitation Learning? It can learn from non-decomposable loss functions, and suboptimal training data!

### Transition system? ###

NLG is a complex task due to large output space.
- The set of possible words limited to those observed from the references of the training data.

We formulate NLG as a sequence A of two types of actions:
- content prediction actions a<sub>c</sub>, and
- word prediction actions a<sub>w</sub>.

### NLG formulation ### 


<p style="border:3px; border-radius: 25px; background-color:lightgrey; border-style:solid; border-color:black; padding: 0.5em; font-size: 70%; display: inline-block">
\begin{align}
& \textbf{Input:} \; \text{meaning representation} \; MR \; \text{with set of attributes} \; C, \; \text{attribute dictionaries} \; D_c,\;\forall c \in C\\
& \textbf{Output:} \; \text{action sequence} \; A \\
& \; \\
& 1 \;\;  \quad \mathbf{do} \\
& 2 \;\;  \quad \quad \text{predict attribute} \; c ∈ C ∪ {END_{attr}} \\
& 3 \;\;  \quad \quad \text{append} \; a_{c} \; \text{to} \; A_{c} \\
& 4 \;\;  \quad \quad \text{remove} \; c \; \text{from} \; C \\
& 5 \;\;  \quad \mathbf{while} \; ac ≠ END_{attr} \\
& 6 \;\;  \quad \mathbf{for} \; a_{c} \; in \; A_{c} \; \mathbf{do} \\
& 7 \;\;  \quad \quad \mathbf{do} \\
& 8 \;\;  \quad \quad \quad \text{predict word} \; w ∈ D_{c} ∪ {END_{word}} \\
& 9 \;\;  \quad \quad \quad \text{append} \;a_{w} \; \text{to} \; A_{w} \\
& 10 \; \quad \quad \mathbf{while} \; a_{w} ≠ END_{word} \\
& 11 \; \quad A = (A_{c}, A_{w}) \\
\end{align}
</p>

### NLG transition in action! ### 

<img src="images/toBeAnimated/NLG_actionSeq1.png">

### NLG transition in action! ### 

<img src="images/toBeAnimated/NLG_actionSeq1.png">
<img src="images/toBeAnimated/NLG_actionSeq2.png">

### Loss function? ###

We can use various loss functions (e.g. BLEU, ROUGE).

- Content actions are ignored by the loss function, but are indirectly evaluated by their impact on future word predictions.

- The loss function also penalizes undesirable behaviour, e.g. repeating the same word, predicting attributes not in the MR.

### Expert policy? ### 

The expert policy π* is based on:
- the NL references of the MR,
- and the alignments..?

### Alignments ### 
Training these models (independently or jointly) would be possible if we extracted data from manually aligned training references.

- However, we do not assume access to such information!

If no alignments are available, they could be automatically calculated ([Liang et al. 2009](http://www.aclweb.org/anthology/P09-1011)).

- But Liang et al.'s model was trained on the datasets considered, and does not generalize well.
- We will assume no access to that either.

### Using naive alignments ### 

References: <br>
| X-name-1 is a | restaurant at the | side of the river. | <br>
| X-name-1 is a | restaurant at the | riverside. | <br>
| X-name-1 is a | restaurant by the | river that serves | Chinese. | <br>
| X-name-1 is a | riverside | restaurant that serves | Chinese. | <br>
| For a Chinese | restaurant, | go to X-name-1 near the | riverside. | <br>

<img src="images/toBeAnimated/NLG_naiveExample.png">

### Suboptimal expert policy ### 

Since our gold standard is naively constucted, the resulting expert policy is suboptimal.

Other potential causes of suboptimal experts are computational restraints.
- For large action sequences we may need to limit our estimations on a subsequence.

An Imitation Learning approach that relies heavily on the expert policy, can be at a disadvantage.

### Locally Optimal Learning to Search ### 
 
LOLS can learn from suboptimal π*
- Because it potentially performs roll-outs with π<sub>i</sub>.

LOLS can learn from non-decomposable loss functions (e.g. BLEU, ROUGE).
- Because it only needs to evaluate complete output predictions, not individual actions.

- For NLG, this means we do not require explicit supervision on how each action is aligned, or which predictor should generate each word; we just need a way to evaluate how good the complete final sentence is.

### LOLS in action! ### 

<img src="images/toBeAnimated/NLG_LOLS.png">

### Sequence correction ### 

Imitation Learning can generate very noisy training instances on tasks that are heavily dependant on previous context.

<img src="images/seqCorrection1.png">

To address this, we apply sequence correction before moving to the next timestep:
- We correct all the already examined actions using π*.

- And re-predict the rest of the sequence using π<sub>i</sub>.

<img src="images/seqCorrection2.png">

If suboptimal actions are encountered further in the new sequence, sequence correction may again be performed.

Before SC, we may allow the examination of at most E actions after the first suboptimal one.
- This will allow the predictors to learn how to recover from the mistake.

### Sequence Correction results ### 
<br>
<center>
<img src="images/SF_HOTEL_analysis_loss.jpg">
</center>

### Results per LOLS epoch ### 
<br>
<center>
<img src="images/NLG_policyResults.jpg">
</center>

### Automatic evaluation for NLG ### 

<center>
<table style="float:center;text-align:center;color:#333; border-collapse:collapse; border-spacing: 0;">
<thead>
<tr>
<th style="border:1px solid transparent;"></th>
<th colspan="3" style="text-align:center;border:1px solid transparent;">SF Restaurant</th>
<th colspan="3" style="text-align:center;border:1px solid transparent;">SF Hotel</th>
</tr>
<tr>
<th style="border:1px solid transparent;"></th>
<th style="text-align:center;border:1px solid transparent;"><b>BLEU</b></th>
<th style="text-align:center;border:1px solid transparent;"><b>ROUGE</b></th>
<th style="text-align:center;border:1px solid transparent;"><b>ERR(%)</b></th>
<th style="text-align:center;border:1px solid transparent;"><b>BLEU</b></th>
<th style="text-align:center;border:1px solid transparent;"><b>ROUGE</b></th>
<th style="text-align:center;border:1px solid transparent;"><b>ERR(%)</b></th>
</tr>
</thead>
<tbody>
<tr>
<td style="border:1px solid transparent;"><b>LSTM</b></td>
<td style="text-align:center;border:1px solid transparent;">52.97</td>
<td style="text-align:center;border:1px solid transparent;">43.52</td>
<td style="text-align:center;border:1px solid transparent;">6.29</td>
<td style="text-align:center;border:1px solid transparent;">66.37</td>
<td style="text-align:center;border:1px solid transparent;">56.19</td>
<td style="text-align:center;border:1px solid transparent;">3.99</td>
</tr>
<tr>
<td style="border:1px solid transparent;"><b>LOLS</b></td>
<td style="text-align:center;border:1px solid transparent;">49.44</td>
<td style="text-align:center;border:1px solid transparent;">38.52</td>
<td style="text-align:center;border:1px solid transparent;">0.58</td>
<td style="text-align:center;border:1px solid transparent;">68.65</td>
<td style="text-align:center;border:1px solid transparent;">68.37</td>
<td style="text-align:center;border:1px solid transparent;">0.52</td>
</tr>
</tbody>
</table>
</center>

### Human evaluation for NLG ### 

<center>
<table style="float:center; color:#333; border-collapse:collapse; border-spacing: 0;">
<thead>
<tr>
<th></th>
<th colspan="2" style="text-align:center;">SF Restaurant</th>
<th colspan="2" style="text-align:center;">SF Hotel</th>
</tr>
<tr>
<th></th>
<th style="text-align:center;"><b>Fluency</b></th>
<th style="text-align:center;"><b>Informativeness</b></th>
<th style="text-align:center;"><b>Fluency</b></th>
<th style="text-align:center;"><b>Informativeness</b></th>
</tr>
</thead>
<tbody>
<tr>
<td><b>LSTM</b></td>
<td style="text-align:center;">4.49</td>
<td style="text-align:center;">5.29</td>
<td style="text-align:center;">4.41</td>
<td style="text-align:center;">5.36</td>
</tr>
<tr>
<td><b>LOLS</b></td>
<td style="text-align:center;">4.23</td>
<td style="text-align:center;">5.36</td>
<td style="text-align:center;">4.68</td>
<td style="text-align:center;">5.19</td>
</tr>
</tbody>
</table>
</center>

We performed Analysis of Variance (ANOVA) and post-hoc Tukey tests (a = 0.05); there is no statistically significant difference.

### Summary so far ### 

We discussed modifications to the LOLS framework.
- Exponential decay schedule when determining the roll-out policy.
- Using sequence correction when encountering suboptimal actions.

We showed that LOLS improves on the results of AROW predictors.
- LOLS also improved on the predictors, when using random initialization of alignments.