# Dialogue Act Classification

This notebook demonstrates how you can use an XLM-RoBERTa classifier finetuned with the MIDAS data set. For this we need to the
```cltl.dialogue_act_classification``` package. We will demonstrate how you can apply the classifier to a sequence of utterances in a list but also how you can annotate a dialogue in EMISSOR format.

The model needs to placed in a folder on your local machine. It can be downloaded from:

```https://vu.data.surfsara.nl/index.php/s/dw0YCJAVFM870DT```



In [16]:
#### Path to the local copy of midas-da-xlmroberta model, adapt the path to your local configuration
model_path = "../resources/midas-da-xlmroberta"

## Using the classifier on a list of utterances in sequence

In [9]:
from cltl.dialogue_act_classification.midas_classifier import MidasDialogTagger

In [10]:
sentences_en = ["I love cats", "Do you love cats?","Yes, I do", "Do you love cats?", "No, dogs"]
sentences_nl = ["Ik ben dol op katten", "Hou jij van katten?","Ja, ik ben dol op ze", "Hou jij van katten?", "Nee, honden"]
model_path = "/Users/piek/Desktop/d-Leolani/leolani-models/dialogue_models/midas-da-xlmroberta"
analyzer = MidasDialogTagger(model_path=model_path)
for sentence in sentences_en+sentences_nl:
    response = analyzer.extract_dialogue_act(sentence)
    print(sentence, response)

You are using a model of type xlm-roberta to instantiate a model of type roberta. This is not supported for all configurations of models and can yield errors.
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at /Users/piek/Desktop/d-Leolani/leolani-models/dialogue_models/midas-da-xlmroberta and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use mps:0


I love cats [DialogueAct(type='MIDAS', value='LABEL_4', confidence=0.06285823881626129)]
Do you love cats? [DialogueAct(type='MIDAS', value='LABEL_14', confidence=0.05728799104690552)]
Yes, I do [DialogueAct(type='MIDAS', value='LABEL_2', confidence=0.05970270186662674)]
Do you love cats? [DialogueAct(type='MIDAS', value='LABEL_14', confidence=0.060422565788030624)]
No, dogs [DialogueAct(type='MIDAS', value='LABEL_1', confidence=0.06130174919962883)]
Ik ben dol op katten [DialogueAct(type='MIDAS', value='LABEL_21', confidence=0.06320066004991531)]
Hou jij van katten? [DialogueAct(type='MIDAS', value='LABEL_13', confidence=0.05624299868941307)]
Ja, ik ben dol op ze [DialogueAct(type='MIDAS', value='LABEL_13', confidence=0.05842715501785278)]
Hou jij van katten? [DialogueAct(type='MIDAS', value='LABEL_15', confidence=0.05861100181937218)]
Nee, honden [DialogueAct(type='MIDAS', value='LABEL_13', confidence=0.06360748410224915)]


## Annotating conversations in EMISSOR

If a conversation is saved in EMISSOR format, you can use the emissor module to load the conversation and add the dialogue act output of the classifier to the EMISSOR representation. EMISSOR a simple JSON format that is a generated by conversational agents created with the Leolani platform. 

Reference:

```Santamaría, Selene Báez, Thomas Baier, Taewoon Kim, Lea Krause, Jaap Kruijt, and Piek Vossen. "EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References." In Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR), pp. 56-77. 2021.```

In [13]:
from emissor.persistence import ScenarioStorage
from emissor.representation.scenario import Modality, Signal
from cltl.dialogue_act_classification.add_dialogue_acts_to_emissor import DialogueActAnnotator

In [17]:
# label used to store the provenance of the annotation
model_name = "midas-da-xlmroberta"
annotator = DialogueActAnnotator(model_path=model_path, model_name=model_name)

You are using a model of type xlm-roberta to instantiate a model of type roberta. This is not supported for all configurations of models and can yield errors.
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at /Users/piek/Desktop/test/diaclassification/resources/midas-da-xlmroberta and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use mps:0


In [18]:
### Path where the different scenarios are kept, each following the EMISSOR structure
path_to_emissor = "../data/emissor"
### A subfolder within the emissor folder that represents a single scenario
scenario = "14a1c27d-dfd2-465b-9ab2-90e9ea91d214"

In [8]:
scenario_storage = ScenarioStorage(path_to_emissor)
scenario_ctrl = scenario_storage.load_scenario(scenario)
signals = scenario_ctrl.get_signals(Modality.TEXT)
for signal in signals:
    ### If there are old annotations, these can be removed using the next call, check the JSON to see what label was used
    ### annotator.remove_annotations(signal,["MIDAS", "python-source:cltl.dialogue_act_classification.midas_classifier"])
    annotator.process_signal(scenario=scenario_ctrl, signal=signal)
#### Save the modified scenario to emissor
scenario_storage.save_scenario(scenario_ctrl)

## End of notebook