# Annotating EMISSOR data

EMISSOR is used to capture interactions. Each interaction is saved as a unique scenario as JSON files in a subfolder. The scenario can be annotated with interpretations after the interactions. 

The concepts and structure of EMISSOR are described in Santamaría et al 2021.

**Reference**:

```Santamaría, Selene Báez, Thomas Baier, Taewoon Kim, Lea Krause, Jaap Kruijt, and Piek Vossen. "EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References." In Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR), pp. 56-77. 2021.```

We developed several annotation modules to assign:

1. The dialogue act of an utterance
2. Emotion expressed
3. Likelihood of an utterance given the preceding context according to a language model
4. Linguistic structures and mentions

In the notebook below, we briefly describe each annotation and provide pointers. After annotating, you can analyse the interaction based on the annotations.

## Preparations

The next preparations should be done in a terminal before launching this notebook. 
So if not done yet, copy the instructions and launch a terminal to carry these out.

Open a terminal and do the following preparations:

1. Define a Python ```venv``` or Conda environment. Here we use ```annotate``` as the name for the environment:

```>python -m venv annotate```
```>conda create -n annotate```

Activate the environment. See the documentation how to activate.
After activation the terminal prompt should be prefixed with ```(annotate)```.

2. Install the necessary packages:

With ```annotate``` activated install the requirements:

```(annotate)>pip install -r requirements.txt```

3. Download the spaCy language module:

The linguistic annotator uses spaCy for which we need to download a language module in our ```annotate``` environment:

```(annotate)>python -m spacy download en_core_web_sm```

4. Add jupyter to your environment

To make sure that Jupyter will know the ```annotate``` environment, we need to do the following:

```(annotate)>python -m ipykernel install --user --name=annotate```

5. Launch Jupter in the ```annotate``` environment and open this notebook:

```(annotate)>jupyter lab```


## Importing the annotators

We first need to import the different ```annotators``` defined by the Python script next to this notebook.

In [1]:
import annotate_emissor_conversation_with_dialogue_acts as dialogacts
import annotate_emissor_conversation_with_emotions as emotions
import annotate_emissor_conversation_with_llm_likelihood as likelihood
import annotate_emissor_conversation_with_text_mentions as mentions

The functions look for subfolders in an emissor folder. These subfolders are treated as scenarios. If no specific scenario is provided, the annotation functions will process all the scenarios. In this notebook, we use the ```emissor``` folder in the ```data``` directory with a single scenario.

In [2]:
emissor = "data/emissor"
scenario = "14a1c27d-dfd2-465b-9ab2-90e9ea91d214"

## Annotating utterances with Dialogue Acts

Dialogue acts represent the overal intention of an utterance in conversation, such as asing a question, giving an answer, making a statement, etc. We used a dataset and scheme with 23 different dialog acts called MIDAS developed by Yu et al. 2021. We finetuned a crosslingual encoder LLM, ```XLM-RoBERTa``` with the data. The finetuned model is available on Huggingface under ```CLTL/midas-da-xlmroberta```.

Reference: ```Yu, Dian, and Zhou Yu. "MIDAS: A Dialog Act Annotation Scheme for Open Domain HumanMachine Spoken Conversations." In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1103-1120. Association for Computational Linguistics, 2021.```

The annotator for the dialog acts has the following parameters that can be set:

1. ```emissor``` = path to the emissor folder.
2. ```scenario``` (optional) = the subfolder in the emissor folder that needs to processed. If omitted all subfolders are processed.
3. ```model_name``` (optional) = the name of the model that is used to specificy the provenance of the annotation. If omitted ```MIDAS``` is used as a a name.
4.  ```model_path``` (optional) = the name of the model on Huggingface or the path to a local model on your computer. If omitted our model on Huggingface is downloaded and used, which may take a bit of time the first time.


In [3]:
model = "CLTL/midas-da-xlmroberta"
dialogacts.main(emissor= emissor, scenario=scenario, model_path=model)

Processing emissor data/emissor
Using model CLTL/midas-da-xlmroberta
Loading MIDAS model... CLTL/midas-da-xlmroberta


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

Device set to use mps:0


Processing scenarios:  ['14a1c27d-dfd2-465b-9ab2-90e9ea91d214']
Processing scenario 14a1c27d-dfd2-465b-9ab2-90e9ea91d214


The text.json file in the scenario folder is augmented with the dialog act annotations. In EMISSOR, annotations are part of ```mentions```. A mention has a ```segment``` that defines which part of which signal is annotated, and one or more annotations. In the next example, we show the output for the dialogue annotator as an annotation of the full text as a segment: "I know. I have heard about you before".

```
"text": "I know. I have heard about you before"
      {
        "@context": {
          "Mention": "https://emissor.org#Mention",
          "id": "@id",
          "segment": "https://emissor.org#segment",
          "annotations": "https://emissor.org#annotations"
        },
        "@type": "Mention",
        "id": "fd60a4bd-86bf-47e5-bd13-16358835c45d",
        "segment": [
          {
            "@context": {
              "Index": "https://emissor.org#Index",
              "id": "@id",
              "start": "https://emissor.org#start",
              "stop": "https://emissor.org#stop",
              "container_id": {
                "@id": "https://emissor.org#container_id",
                "@type": "@id"
              }
            },
            "@type": "Index",
            "container_id": "7d72105d-52d8-4ca1-b1e8-03d9f3d10d13",
            "start": 0,
            "stop": 37,
            "_py_type": "emissor.representation.container-Index"
          }
        ],
        "annotations": [
          {
            "@context": {
              "Annotation": "https://emissor.org#Annotation",
              "id": "@id",
              "type": "https://emissor.org#type",
              "value": "https://emissor.org#value",
              "source": "https://emissor.org#source",
              "timestamp": "https://emissor.org#timestamp"
            },
            "@type": "Annotation",
            "type": "python-type:cltl.dialogue_act_classification.api.DialogueAct",
            "value": {
              "type": "MIDAS",
              "value": "statement",
              "confidence": 0.9055033326148987,
              "_py_type": "cltl.dialogue_act_classification.api-DialogueAct"
            },
            "source": "MIDAS",
            "timestamp": 1760646835711,
            "_py_type": "emissor.representation.scenario-Annotation"
          }
        ]
      }
```

## Annotating utterances with Emotions

Emotions cane be defined in many ways such as for facial expressions and for uttered text. Here we use a finetuned multilingual BERT model trained with the dataset ```GO``` created by Google that capture a broad range of emotions. This model is downloaded from Huggingface. 

**Reference**:

```Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions: A dataset of fine-grained emotions. arXiv preprint arXiv:2005.00547.```

The annotator for the emotions has the following parameters that can be set:

1. emissor = path to the emissor folder.
2. scenario (optional) = the subfolder in the emissor folder that needs to processed. If omitted all subfolders are processed.
3. model_name (optional) = the name of the model that is used to specificy the provenance of the annotation. If omitted GO is used as a a name.
4. model_path (optional) = the name of the model on Huggingface or the path to a local model on your computer. If omitted the model on Huggingface is downloaded and used, which may take a bit of time the first time.


In [19]:
model = "AnasAlokla/multilingual_go_emotions"
emotions.main(emissor=emissor,  scenario=scenario, model_path=model)

Device set to use mps:0


Processing scenarios:  ['14a1c27d-dfd2-465b-9ab2-90e9ea91d214']
Processing scenario 14a1c27d-dfd2-465b-9ab2-90e9ea91d214


The next cell shows the JSON structure for the emotion annotation. Note that the 27 GO emotions are also mapped to the 7 basic Ekman emotions (joy, sadness, anger, surprise, disgust, fear, neutral) as well to sentiment: negative, positive and neutral.

```
"text": "I know. I have heard about you before"
  {
    "@context": {...},
    "@type": "Mention",
    "id": "6aab3fb9-9854-459f-a864-0b8a95c74bb0",
    "segment": [
      {
        "@context": {
          "Index": "https://emissor.org#Index",
          "id": "@id",
          "start": "https://emissor.org#start",
          "stop": "https://emissor.org#stop",
          "container_id": {
            "@id": "https://emissor.org#container_id",
            "@type": "@id"
          }
        },
        "@type": "Index",
        "container_id": "7d72105d-52d8-4ca1-b1e8-03d9f3d10d13",
        "start": 0,
        "stop": 37,
        "_py_type": "emissor.representation.container-Index"
      }
    ],
    "annotations": [
      {
        "@context": {...},
        "@type": "Annotation",
        "type": "python-type:cltl.emotion_extraction.api.Emotion",
        "value": {
          "type": "GO",
          "value": "approval",
          "confidence": 0.28517377376556396,
          "_py_type": "cltl.emotion_extraction.api-Emotion"
        },
        "source": "GO",
        "timestamp": 1760555549981,
        "_py_type": "emissor.representation.scenario-Annotation"
      },
      {
        "@context": {...},
        "@type": "Annotation",
        "type": "python-type:cltl.emotion_extraction.api.Emotion",
        "value": {
          "type": "EKMAN",
          "value": "joy",
          "confidence": 0.4837806256255135,
          "_py_type": "cltl.emotion_extraction.api-Emotion"
        },
        "source": "GO",
        "timestamp": 1760555549981,
        "_py_type": "emissor.representation.scenario-Annotation"
      }
```

## Annotating utterances with Likelihood

Large Language Models are pretrained to predict masked words. We can use their top predictions and the probabilities for each to derive a likelihood score for any utterance given the preceding context. This likelihood score is added to each utterance in EMISSOR. If a word in the utterance is among the top predictions, we take its porbability score. If it does not occur, the score will be zero.

The annotator for the Likehood has the following parameters that can be set:

1. emissor = path to the emissor folder.
2. scenario (optional) = the subfolder in the emissor folder that needs to processed. If omitted all subfolders are processed.
3. model_name (optional) = the name of the model that is used to specificy the provenance of the annotation. If omitted GO is used as a a name.
4. model_path (optional) = the name of the model on Huggingface or the path to a local model on your computer. If omitted ```google-bert/bert-base-multilingual-cased``` on Huggingface is downloaded and used, which may take a bit of time the first time.
5. max_context (optional) = the maximum number of tokens from the preceeding context that is considered. If omitted set to 300.
6. len_top_tokens (otional) = the number of top results that will be used to find the probability of the word in the utterance. If omitted set to 20.


In [11]:
model = "google-bert/bert-base-multilingual-cased"
context=300
top_results = 20

likelihood.main(emissor=emissor, scenario=scenario, model_path=model, model_name = model, max_context = context, len_top_tokens = top_results)

Extracting the likelihood score using google-bert/bert-base-multilingual-cased


Some weights of the model checkpoint at google-bert/bert-base-multilingual-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


model_path google-bert/bert-base-multilingual-cased
model_name google-bert/bert-base-multilingual-cased
context_threshold 300
top_results 20
Processing scenarios:  ['14a1c27d-dfd2-465b-9ab2-90e9ea91d214']
Processing scenario 14a1c27d-dfd2-465b-9ab2-90e9ea91d214


The next cell shows how the Likelihood score is added as an annotation:

```
"text": "I know. I have heard about you before"
  {
    "@context": {...},
    "@type": "Mention",
    "id": "f85a06fb-c7a1-411f-9577-10d21db9f04b",
    "segment": [
      {
        "@context": {
          "Index": "https://emissor.org#Index",
          "id": "@id",
          "start": "https://emissor.org#start",
          "stop": "https://emissor.org#stop",
          "container_id": {
            "@id": "https://emissor.org#container_id",
            "@type": "@id"
          }
        },
        "@type": "Index",
        "container_id": "f410c57a-b295-4570-ad0f-767199129825",
        "start": 0,
        "stop": 11,
        "_py_type": "emissor.representation.container-Index"
      }
    ],
    "annotations": [
      {
        "@context": {
          "Annotation": "https://emissor.org#Annotation",
          "id": "@id",
          "type": "https://emissor.org#type",
          "value": "https://emissor.org#value",
          "source": "https://emissor.org#source",
          "timestamp": "https://emissor.org#timestamp"
        },
        "@type": "Annotation",
        "type": "Likelihood",
        "value": 0.28286586205164593,
        "source": "mBERT",
        "timestamp": 1760532618914,
        "_py_type": "emissor.representation.scenario-Annotation"
      }
    ]
  },
```

## Annotating utterances with Linguistic structures

The final annotation focuses on linguistic properties of the utterances. We use the [spaCy](https://spacy.io) software to get information about the part-of-speech of words used, the syntactic dependencies and the entities that are referred to. A special case are words that match categories of object recognition. Text references to these objects, as defined in the [COCO](https://cocodataset.org/#home) dataset for object recognition, can be interesting to study the physical grounding of the text.

Since the this annotator uses spaCy it is necessary to download the spaCy language module. For English this would be throug the following command in a terminal in the same environment as this notebook:

```python -m spacy download en_core_web_sm```

It is also possible to use language modules for other languages provided by spaCy.

The linguistic annotator can be called with the following paramters:

1. emissor = path to the emissor folder.
2. scenario = the subfolder in the emissor folder that needs to processed. If omitted all subfolders are processed.
3. model  = the spaCy language model that is used to process the utterance.

In [12]:
spacy_language_model = "en_core_web_sm"
mentions.main(emissor=emissor, scenario=scenario, model=spacy_language_model)

Processing scenarios:  ['14a1c27d-dfd2-465b-9ab2-90e9ea91d214']
Processing scenario 14a1c27d-dfd2-465b-9ab2-90e9ea91d214


The next two cells show two annotations of mentions: a speaker reference and a part-of-speech tag for specific words. 

```
"text": "I know. I have heard about you before"
{
"@context": {...},
"@type": "Mention",
"id": "74cca75b-4a94-49fb-b6f2-e4867357aee0",
"segment": [
  {
    "@context": {
      "Index": "https://emissor.org#Index",
      "id": "@id",
      "start": "https://emissor.org#start",
      "stop": "https://emissor.org#stop",
      "container_id": {
        "@id": "https://emissor.org#container_id",
        "@type": "@id"
      }
    },
    "@type": "Index",
    "container_id": "7d72105d-52d8-4ca1-b1e8-03d9f3d10d13",
    "start": 8,
    "stop": 9,
    "_py_type": "emissor.representation.container-Index"
  }
],
"annotations": [
  {
    "@context": {
      "Annotation": "https://emissor.org#Annotation",
      "id": "@id",
      "type": "https://emissor.org#type",
      "value": "https://emissor.org#value",
      "source": "https://emissor.org#source",
      "timestamp": "https://emissor.org#timestamp"
    },
    "@type": "Annotation",
    "type": "Entity",
    "value": {
      "text": "I",
      "type": "SPEAKER",
      "segment": [
        0,
        1
      ],
      "_py_type": "cltl.nlp.api-Entity"
    },
    "source": "NLP",
    "timestamp": 1760553605282,
    "_py_type": "emissor.representation.scenario-Annotation"
  }
]
```

```
"text": "I know. I have heard about you before"
  {
    "@context": {...},
    "@type": "Mention",
    "id": "ed89a8be-eda3-4406-8b11-7c8676cfbe0d",
    "segment": [
      {
        "@context": {
          "Index": "https://emissor.org#Index",
          "id": "@id",
          "start": "https://emissor.org#start",
          "stop": "https://emissor.org#stop",
          "container_id": {
            "@id": "https://emissor.org#container_id",
            "@type": "@id"
          }
        },
        "@type": "Index",
        "container_id": "f410c57a-b295-4570-ad0f-767199129825",
        "start": 2,
        "stop": 6,
        "_py_type": "emissor.representation.container-Index"
      }
    ],
    "annotations": [
      {
        "@context": {
          "Annotation": "https://emissor.org#Annotation",
          "id": "@id",
          "type": "https://emissor.org#type",
          "value": "https://emissor.org#value",
          "source": "https://emissor.org#source",
          "timestamp": "https://emissor.org#timestamp"
        },
        "@type": "Annotation",
        "type": "Token",
        "value": {
          "text": "know",
          "pos": "VERB",
          "segment": [
            2,
            6
          ],
          "_py_type": "cltl.nlp.api-Token"
        },
        "source": "NLP",
        "timestamp": 1760533911733,
        "_py_type": "emissor.representation.scenario-Annotation"
      }
    ]
  }
```

## End of notebook