# Classifing Reports: emotion presence

In this notebook, we will look at how to classify a dream report for the presence of emotions using DReAMy. We will test both the emotion presence, were emotions are attributed a probability of presence, and generation, where text is generated by the model, describing the presence of emotions, as well as to which character such emotion is attributed.   

In [1]:
! pip install dreamy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dreamy
  Downloading dreamy-0.0.5-py3-none-any.whl (12 kB)
Collecting transformers[tokenizers,torch]
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.9.0-py3-none-any.whl (462 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.8/462.8 KB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess
  Downloading multiprocess-0.70.14-py38-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.0/132.0 KB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting xxhash
  Downloading xxhash-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 KB[

In [2]:
import dreamy

Let's start by getting some dreams. You can stard by dowloading a collection of dream-reports scraped from the DreamBank database, freely availabe from DReAMy's hugging face!

In [3]:
dream_bank = dreamy.get_HF_DreamBank(as_dataframe=True)

Downloading readme:   0%|          | 0.00/2.45k [00:00<?, ?B/s]



Downloading and preparing dataset None/None to /root/.cache/huggingface/datasets/DReAMy-Library___parquet/DReAMy-Library--DreamBank-dreams-34a268b12519d660/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/15.5M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/29345 [00:00<?, ? examples/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/DReAMy-Library___parquet/DReAMy-Library--DreamBank-dreams-34a268b12519d660/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

As you can see, the file, dowloaded directlty as a pandas DataFrame, as three entires:
- dreams, the dream-reports
- series, the different collection of DreamBank
- description, a brief description of each series

In [4]:
dream_bank.sample(5)

Unnamed: 0,dreams,series,description
2003,Dieser Traum spielt bei Frau Böhringer in Bend...,vonuslar.de,"Detlev von Uslar, auf Deutsch"
11883,I was working as a waitress at a huge party. T...,nancy,Nancy: Caring & headstrong
27429,There is a second story deck around the buildi...,b,Barb Sanders
18266,ich war mit meinem Neffen im Zoo. Plötzlich si...,german-f.de,German dreams (F)
5729,"Ich musste immer ins Hochgebirge fahren, um Bl...",vonuslar.de,"Detlev von Uslar, auf Deutsch"


Lets now sample a small set of dreams. If you have a more powerfull machine (or you are working on Colab), you can increase the number of report. Note than the whole dataset contains ~ 29 k reports.

In [5]:
n_samples = 10
dream_sample = dream_bank.sample(n_samples).reset_index(drop=True)

dream_as_list = dream_sample["dreams"].tolist()

We now set some parameters to decide which model to start with.

### Emotion Presence
Here, we query a model just to know the probability of an emotionbeing present.

In [6]:
classification_type = "presence"
model_type          = "base-en"
return_all_scores   = True
device              = "cpu"
max_length          = 512
truncation          = True
device              = "cpu"

We then use the `.model_maps` feature to get the correct model-name based on the selected specifications. Since DReeAMy mainly makes usee of 🤗's pipeline, we need the corrent tas-name too.

In [7]:
model_name, task = dreamy.emotion_classification.emotion_model_maps[
    "{}-{}".format(classification_type, model_type)
]

print(model_name, task)

DReAMy-lib/bert-base-cased-DreamBank-emotion-presence text-classification


In [9]:
predictions = dreamy.predict_emotions(
    dream_as_list, 
    model_name, 
    task,
    return_all_scores=return_all_scores, 
    max_length=max_length, 
    truncation=truncation, 
    device=device,
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/904 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/433M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/669k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]



And here are the predictions. As you can see, they are a list os dictionaries. Each dictionary contains two items: label, that is a specific emotion, and score, or the probability the model has assigned to that emotion being present in the report. 

In [10]:
predictions

[[{'label': 'AN', 'score': 0.0198601596057415},
  {'label': 'AP', 'score': 0.4088245630264282},
  {'label': 'SD', 'score': 0.014561345800757408},
  {'label': 'CO', 'score': 0.6341181993484497},
  {'label': 'HA', 'score': 0.0166273545473814}],
 [{'label': 'AN', 'score': 0.054999228566884995},
  {'label': 'AP', 'score': 0.06907249242067337},
  {'label': 'SD', 'score': 0.9694998264312744},
  {'label': 'CO', 'score': 0.03800805285573006},
  {'label': 'HA', 'score': 0.08832892775535583}],
 [{'label': 'AN', 'score': 0.11532843112945557},
  {'label': 'AP', 'score': 0.2361140251159668},
  {'label': 'SD', 'score': 0.03984451666474342},
  {'label': 'CO', 'score': 0.21033813059329987},
  {'label': 'HA', 'score': 0.1727239042520523}],
 [{'label': 'AN', 'score': 0.3926936089992523},
  {'label': 'AP', 'score': 0.9885526895523071},
  {'label': 'SD', 'score': 0.03490443527698517},
  {'label': 'CO', 'score': 0.023697832599282265},
  {'label': 'HA', 'score': 0.045115191489458084}],
 [{'label': 'AN', 'sc

You can interpret the emotion labels via the decodings already installed in DReAMy

In [11]:
dreamy.Coding_emotions

{'AN': 'anger',
 'AP': 'apprehension',
 'SD': 'sadness',
 'CO': 'confusion',
 'HA': 'happiness'}

Lets now try to *generate* the emotion encodings. We will use the sam data, and general specification, with the key change to the task type, like so:

In [12]:
classification_type = "generation"

# The remaining arguments are the same
model_type          = "base-en"
device              = "cpu"
max_length          = 512
truncation          = True
device              = "cpu"

model_name, task = dreamy.emotion_classification.emotion_model_maps[
    "{}-{}".format(classification_type, model_type)
]

print(model_name, task)

DReAMy-lib/t5-base-DreamBank-Generation-Emot-Char summarization


As you can see, we now have a different model name and task for the pipleine. Moreover, we need to call a sligltly different function: `.generate_emotions`, which taskes also sliglty different inputs.

Don't worry about the `Your max_length is`, is just 🤗 being polite and letting you know you can save memory by reducing the len.

In [13]:
predictions = dreamy.generate_emotions(
    dream_as_list, 
    model_name, 
    task,
    max_length=max_length, 
    truncation=truncation, 
    device=device,
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

Downloading (…)"spiece.model";:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Your max_length is set to 512, but you input_length is only 416. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=208)
Your max_length is set to 512, but you input_length is only 74. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)
Your max_length is set to 512, but you input_length is only 266. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=133)
Your max_length is set to 512, but you input_length is only 414. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=207)
Your max_length is set to 512, but you input_length is only 70. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=35)
Your max_length is set to 512, but you input_length is only 461. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=230)
Your max_length is set to 512, but you input_length is only 230. You might

As you can see, in thi case prediction just have the `summary_text` item  in the dictionary.  This is just the way 🤗 has encoded the general task under which T5, the tuned model follows. You can now explore and study your predictios!

Please note that the model has been trained solely on english (both T5 and DreamBank T5), so eventual predictions in German are bound to be strange/incorrect.

In [14]:
predictions

[{'summary_text': 'The dreamer and the individual female stranger adult experienced apprehension . the dreamer experienced confusion and confusion and sadness. the individual male occupational adult experienced sadness'},
 {'summary_text': 'The dreamer and the individual female father adult experienced sadness . the dreamer experienced happiness and apprehension. The individual female known adult experienced happiness.'},
 {'summary_text': 'The individual male known adult experienced anger. The dreamer experienced confusion and apprehension . the individual male occupational adult experienced confusion.'},
 {'summary_text': 'the dreamer experienced apprehension. the individual female uncertian adult experienced sadness. the group indefinite known adult experienced anger.'},
 {'summary_text': "the dreamer and the group joint uncertian adult experienced happiness . the chef of a food stall in london 'cooks' a record for the group ."},
 {'summary_text': 'The dreamer and the individual fem