# Working with Narrative Data

This notebook illustrates some examples of working with narrative text data using small, local language models. 



### Install required libraries 

For Apple we will use `mlx-lm` to load a small, quantized version of the Llama 3 8b instruct model, so that it can run on a single laptop. 

In [1]:
! pip install mlx-lm torch

Collecting mlx-lm
  Downloading mlx_lm-0.14.3-py3-none-any.whl.metadata (5.6 kB)
Collecting torch
  Downloading torch-2.3.1-cp312-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting mlx>=0.14.1 (from mlx-lm)
  Downloading mlx-0.15.1-cp312-cp312-macosx_13_0_arm64.whl.metadata (5.0 kB)
Collecting numpy (from mlx-lm)
  Downloading numpy-2.0.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m880.4 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting transformers>=4.39.3 (from mlx-lm)
  Downloading transformers-4.41.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting protobuf (from mlx-lm)
  Downloading protobuf-5.27.1-cp38-abi3-macosx_10_9_universal2.whl.metadata (592 bytes)
Collecting filelock (from torch)
  Downloading filelock-3.15.1-py3-none-any.whl.metadata (2.8 kB

### Install Llama 3 - 8b 
Next we install the quantized version of the Llama 8b language model



In [2]:
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit")

Fetching 6 files:   0%|          | 0/6 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/52.3k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.9k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.25k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.27G [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Running the model with an example prompt

We show that the model can run with an example prompt. First we define the system prompt, which tells the model what character to adopt. Then we give it an instruction to introduce itself. 

In [18]:
SYSTEM_MSG = "You are a helpful chatbot assistant."

def generateFromPrompt(promptStr,maxTokens=100): 
    messages = [ {"role": "system", "content": SYSTEM_MSG}, 
             {"role": "user", "content": promptStr}, ]
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
    prompt = tokenizer.decode(input_ids)
    response = generate(model, tokenizer, prompt=prompt,max_tokens=maxTokens)
    return(response)

response = generateFromPrompt("Please introduce yourself")

print(response+"...")

Hello there! I'm ChatGenesis, your friendly and helpful chatbot. I'm here to assist you with any questions or topics you'd like to discuss. I'm a large language model, trained on a vast amount of text data, which enables me to understand and respond to your queries in a human-like way.

I'm constantly learning and improving, so please bear with me if I make any mistakes. My goal is to provide you with accurate and helpful information, and to make our conversation as...


Now we illustrate performance on a slightly more medically themed topic: 

In [19]:
response = generateFromPrompt("Please tell me some symptoms of allergies",maxTokens=200)

print(response+"...")

I'd be happy to help! Allergies can cause a wide range of symptoms, but here are some common ones:

1. **Congestion and stuffiness**: Nasal passages may become inflamed, leading to a runny nose, stuffy nose, and difficulty breathing.
2. **Itchy, watery eyes**: Allergens can irritate the eyes, causing itchiness, redness, and excessive tearing.
3. **Sneezing and coughing**: Irritation in the nasal passages and throat can lead to sneezing and coughing.
4. **Skin rashes**: Allergic reactions can cause skin rashes, hives, or eczema-like symptoms.
5. **Hives**: Red, itchy, and swollen patches on the skin, often accompanied by itching.
6. **Anaphylaxis**: A severe, life-threatening allergic reaction that can cause difficulty breathing, rapid heartbeat, and a drop in blood pressure. This requires immediate medical...


### Using the model to extract themes from narrative texts

Here we demonstrate how the model can be used to extract themes from a narrative interview text. 

This brief interview extract comes from the article Price, T., McColl, E., & Visram, S. (2022). Barriers and facilitators of childhood flu vaccination: The views of parents in North East England. Zeitschrift Fur Gesundheitswissenschaften = Journal of Public Health, 30(11), 2619–2626. https://doi.org/10.1007/s10389-022-01695-2

```
So can you tell me a kind of your first impressions, your first thoughts around flu? That can be anything from, like, is it a serious illness, what kind of symptoms do you expect to see, kind of anything first thoughts around flu.

Participant 12: Yeah, so I know a little bit from my yeah, kind of studies and from work, etc. I think it is quite a serious issue. I think, you know, moving forward as well with the latest pandemic. It could be worse, given the strains on the healthcare system and things like that.But yeah, I think I'm certainly aware of the the seriousness of it for certain at risk groups in particular. So anyone who's already vulnerable or has a long term condition and who are elderly, orvery young, obviously are at higher risk and yeah, you know, the whole aim of the kind of screening program has been to protect those at risk groups up, you know, first and foremost. Also in terms of developmening, herd immunity across the whole community as well. So I've always been for the campaign obviously I understand the evidence base behind it and I always had a flu jab myself. You know, we get provided with that as healthcare workers and as public health workers through our employer. But equally, I think, you know, we see the benefits to, to our children as well in terms of protecting them. But also protecting all the vulnerable people that they come in contact with as well, obviously, when they're at school, when they see elderly grandparents and things like that as well. So, um, as far as signs and symptoms. I think I possibly only had flu once in my life and it may have just been a really bad bug. But I was literally bed bound for it when I was at university for a few, good few days. And so it may have been may have been flu. I think from a symptoms point of view it was just the usual kind of fever, nausea, muscle aches, just high temperature and things like that, really, yeah. Yeah, so they were probably my, my kind of bad symptoms, I would say.

```



In [20]:
interviewExtract = """ 
So can you tell me a kind of your first impressions, your first thoughts around flu? That can be anything from, like, is it a serious illness, what kind of symptoms do you expect to see, kind of anything first thoughts around flu.

Participant 12: Yeah, so I know a little bit from my yeah, kind of studies and from work, etc. I think it is quite a serious issue. I think, you know, moving forward as well with the latest pandemic. It could be worse, given the strains on the healthcare system and things like that.But yeah, I think I'm certainly aware of the the seriousness of it for certain at risk groups in particular. So anyone who's already vulnerable or has a long term condition and who are elderly, orvery young, obviously are at higher risk and yeah, you know, the whole aim of the kind of screening program has been to protect those at risk groups up, you know, first and foremost. Also in terms of developmening, herd immunity across the whole community as well. So I've always been for the campaign obviously I understand the evidence base behind it and I always had a flu jab myself. You know, we get provided with that as healthcare workers and as public health workers through our employer. But equally, I think, you know, we see the benefits to, to our children as well in terms of protecting them. But also protecting all the vulnerable people that they come in contact with as well, obviously, when they're at school, when they see elderly grandparents and things like that as well. So, um, as far as signs and symptoms. I think I possibly only had flu once in my life and it may have just been a really bad bug. But I was literally bed bound for it when I was at university for a few, good few days. And so it may have been may have been flu. I think from a symptoms point of view it was just the usual kind of fever, nausea, muscle aches, just high temperature and things like that, really, yeah. Yeah, so they were probably my, my kind of bad symptoms, I would say.
"""

response = generateFromPrompt("Please identify some themes in the following interview transcript: ' "+interviewExtract+"'",maxTokens=200)

print(response+"...")

Based on this interview transcript, some themes that can be identified are:

1. **Perception of flu as a serious issue**: The participant views flu as a serious issue, especially for certain at-risk groups such as the elderly, young, and those with long-term conditions.
2. **Importance of vaccination**: The participant has always been in favor of the flu vaccination campaign and has received the jab themselves as a healthcare worker. They also mention the benefits of vaccination in protecting vulnerable people, including children and those they come into contact with.
3. **Personal experience with flu-like symptoms**: The participant has had a limited experience with flu-like symptoms, which they describe as fever, nausea, muscle aches, and high temperature.
4. **Awareness of healthcare system strains**: The participant is aware of the strains on the healthcare system, which they believe could make the flu situation worse.
5. **Importance of screening programs**: The participant mentio

### Exploring structural aspects of the interview text

We can also try to use the LM to extract structural aspects of the interview speech such as how confident the speaker is. 

In [21]:
response = generateFromPrompt("Please describe how confident the interviewee is in the following interview, with motivation: ' "+interviewExtract+"'",maxTokens=200)

print(response+"...")

Based on the interviewee's responses, I would rate their confidence as moderate to high. Here's why:

* They provide clear and concise answers to the questions, demonstrating a good understanding of the topic.
* They use technical terms such as "strains on the healthcare system" and "herd immunity", which suggests they have a good grasp of the subject matter.
* They share their personal experience of having had flu-like symptoms, which adds a level of authenticity to their responses.
* They also mention that they have received the flu jab as a healthcare worker, which shows that they are familiar with the measures taken to prevent the spread of flu.

However, there are a few moments where their confidence wavers:

* They start their response by saying "Yeah, so I know a little bit from my yeah, kind of studies and from work, etc.", which suggests that they may not be entirely confident in their knowledge.
* They use phrases such as "I think" and "I believe"...


### Exploring emotional aspects of the interview text 

We can explore any emotional dimensions that are present in the narrative text. 

In [22]:
response = generateFromPrompt("Please describe the sentiment and any emotions of the interviewee in the following interview, with motivation: ' "+interviewExtract+"'",maxTokens=200)

print(response+"...")

The sentiment expressed by Participant 12 in this interview is one of caution and concern. They seem to be aware of the seriousness of the flu, particularly for certain at-risk groups, and are motivated to take steps to protect themselves and others. They mention the importance of developing herd immunity and protecting vulnerable individuals, such as the elderly and young children.

The emotions expressed by Participant 12 are likely to be a mix of concern, responsibility, and empathy. They are concerned about the potential impact of the flu on vulnerable individuals and are motivated to take action to protect them. They also seem to be responsible and aware of their role in helping to prevent the spread of the flu, as a healthcare worker and public health worker. Additionally, they may be empathetic towards those who are affected by the flu, particularly the elderly and young children who are more vulnerable to its effects.

The tone of the interview is calm and matter-of-fact, with 

### Extracting information according to a predefined schema

The original study from which this transcript has been sourced harnessed the COM-B theory of behaviour change to frame the research study into perspectives on childhood vaccination. We can similarly use the language model to extract examples of those theoretical elements. 

In [24]:
response = generateFromPrompt("Please extract examples of COM-B (capability, opportunity, and motivation) elements in the following interview: ' "+interviewExtract+"'",maxTokens=400)

print(response+"...")

Here are the examples of COM-B (capability, opportunity, and motivation) elements extracted from the interview:

**Capability:**

* "I think I know a little bit from my studies and from work, etc." (indicates that the participant has the capability to understand the seriousness of flu)
* "I always had a flu jab myself" (indicates that the participant has the capability to take care of their own health)
* "We get provided with that as healthcare workers and as public health workers through our employer" (indicates that the participant has the capability to access flu vaccination as a healthcare worker)

**Opportunity:**

* "I think I possibly only had flu once in my life and it may have just been a really bad bug" (indicates that the participant has had limited opportunities to experience flu firsthand)
* "I was literally bed bound for it when I was at university for a few, good few days" (indicates that the participant has had limited opportunities to experience severe symptoms of flu)

## Exercise: Do it yourself! 

1. Choose an extract of text from an openly available source of interview text, narrative or transcripts. Some examples that you might find useful include:
    - Medical Transcription examples: https://www.kaggle.com/datasets/tboyle10/medicaltranscriptions/data



2. Use the LM to extract information from the text.
    - Themes
    - Structural aspects
    - Emotional aspects