# Working with Narrative Data

This notebook illustrates some examples of working with narrative text data using small, local language models.

## Running this notebook on a newer MacBook with Apple Silicon Chip

You will need an environment with Python and Jupyter installed. To create an environment with Anaconda for Python 3.12, execute: 

```
conda create --name llm-narrative python=3.12
conda activate llm-narrative
conda install jupyter
jupyter notebook
```

## Running this notebook on older MacBooks or any other machine

Please run this script on [Google Colab](https://colab.research.google.com/). After opening the notebook there, please change the settings to using a GPU, check [here](https://www.geeksforgeeks.org/how-to-use-gpu-in-google-colab/) for instructions on how to do that.



### Install required libraries

For the newer MacBooks with Apple Chips we will use `mlx-lm` to load a small, quantized version of the Llama 3 8b instruct model, so that it can run on a single laptop (https://ollama.com/library/llama3). For older MacBooks and other machines we will use a quantized version of the model provided by the hugging face community (https://huggingface.co/astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit).

Depending on the machine, different packages are required and will be installed below.

In [1]:
import platform

# for the newer MacBooks with the Apple Chip
# changed for testing but change back later
if platform.processor() == 'arm':
    ! pip install mlx-lm torch transformers
# for all other machines
else:
    ! pip install torch transformers optimum accelerate auto-gptq bitsandbytes



### Install Llama 3 - 8b
Next we install the quantized version of the Llama 8b language model.



In [2]:
if platform.processor() == 'arm':
    from mlx_lm import load, generate
    model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit")
else:
    from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
    import torch

    MODEL_ID="astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit"
    tokenizer = AutoTokenizer.from_pretrained("astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit")

    config = AutoConfig.from_pretrained(MODEL_ID)
    config.quantization_config["disable_exllama"] = False
    config.quantization_config["exllama_config"] = {"version":2}

    model = AutoModelForCausalLM.from_pretrained(
            MODEL_ID, 
            device_map='auto', 
            torch_dtype=torch.bfloat16, 
            trust_remote_code=True, 
            # low_cpu_mem_usage=True,
            # load_in_4bit=True,
            config=config,
        )

Fetching 6 files:   0%|          | 0/6 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Running the model with an example prompt

We show that the model can run with an example prompt. First we define the system prompt, which tells the model what character to adopt. Then we give it an instruction to introduce itself. Again, depending on the machine and therefore model used, we use slightly different functions to generate output.

In [35]:
from transformers import pipeline
from IPython.display import display

SYSTEM_MSG = "You are a helpful chatbot assistant."

def generateFromPrompt(promptStr,maxTokens=100):
    if platform.processor() == 'arm':
      messages = [ {"role": "system", "content": SYSTEM_MSG},
              {"role": "user", "content": promptStr}, ]
      input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
      prompt = tokenizer.decode(input_ids)
      response = generate(model, tokenizer, prompt=prompt,max_tokens=maxTokens)
    else:
      message = [{"role": "user", "content": promptStr},]
      pipe = pipeline("text-generation", model=model, tokenizer=tokenizer,max_new_tokens=maxTokens)
      result = pipe(message)
      response = result[0]['generated_text'][1]['content']
    return(response)


response = generateFromPrompt("Please introduce yourself")

print(response+"...")

"Hello! My name is Ada, and I'm a helpful chatbot assistant. I'm here to assist you with any questions or tasks you may have. I'm a large language model, trained on a vast amount of text data, which enables me to understand and respond to natural language inputs.\n\nI'm designed to be friendly, approachable, and knowledgeable. I can help you with a wide range of topics, from general knowledge and entertainment to more specific areas like science, technology, and health...."

Now we illustrate performance on a slightly more medically themed topic:

In [37]:
response = generateFromPrompt("Please tell me some symptoms of allergies",maxTokens=200)

print(response+"...")

I'd be happy to help! Allergies can cause a wide range of symptoms, but here are some common ones:

1. **Congestion and stuffiness**: Nasal passages may become inflamed, leading to a runny nose, stuffy nose, and difficulty breathing.
2. **Itchy, watery eyes**: Allergens can irritate the eyes, causing itchiness, redness, and excessive tearing.
3. **Sneezing and coughing**: Irritation in the nasal passages and throat can lead to sneezing and coughing.
4. **Skin rashes and hives**: Allergic reactions can cause skin irritation, leading to redness, itching, and hives (wheals and red bumps).
5. **Stomach cramps and diarrhea**: In some cases, food allergies can cause stomach cramps, diarrhea, and vomiting.
6. **Respiratory issues**: Allergies can exacerbate conditions like asthma, leading to...


### Using the model to extract themes from narrative texts

Here we demonstrate how the model can be used to extract themes from a narrative interview text.

This brief interview extract comes from the article Price, T., McColl, E., & Visram, S. (2022). Barriers and facilitators of childhood flu vaccination: The views of parents in North East England. Zeitschrift Fur Gesundheitswissenschaften = Journal of Public Health, 30(11), 2619–2626. https://doi.org/10.1007/s10389-022-01695-2

```
So can you tell me a kind of your first impressions, your first thoughts around flu? That can be anything from, like, is it a serious illness, what kind of symptoms do you expect to see, kind of anything first thoughts around flu.

Participant 12: Yeah, so I know a little bit from my yeah, kind of studies and from work, etc. I think it is quite a serious issue. I think, you know, moving forward as well with the latest pandemic. It could be worse, given the strains on the healthcare system and things like that.But yeah, I think I'm certainly aware of the the seriousness of it for certain at risk groups in particular. So anyone who's already vulnerable or has a long term condition and who are elderly, orvery young, obviously are at higher risk and yeah, you know, the whole aim of the kind of screening program has been to protect those at risk groups up, you know, first and foremost. Also in terms of developmening, herd immunity across the whole community as well. So I've always been for the campaign obviously I understand the evidence base behind it and I always had a flu jab myself. You know, we get provided with that as healthcare workers and as public health workers through our employer. But equally, I think, you know, we see the benefits to, to our children as well in terms of protecting them. But also protecting all the vulnerable people that they come in contact with as well, obviously, when they're at school, when they see elderly grandparents and things like that as well. So, um, as far as signs and symptoms. I think I possibly only had flu once in my life and it may have just been a really bad bug. But I was literally bed bound for it when I was at university for a few, good few days. And so it may have been may have been flu. I think from a symptoms point of view it was just the usual kind of fever, nausea, muscle aches, just high temperature and things like that, really, yeah. Yeah, so they were probably my, my kind of bad symptoms, I would say.

```



In [30]:
interviewExtract = """
So can you tell me a kind of your first impressions, your first thoughts around flu? That can be anything from, like, is it a serious illness, what kind of symptoms do you expect to see, kind of anything first thoughts around flu.

Participant 12: Yeah, so I know a little bit from my yeah, kind of studies and from work, etc. I think it is quite a serious issue. I think, you know, moving forward as well with the latest pandemic. It could be worse, given the strains on the healthcare system and things like that.But yeah, I think I'm certainly aware of the the seriousness of it for certain at risk groups in particular. So anyone who's already vulnerable or has a long term condition and who are elderly, orvery young, obviously are at higher risk and yeah, you know, the whole aim of the kind of screening program has been to protect those at risk groups up, you know, first and foremost. Also in terms of developmening, herd immunity across the whole community as well. So I've always been for the campaign obviously I understand the evidence base behind it and I always had a flu jab myself. You know, we get provided with that as healthcare workers and as public health workers through our employer. But equally, I think, you know, we see the benefits to, to our children as well in terms of protecting them. But also protecting all the vulnerable people that they come in contact with as well, obviously, when they're at school, when they see elderly grandparents and things like that as well. So, um, as far as signs and symptoms. I think I possibly only had flu once in my life and it may have just been a really bad bug. But I was literally bed bound for it when I was at university for a few, good few days. And so it may have been may have been flu. I think from a symptoms point of view it was just the usual kind of fever, nausea, muscle aches, just high temperature and things like that, really, yeah. Yeah, so they were probably my, my kind of bad symptoms, I would say.
"""

response = generateFromPrompt("Please identify some themes in the following interview transcript: ' "+interviewExtract+"'",maxTokens=200)

print(response+"...")

Based on this interview transcript, some themes that can be identified are:

1. **Perception of flu as a serious issue**: The participant views flu as a serious issue, especially for certain at-risk groups such as the elderly, young, and those with long-term conditions.
2. **Importance of vaccination**: The participant has always been in favor of the flu vaccination campaign and has received the jab themselves as a healthcare worker. They also mention the benefits of vaccination in protecting vulnerable people, including children and those they come into contact with.
3. **Personal experience with flu-like symptoms**: The participant has had a limited experience with flu-like symptoms, which they describe as fever, nausea, muscle aches, and high temperature.
4. **Awareness of healthcare system strains**: The participant is aware of the strains on the healthcare system, which they believe could make the flu situation worse.
5. **Importance of protecting vulnerable groups**: The particip

### Exploring structural aspects of the interview text

We can also try to use the LM to extract structural aspects of the interview speech such as how confident the speaker is.

In [31]:
response = generateFromPrompt("Please describe how confident the interviewee is in the following interview, with motivation: ' "+interviewExtract+"'",maxTokens=200)

display(response+"...")

Based on the interviewee's responses, I would rate their confidence as moderate to high. Here's why:

* They provide clear and concise answers to the questions, demonstrating a good understanding of the topic.
* They use technical terms such as "strains on the healthcare system" and "herd immunity", which suggests they have a good grasp of the subject matter.
* They also provide personal experiences and anecdotes, such as having had a flu jab and having had a bad case of flu in the past, which adds credibility to their responses.
* They are able to articulate their thoughts and opinions clearly, using phrases such as "I think", "I'm certainly aware", and "I always had a flu jab", which suggests they are confident in their views.
* However, there are a few moments where they seem to hesitate or pause, such as when they say "I think I possibly only had flu once in my life" or "I think from a symptoms point of view it was just...


### Exploring emotional aspects of the interview text

We can explore any emotional dimensions that are present in the narrative text.

In [32]:
response = generateFromPrompt("Please describe the sentiment and any emotions of the interviewee in the following interview, with motivation: ' "+interviewExtract+"'",maxTokens=200)

print(response+"...")

The sentiment expressed by Participant 12 in this interview is one of awareness and concern about the seriousness of flu, particularly for vulnerable groups such as the elderly, young, and those with long-term conditions. They also express a sense of responsibility and motivation to take action to protect themselves and others, as evident in their decision to get the flu jab and their support for the screening program.

The emotions expressed by Participant 12 are likely to be a mix of concern, responsibility, and motivation. They seem to be aware of the potential risks and consequences of flu, particularly for vulnerable groups, and are motivated to take action to protect themselves and others. They also express a sense of responsibility as a healthcare worker and public health worker to take steps to prevent the spread of flu.

The tone of the interview is informative, with Participant 12 providing their thoughts and experiences about flu. They seem to be speaking from a place of kno

### Extracting information according to a predefined schema

The original study from which this transcript has been sourced harnessed the COM-B theory of behaviour change to frame the research study into perspectives on childhood vaccination. We can similarly use the language model to extract examples of those theoretical elements.

In [33]:
response = generateFromPrompt("Please extract examples of COM-B (capability, opportunity, and motivation) elements in the following interview: ' "+interviewExtract+"'",maxTokens=400)

print(response+"...")

Here are the examples of COM-B (capability, opportunity, and motivation) elements extracted from the interview:

**Capability:**

* "I think I possibly only had flu once in my life and it may have just been a really bad bug." (indicates that the participant has some knowledge about flu, but not extensive)
* "I think from a symptoms point of view it was just the usual kind of fever, nausea, muscle aches, just high temperature and things like that, really, yeah." (shows that the participant has some understanding of the symptoms of flu)

**Opportunity:**

* "I know a little bit from my yeah, kind of studies and from work, etc." (indicates that the participant has had opportunities to learn about flu through their studies and work)
* "We get provided with that as healthcare workers and as public health workers through our employer." (shows that the participant has opportunities to receive flu jabs as part of their job)

**Motivation:**

* "I think I'm certainly aware of the the seriousnes

## Exercise: Do it yourself!

1. Choose an extract of text from an openly available source of interview text, narrative or transcripts. Some examples that you might find useful include:
    - Medical Transcription examples: https://www.kaggle.com/datasets/tboyle10/medicaltranscriptions/data



2. Use the LM to extract information from the text.
    - Themes
    - Structural aspects
    - Emotional aspects
    - Extract some formal, structured information such as disease / diagnosis