In [None]:
import re
import json
from pydantic import BaseModel  # to use data structures
from tqdm.auto import tqdm
import pandas as pd

tqdm.pandas()  # activate the tqdm for pandas

## Using data

I have a dataset of the responses of psycologists to comments from patients from [Kaggle](https://www.kaggle.com/datasets/thedevastator/nlp-mental-health-conversations?resource=download)

I want to use the generative AI to clasify the comments of the psycologist as appropiate or not

In [14]:
mental_health = pd.read_csv("data/mental_health.csv")
mental_health.rename(columns={"Context": "Comment"}, inplace=True)
mental_health.dropna(inplace=True)

# take a sample of 20 responses
mental_health = mental_health.sample(20, random_state=1337)
mental_health

Unnamed: 0,Comment,Response
3296,"My boyfriend is in Ireland for 11 days, and I ...",It sounds like you and your boyfriend are very...
2011,I have so many issues to address. I have a his...,I think this is a very common question that pe...
1683,"After 40 years of being straight, how could I ...",Sexuality is normally formed during adolescenc...
1560,I feel like I took our relationship for grante...,A key factor in a relationship is trust.I'd st...
1267,"I crave attention, companionship, and sex. She...","Hi Hampton,Although I'd bet your wife also wan..."
2895,He said he would try and he never did. It's be...,If your husband is changing his mind about whe...
2697,"I always feel the need to impress people, whet...",It is normal to seek other’s attention and not...
1042,We're in an eight year relationship. My boyfri...,"First, let me extend my compassion to both of ..."
3312,I've gone to a couple therapy sessions so far ...,"Yes, it is completely normal to feel anxious a..."
2788,He is an adolescent. He has peed his pant mult...,"Sounds as though your son is ""pissed off"" abou..."


# Open source with HuggingFace

Instead of paying for API calls we can use the computer resources. For this we use the package transformers!

In this case we will use a super lightweight model, so we don't overwhelm the computer

In [23]:
import torch

In [None]:
# check to see that GPU's are active and how many
gpu_avail = torch.cuda.is_available()
mps_avail = hasattr(torch.backends, "mps") and torch.backends.mps.is_available()

if gpu_avail:
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}:")
        print(f"  Name: {torch.cuda.get_device_name(i)}")
        print(
            f"  Memory: {torch.cuda.get_device_properties(i).total_memory / 1024**3:.2f} GB"
        )
elif mps_avail:
    print("MPS (Metal Performance Shaders) is available for macOS GPU acceleration")
else:
    print(
        "No GPU or MPS detected. Please check you asked for the resources or have a GPU on your computer"
    )

MPS (Metal Performance Shaders) is available for macOS GPU acceleration


If you are using the SuperPod HPC, you can see almost 80gb of VRAM to use!

If you see that GPU or MPS is not detected, the model will be loaded on the CPU and inference will be super slow.

For this notebook we will use a lightweight model that will take no more than 4gb of VRAM.

In [25]:
from transformers import pipeline

We now load the model using [`pipeline`](http://bcm-dgxa100-0005:8888/lab?token=3179442dd49254ece431fe36b5e001dda40c6a5ed48247a8) which simplifies a lot of the model configuration.

We will use small version of [Mistral](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) from this [Huggingface repo](https://huggingface.co/alamios/Mistral-Small-3.1-DRAFT-0.5B). Because of the size, performance will not be great. 

However, I've been able to run big models like [Deepseek Llama 3.3-70b](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) at full precision (or quantization) in 2 SuperPod GPUs.

In [None]:
pipe = pipeline(
    "text-generation",
    model="alamios/Mistral-Small-3.1-DRAFT-0.5B",
    trust_remote_code=True,
    do_sample=True,
    max_new_tokens=1024 * 3,
)

Device set to use mps:0


Once loaded we can ask it to create the a econ poem

> Note: some models are not trained to use the 'system' message, so we just combine the whole prompt into 1

In [None]:
msg = [
    {
        "role": "user",
        "content": "Write me a poam about the topic of the following text. It should be a short poem, no more than 4 lines. The text is: economics graduate student",
    }
]
out = pipe(msg)
out

[{'generated_text': [{'role': 'user',
    'content': 'Write me a poam about the topic of the following text. It should be a short poem, no more than 4 lines. The text is: economics graduate student'},
   {'role': 'assistant',
    'content': 'In the quiet of the night,\nI read the equations, the calculations,\nThe equations that guide the flow,\nThe calculations that make the world.\n\nThe equations that describe the sky,\nThe calculations that make the sun,\nThe equations that tell the story,\nThe calculations that make the world.\n\nThe equations that describe the sky,\nThe calculations that make the sun,\nThe equations that tell the story,\nThe calculations that make the world.'}]}]

In [None]:
print(out[0]["generated_text"][1]["content"])

In the quiet of the night,
I read the equations, the calculations,
The equations that guide the flow,
The calculations that make the world.

The equations that describe the sky,
The calculations that make the sun,
The equations that tell the story,
The calculations that make the world.

The equations that describe the sky,
The calculations that make the sun,
The equations that tell the story,
The calculations that make the world.


Now let's see how does it do with the health dataset

> I will also simplify the message a bit so that we can get something usefull from this small model

In [None]:
def create_msg(row):
    msg = [
        {
            "role": "user",
            "content": f"""Read the folling comversation between two people. The first person a patient and the second one is a mental health professional. Please rate the professional's response on a score 1 if you think it was appropiate and 0 if it was not.
            Patient: {row["Comment"]} 
            Mental Health Professional: {row["Response"]}""",
        }
    ]
    return msg

In [None]:
mental_health["msg_open_source"] = mental_health.progress_apply(create_msg, axis=1)

  0%|          | 0/20 [00:00<?, ?it/s]

In [None]:
def open_source_genai(msg):
    out = pipe(msg)
    return out[0]["generated_text"][1]["content"]

In [None]:
answers = mental_health["msg_open_source"].progress_apply(open_source_genai)

  0%|          | 0/20 [00:00<?, ?it/s]

In [35]:
print(answers.iloc[1])

Based on the conversation, here is a rating:

1 (appropiate)

The professional's response is clear and concise, addressing the patient's concerns about their mental health issues and the importance of seeking help. They acknowledge the patient's history of sexual abuse, breast cancer survivor, and lifetime insomniac, and mention that counseling is a common and necessary step in addressing these issues. The professional also encourages the patient to seek help from a licensed counselor who has experience in trauma, grief/loss, and provides specific strategies to manage symptoms. The response is well-structured and provides a clear and empathetic approach to the patient's situation.


In [36]:
print(answers.iloc[8])

Based on the conversation, here are my responses:

1. **Patient:** "I've gone to a couple therapy sessions so far and still everytime I walk in I get nervous and shaky. Is this normal? Should I still be feeling like this?"
   - **Mental Health Professional:** "Yes, it is completely normal to feel anxious about therapy. Therapy often explores topics and feelings that are uncomfortable. The ultimate goal of therapy is to feel better but the process itself can be uncomfortable. It's important to remember that everyone's experience with therapy is unique, and it's normal to feel a range of emotions during the initial sessions. If you're still feeling anxious, it's okay to acknowledge that and to seek support from friends, family, or a professional if you'd like. Remember, therapy is a collaborative effort, and you don't have to go through it alone."

2. **Patient:** "Should I still be feeling like this?"
   - **Mental Health Professional:** "It's completely normal to feel anxious about the

Now, the answers are not in a very structured way? How can we improve this?

In these types of model we can incorporate how we want them to respond in the prompt and then we make some validations using `pydantic` structures.

In [37]:
class Response(BaseModel):
    appropiate: bool
    thinking: str

In [None]:
def create_msg(row):
    patient_psyc_chat = f"""
    Patient: {row["Comment"]} 
    Mental Health Professional: {row["Response"]}
    """

    json_out = """
    Please output the following JSON object first, with no extra text before it and avoid using " in any strings:
    {
        "appropiate": bool,
        "thinking": str
    }
    """
    msg = [
        {
            "role": "user",
            "content": """Read the folling comversation between two people. The first person a patient and the second one is a mental health professional. Please rate the professional's response if it is appropiate or not. Please also provide your thinking in one sentece."""
            + patient_psyc_chat
            + json_out,
        }
    ]
    return msg

In [None]:
mental_health["msg_open_source"] = mental_health.progress_apply(create_msg, axis=1)
answers = mental_health["msg_open_source"].progress_apply(open_source_genai)

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

In [41]:
answers

3296    {\n        "appropiate": false,\n        "thin...
2011    {\n        "appropiate": false,\n        "thin...
1683    {\n        "appropiate": false,\n        "thin...
1560    {\n        "appropiate": false,\n        "thin...
1267    {\n        "appropiate": false,\n        "thin...
2895    Based on the conversation, here is the JSON ob...
2697    {\n        "appropiate": false,\n        "thin...
1042    {\n        "appropiate": false,\n        "thin...
3312    {\n        "appropiate": true,\n        "think...
2788    To determine if the professional's response is...
606     To provide an appropriate response, I'll follo...
1443    Based on the conversation, here is the output ...
2576    {\n        "appropiate": false,\n        "thin...
1078    {\n        "appropiate": false,\n        "thin...
1948    {\n        "appropiate": true,\n        "think...
73      {\n        "appropiate": true,\n        "think...
2687    {\n        "appropiate": false,\n        "thin...
1606    To det

In [42]:
print(answers.iloc[0])

{
        "appropiate": false,
        "thinking": "It sounds like you and your boyfriend are very close.  Do you typically spend most of your time together?  If so, it may be important to reflect on how you feel when you are apart.  If any separation is difficult, you may need to examine why.  Think about what it is that you miss and what you are anxious, upset or worried about.  If you examine the causes of your distress, you likely will experience some relief.  It is important to learn how to be happy when you are alone, it will only improve the way you feel when you are with your boyfriend."
    }


In [43]:
print(answers.iloc[5])

Based on the conversation, here is the JSON object:

```json
{
    "appropiate": false,
    "thinking": "I'm not sure if I should respect his decision to leave, especially if he's constantly changing his mind and leaving me crying and worried every night. I'm worried about the impact this will have on our relationship and my mental health. I've been trying to see a therapist who specializes in couples, but I'm not sure if I should trust his judgment. He's constantly changing his mind about whether or not to stay together, and I'm worried about the impact this will have on our relationship and my mental health."
}
```

### Reasoning:
1. **Patient's Statement**: The patient mentions that their husband has been changing his mind about whether or not to stay in the relationship, which is a significant issue. They also mention that they are worried about the impact this will have on their relationship and mental health.
2. **Mental Health Professional's Response**: The professional acknowle

To validate the responses we will need to extract the json from the output and then validate if it's correct.

Therefore there can be 2 problems:
1. The model does not provide the JSON structure
2. The model provides a JSON but it's not in the format we want

In [None]:
def validate_output(answ, pydantic_model):
    """
    Created by Copilot
    Validate the output of the model against a Pydantic model.
    """
    # Try to extract JSON from the response
    json_match = re.search(r"```json\s*(.*?)\s*```", answ, re.DOTALL)
    if json_match:
        json_str = json_match.group(1)
    else:
        # If no JSON code block, try to find a JSON-like structure
        json_match = re.search(r"\{.*\}", answ, re.DOTALL)
        if json_match:
            json_str = json_match.group(0)
        else:
            return None

    # Try to parse the JSON
    try:
        json_obj = json.loads(json_str)
        # Validate against the Pydantic model
        validated = pydantic_model(**json_obj)
        return validated
    except (json.JSONDecodeError, ValueError) as e:
        print(f"Error parsing JSON: {e}")
        return None


In [45]:
val_answers = answers.apply(lambda x: validate_output(x, Response))
val_answers

3296    appropiate=False thinking='It sounds like you ...
2011    appropiate=False thinking='I think this is a v...
1683    appropiate=False thinking='I would explore whe...
1560    appropiate=False thinking="It seems like you'r...
1267    appropiate=False thinking="I don't know what t...
2895    appropiate=False thinking="I'm not sure if I s...
2697    appropiate=False thinking='Es común que busque...
1042    appropiate=False thinking="I don't know how to...
3312    appropiate=True thinking='It is completely nor...
2788    appropiate=False thinking="The professional's ...
606     appropiate=False thinking="I am feeling overwh...
1443    appropiate=False thinking='I am always arguing...
2576    appropiate=False thinking="I don't have the in...
1078    appropiate=False thinking="I've pretty much be...
1948    appropiate=True thinking="Yes, it's totally no...
73      appropiate=True thinking='Working with a great...
2687    appropiate=False thinking="It's normal for som...
1606    approp

In [46]:
val_answers.iloc[0].appropiate

False

In [47]:
val_answers.iloc[0].thinking

'It sounds like you and your boyfriend are very close.  Do you typically spend most of your time together?  If so, it may be important to reflect on how you feel when you are apart.  If any separation is difficult, you may need to examine why.  Think about what it is that you miss and what you are anxious, upset or worried about.  If you examine the causes of your distress, you likely will experience some relief.  It is important to learn how to be happy when you are alone, it will only improve the way you feel when you are with your boyfriend.'

Now you have some structured output.

> If there is one answer that was not validated and returned None, you can simply ask the model again until you get the correct format.

These models are intrincically stochastics (even sometime swhen temperature or do_sample are equal to 0). Although it's good to have outputs that can be replicated for some of these clasification tasks you will need some creativity in the model. I suggest that you save all the responses and ask multiple times the models for answers.

In my research I tend to ask the model for the clasification 5 times + some confidence level on the answer. Then I just take those answers that the model agrees on with high confidence.

Another option is to use different models and create an ensemble of responses.
