In [63]:
from tqdm.auto import tqdm
import os
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn' This disables warning of "copying a slice of a DataFrame"
tqdm.pandas() # activate progress_apply

import numpy as np
from copy import copy

# Data

Let's keep using the responses of psycologists to comments from patients dataset

In [64]:
mental_health = pd.read_csv("data/mental_health.csv")
mental_health.rename(columns={"Context": "patient_comment", "Response": "psyc_response"}, inplace=True)
mental_health.dropna(inplace=True)

# take a sample of 20 responses
mental_health = mental_health.sample(20, random_state=1337)
mental_health

Unnamed: 0,patient_comment,psyc_response
3296,"My boyfriend is in Ireland for 11 days, and I ...",It sounds like you and your boyfriend are very...
2011,I have so many issues to address. I have a his...,I think this is a very common question that pe...
1683,"After 40 years of being straight, how could I ...",Sexuality is normally formed during adolescenc...
1560,I feel like I took our relationship for grante...,A key factor in a relationship is trust.I'd st...
1267,"I crave attention, companionship, and sex. She...","Hi Hampton,Although I'd bet your wife also wan..."
2895,He said he would try and he never did. It's be...,If your husband is changing his mind about whe...
2697,"I always feel the need to impress people, whet...",It is normal to seek other’s attention and not...
1042,We're in an eight year relationship. My boyfri...,"First, let me extend my compassion to both of ..."
3312,I've gone to a couple therapy sessions so far ...,"Yes, it is completely normal to feel anxious a..."
2788,He is an adolescent. He has peed his pant mult...,"Sounds as though your son is ""pissed off"" abou..."


# Sentiment Analysis

A commom clasification task on NLP is to see if a piece of text has a positive or negative tone. Think of the Amazon reviews and how a computer can detect whether that reviews is favorable or not for a product.

In our particular case, we want to know how does the psychologists react to their patients comments. 

We can now try to test 2 things:
1. Are patient's comments always negative?
2. Do psycologists always respond in a postive way?

We will use some huggingface models for this task. A list of sentiment analysis model can be found [here](https://huggingface.co/models?pipeline_tag=text-classification&sort=trending) and look some sentiment or emotion.

## [DestilBERT](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)

We will first use Hugging Face's default sentiment analysis pipeline `DestilBERT`

In [65]:
from transformers import pipeline

In [66]:
# Load the model
sentiment_pipeline = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Device set to use 0


In [67]:
patients_sentiment = mental_health['patient_comment'].progress_apply(lambda x: sentiment_pipeline(x))

  0%|          | 0/20 [00:00<?, ?it/s]

The model returns a classification label and the probability of that classification

In [68]:
mental_health['patient_sentiment'] = [x[0]['label'] for x in patients_sentiment]
mental_health['patient_sentiment_prob'] = [x[0]['score'] for x in patients_sentiment]

In [69]:
mental_health[['patient_comment','patient_sentiment','patient_sentiment_prob']].head()

Unnamed: 0,patient_comment,patient_sentiment,patient_sentiment_prob
3296,"My boyfriend is in Ireland for 11 days, and I ...",NEGATIVE,0.999582
2011,I have so many issues to address. I have a his...,POSITIVE,0.92529
1683,"After 40 years of being straight, how could I ...",NEGATIVE,0.986875
1560,I feel like I took our relationship for grante...,NEGATIVE,0.998597
1267,"I crave attention, companionship, and sex. She...",POSITIVE,0.921631


In [70]:
mental_health['patient_sentiment'].value_counts()

patient_sentiment
NEGATIVE    16
POSITIVE     4
Name: count, dtype: int64

In [71]:
# Group by sentiment and describe the probabilities
mental_health.groupby('patient_sentiment')['patient_sentiment_prob'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
patient_sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
NEGATIVE,16.0,0.971507,0.091869,0.628363,0.993244,0.998328,0.998713,0.999768
POSITIVE,4.0,0.928099,0.047171,0.875369,0.910065,0.92346,0.941494,0.990108


Almost all the patient comments are negatve, with 1 comment with lower prob of 62% which could be either + or -.

Let's check with the texts. Run each cell bellow to randomly see different positive or negative comments

In [79]:
print("Positive comment:")
pos_c = mental_health.loc[mental_health['patient_sentiment'] == 'POSITIVE', ['patient_comment']].sample(1).values[0][0]
print(pos_c)

Positive comment:
I have so many issues to address. I have a history of sexual abuse, I’m a breast cancer survivor and I am a lifetime insomniac.    I have a long history of depression and I’m beginning to have anxiety. I have low self esteem but I’ve been happily married for almost 35 years.
   I’ve never had counseling about any of this. Do I have too many issues to address in counseling?


In [80]:

print("Negative comment:")
neg_c = mental_health.loc[mental_health['patient_sentiment'] == 'NEGATIVE', ['patient_comment']].sample(1).values[0][0]
print(neg_c)

Negative comment:
My boyfriend is in Ireland for 11 days, and I am an emotional wreck.


Maybe the positive comments are not that positive, so the model might not be capturing what we want. Maybe other models perform better?

Now let's look at the psycologists responses

In [73]:
# keep the lenght of the comments to maximum 500 words
def truncate_text(text, max_words=500):
    words = text.split()
    if len(words) > max_words:
        return ' '.join(words[:max_words])
    return text

mental_health['psyc_response_short'] = mental_health['psyc_response'].progress_apply(lambda x: truncate_text(x, max_words=300))

  0%|          | 0/20 [00:00<?, ?it/s]

> Running the full lenght will result in a tensor error.

In [74]:
psyc_sentiment = mental_health['psyc_response_short'].progress_apply(lambda x: sentiment_pipeline(x))
mental_health['psyc_sentiment'] = [x[0]['label'] for x in psyc_sentiment]
mental_health['psyc_sentiment_prob'] = [x[0]['score'] for x in psyc_sentiment]

  0%|          | 0/20 [00:00<?, ?it/s]

In [75]:
mental_health['psyc_sentiment'].value_counts()

psyc_sentiment
POSITIVE    11
NEGATIVE     9
Name: count, dtype: int64

The psycologists responses are like half and half

In [77]:
# Group by sentiment and describe the probabilities
mental_health.groupby('psyc_sentiment')['psyc_sentiment_prob'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
psyc_sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
NEGATIVE,9.0,0.890531,0.165725,0.51535,0.853117,0.976166,0.989951,0.999289
POSITIVE,11.0,0.928551,0.073609,0.78026,0.885392,0.968046,0.980113,0.997832


With varying degree's of confidence with the min prob for negative is around 50% and for positive 80%.

In [81]:
print("Positive response:")
pos_c = mental_health.loc[mental_health['psyc_sentiment'] == 'POSITIVE', ['psyc_response_short']].sample(1).values[0][0]
print(pos_c)

Positive response:
You are very wise for a young person. You have already figured out that other people's behaviours...how they treat you...it's not about you...it's about them. I love that you don't blame yourself for your mom's behaviours. It's not your fault. She's getting upset because she doesn't know how to manage her emotions, and these emotions have to do with her past and her present stress. You're just the trigger. Yes, this is normal, but it's not necessary. She can find another way to manage her "stuff".Unfortunately, you can't help your mom a whole lot or even help her recognize this. But for yourself... remembering that her behaviours are her issue is the biggest piece of "dealing with it". You can always try some new strategies when you talk to mom...you can say "I'll listen you better if you don't bring up past stuff", or "I'm worried about you mom. You seem stressed", or even "I don't like the way you talk to me". Good luck!


In [82]:
print("Negative response:")
neg_c = mental_health.loc[mental_health['psyc_sentiment'] == 'NEGATIVE', ['psyc_response_short']].sample(1).values[0][0]
print(neg_c)

Negative response:
Sounds as though your son is "pissed off" about something.Punishment will most likely result in more of the same, not less of the peeing you would like to stop from happening."Laziness" is more of a social judgement than it is a characteristic of its own merit.Is this your description of your son or his description of himself?First step always before addressing any of the family dynamics, emotions, and psychology of the people involved, is a medical rule out as to why your son pees at times he plays video games.If he has medical clearance that there is no physiological  problem, then talk with your son on his opinions as to why he pees, if he is aware of the urge to pee and ignores it, or that his attention gets so absorbed he doesn't notice the urge to pee.See what modifications you can create by cooperating with your son.Maybe it is as simple as each two hours, he sets a timer and when it goes off, he takes a bathroom break.


In the psycologist side, negative tone seems to be consistent because they address comments which are mostly negative

## [DestilBERT Emotions](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion?text=I+feel+a+bit+let+down)

This model now returns joy, sadness, anger, etc. which are more complex than just positive or negative sentiment.

In [78]:
# Load the model
emotions_pipeline = pipeline(model = "bhadresh-savani/distilbert-base-uncased-emotion")

All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Device set to use 0


In [83]:
patients_emotions = mental_health['patient_comment'].progress_apply(lambda x: emotions_pipeline(x))

  0%|          | 0/20 [00:00<?, ?it/s]

In [86]:
patients_emotions.iloc[0]

[{'label': 'sadness', 'score': 0.9906992316246033}]

Same sort of output, just the label and the prob of said category being chosen.

In [95]:
mental_health['patient_emotion'] = [x[0]['label'] for x in patients_emotions]
mental_health['patient_emotion_prob'] = [x[0]['score'] for x in patients_emotions]

In [96]:
mental_health[['patient_comment','patient_emotion','patient_emotion_prob']].head()

Unnamed: 0,patient_comment,patient_emotion,patient_emotion_prob
3296,"My boyfriend is in Ireland for 11 days, and I ...",sadness,0.990699
2011,I have so many issues to address. I have a his...,joy,0.613567
1683,"After 40 years of being straight, how could I ...",joy,0.399403
1560,I feel like I took our relationship for grante...,joy,0.9868
1267,"I crave attention, companionship, and sex. She...",sadness,0.757707


In [97]:
mental_health['patient_emotion'].value_counts()

patient_emotion
sadness    6
anger      6
joy        5
fear       3
Name: count, dtype: int64

Patient emotions seems to be consistent with sadness and anger dominating. However, why do we have some joy?

In [110]:
print("Joy comments?:")
pos_c = mental_health.loc[mental_health['patient_emotion'] == 'joy', ['patient_comment','patient_emotion_prob']].sample(1)
print("prob:",pos_c['patient_emotion_prob'].values[0])
print(pos_c['patient_comment'].values[0])

Joy comments?:
prob: 0.553688645362854
I always feel the need to impress people, whether it's my family, the people at school, or just random people. I know that no matter what I do or how I change, there will always be some people who hate me.  Why do I feel this way?


They seems to have low probability, so maybe we are mis-identifying some other emotion.
> To get the probs of all labels you might need to look at each model and see if they report or extract it in some way