# Sentiment analysis on LLY event

Sentiment analysis on the Eli Lilly free insulin event on 11 November 2022, with a dataset from Mastodon. Using BERT trained with the IMDB training set as sentiment analysis model outputting continuous sentiment scores.

Written by Luc Bijl.

Loading the Mastodon dataset.

In [2]:
import pickle

with open('../datasets/mastodon-lly.pkl','rb') as file:
    mastodon_dataframes = pickle.load(file)

## Sentiment analysis of the topic: Lilly

Creating the dataset with toots related to the topics 'Lilly', 'Eli Lilly', 'Eli Lilly and company' and 'LLY'.

In [43]:
import pandas as pd

df_lilly = pd.DataFrame(columns=['Date','ID','Content'])
lilly_topics = ['Lilly', 'Eli Lilly', 'Eli Lilly and company', 'LLY']

for topic in lilly_topics:
    df_lilly = pd.concat([df_lilly,mastodon_dataframes[topic]])

df_lilly.drop_duplicates(subset='ID',keep='first',inplace=True)
df_lilly = df_lilly.set_index('Date')

df_lilly.head(5)

Unnamed: 0_level_0,ID,Content
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-12-08 14:42:58,109478581359363509,"<p>ICYMI yesterday, our new <a href=""https://n..."
2022-12-05 17:58:35,109462365059484871,"<p><span class=""h-card""><a href=""https://roman..."
2022-12-01 20:33:41,109440324430339788,<p>Sophia has nursery toys Charlie and Lilly f...
2022-11-30 18:53:33,109434268151114981,<p>Eli Lilly CEO says insulin tweet flap “prob...
2022-11-28 03:01:05,109419198231109803,<p>Tickets acquired to see The Mountain Goats ...


In [127]:
len(df_lilly)

104

Analyzing the content.

In [44]:
df_lilly['Content'][0]

'<p>ICYMI yesterday, our new <a href="https://newsie.social/tags/cardiovascular" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>cardiovascular</span></a> reporter Elaine Chen reporting on the intersection of <a href="https://newsie.social/tags/obesity" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>obesity</span></a> and <a href="https://newsie.social/tags/diabetes" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>diabetes</span></a>: </p><p>the effects of constricting a drug for one condition that\'s prescribed off-label for another</p><p><a href="https://www.statnews.com/2022/12/07/eli-lilly-tightens-access-tirzepatide-mounjaro-diabetes-obesity/" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">statnews.com/2022/12/07/eli-li</span><span class="invisible">lly-tightens-access-tirzepatide-mounjaro-diabetes-obesity/</sp

Creating a function that cleans the content.

In [45]:
from bs4 import BeautifulSoup
import re

def clean_text(text):

    text = BeautifulSoup(text, "html.parser").get_text()
    text = re.sub(r'http\S+', '', text)
    text = re.sub(r'[#@]', '', text)
    return text

df_lilly['Text'] = df_lilly['Content'].apply(clean_text)

df_lilly['Text'][0]

"ICYMI yesterday, our new cardiovascular reporter Elaine Chen reporting on the intersection of obesity and diabetes: the effects of constricting a drug for one condition that's prescribed off-label for another"

Converting the cleaned text to the language BERT can interpret.

In [46]:
import torch
from transformers import DistilBertTokenizerFast
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)

lilly_encodings = tokenizer(df_lilly['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')

lilly = torch.utils.data.TensorDataset(lilly_encodings['input_ids'], lilly_encodings['attention_mask'])
lilly_dataloader = DataLoader(lilly, batch_size=16, shuffle=False)

Loading BERT that is trained on the normalized IMDB dataset from Stanford.

In [None]:
bert_model = torch.load('bert-imdb.pth')

Using BERT to perform sentiment analysis on the dataset.

In [48]:
bert_model.eval()
list_predicted_scores = []

with torch.no_grad():
    for batch in lilly_dataloader:
        input_ids, attention_mask = batch
        output = bert_model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits.view(-1)

        list_predicted_scores.extend(predicted_scores.tolist())

df_lilly['BERT sentiment score'] = list_predicted_scores

df_lilly.head(5)

Unnamed: 0_level_0,ID,Content,Text,BERT sentiment score
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-12-08 14:42:58,109478581359363509,"<p>ICYMI yesterday, our new <a href=""https://n...","ICYMI yesterday, our new cardiovascular report...",0.339886
2022-12-05 17:58:35,109462365059484871,"<p><span class=""h-card""><a href=""https://roman...",skimgoth and water lillies?,0.025848
2022-12-01 20:33:41,109440324430339788,<p>Sophia has nursery toys Charlie and Lilly f...,Sophia has nursery toys Charlie and Lilly for ...,0.354358
2022-11-30 18:53:33,109434268151114981,<p>Eli Lilly CEO says insulin tweet flap “prob...,Eli Lilly CEO says insulin tweet flap “probabl...,-0.059616
2022-11-28 03:01:05,109419198231109803,<p>Tickets acquired to see The Mountain Goats ...,Tickets acquired to see The Mountain Goats and...,0.588035


Saving the dataset.

In [156]:
import pickle

with open('../datasets/bert-scored/lilly.pkl', 'wb') as file:
    pickle.dump(df_lilly, file)

## Sentiment analysis of the topic: Insulin

Creating the dataset with toots related to the topic 'Insulin'.

In [53]:
import pandas as pd

df_insulin = mastodon_dataframes['Insulin']
df_insulin = df_insulin.set_index('Date')

df_insulin.head()

Unnamed: 0_level_0,ID,Content
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-12-09 20:09:41,109485528680726575,<p>Henceforth my insulin pump shall be known a...
2022-12-09 14:52:21,109484280827598555,
2022-12-09 10:31:24,109483255136181443,<p>fri/20221209</p>
2022-12-09 06:23:15,109482278888205908,<p>Being an insulin-injecting diabetic means o...
2022-12-08 23:16:32,109480601078091132,<p>I am now part cyborg and have sent my first...


In [128]:
len(df_insulin)

150

Analyzing the content.

In [62]:
df_insulin['Content'][0]

'<p>Henceforth my insulin pump shall be known as H.E.R.B.I.E. <a href="https://social.parentheticalrecluse.com/tags/t1d" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>t1d</span></a></p>'

Cleaning the dataset.

In [61]:
from bs4 import BeautifulSoup
import re

def clean_text(text):

    text = BeautifulSoup(text, "html.parser").get_text()
    text = re.sub(r'http\S+', '', text)
    text = re.sub(r'[#@]', '', text)
    return text

df_insulin['Text'] = df_insulin['Content'].apply(clean_text)

df_insulin['Text'][0]

'Henceforth my insulin pump shall be known as H.E.R.B.I.E. t1d'

Converting the cleaned text to the language BERT can interpret.

In [63]:
import torch
from transformers import DistilBertTokenizerFast
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)

insulin_encodings = tokenizer(df_insulin['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')

insulin = torch.utils.data.TensorDataset(insulin_encodings['input_ids'], insulin_encodings['attention_mask'])
insulin_dataloader = DataLoader(insulin, batch_size=16, shuffle=False)

Loading BERT that is trained on the normalized IMDB dataset from Stanford.

In [None]:
bert_model = torch.load('bert-imdb.pth')

Using BERT to perform sentiment analysis on the dataset.

In [64]:
bert_model.eval()
list_predicted_scores = []

with torch.no_grad():
    for batch in insulin_dataloader:
        input_ids, attention_mask = batch
        output = bert_model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits.view(-1)

        list_predicted_scores.extend(predicted_scores.tolist())

df_insulin['BERT sentiment score'] = list_predicted_scores

df_insulin.head(5)

Unnamed: 0_level_0,ID,Content,Text,BERT sentiment score
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-12-09 20:09:41,109485528680726575,<p>Henceforth my insulin pump shall be known a...,Henceforth my insulin pump shall be known as H...,0.184103
2022-12-09 14:52:21,109484280827598555,,,0.014581
2022-12-09 10:31:24,109483255136181443,<p>fri/20221209</p>,fri/20221209,0.16071
2022-12-09 06:23:15,109482278888205908,<p>Being an insulin-injecting diabetic means o...,Being an insulin-injecting diabetic means occa...,-0.22355
2022-12-08 23:16:32,109480601078091132,<p>I am now part cyborg and have sent my first...,I am now part cyborg and have sent my first in...,-0.073293


Saving the dataset.

In [155]:
import pickle

with open('../datasets/bert-scored/insulin.pkl', 'wb') as file:
    pickle.dump(df_insulin, file)

## Sentiment analysis of the topic: Diabetes

Creating the dataset with toots related to the topics 'Diabetes' and 'Diabetic'.

In [3]:
import pandas as pd

df_diabetes = pd.DataFrame(columns=['Date','ID','Content'])
diabetes_topics = ['Diabetes', 'Diabetic']

for topic in diabetes_topics:
    df_diabetes = pd.concat([df_diabetes,mastodon_dataframes[topic]])

df_diabetes.drop_duplicates(subset='ID',keep='first',inplace=True)
df_diabetes = df_diabetes.set_index('Date')

df_diabetes.head(5)

Unnamed: 0_level_0,ID,Content
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-12-09 21:27:06,109485832811563069,"<p><span class=""h-card""><a href=""https://masto..."
2022-12-09 20:34:47,109485626981509382,<p>Well. Libre 3 continues to be out of stock....
2022-12-09 20:12:11,109485538252881272,"<p>As a T2 diabetic, I'm not supposed to eat b..."
2022-12-09 14:39:36,109484230932150244,"<p><span class=""h-card""><a href=""https://hachy..."
2022-12-09 06:23:15,109482278888205908,<p>Being an insulin-injecting diabetic means o...


In [4]:
len(df_diabetes)

165

Analyzing the content.

In [5]:
df_diabetes['Content'][0]

'<p><span class="h-card"><a href="https://mastodon.nz/@CaseyL" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>CaseyL</span></a></span> <br>I\'m T2 with insulin, currently in the care of a specialised DHB diabetic team and extremely grateful to them. They\'re stabilising my blood sugars. </p><p>I avoid sugar wherever possible, but my personal weakness is sticky buns. Just once in a blue moon... when nobody is looking ... </p><p>I do eat a lot of bread, pasta, &amp; potatoes and my dietician hasn\'t asked me to cut down on those. Not yet anyway. </p><p><a href="https://mastodon.nz/tags/diabetes" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>diabetes</span></a> <a href="https://mastodon.nz/tags/diet" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>diet</span></a></p>'

Creating a function that cleans the content.

In [6]:
from bs4 import BeautifulSoup
import re

def clean_text(text):

    text = BeautifulSoup(text, "html.parser").get_text()
    text = re.sub(r'http\S+', '', text)
    text = re.sub(r'[#@]', '', text)
    return text

df_diabetes['Text'] = df_diabetes['Content'].apply(clean_text)

df_diabetes['Text'][0]

"CaseyL I'm T2 with insulin, currently in the care of a specialised DHB diabetic team and extremely grateful to them. They're stabilising my blood sugars. I avoid sugar wherever possible, but my personal weakness is sticky buns. Just once in a blue moon... when nobody is looking ... I do eat a lot of bread, pasta, & potatoes and my dietician hasn't asked me to cut down on those. Not yet anyway. diabetes diet"

Converting the cleaned text to the language BERT can interpret.

In [7]:
import torch
from transformers import DistilBertTokenizerFast
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)

diabetes_encodings = tokenizer(df_diabetes['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')

diabetes = torch.utils.data.TensorDataset(diabetes_encodings['input_ids'], diabetes_encodings['attention_mask'])
diabetes_dataloader = DataLoader(diabetes, batch_size=16, shuffle=False)

Loading BERT that is trained on the normalized IMDB dataset from Stanford.

In [8]:
bert_model = torch.load('bert-imdb.pth')

Using BERT to perform sentiment analysis on the dataset.

In [9]:
bert_model.eval()
list_predicted_scores = []

with torch.no_grad():
    for batch in diabetes_dataloader:
        input_ids, attention_mask = batch
        output = bert_model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits.view(-1)

        list_predicted_scores.extend(predicted_scores.tolist())

df_diabetes['BERT sentiment score'] = list_predicted_scores

df_diabetes.head(5)

Unnamed: 0_level_0,ID,Content,Text,BERT sentiment score
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-12-09 21:27:06,109485832811563069,"<p><span class=""h-card""><a href=""https://masto...","CaseyL I'm T2 with insulin, currently in the c...",-0.076328
2022-12-09 20:34:47,109485626981509382,<p>Well. Libre 3 continues to be out of stock....,Well. Libre 3 continues to be out of stock. So...,-0.443311
2022-12-09 20:12:11,109485538252881272,"<p>As a T2 diabetic, I'm not supposed to eat b...","As a T2 diabetic, I'm not supposed to eat brea...",0.332407
2022-12-09 14:39:36,109484230932150244,"<p><span class=""h-card""><a href=""https://hachy...",shanselman you might dig this. a local artist ...,0.208667
2022-12-09 06:23:15,109482278888205908,<p>Being an insulin-injecting diabetic means o...,Being an insulin-injecting diabetic means occa...,-0.22355


Saving the dataset.

In [10]:
import pickle

with open('../datasets/bert-scored/diabetes.pkl', 'wb') as file:
    pickle.dump(df_diabetes, file)

## Sentiment analysis of the topic: Pharmaceutic

Creating the dataset with toots related to the topics 'pharmaceutical'.

In [12]:
import pandas as pd

df_pharmaceutic = mastodon_dataframes['pharmaceutical']
df_pharmaceutic = df_pharmaceutic.set_index('Date')

df_pharmaceutic.head()

Unnamed: 0_level_0,ID,Content
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-12-06 19:03:34,109468281661273330,"<p><span class=""h-card""><a href=""https://journ..."
2022-12-06 18:45:15,109468210180810176,"<p><span class=""h-card""><a href=""https://mstdn..."
2022-12-06 14:32:45,109467216700022257,"<p><span class=""h-card""><a href=""https://masto..."
2022-12-06 11:12:06,109466427550668917,"<p>Right. I see when one types ""lft"" into the..."
2022-12-05 15:14:17,109461717496931207,"<p><span class=""h-card"" translate=""no""><a href..."


In [13]:
len(df_pharmaceutic)

42

Analyzing the content.

In [14]:
df_pharmaceutic['Content'][0]

'<p><span class="h-card"><a href="https://journa.host/@froomkin" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>froomkin</span></a></span> I have heard that Wikipedia rules constrain edits on pharmaceuticals to people who work for the big pharma companies and exclude various (e.g Stanford medicine) researchers.  That\'s hearsay, but I kinda trust the source.</p>'

Creating a function that cleans the content.

In [15]:
from bs4 import BeautifulSoup
import re

def clean_text(text):

    text = BeautifulSoup(text, "html.parser").get_text()
    text = re.sub(r'http\S+', '', text)
    text = re.sub(r'[#@]', '', text)
    return text

df_pharmaceutic['Text'] = df_pharmaceutic['Content'].apply(clean_text)

df_pharmaceutic['Text'][0]

"froomkin I have heard that Wikipedia rules constrain edits on pharmaceuticals to people who work for the big pharma companies and exclude various (e.g Stanford medicine) researchers.  That's hearsay, but I kinda trust the source."

Converting the cleaned text to the language BERT can interpret.

In [16]:
import torch
from transformers import DistilBertTokenizerFast
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)

pharmaceutic_encodings = tokenizer(df_pharmaceutic['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')

pharmaceutic = torch.utils.data.TensorDataset(pharmaceutic_encodings['input_ids'], pharmaceutic_encodings['attention_mask'])
pharmaceutic_dataloader = DataLoader(pharmaceutic, batch_size=16, shuffle=False)

Loading BERT that is trained on the normalized IMDB dataset from Stanford.

In [17]:
bert_model = torch.load('bert-imdb.pth')

Using BERT to perform sentiment analysis on the dataset.

In [18]:
bert_model.eval()
list_predicted_scores = []

with torch.no_grad():
    for batch in pharmaceutic_dataloader:
        input_ids, attention_mask = batch
        output = bert_model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits.view(-1)

        list_predicted_scores.extend(predicted_scores.tolist())

df_pharmaceutic['BERT sentiment score'] = list_predicted_scores

df_pharmaceutic.head(5)

Unnamed: 0_level_0,ID,Content,Text,BERT sentiment score
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-12-06 19:03:34,109468281661273330,"<p><span class=""h-card""><a href=""https://journ...",froomkin I have heard that Wikipedia rules con...,-0.360652
2022-12-06 18:45:15,109468210180810176,"<p><span class=""h-card""><a href=""https://mstdn...",Trapd The feds are reforming all of this right...,-0.181085
2022-12-06 14:32:45,109467216700022257,"<p><span class=""h-card""><a href=""https://masto...",BethanyBlack protip: citric acid.the super fan...,0.095077
2022-12-06 11:12:06,109466427550668917,"<p>Right. I see when one types ""lft"" into the...","Right. I see when one types ""lft"" into the se...",0.088579
2022-12-05 15:14:17,109461717496931207,"<p><span class=""h-card"" translate=""no""><a href...",mattgemmell one of the most baffling things a...,-0.30934


Saving the dataset.

In [19]:
import pickle

with open('../datasets/bert-scored/pharmaceutic.pkl', 'wb') as file:
    pickle.dump(df_pharmaceutic, file)

## Sentiment analysis of the topic: Medicine

Creating the dataset with toots related to the topics 'Medicine' and 'Medical'.

In [20]:
import pandas as pd

df_medical = pd.DataFrame(columns=['Date','ID','Content'])
medical_topics = ['Medicine', 'Medical']

for topic in medical_topics:
    df_medical = pd.concat([df_medical,mastodon_dataframes[topic]])

df_medical.drop_duplicates(subset='ID',keep='first',inplace=True)
df_medical = df_medical.set_index('Date')

df_medical.head(5)

Unnamed: 0_level_0,ID,Content
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-12-10 19:59:30,109491150555807416,<p>This little two minute ambient piece has me...
2022-12-10 19:44:30,109491093337726131,<p>How holiday spots are stocking up for our l...
2022-12-10 18:59:22,109490914103968349,"<p><span class=""h-card"" translate=""no""><a href..."
2022-12-10 18:15:32,109490742076711264,<p>So I spent yesterday/last night in the ER.<...
2022-12-10 17:44:08,109490618610928554,"<p>Hmm, there are at least 3 people on our sma..."


In [21]:
len(df_medical)

1189

Analyzing the content.

In [22]:
df_medical['Content'][0]

'<p>This little two minute ambient piece has me hooked and wishing for a longer version! Can anyone suggest some more tracks in a similar style? I’m in love with the chord changes that take the music places. <a href="https://fosstodon.org/tags/music" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>music</span></a> <a href="https://fosstodon.org/tags/ambient" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ambient</span></a> <a href="https://music.apple.com/us/album/relajacion-medicinal/1593745101?i=1593745103" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">music.apple.com/us/album/relaj</span><span class="invisible">acion-medicinal/1593745101?i=1593745103</span></a></p>'

Creating a function that cleans the content.

In [23]:
from bs4 import BeautifulSoup
import re

def clean_text(text):

    text = BeautifulSoup(text, "html.parser").get_text()
    text = re.sub(r'http\S+', '', text)
    text = re.sub(r'[#@]', '', text)
    return text

df_medical['Text'] = df_medical['Content'].apply(clean_text)

df_medical['Text'][0]

'This little two minute ambient piece has me hooked and wishing for a longer version! Can anyone suggest some more tracks in a similar style? I’m in love with the chord changes that take the music places. music ambient '

Converting the cleaned text to the language BERT can interpret.

In [24]:
import torch
from transformers import DistilBertTokenizerFast
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)

medical_encodings = tokenizer(df_medical['Text'].tolist(), truncation=True, padding=True, return_tensors='pt')

medical = torch.utils.data.TensorDataset(medical_encodings['input_ids'], medical_encodings['attention_mask'])
medical_dataloader = DataLoader(medical, batch_size=16, shuffle=False)

Loading BERT that is trained on the normalized IMDB dataset from Stanford.

In [25]:
bert_model = torch.load('bert-imdb.pth')

Using BERT to perform sentiment analysis on the dataset.

In [26]:
bert_model.eval()
list_predicted_scores = []

with torch.no_grad():
    for batch in medical_dataloader:
        input_ids, attention_mask = batch
        output = bert_model(input_ids=input_ids, attention_mask=attention_mask)
        predicted_scores = output.logits.view(-1)

        list_predicted_scores.extend(predicted_scores.tolist())

df_medical['BERT sentiment score'] = list_predicted_scores

df_medical.head(5)

Unnamed: 0_level_0,ID,Content,Text,BERT sentiment score
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-12-10 19:59:30,109491150555807416,<p>This little two minute ambient piece has me...,This little two minute ambient piece has me ho...,0.415996
2022-12-10 19:44:30,109491093337726131,<p>How holiday spots are stocking up for our l...,How holiday spots are stocking up for our late...,0.209835
2022-12-10 18:59:22,109490914103968349,"<p><span class=""h-card"" translate=""no""><a href...",Erin absolutely. when I worked in the grooming...,-0.479018
2022-12-10 18:15:32,109490742076711264,<p>So I spent yesterday/last night in the ER.<...,So I spent yesterday/last night in the ER.I wa...,-0.620889
2022-12-10 17:44:08,109490618610928554,"<p>Hmm, there are at least 3 people on our sma...","Hmm, there are at least 3 people on our small ...",-0.508928


Saving the dataset.

In [27]:
import pickle

with open('../datasets/bert-scored/medical.pkl', 'wb') as file:
    pickle.dump(df_medical, file)