# **Análise de Sentimento - Detecção de Estresse**

**Objetivo**

* Construir um algoritmo de machine learning capaz detectar, através de frases, o estresse.

### **1. Introdução**

<p align="justify">
No mundo acelerado e cada vez mais interconectado de hoje, o estresse é uma realidade com a qual muitos de nós lidamos diariamente. Identificar o estresse é fundamental para cuidar de nossa saúde mental e bem-estar. Neste projeto, abordaremos a emocionante tarefa de desenvolver um modelo de aprendizado de máquina que possa prever, com base em frases, se o sentimento expresso é de estresse ou não.

In [None]:
# Carregando bibliotecas

import re
import nltk
import string
import numpy as np
import pandas as pd

from nltk.corpus import stopwords
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

In [None]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
# Carregando base de dados

dados = pd.read_csv('/content/drive/MyDrive/Ciência de Dados com Python/Projetos/Dados/Stress.csv', sep = ',')

In [None]:
# Visualizando dados

dados.head()

Unnamed: 0,subreddit,post_id,sentence_range,text,id,label,confidence,social_timestamp,social_karma,syntax_ari,...,lex_dal_min_pleasantness,lex_dal_min_activation,lex_dal_min_imagery,lex_dal_avg_activation,lex_dal_avg_imagery,lex_dal_avg_pleasantness,social_upvote_ratio,social_num_comments,syntax_fk_grade,sentiment
0,ptsd,8601tu,"(15, 20)","He said he had not felt that way before, sugge...",33181,1,0.8,1521614353,5,1.806818,...,1.0,1.125,1.0,1.77,1.52211,1.89556,0.86,1,3.253573,-0.002742
1,assistance,8lbrx9,"(0, 5)","Hey there r/assistance, Not sure if this is th...",2606,0,1.0,1527009817,4,9.429737,...,1.125,1.0,1.0,1.69586,1.62045,1.88919,0.65,2,8.828316,0.292857
2,ptsd,9ch1zh,"(15, 20)",My mom then hit me with the newspaper and it s...,38816,1,0.8,1535935605,2,7.769821,...,1.0,1.1429,1.0,1.83088,1.58108,1.85828,0.67,0,7.841667,0.011894
3,relationships,7rorpp,"[5, 10]","until i met my new boyfriend, he is amazing, h...",239,1,0.6,1516429555,0,2.667798,...,1.0,1.125,1.0,1.75356,1.52114,1.98848,0.5,5,4.104027,0.141671
4,survivorsofabuse,9p2gbc,"[0, 5]",October is Domestic Violence Awareness Month a...,1421,1,0.8,1539809005,24,7.554238,...,1.0,1.125,1.0,1.77644,1.64872,1.81456,1.0,1,7.910952,-0.204167


### **2.Pré-Processamento de Dados**

In [None]:
# Instânciando objetos

stemmer = nltk.SnowballStemmer("english")
stopword = set(stopwords.words('english'))

In [None]:
# Criando função para limpeza dos textos

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

In [None]:
# Aplicando a variável text do conjunto de dados

dados["text"] = dados["text"].apply(clean)

In [None]:
# Definindo labels

dados["label"] = dados["label"].map({0: "No Stress", 1: "Stress"})

In [None]:
# Criando um novo conjunto de dados

new_data = dados[["text", "label"]]

In [None]:
# Visualizando dados

new_data.head()

Unnamed: 0,text,label
0,said felt way sugget go rest trigger ahead you...,Stress
1,hey rassist sure right place post goe im curr...,No Stress
2,mom hit newspap shock would know dont like pla...,Stress
3,met new boyfriend amaz kind sweet good student...,Stress
4,octob domest violenc awar month domest violenc...,Stress


In [None]:
# Separando os dados em previsores e classes

previsores = np.array(dados["text"])
classes = np.array(dados["label"])

In [None]:
# Transformando texto em vetor de frequência

## Instânciando objeto
Count_Vectorizer = CountVectorizer()

## Aplicando a varável previsores
previsores = Count_Vectorizer.fit_transform(previsores)

In [None]:
# Dividindo os dados em treino e teste

X_treino, X_teste, y_treino, y_teste = train_test_split(previsores,
                                                        classes,
                                                        test_size = 0.3,
                                                        random_state = 0)

### **3. Criando o Modelo**

In [None]:
# Instânciando modelo

modelo = BernoulliNB()

In [None]:
# Treinando o modelo

modelo.fit(X_treino, y_treino)

In [None]:
# Aplicando modelo aos dados de teste

y_predito = modelo.predict(X_teste)

### **4. Avaliando e Testando o Modelo**

In [None]:
# Métricas do modelo

print(classification_report(y_teste, y_predito))

              precision    recall  f1-score   support

   No Stress       0.76      0.66      0.71       398
      Stress       0.73      0.82      0.77       454

    accuracy                           0.75       852
   macro avg       0.75      0.74      0.74       852
weighted avg       0.75      0.75      0.74       852



In [None]:
# Teste 1: I've been doing really well these days.

user = input("Enter a Text: ")
data = Count_Vectorizer.transform([user]).toarray()
output = modelo.predict(data)
print(output)

Enter a Text: I've been doing really well these days.
['No Stress']


In [None]:
# Teste 2: Sometime I feel like I need some help.

user = input("Enter a Text: ")
data = Count_Vectorizer.transform([user]).toarray()
output = modelo.predict(data)
print(output)

Enter a Text: Sometime I feel like I need some help.
['Stress']
