# Sentiment Analysis Using ML model 
- Task to classify the person is depressed or not
- I am here using logistic regression and Naive bayes classification 

## What is depression? 
- A mental health disorder characterised by persistently depressed mood or loss of interest in activities, causing significant impairment in daily life.
- Possible causes include a combination of biological, psychological and social sources of distress. Increasingly, research suggests that these factors may cause changes in brain function, including altered activity of certain neural circuits in the brain.
- The persistent feeling of sadness or loss of interest that characterises major depression can lead to a range of behavioural and physical symptoms. These may include changes in sleep, appetite, energy level, concentration, daily behaviour or self-esteem. Depression can also be associated with thoughts of suicide.
- The mainstay of treatment is usually medication, talk therapy or a combination of the two. Increasingly, research suggests that these treatments may normalise brain changes associated with depression.


                                               

## Let's End the Depression !


### ❤️ Following are some ways to cure. ❤️ 
* Try to talk.
* Do something new.
* Keep yourself busy.

#  Call your buddy. 

* Get Routine
* Run/Exercise(personal experience: it works)
* Try to think less.




![It has to end](https://images.pexels.com/photos/3958470/pexels-photo-3958470.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940)
source : https://images.pexels.com/photos/3958470/pexels-photo-3958470.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940

credits: https://www.pexels.com/@polina-zimmerman

# Let's talk about data now
- data has three columns [index, Message_to_examine, Labels(target)]
- Over 10k rows. 
- message is in text format 
- label is of 0 or 1. 

## Steps to create sentiment classifier using LR and NB 
1. See the data
2. Preprocessing : 
        1.Remove punctuation 
        2. Remove stopwords 
        3. Lemmatization( Normalizing the words to its real form) 
        4. Remove the non-textual data from the dataset 
3. Create vectors for the words (using TfidfVectorizer)
4. Initialize the model
5. Fit the model
6. Do the predictions

        

In [None]:
import pandas as pd
import numpy as np


data = pd.read_csv('/kaggle/input/sentimental-analysis-for-tweets/sentiment_tweets3.csv')

In [None]:
# checking null vallues present 
data.isna().sum()

In [None]:
print(f'{data.shape} is the shape of the data')
print(f'Description: \n{data.describe()}')

In [None]:
print(f'Information: {data.info()}')

In [None]:
from pandas_profiling import ProfileReport
profile = ProfileReport(data, title='Pandas Profiling Report', explorative=True)


In [None]:
profile.to_notebook_iframe()


#### MultinomailNB and Logistic Regression classifier

In [None]:
print(data[data['label (depression result)'] == 1].shape[0]/data.shape[0] *100, "% of the data is of label 1 ")
print(data[data['label (depression result)'] == 0].shape[0]/data.shape[0] *100, "% of the data is of label 0 ")

In [None]:
# it is imbalanced. though I am here trying build a model without making it balanced. 


In [None]:
#Importing the required libraries  


import nltk
from nltk.corpus import stopwords 
import string 
nltk.download('stopwords')
nltk.download('punkt')
from nltk.stem.wordnet import WordNetLemmatizer
import spacy
nlp = spacy.load('en')
lmtzr = WordNetLemmatizer()
def text_preprocess(text):
    lm = []
    text = nlp(text)
    for token in text:
        lm.append(token.lemma_)
        
    text = " ".join(lm)
    text = text.translate(str.maketrans("", "", string.punctuation))
    text = [word for word in text.split() if word.lower() not in stopwords.words('english')]
   

    
    return " ".join(text)

In [None]:
# changing the column names

data.columns = data.columns.str.replace(" ", "_")

In [None]:
# checking the preprocessor
text_preprocess('helloo my name is rushi')

In [None]:
data['processed'] = data['message_to_examine'].apply(text_preprocess)

In [None]:
data.shape

In [None]:
data['processed'][:10]

In [None]:
data['processed1'] = data.processed.str.replace(r"[0-9]","")

In [None]:
data['processed1'][:10]

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer 
vectorizer = TfidfVectorizer("english")

In [None]:
processed  = vectorizer.fit_transform(data['processed1'] ) 


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train,y_test = train_test_split(processed, data['label_(depression_result)'], test_size=0.2)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

In [None]:
# improting models

from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB

In [None]:
# creating the instance of the models
lr = LogisticRegression(solver='liblinear', penalty='l1')
mnb = MultinomialNB()

# Training

In [None]:
# fitting the model
print(lr.fit(X_train, y_train))
print(mnb.fit(X_train, y_train))

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score

def conf_matrix_acc(y_true, y_pred):
    print(f'Confusion matrix\n:{confusion_matrix(y_true, y_pred)}\n')
    print(f'Accuracy score is : {accuracy_score(y_true, y_pred)}')

In [None]:
y_pred_lr = lr.predict(X_test)
y_pred_mnb = mnb.predict(X_test)


# Evaluating 

In [None]:
conf_matrix_acc(y_test,y_pred_lr )
conf_matrix_acc(y_test, y_pred_mnb)

- Logistic Regression is Performing better than NB 
- Neural Network will Perform better than these two since these are the basic algorithms yet they are performing great here 
- LSTM .. coming soon

# Do Upvote 

![gif](https://media1.tenor.com/images/ea245f1e9eca7421d75e635fc7ae4120/tenor.gif?itemid=12451028)