#  **Explaining the Dataset**
 We took a dataset containing numerous tweets from the twitter users. And here our aim is to detect or rather the better term would be to predict the depression amongst them. Here, first we will import some of the library frequently used to import data and perform some of the basic data operation. So for simplicity, we will import pandas for EDA, matplotlib and seaborn for visualization, os for operating system operations. 

In [3]:
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [4]:
data=pd.read_csv("/kaggle/input/mental-health-social-media/Mental-Health-Twitter.csv",index_col=0)

In [5]:
data.head()

#  Giving an overview of the data and getting simple data understanding 
Here the data  contains columns like post_created which is a date-time data, post-text which is a text data corpus from the actual tweets, user_id,post_id, followers and friends of the respective data. The count of favorite contents of the user in twitter as well as their status, retweets, and the output as label which determines that this is a problem of a supervised learning.  

In [6]:
sns.countplot(data=data,x='label')

From the above plot, it is obvious that the dataset is balanced which means that the number of depressed people and non-depressed people are appearing to be equally distributed in this specific dataset.

In [7]:
import re
import nltk
nltk.download('omw-1.4')

We here are going to take up an approach of natural language processing. So we will be importing re module which is a regular expression modules and nltk module which is an NLP library for working with the text.

In [8]:
nltk.download("stopwords")
from nltk.corpus import stopwords

In [9]:
lemm=nltk.WordNetLemmatizer()

In [10]:
text_list=[]
for text in data.post_text:
    des=re.sub('[^A-Za-z]',' ',text)
    des=des.lower()
    des=nltk.word_tokenize(text)
    des=[lemm.lemmatize(word) for word in text]
    des=''.join(des)
    text_list.append(des)

# Operations performed in the post_text dataset
In this, we first removed the words without letters, and also the random white spaces are removed used regular expressions. Next we changed the words into lowercase so that all the words get in equal composition. Then we performed the word_tokenizer so that the words are converted into tokens. Then we cleaning the extra words using lemmatizer. And further joined the word obtained in the text_list.

In [11]:
from sklearn.feature_extraction.text import CountVectorizer
count_vectorizer=CountVectorizer(max_features=800,stop_words="english")
sparce_matrix=count_vectorizer.fit_transform(text_list).toarray()

In [12]:
X=sparce_matrix
y=data['label'].values

In [13]:
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42)

# Applying Logistic Regression

In [15]:
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()
lr.fit(X_train,y_train)

In [21]:
print('The train Score for Logistic Regression is: ',lr.score(X_train,y_train))
print('The test score for Logistic Regression is: ', lr.score(X_test,y_test))

# Applying Support Vector Machine

In [49]:
from sklearn.svm import SVC
sv=SVC(C=0.73,kernel='rbf',degree=4)

In [None]:
sv.fit(X_train,y_train)

In [None]:
sv.score(X_train,y_train)


In [25]:
sv.score(X_test,y_test)

# Applying Deep Learning Techniques

In [26]:
from tensorflow import keras
import tensorflow as tf

In [34]:
model = keras.Sequential(
    [
        tf.keras.layers.Dense(512, activation="relu", name="layer1"),
        tf.keras.layers.Dense(512, activation="relu", name="layer2"),
        tf.keras.layers.Dense(2, name="layer3"),
    ]
)

In [35]:
model.compile(optimizer="Adam", loss="mse", metrics=["mae"])

In [43]:
history=model.fit(X_train, y_train, epochs=50, verbose=1)

In [44]:
loss, acc = model.evaluate(X_test, y_test, verbose=1)

In [48]:
import matplotlib.pyplot as plt
plt.plot(history.history['loss'],label='X_train')
