## Notebook: text_analysis_logistic_regression.ipynb

This notebook is used for building logistic regression classifiers.

**Uses updated dataset with text analysis.**

In [18]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

### Data Preparation

In [19]:
data = pd.read_csv('../datasets/MIB/mib_processed_text_standardized.csv')
X_labels = list(data.columns)
Y_label = 'identification'

# use all except identification for inputs
X = data.drop(columns=[Y_label])
y = data[Y_label]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

### Logistic Regression Classifier

**Penalty = L1**

In [20]:
logistic_regression_l1 = LogisticRegression(solver='liblinear', penalty='l1', random_state=0)
l1_y_pred = logistic_regression_l1.fit(X_train, y_train).predict(X_test)
accuracy_score(l1_y_pred, y_test)

0.9613368283093053

In [21]:
print(classification_report(l1_y_pred, y_test))

              precision    recall  f1-score   support

         bot       0.98      0.96      0.97      2094
       human       0.92      0.96      0.94       958

    accuracy                           0.96      3052
   macro avg       0.95      0.96      0.96      3052
weighted avg       0.96      0.96      0.96      3052



### Logistic Regression Classifier

**Penalty = L2**

In [22]:
logistic_regression_l2 = LogisticRegression(solver='liblinear', penalty='l2', random_state=0)
l2_y_pred = logistic_regression_l2.fit(X_train, y_train).predict(X_test)
accuracy_score(l2_y_pred, y_test)

0.959043250327654

In [23]:
print(classification_report(l2_y_pred, y_test))

              precision    recall  f1-score   support

         bot       0.98      0.96      0.97      2097
       human       0.91      0.96      0.94       955

    accuracy                           0.96      3052
   macro avg       0.95      0.96      0.95      3052
weighted avg       0.96      0.96      0.96      3052

