# Baselines

This notebook contains the baseline model that only predicts the most common class.

## Imports

In [None]:
import os
import numpy as np
from dotenv import load_dotenv, find_dotenv
import sys

sys.path.append(os.path.dirname(find_dotenv()))

In [None]:
#Import the file_handler.py file
from py_scripts.file_handler import read_csv_file

#Read the data
X, Y = read_csv_file("clean.csv")

In [None]:
#Splitting the data into train and test
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

## Implementing the baseline (Most Frequent Class)

This baseline is a simple model that predicts the most frequent class in the training set.

In [None]:
def flatten(list):
    return [item for sublist in list for item in sublist]

In [None]:
#Predict the most frequent class
from sklearn.dummy import DummyClassifier

dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(flatten(X_train), flatten(Y_train))

In [None]:
#Flatten
X_test_flatten = flatten(X_test)

#Predict
Y_pred = dummy_clf.predict(X_test_flatten)

## Evaluting the model

In [None]:
#Evalute the model on the test set, F1 score, precision, recall
from sklearn.metrics import classification_report

print(classification_report(flatten(Y_test), Y_pred, zero_division=1))

The most frequent class is the class O, which is the class that represents outside of entities.

However, since the system predicts the most frequent class, it will not be able to predict any entities that are of interest to us.

## Implementing another baseline (Random)

In [None]:
#Random prediction
dummy_random = DummyClassifier(strategy="stratified")
dummy_random.fit(flatten(X_train), flatten(Y_train))

In [None]:
#Flatten
X_test_flatten = flatten(X_test)

#Predict
Y_pred = dummy_random.predict(X_test_flatten)

In [None]:
#Evalute the model on the test set, F1 score, precision, recall
from sklearn.metrics import classification_report

print(classification_report(flatten(Y_test), Y_pred, zero_division=1))

## 