In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Implicit Hate Speech Detection using BERT\n",
    "\n",
    "## 1. Project Setup\n",
    "This notebook demonstrates a complete workflow for training a BERT-based model to detect implicit hate speech. We will load the data, preprocess it, build the model, and train it.\n",
    "\n",
    "First, we import the necessary libraries and our custom helper functions from the `src` directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# Add the src directory to the Python path to import our modules\n",
    "sys.path.append(os.path.abspath(os.path.join('..', 'src')))\n",
    "\n",
    "from data_processing import load_and_preprocess_data\n",
    "from model import build_hate_speech_model\n",
    "from train import train_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load and Preprocess the Data\n",
    "We use our custom function to load the dataset from the `../data/` directory and split it into training and testing sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_FILEPATH = '../data/your_dataset_name.csv' # <-- IMPORTANT: Change this to your actual CSV file name\n",
    "\n",
    "X_train, X_test, y_train, y_test = load_and_preprocess_data(DATA_FILEPATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Build the BERT Model\n",
    "Next, we build our classification model. This function constructs a fine-tunable BERT model with additional dense layers for classification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hate_speech_model = build_hate_speech_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Train the Model\n",
    "Now we train the model using our prepared datasets. The `train_model` function handles tokenization and the training loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "history = train_model(hate_speech_model, X_train, y_train, X_test, y_test, epochs=3, batch_size=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Evaluate Performance\n",
    "Finally, we can visualize the training history to check for things like overfitting and see how the model's performance (accuracy and AUC) improved over each epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_training_history(history):\n",
    "    \"\"\"Plots accuracy and loss for training and validation sets.\"\"\"\n",
    "    pd.DataFrame(history.history).plot(figsize=(10, 6))\n",
    "    plt.grid(True)\n",
    "    plt.gca().set_ylim(0, 1) # Set the y-axis limit to be between 0 and 1\n",
    "    plt.title('Model Training History')\n",
    "    plt.xlabel('Epoch')\n",
    "    plt.ylabel('Metric Value')\n",
    "    plt.show()\n",
    "\n",
    "plot_training_history(history)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
