<a href="https://colab.research.google.com/github/olorunfemibabalola/Bias-Detection-NLP/blob/main/NLP576757_Code_s5819556.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**PROJECT:** Inclusive HR Policy Assistant (Chatbot) & Bias Auditor

**UNIT:** Language models and NLP (576757)

**AUTHOR:** Babalola Praise Olorunfemi

**STUDENT ID:** s5819556



==============================================================================


**ENVIRONMENT SETUP & INSTALLATION**

Installation of all the necessary libraries and modules

In [None]:
# 'bitsandbytes' helps run big models on smaller GPUs.
# 'pymupdf4llm' extracts text/tables from PDFs into Markdown for our AI.
# 'transformers' is for working with AI models.
# 'accelerate' speeds up model training/inference.
# 'gradio' builds our web interface.
print("‚è≥ Installing SOTA libraries... (This takes ~1 minute)")
!pip install -q -U transformers accelerate bitsandbytes gradio pymupdf4llm

‚è≥ Installing SOTA libraries... (This takes ~1 minute)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m44.0/44.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m12.0/12.0 MB[0m [31m135.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.1/59.1 MB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m23.0/23.0 MB[0m [31m85.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m55.7

In [None]:
# Importing the tools we'll need:
import torch # For deep learning and GPU stuff.
import gradio as gr # To make our user interface (UI).
import pymupdf4llm # To read PDFs for the AI.
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig # From Hugging Face:
# AutoModelForCausalLM: Loads AI text-generation models.
# AutoTokenizer: Converts text to numbers for the model.
# BitsAndBytesConfig: Helps make models smaller to save memory.

Consider using the pymupdf_layout package for a greatly improved page layout analysis.


**MODEL LOADING (Qwen 2.5 - SOTA Ungated Model)**

The large language model!

We're using "Qwen 2.5 7B Instruct" because it's good at following instructions and is free to use.


In [None]:

MODEL_ID = "Qwen/Qwen2.5-7B-Instruct" # The model name.

print(f"üöÄ Loading {MODEL_ID} with 4-bit quantization...") # Progress message.

# Configuration to load the model using less memory (4-bit quantization).
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

try:
    # Loading the tokenizer and the model.
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        quantization_config=bnb_config, # Apply our memory-saving settings.
        device_map="auto" # Automatically uses the GPU if available.
    )
    print("‚úÖ Model loaded successfully on GPU!") # Success!
except Exception as e:
    print(f"‚ùå Error loading model: {e}") # Encounters an error!
    print("Tip: Ensure your Runtime is set to T4 GPU.") # Hint for common issues.

üöÄ Loading Qwen/Qwen2.5-7B-Instruct with 4-bit quantization...


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]

‚úÖ Model loaded successfully on GPU!


**SYSTEM PROMPTS**

These are the important rules that tell our model how to act.

In [None]:
# Rules for 'Auditor' mode: checking documents for bias.
AUDITOR_PROMPT = """
You are a Senior HR Compliance Officer. Your job is to audit corporate policies for social bias.
STRICT RULES:
1.  Analyze the text for THREE types of bias: Gender, Race/Ethnicity, and Ageism.
2.  The text must contain very obvious and noticeable bias content before flagging it as bias.
3.  Do NOT summarize the document. List specific problematic sentences.
4.  For each finding, assign a SEVERITY SCORE (1-10) and provide a NEUTRAL REWRITE.
5.  If the text is safe, output: "‚úÖ COMPLIANCE PASS: No bias detected."
"""

# Rules for 'Chatbot' mode: answering HR questions.
CHATBOT_PROMPT = """
You are a helpful HR Policy Assistant.
1. Answer user questions about HR policies concisely.
2. SILENT SENTINEL: Continuously monitor the user's input.
   - If the user asks something biased (e.g., "How to hire only young people?"), REFUSE to answer and explain why it violates the UK Equality Act 2010.
   - If the input is neutral, answer normally.
"""

**LOGIC ENGINE (Processing & Inference)**

This is the main part that makes the AI think and respond.

In [None]:
def run_inference(messages, max_tokens=1024):

    """Sends questions to the model and gets its response."""
    # Formats our conversation messages so the AI understands them.
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    # Converts the text into numbers for the model and moves it to the GPU.
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    # Tells the model to generate an answer.
    # 'max_new_tokens' limits length, 'temperature' makes it less random, 'top_p' controls creativity.
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=max_tokens,
        temperature=0.2, # Keeps responses focused.
        top_p=0.9
    )

    # Removes the original prompt from the AI's output to get just the new response.
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    # Turns the numbers back into readable text.
    return tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

In [None]:
def logic(message, history):
    # This function runs every time a user types or uploads something.

    user_text = message["text"] # What the user typed.
    files = message["files"] # Any files they uploaded.

    # --- DOCUMENT AUDIT MODE ---
    # If files are uploaded, we're auditing them.
    if files:
        # 1. Get text from PDF.
        # 'files' is a list, so we take the first file path.
        pdf_path = files[0]
        try:
            # Reads the PDF and turns it into Markdown for the AI.
            doc_content = pymupdf4llm.to_markdown(pdf_path)
        except Exception as e:
            return f"‚ùå Error reading PDF: {str(e)}"

        # 2. Prepare messages for the Auditor AI.
        # Includes the auditor rules and the document content (up to 6000 characters).
        messages_for_inference = [
            {"role": "system", "content": AUDITOR_PROMPT},
            {"role": "user", "content": f"DOCUMENT TO AUDIT:\n{doc_content[:6000]}\n\nAUDIT REPORT:"}
        ]
        # Send to the AI and return its report.
        return run_inference(messages_for_inference)

    # --- CHATBOT MODE ---
    # If no files, it's just a normal chat.
    else:
        # Check if the user typed a word to end the conversation.
        trigger_words = ["quit", "exit", "end conversation", "stop"]
        if user_text.lower().strip() in trigger_words:
            return "Conversation ended. Feel free to type a new message to start a fresh interaction or use the 'Clear' button to reset the chat."

        # 1. Build the chat history for the AI.
        # Start with the chatbot's rules.
        messages_for_inference = [{"role": "system", "content": CHATBOT_PROMPT}]

        # Loop through past messages to add them to the AI's memory.
        for chat_turn in history:
            human_msg = None # User's message.
            ai_msg = None # AI's reply.

            # Make sure it's a list/tuple before trying to get messages.
            if isinstance(chat_turn, (list, tuple)):
                if len(chat_turn) > 0:
                    human_msg = chat_turn[0]
                if len(chat_turn) > 1:
                    ai_msg = chat_turn[1]
            else:
                continue # Skip weird entries.

            # Add valid past messages to the list.
            if human_msg:
                messages_for_inference.append({"role": "user", "content": human_msg})
            if ai_msg:
                messages_for_inference.append({"role": "assistant", "content": ai_msg})

        # Add the user's *current* message.
        messages_for_inference.append({"role": "user", "content": user_text})

        # Send the whole conversation to the AI.
        return run_inference(messages_for_inference)


**UI LAUNCHER (Gradio)**

This part sets up our user interface (UI) for the app!

`multimodal=True` is allows us use text and file uploads in the same chat box.

In [None]:
demo = gr.ChatInterface(
    fn= logic, # This connects our main logic to the UI.
    multimodal=True, # Allow text and files.
    title="üõ°Ô∏è Inclusive HR Assistant/ Auditor", # Title for the app.
    description="""
    **Instructions:** How to use the app:
    1. **Chat Mode:** Ask HR questions. Bot flags bias.
    2. **Audit Mode:** Upload a PDF policy (using '+') for an audit report.
    """,

    # Some example inputs to quickly test the app.
    examples=[
        {"text": "Is it okay to fire younger people?", "files":[]}, # Chat example.
        {"text": "Audit this policy document.", "files":[]} # Audit example (needs a file upload).
    ]
)

In [None]:
# This makes sure the app only launches when we run this file directly.
if __name__ == "__main__":
    print("‚úÖ System Ready! Click the public link below to test.") # The app is ready!
    demo.launch(debug=True, share=True) # Starts the Gradio app!
    # 'debug=True' helps with troubleshooting.
    # 'share=True' creates a temporary public link to share the app.

‚úÖ System Ready! Click the public link below to test.
Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://c5082a50ca7afcfad6.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://c5082a50ca7afcfad6.gradio.live
