In [1]:
import os
import json
import time
import pandas as pd
from datetime import datetime
from ollama import Client # Import Ollama client
from concurrent.futures import ThreadPoolExecutor, as_completed # For concurrent processing

## Connecting to OpenAI (or Ollama in case of open source)

The next cell is where we load in the environment variables in your `.env` file and connect to OpenAI.  

In [7]:
# Constants
OLLAMA_HOST = "http://localhost:11434" # Default Ollama host
HEADERS = {"Content-Type": "application/json"}
MODEL = "deepseek-r1:8b"

## Create system and user prompts to input to the model
Most models have been trained to receive instructions in a particular way. They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [3]:
# Define our system prompt

system_prompt = """You are an expert topic classifier for student feedback from a vocational education and training institution.
Your task is to analyze student verbatims and assign one or more topics from a predefined list.\
You must adhere to the following rules:
1.  Match the verbatim to the most relevant topics from the provided topic list.
2.  If the verbatim is not relevant to any topic on the list, return 'No Match'.
3.  You can assign more than one topic if the verbatim covers multiple subjects.
4.  Your output must be a single JSON object.
5.  The JSON object must have two keys: 'topics' and 'verbatim_text'.
6.  The value for 'topics' should be a list of strings. Each string must be a topic from the provided list or the string 'No Match'.
7.  The value for 'verbatim_text' should be the original verbatim you are analyzing.

Below is a list of predefined topics and five examples of how to classify a verbatim.

**Topic List:**
- Enrolment Process
- Student Support Services
- Course Content and Relevance
- Trainer Quality and Engagement
- Facilities and Campus Environment
- Timetable and Scheduling
- Online Learning Platform
- Assessment and Feedback
- Career and Employment Services
- Technology and Equipment
- Communication and Information
- Student Welfare and Wellbeing
- Course Fees and Payments
- Recognition of Prior Learning (RPL)
- Work Placement
- Graduation and Completion"""


In [4]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(verbatim):
    user_prompt = f"""
    You are looking at a verbatim from a student. Based on the list of topics provided, below are five examples of how to classify a verbatim.

    **Examples for Few-Shot Classification:**

    1. **Verbatim:** "I'm having trouble with the login for the online portal, and the Wi-Fi on campus is really slow."
       **Topics:** ["Online Learning Platform", "Technology and Equipment"]

    2. **Verbatim:** "John is great! He explains everything clearly and is always available to help after class."
       **Topics:** ["Trainer Quality and Engagement"]

    3. **Verbatim:** "I asked about my results from last semester, but nobody has gotten back to me. I've been waiting for weeks."
       **Topics:** ["Communication and Information", "Assessment and Feedback"]

    4. **Verbatim:** "The campus cafeteria has really limited options, and the library hours are not great for students who work."
       **Topics:** ["Facilities and Campus Environment"]

    5. **Verbatim:** "I've been working as a mechanic for 10 years, and I want to see if I can get credit for my experience towards this course."
       **Topics:** ["Recognition of Prior Learning (RPL)"]

    **New Verbatim to Classify:**
    {verbatim}
    """
    
    return user_prompt

**Create Message for the model** : The API from OpenAI expects to receive messages in a particular structure.

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```

In [6]:
# Create the message structure
def messages_for(verbatim):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(verbatim)}
    ]

## Bring it together

In [40]:
# Create an Ollama client instance
ollama_client = Client(host=OLLAMA_HOST)

# And now: call the OpenAI API to perform the task.
def return_topics(verbatim_obj):
    verbatim = verbatim_obj['verbatim_text']
    custom_id = verbatim_obj['custom_id']
    
    try:        
        response = ollama_client.chat(
                model=MODEL,
                messages=messages_for(verbatim),
                options={
                    "temperature": 0.0, # Adjust as needed for classification
                    "top_p": 0.9,
                    "num_predict": 200, # Limit response length if needed
                },
                format="json" # Important: Ask Ollama for JSON output
            )

        # Get Ollama response content (slightly diff from OpenAI)
        response_content = response['message']['content']

        # Parse the JSON string from the model's response
        classification_data = json.loads(response_content)
        
        return {
                'custom_id': classification_data.get('custom_id', custom_id),
                'topics': classification_data.get('topics', ['Error: No topics']), # Use .get for safety
                'verbatim_text': classification_data.get('verbatim_text', verbatim)
            }

    except json.JSONDecodeError as e:
        print(f"JSON decoding error for ID {custom_id}: {e}. Raw response: {response_content}")
        return {
            'custom_id': custom_id,
            'topics': ['Error: JSON Decode Error'],
            'verbatim_text': verbatim_text
        }
    except Exception as e:
        print(f"Error processing verbatim ID {custom_id}: {e}")
        return {
            'custom_id': custom_id,
            'topics': ['Error: API Call Failed'],
            'verbatim_text': verbatim_text
        }

## Create Batch 

In [32]:
# Sample list of verbatims
verbatims_to_classify = [
    "My classes are all over the place. I have to come to campus three times a week for just one or two hours each time.",
    "The course material is outdated; we're still learning about software from five years ago.",
    "I tried to get help from the student welfare office, but they were closed.",
    "My final assignment feedback was very vague, and I don't know what to improve on.",
    "I'm not sure if this is the right career path for me after finishing this course.",
    "The computer labs have really old computers, and some don't even work properly.",
    "The process to apply for this course was so confusing and the website kept crashing.",
    "My work placement was very unorganized and I felt like I didn't learn anything.",
    "The person who was meant to help me with my enrolment never got back to me.",
    "The cost of the textbooks is way too high, and I'm not sure if I can afford them.",
    "This feedback is not about any of the topics.",
    "I need help with my resume and job applications after I graduate.",
    "The fees for next semester seem to have increased without much warning.",
    "The campus security could be better, I don't feel entirely safe at night.",
    "I really enjoy the practical exercises in this course; they are very relevant to industry.",
]

In [33]:
# Prepare data for concurrent processing
data_for_processing = []
for i, verbatim in enumerate(verbatims_to_classify):
    data_for_processing.append({"custom_id": f"verbatim_{i+1}", "verbatim_text": verbatim})


In [35]:
# --- Concurrent Processing with ThreadPoolExecutor ---

# max_workers should be adjusted based on your system's resources (CPU cores, VRAM)
# and OLLAMA_NUM_PARALLEL setting. Start with a small number like 2-4.
MAX_CONCURRENT_REQUESTS = 4 

In [44]:
print(f"\nStarting batch processing with {MAX_CONCURRENT_REQUESTS} concurrent requests...")

parsed_results = []
with ThreadPoolExecutor(max_workers=MAX_CONCURRENT_REQUESTS) as executor:
    # Submit all verbatims to the executor
    job_queue = {executor.submit(return_topics, item): item for item in data_for_processing}
    
    # As results complete, retrieve them
    for job in as_completed(job_queue):
        result = job.result()
        if result: # Only add if result is not None (in case of total failure)
            parsed_results.append(result)
        # Optional: Add a progress indicator
        print(f"Processed {len(parsed_results)}/{len(data_for_processing)} verbatims...", end='\r')

print("\nBatch processing complete.")


Starting batch processing with 4 concurrent requests...
Processed 15/15 verbatims...
Batch processing complete.


In [54]:
for i, result in enumerate(parsed_results):
    print(f"Verbatim {i+1} : {result['verbatim_text']}")
    print(f"Predicted Topics : {result['topics']}\n")
    
    

Verbatim 1 : My final assignment feedback was very vague, and I don't know what to improve on.
Predicted Topics : ['Assessment and Feedback']

Verbatim 2 : My classes are all over the place. I have to come to campus three times a week for just one or two hours each time.
Predicted Topics : ['Timetable and Scheduling']

Verbatim 3 : The course material is outdated; we're still learning about software from five years ago.
Predicted Topics : ['Course Content and Relevance']

Verbatim 4 : I tried to get help from the student welfare office, but they were closed.
Predicted Topics : ['Student Welfare and Wellbeing']

Verbatim 5 : I'm not sure if this is the right career path for me after finishing this course.
Predicted Topics : ['Career and Employment Services']

Verbatim 6 : The computer labs have really old computers, and some don't even work properly.
Predicted Topics : ['Technology and Equipment']

Verbatim 7 : The process to apply for this course was so confusing and the website kept c

In [53]:
# Convert result to dataframe
df = pd.DataFrame(parsed_results)

# If blank, then show No Match
df['topics'] = df['topics'].apply(lambda x: 'No Match' if len(x)==0 else x)

# Clean topic display
df['topics'] = df['topics'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)

print("\n--- Final Results DataFrame ---")
display(df[['verbatim_text','topics']]) # Using display to render the DataFrame nicely in Jupyter/IPython


--- Final Results DataFrame ---


Unnamed: 0,verbatim_text,topics
0,"My final assignment feedback was very vague, a...",Assessment and Feedback
1,My classes are all over the place. I have to c...,Timetable and Scheduling
2,The course material is outdated; we're still l...,Course Content and Relevance
3,I tried to get help from the student welfare o...,Student Welfare and Wellbeing
4,I'm not sure if this is the right career path ...,Career and Employment Services
5,"The computer labs have really old computers, a...",Technology and Equipment
6,The process to apply for this course was so co...,"Enrolment Process, Online Learning Platform"
7,My work placement was very unorganized and I f...,Work Placement
8,"The cost of the textbooks is way too high, and...",Course Fees and Payments
9,The person who was meant to help me with my en...,"Enrolment Process, Communication and Information"
