<a href="https://colab.research.google.com/github/kennethmugo/Swahili-SMS-Spam-Detection/blob/main/research/gpt4_1_swahili_spam_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
from openai import OpenAI
import os
import json
import pandas as pd
import numpy as np
from tqdm import tqdm
import kagglehub
from google.colab import userdata

In [36]:
## Download the dataset from kaggle
path = kagglehub.dataset_download("henrydioniz/swahili-sms-detection-dataset")
full_path = os.path.join(path, "bongo_scam.csv")
df = pd.read_csv(full_path)
df.head()

Unnamed: 0,Category,Sms
0,trust,"Nipigie baada ya saa moja, tafadhali."
1,scam,Naomba unitumie iyo Hela kwenye namba hii ya A...
2,scam,"666,KARIBU FREEMASON UTIMIZE NDOTO KATIKA BIAS..."
3,trust,Watoto wanapenda sana zawadi ulizowaletea.
4,scam,IYO PESA ITUME KWENYE NAMBA HII 0657538690 JIN...


In [37]:
## Let us rename: trust -> spam and scam -> spam.
mapper = {"trust": "ham", "scam": "spam"}
df["Category"] = df["Category"].map(mapper)
df.head()

Unnamed: 0,Category,Sms
0,ham,"Nipigie baada ya saa moja, tafadhali."
1,spam,Naomba unitumie iyo Hela kwenye namba hii ya A...
2,spam,"666,KARIBU FREEMASON UTIMIZE NDOTO KATIKA BIAS..."
3,ham,Watoto wanapenda sana zawadi ulizowaletea.
4,spam,IYO PESA ITUME KWENYE NAMBA HII 0657538690 JIN...


In [38]:
# Set the number of samples per class
n_per_class = 50

df = (
    df.groupby("Category", group_keys=False)
      .apply(lambda x: x.sample(n=n_per_class, random_state=42))
      .reset_index(drop=True)
)
print(f"Length of the dataframe now: {len(df)}")

Length of the dataframe now: 100


  .apply(lambda x: x.sample(n=n_per_class, random_state=42))


We'll now use `GPT-4.1` model provided by OpenAI to try out zero-shot classification of the messages.

In [18]:
# Initialize the OpenAI client with your API key
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

In [43]:
def classify_message(message: str, model="gpt-4.1-2025-04-14"):
    system_prompt = (
        "You are a spam classifier. Classify the given message as 'spam' or 'ham' (not spam). "
        "The message may contain Swahili, English, or both, and may include misspellings. "
        "Explain your reasoning briefly. Your response must be a JSON object with two keys: "
        "'classification' and 'explanation'."
    )

    user_prompt = f"Message: \"{message}\""

    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
            ],
            temperature=0.2,
        )

        reply = response.choices[0].message.content.strip()

        # Parse the reply as JSON
        result = json.loads(reply)
        return result

    except Exception as e:
        return {
            "classification": "error",
            "explanation": f"Failed to classify message: {str(e)}"
        }

In [44]:
message = "HELLO. Ungana na wakenya wengi wanoSHINDA katika PICK A BOX.2024 END YEAR Bonus NI from 50,000. BONYEZA *201# BILA Credo upick BOX YAKO.STOP *456*9*5#"
json_response = classify_message(message)
classification = json_response.get('classification', 'unknown').lower()
explanation = json_response.get('explanation', 'No explanation provided')
print(f"Classification: {classification}")
print(f"Explanation: {explanation}")

Classification: spam
Explanation: The message promotes a lottery-like game ('PICK A BOX') with promises of large cash bonuses, uses capital letters for emphasis, and urges the recipient to dial a USSD code to participate. It also includes an opt-out code, which is common in promotional or unsolicited messages. These are typical characteristics of spam, especially in the context of mobile promotions in Kenya.


In [45]:
## Now let us make inference on the dataset
sms_messages = df['Sms'].tolist()
ground_truth_labels = df['Category'].tolist()

predicted_labels = []
model_explanations = []

for i in tqdm(range(len(sms_messages)), desc="Classifying messages..."):
    message = sms_messages[i]
    json_response = classify_message(message)
    classification = json_response.get('classification', 'unknown').lower()
    explanation = json_response.get('explanation', 'No explanation provided')
    predicted_labels.append(classification)
    model_explanations.append(explanation)

# Ensure predicted_labels and ground_truth_labels have the same length
min_len = min(len(predicted_labels), len(ground_truth_labels))
predicted_labels = predicted_labels[:min_len]
ground_truth_labels = ground_truth_labels[:min_len]
model_explanations = model_explanations[:min_len]

# Filter out 'unknown' predictions if necessary for metrics
valid_indices = [i for i, label in enumerate(predicted_labels) if label in ['ham', 'spam']]
filtered_predicted_labels = [predicted_labels[i] for i in valid_indices]
filtered_ground_truth_labels = [ground_truth_labels[i] for i in valid_indices]
filtered_explanations = [model_explanations[i] for i in valid_indices]

Classifying messages...: 100%|██████████| 100/100 [02:38<00:00,  1.59s/it]


In [46]:
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score

if len(filtered_predicted_labels) > 0:
    # Calculate metrics
    recall = recall_score(filtered_ground_truth_labels, filtered_predicted_labels, pos_label='spam')
    precision = precision_score(filtered_ground_truth_labels, filtered_predicted_labels, pos_label='spam')
    accuracy = accuracy_score(filtered_ground_truth_labels, filtered_predicted_labels)
    f1 = f1_score(filtered_ground_truth_labels, filtered_predicted_labels, pos_label='spam')

    print(f"\n--- Classification Metrics ---")
    print(f"Recall (Spam): {recall:.4f}")
    print(f"Precision (Spam): {precision:.4f}")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"F1 Score (Spam): {f1:.4f}")
else:
    print("\nNo valid predictions were obtained to calculate metrics.")


--- Classification Metrics ---
Recall (Spam): 0.7800
Precision (Spam): 1.0000
Accuracy: 0.8900
F1 Score (Spam): 0.8764


In [47]:
# Add the predicted labels to the dataframe for inspection
df['Predicted_Category'] = predicted_labels
df['Explanation'] = model_explanations
print("\nDataFrame with predictions:")
print(df[['Sms', 'Category', 'Explanation', 'Predicted_Category']].head())
print(df['Predicted_Category'].value_counts())
print(df['Category'].value_counts())


DataFrame with predictions:
                                                 Sms Category  \
0  Bro, kuna movie mpya imeachiwa leo. Je, tutaza...      ham   
1                      Tafadhali nipe maelezo zaidi.      ham   
2          Nitaandika ripoti mara tu nitakapomaliza.      ham   
3  Niambie ukweli, unafikiri Ronaldo bado ana kiw...      ham   
4                        Nisaidie na namba ya fundi.      ham   

                                         Explanation Predicted_Category  
0  The message is a casual conversation between f...                ham  
1  The message translates to 'Please give me more...                ham  
2  The message is in Swahili and translates to 'I...                ham  
3  The message is a casual question in Swahili as...                ham  
4  The message translates to 'Help me with the te...                ham  
Predicted_Category
ham     61
spam    39
Name: count, dtype: int64
Category
ham     50
spam    50
Name: count, dtype: int64


`GPT-4.1` seems to work  relatively very well as a zero-shot classifier but `Qwen3-4B` beats it. That is very impressive given there is no training needed out of the box. Let us see if there are some common themes driving certain predictions.

In [48]:
predicted_spam_mask = df['Predicted_Category'] == 'spam'
actual_spam_mask = df['Category'] == 'spam'
spam_messages = df[predicted_spam_mask & actual_spam_mask]
spam_messages.head()

Unnamed: 0,Category,Sms,Predicted_Category,Explanation
50,spam,"666,KARIBU FREEMASON UTIMIZE NDOTO KATIKA BIAS...",spam,The message promotes joining a secretive organ...
52,spam,Mjukuu wangu ndagu niliyokukabizi hiyo uwe mak...,spam,"The message contains elements typical of spam,..."
53,spam,Nitumie tu kwenye hii Tigo 0733822240 jina SAL...,spam,The message asks the recipient to send somethi...
54,spam,Naomba unitumie iyo pesa kwenye namba hii ya A...,spam,The message requests money to be sent to a spe...
55,spam,"666,KARIBU FREEMASON UTIMIZE NDOTO KATIKA BIAS...",spam,The message promotes joining a secret society ...


In [53]:
for row in spam_messages.sample(5).iterrows():
    print("----------------------")
    print(f"SMS: {row[1]['Sms']}")
    print(f"Explanation: {row[1]['Explanation']}")

----------------------
SMS: mjukuu wangu utafuta ji wako mgumu ela hazikai mkononi pakazinaisha unasota sana mpenzi hamuelewani je utatunza siri nikikusaidia pesa bila mashaliti magumu nipigie nikwelekeze NO .. 0655251448
Explanation: The message offers unsolicited financial help and requests the recipient to call a phone number for further instructions. It uses persuasive language to entice the recipient with money and secrecy, which are common characteristics of spam and potential scams.
----------------------
SMS: Congratulations! Your CV has passed. You can get 6,000,000TZS in a day. for details: wa.me/2550654321098
Explanation: The message promises a large sum of money (6,000,000 TZS) in a short time, which is a common tactic in spam and scam messages. It uses generic congratulatory language and provides a suspicious link (wa.me/...), which is often used to lure recipients into scams. There is no specific information about the job or sender, further indicating it is likely spam.
-

The explanations seem to make sense. Seems that most of the spam messages are soliciting recepients to send money to a certain number or include phishing links to direct them elsewhere.

Since the recall was at 78%, let us see some of the spam messages that were misclassified as ham to see if there are any commonalities.

In [54]:
predicted_ham_mask = df['Predicted_Category'] == 'ham'
misclassified_spam_messages = df[predicted_ham_mask & actual_spam_mask]

for row in misclassified_spam_messages.iterrows():
    print("----------------------")
    print(f"SMS: {row[1]['Sms']}")
    print(f"Explanation: {row[1]['Explanation']}")

----------------------
SMS: Au nitumie kwenye M-Pesa Namba.0696530433 jina litakuja OLIVA MATIAS
Explanation: The message is asking if money should be sent to a specific M-Pesa number and provides a name for verification. There are no indications of unsolicited offers, phishing attempts, or suspicious links. The language and context suggest a personal or transactional conversation, not spam.
----------------------
SMS: Habari za mchana. Mimi  mwenye nyumba wako hii namba yangu ya tigo. Mbona kimya na siku zinazidi kwenda...?
Explanation: The message is written in Swahili and translates to 'Good afternoon. I am your landlord, this is my Tigo number. Why are you silent and days are passing by...?' The content is personal, referencing a landlord-tenant relationship and does not contain any promotional content, suspicious links, or requests for sensitive information. It appears to be a legitimate communication rather than spam.
----------------------
SMS: Nitumie tu kwenye hii Halotel 0615

Well, there is a theme here! When messages solicit money from recepients by providing a phone number and a verification name, this model seems to think the message is trustworthy.

Let us see if there were any ham messages classified as spam.

In [55]:
actual_ham_mask = df['Category'] == 'ham'
misclassified_ham_messages = df[predicted_spam_mask & actual_ham_mask]

for row in misclassified_ham_messages.iterrows():
    print("----------------------")
    print(f"SMS: {row[1]['Sms']}")
    print(f"Explanation: {row[1]['Explanation']}")

Well, seems there are none. So this model is very good at identifying ham messages. Let us eye-ball some of them and see the explanations.

In [56]:
ham_messages = df[predicted_ham_mask & actual_ham_mask]
for row in ham_messages.sample(5).iterrows():
    print("----------------------")
    print(f"SMS: {row[1]['Sms']}")
    print(f"Explanation: {row[1]['Explanation']}")

----------------------
SMS: Nitakutafuta baadaye kwa mazungumzo zaidi.
Explanation: The message translates to 'I will look for you later for more conversation.' It is a normal, personal message with no signs of spam such as unsolicited offers, promotions, or suspicious links.
----------------------
SMS: Niambie ukweli, unafikiri Ronaldo bado ana kiwango cha juu?
Explanation: The message is a casual question in Swahili asking for an opinion about Ronaldo's current performance level. It does not contain any promotional content, suspicious links, or requests for personal information, which are typical characteristics of spam.
----------------------
SMS: Nimemaliza kuandika ripoti, nitakutumia baadaye.
Explanation: The message is in Swahili and translates to 'I have finished writing the report, I will send it to you later.' It is a normal, personal communication with no signs of spam such as promotional content, suspicious links, or requests for sensitive information.
---------------------