# Multilingual Sentiment Analysis Lexicons for African Languages: An LLM Approach

This notebook implements the methodology and findings from research conducted by Gift Markus Xipu on **"Multilingual Sentiment Analysis Lexicons for African Languages: An LLM Approach."** 

Our study evaluates the capabilities of large language models (OpenAI, Claude, Gemini, and BERT) to perform sentiment analysis directly on African languages, specifically Sepedi, Sesotho, and Setswana, without relying on translation-based techniques. We examine the effectiveness of various prompting strategies and assess the degree of fine-tuning required to optimise performance across these languages.

The code provided here demonstrates how to leverage these LLMs to create sentiment lexicons and analyse African language text, representing a step toward more culturally and linguistically inclusive NLP tools. The approach aims to address the critical gap in NLP resources for African languages by utilising the multilingual capabilities of modern language models.

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    Importing the necessary dependencies
</div>

In [1]:
import os
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from datetime import datetime
import time
import requests
from typing import Dict, List, Tuple, Union, Optional, Any

# LLM-specific imports
from openai import OpenAI
import anthropic
import google.generativeai as genai
import ollama

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    <strong>Code Explanation:</strong> This defines the base <code>LLM</code> class that serves as an abstract interface for all language models. It stores common attributes (name, API key, model identifier, temperature, and token limits) and declares abstract methods that child classes must implement. The <code>setup_client()</code> method will establish connections to model APIs, while <code>generate()</code> will handle prompt submission and response retrieval. Each language model implementation (Claude, OpenAI, Gemini) will extend this class with their specific functionality.
</div>

In [2]:
class LLM:
    
    def __init__(self, name, api_key, model, temperature=0.0, max_tokens=1000):
        self.name = name
        self.api_key = api_key
        self.model = model
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.client = None
    
    def setup_client(self):
        raise NotImplementedError("Subclasses must implement setup_client()")
    
    def generate(self, prompt, system_prompt=None):
        raise NotImplementedError("Subclasses must implement generate()")
    
    def __str__(self):
        return f"{self.name}(model='{self.model}')"

## Initializing the different llms we plan to use for this project

For this project we plan to use Claude, Ollama (orca-mini), Gemini and OpenAI

<div style="background-color: #d1ecff; padding: 10px; border-radius: 5px;">
    <strong>Claude Initialization</strong>
</div>

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    <strong>Code Explanation:</strong> This code implements the Claude-specific LLM subclass. The <code>ClaudeLLM</code> class inherits from the base <code>LLM</code> class and provides Claude-specific implementations for client setup and text generation. It initializes with default values optimized for Claude (using the latest claude-3-7-sonnet model and 4096 token limit). The <code>generate()</code> method formats the request according to Claude's API requirements, handling both standard prompts and optional system prompts. The code then initializes a Claude instance with the provided API key, sets the temperature to 0 for deterministic responses, and establishes the connection to Anthropic's API.
</div>

In [3]:
# Claude-specific implementation
class ClaudeLLM(LLM):
    def __init__(self, api_key, model="claude-3-7-sonnet-20250219", temperature=0.0, max_tokens=4096):
        super().__init__("Claude", api_key, model, temperature, max_tokens)
    
    def setup_client(self):
        self.client = anthropic.Anthropic(api_key=self.api_key)
        return self.client
    
    def generate(self, prompt, system_prompt=None):
        # Set up client if not already done
        if not self.client:
            self.setup_client()
        
        # Prepare the message parameters
        message_params = {
            "model": self.model,
            "max_tokens": self.max_tokens,
            "temperature": self.temperature,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        }
        
        # Add system prompt if provided
        if system_prompt:
            message_params["system"] = system_prompt
        
        # Send request to Claude
        response = self.client.messages.create(**message_params)
        
        # Return the text response
        return response.content[0].text


# Now initialize a Claude instance (replace with your API key)
CLAUDE_API_KEY = ""

# Create the Claude instance
claude = ClaudeLLM(
    api_key=CLAUDE_API_KEY,
    temperature=0.0  # Deterministic responses as requested
)

# Setup the client
claude.setup_client()

# Verify setup
print(f"Claude setup complete: {claude}")

Claude setup complete: Claude(model='claude-3-7-sonnet-20250219')


<div style="background-color: #b5e64c; padding: 10px; border-radius: 5px;">
    <strong>OpenAI Initialization</strong>
</div>

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    <strong>Code Explanation:</strong> This OpenAILLM class creates a wrapper around OpenAI's API for text generation. It initializes with an API key and optional parameters like model type (defaulting to gpt-4o), temperature (controlling randomness), and token limit. The class provides methods to set up the client connection, configure a custom API endpoint, and generate responses by sending prompts to OpenAI's service. After defining the class, the code creates an instance with a specific API key and zero temperature for deterministic outputs.

In [4]:
# OpenAI-specific implementation
class OpenAILLM(LLM):
    def __init__(self, api_key, model="gpt-4o", temperature=0.0, max_tokens=4096):
        # Call the parent constructor with the name "OpenAI"
        super().__init__("OpenAI", api_key, model, temperature, max_tokens)
        self.base_url = None  # Optional base URL for API requests
    
    def setup_client(self):
        """Set up the OpenAI client."""
        if self.base_url:
            self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)
        else:
            self.client = OpenAI(api_key=self.api_key)
        return self.client
    
    def set_base_url(self, base_url):
        """
        Set a custom base URL for API requests (useful for proxies or Azure OpenAI).
        
        Args:
            base_url (str): The base URL to use for API requests
        """
        self.base_url = base_url
        # Reset client if it was already set up
        if self.client:
            self.setup_client()
        return self
    
    def generate(self, prompt, system_prompt=None):
        """
        Generate a response from OpenAI.
        
        Args:
            prompt (str): The user prompt
            system_prompt (str, optional): System instructions for the AI
            
        Returns:
            str: OpenAI's response
        """
        # Set up client if not already done
        if not self.client:
            self.setup_client()
        
        # Prepare the messages list
        messages = []
        
        # Add system prompt if provided
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        
        # Add user prompt
        messages.append({"role": "user", "content": prompt})
        
        # Send request to OpenAI
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            max_tokens=self.max_tokens,
            temperature=self.temperature
        )
        
        # Return the text response
        return response.choices[0].message.content


# Now initialize an OpenAI instance (replace with your API key)
OPENAI_API_KEY= ""

# Create the OpenAI instance
openai_llm = OpenAILLM(
    api_key=OPENAI_API_KEY,
    temperature=0.0  # Deterministic responses as requested
)

# Setup the client
openai_llm.setup_client()

# Verify setup
print(f"OpenAI setup complete: {openai_llm}")

OpenAI setup complete: OpenAI(model='gpt-4o')


<div style="background-color: #e393ed; padding: 10px; border-radius: 5px;">
    <strong>Gemini Initialization</strong>
</div>

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    <strong>Code Explanation:</strong>This GeminiLLM class creates a wrapper for Google's Generative AI API. It inherits from a base LLM class and initializes with an API key and optional parameters for model selection (default is gemini-1.5-pro), temperature control, and token limits. The setup_client method configures the connection to Google's API, while the generate method handles text generation by creating a configured model instance, preparing prompts (combining system and user prompts since Gemini handles them differently than OpenAI), and returning the generated text response. The code finishes by instantiating the class with an API key and zero temperature setting.

In [5]:

class GeminiLLM(LLM):
    def __init__(self, api_key, model="gemini-1.5-pro", temperature=0.0, max_tokens=4096):
        super().__init__("Gemini", api_key, model, temperature, max_tokens)
    
    def setup_client(self):
        """Set up the Google Generative AI client."""
        genai.configure(api_key=self.api_key)
        self.client = genai
        return self.client
    
    def generate(self, prompt, system_prompt=None):
        # Set up client if not already done
        if not self.client:
            self.setup_client()
        
        # Create a generation config
        generation_config = {
            "temperature": self.temperature,
            "max_output_tokens": self.max_tokens,
            "top_p": 0.95,
            "top_k": 0
        }
        
        # Create the model
        model = self.client.GenerativeModel(model_name=self.model,
                                           generation_config=generation_config)
        
        # Prepare content for prompt
        if system_prompt:
            # For Gemini, we combine system and user prompts
            full_prompt = f"{system_prompt}\n\n{prompt}"
        else:
            full_prompt = prompt
        
        # Generate content
        response = model.generate_content(full_prompt)
        
        # Return the text response
        return response.text

        
# Now initialize a Gemini instance (replace with your API key)
GEMINI_API_KEY = ""

# Create the Gemini instance
gemini_llm = GeminiLLM(
    api_key=GEMINI_API_KEY,
    temperature=0.0  # Deterministic responses as requested
)

# Setup the client
gemini_llm.setup_client()

# Verify setup
print(f"Gemini setup complete: {gemini_llm}")

Gemini setup complete: Gemini(model='gemini-1.5-pro')


<div style="background-color: #f593a5; padding: 10px; border-radius: 5px;">
    <strong>Ollama Initialization</strong>
</div>

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    <strong>Code Explanation:</strong>This OllamaLLM class creates a wrapper for running local language models through Ollama. Unlike cloud-based LLMs, it doesn't require an API key but instead connects to a local Ollama server (default: http://localhost:11434). The class initializes with model selection (default: orca-mini), temperature settings, and token limits. Its generate method handles compatibility with different Ollama client versions by trying multiple API formats and gracefully handling errors. The setup process includes error handling to alert users if the local Ollama server isn't running when the client attempts to connect.

In [6]:
class OllamaLLM(LLM):
   
   def __init__(self, model="orca-mini", temperature=0.0, max_tokens=4096, host="http://localhost:11434"):
       # Ollama doesn't use an API key in the traditional sense
       super().__init__("Ollama", None, model, temperature, max_tokens)
       self.host = host
   
   def setup_client(self):
       self.client = ollama
       return self.client
       
   def generate(self, prompt, system_prompt=None):
       # Set up client if not already done
       if not self.client:
           self.setup_client()
       
       # Check the client's API - newer versions may use different parameters
       # Option 1: For newer Ollama client versions
       try:
           options = {
               "temperature": self.temperature,
               "num_predict": self.max_tokens,
           }
           
           if system_prompt:
               options["system"] = system_prompt
               
           response = self.client.generate(
               model=self.model,
               prompt=prompt,
               options=options
           )
           
           return response['response']
       except TypeError:
           # Option 2: Try alternative API format for older versions
           try:
               response = self.client.generate(
                   model=self.model,
                   prompt=prompt,
                   system=system_prompt if system_prompt else "",
               )
               return response['response']
           except Exception as e:
               print(f"Failed to generate with Ollama: {e}")
               return f"Error generating response: {e}"

# Initialize Ollama (no API key needed)
try:
   ollama_llm = OllamaLLM(
       model="orca-mini",
       temperature=0.0,
       host="http://localhost:11434"  # Default Ollama host
   )
   
   # Setup the client
   ollama_llm.setup_client()
   
   # Verify setup
   print(f"Ollama setup complete: {ollama_llm}")
except Exception as e:
   print(f"Error initializing Ollama: {e}. Make sure Ollama is running locally.")

Ollama setup complete: Ollama(model='orca-mini')


### Overall Initialization of LLMS

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
    <strong>Code Explanation:</strong>This function initializes multiple language model (LLM) clients for different providers. It takes three API keys as parameters and returns a dictionary of configured LLM objects.

1. First, it creates an empty dictionary called `llms` to store all the model instances.

2. For Claude, OpenAI, and Gemini:
   - It creates an instance of each LLM's wrapper class
   - Sets the provided API key and a temperature of 0.0 (making responses deterministic/less random)
   - Calls `setup_client()` which likely establishes the API connection
   - Adds each instance to the `llms` dictionary with an appropriate key

3. For Ollama:
   - Unlike the cloud-based LLMs, Ollama runs locally and doesn't require an API key
   - It attempts to initialize the Ollama client with the "llama3" model
   - This is wrapped in a try/except block because Ollama might not be running locally
   - If initialization fails, it prints a warning message but continues execution

4. Finally, it returns the `llms` dictionary containing all successfully initialized LLM clients

This pattern allows the code to work with multiple LLMs through a consistent interface, making it easy to swap between different providers or run comparisons.

In [7]:
def initialize_llms(claude_api_key, openai_api_key, gemini_api_key):
    llms = {}
    
    # Initialize Claude
    claude = ClaudeLLM(
        api_key=claude_api_key,
        temperature=0.0
    )
    claude.setup_client()
    llms["claude"] = claude
    
    # Initialize OpenAI
    openai_llm = OpenAILLM(
        api_key=openai_api_key,
        temperature=0.0
    )
    openai_llm.setup_client()
    llms["openai"] = openai_llm
    
    # Initialize Gemini
    gemini = GeminiLLM(
        api_key=gemini_api_key,
        temperature=0.0
    )
    gemini.setup_client()
    llms["gemini"] = gemini
    
    # Initialize Ollama (no API key needed)
    try:
        ollama_llm = OllamaLLM(
            model="llama3",
            temperature=0.0
        )
        ollama_llm.setup_client()
        llms["ollama"] = ollama_llm
    except:
        print("Warning: Could not initialize Ollama. Make sure it's running locally.")
    
    return llms

### Part 1 of MSA: Sentiment Bearings

Here my plan involves developing sentiment analysis capabilities for Sepedi, Sotho, and Setswana languages by creating comprehensive sentiment bearing lexicons. These lexicons will catalog words with their corresponding sentiment values (positive +1, neutral 0, or negative -1) in a consistent format of "<word> / <sentiment> / <rating>". After establishing these language-specific sentiment dictionaries, I'll feed them to large language models and explore various prompting techniques to evaluate consistency in sentiment interpretation. This approach combines traditional lexicon-based methods with modern LLM capabilities, potentially addressing the resource gap for sentiment analysis in these South African languages. By systematically testing different prompting strategies, I aim to identify the most reliable method for accurate sentiment analysis across these three languages, providing a foundation for more nuanced text analysis applications in these underrepresented languages.

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
<strong>Code Explanation:</strong>

This `MultilingualSentimentBearings` class provides a comprehensive framework for analyzing sentiment in three South African languages: Sepedi, Sesotho, and Setswana. Here's a breakdown of how it works:

1. **Class Structure and Initialization**:
   - The class defines a dictionary `LANGUAGES` mapping language codes to their full names
   - The constructor takes an LLM model as input, which will be used for sentiment analysis
   
2. **Prompt Generation**:
   - `get_bearing_prompt()` creates different styles of prompts for the LLM:
     - "default": Basic prompt with simple examples
     - "zero_shot": No examples, just instructions
     - "few_shot": Includes language-specific examples
     - "in_context": Adds cultural and project context
   - Each prompt style follows the same output format: `<word>, <sentiment>, <score>`
   - The prompts ensure consistency in responses while testing different prompting strategies

3. **System Prompt Generation**:
   - `get_bearing_system_prompt()` creates a system prompt that establishes the LLM's role as a sentiment analyzer for the specified language
   - Emphasizes precision, cultural context, and format adherence

4. **Word Analysis**:
   - `analyze_word()` is the core method that:
     - Validates the language
     - Gets appropriate prompts
     - Calls the LLM
     - Parses the response into structured data
     - Includes error handling for malformed responses
   
5. **Batch Processing**:
   - `analyze_words()` processes multiple words across multiple languages
   - Results are organized by language for easy analysis
   
6. **Lexicon Creation**:
   - `create_sentiment_lexicon()` transforms analysis results into a usable sentiment lexicon
   - The final output is a nested dictionary mapping languages to words to sentiment scores

This class enables systematic comparison of different prompting techniques while building sentiment lexicons for languages with limited NLP resources. The robust error handling and structured output make it suitable for both research and practical applications in multilingual sentiment analysis.
</div>

In [8]:
class MultilingualSentimentBearings:
      # Language codes and names
    LANGUAGES = {
        'sepedi': 'Sepedi',
        'sesotho': 'Sesotho',
        'setswana': 'Setswana'
    }
    
    def __init__(self, llm_model):
        self.llm = llm_model
        
    def get_bearing_prompt(self, word, language, prompt_style="default"):
        language_name = self.LANGUAGES.get(language.lower(), language)
        
        if prompt_style == "zero_shot":
            return f"""
            Analyze the sentiment bearing of the {language_name} word: "{word}"
            
            Please respond with ONLY a single line in exactly this format:
            <word>, <sentiment>, <score>
            
            Where:
            - <word> is the analyzed word in {language_name}
            - <sentiment> is either "positive", "neutral", or "negative" in English
            - <score> is either 1 (positive), 0 (neutral), or -1 (negative)
            """
            
        elif prompt_style == "few_shot":
            return f"""
            Analyze the sentiment bearing of the {language_name} word: "{word}"
            
            Here are some examples of sentiment analysis in {language_name}:
            
            For Sepedi:
            lerato (love), positive, 1
            pefelo (anger), negative, -1
            nako (time), neutral, 0
            
            For Sesotho:
            thabo (joy), positive, 1
            kgalefo (anger), negative, -1
            ntlo (house), neutral, 0
            
            For Setswana:
            boitumelo (happiness), positive, 1
            kutlobotlhoko (sadness), negative, -1
            setulo (chair), neutral, 0
            
            Please respond with ONLY a single line in exactly this format:
            <word>, <sentiment>, <score>
            
            Where:
            - <word> is the analyzed word in {language_name}
            - <sentiment> is either "positive", "neutral", or "negative" in English
            - <score> is either 1 (positive), 0 (neutral), or -1 (negative)
            """
            
        elif prompt_style == "in_context":
            return f"""
            You are analyzing text sentiment in African languages. 
            
            TASK CONTEXT:
            - You are helping to build sentiment lexicons for {language_name}, which lacks NLP resources
            - These lexicons will power sentiment analysis tools for local languages
            - Cultural context is critical to accurate sentiment determination
            
            Now analyze the sentiment bearing of the {language_name} word: "{word}"
            
            Please respond with ONLY a single line in exactly this format:
            <word>, <sentiment>, <score>
            
            Where:
            - <word> is the analyzed word in {language_name}
            - <sentiment> is either "positive", "neutral", or "negative" in English
            - <score> is either 1 (positive), 0 (neutral), or -1 (negative)
            """
            
        else:  # default
            return f"""
            Analyze the sentiment bearing of the {language_name} word: "{word}"
            
            Please respond with ONLY a single line in exactly this format:
            <word>, <sentiment>, <score>
            
            Where:
            - <word> is the analyzed word in {language_name}
            - <sentiment> is either "positive", "neutral", or "negative" in English
            - <score> is either 1 (positive), 0 (neutral), or -1 (negative)
            
            For example:
            lerato, positive, 1
            thata, neutral, 0
            bohloko, negative, -1
            """
    
    def get_bearing_system_prompt(self, language):
        language_name = self.LANGUAGES.get(language.lower(), language)
        
        return f"""
        You are a precise sentiment analyzer for the {language_name} language. Your task is to determine 
        if a {language_name} word has a positive, neutral, or negative sentiment bearing.
        
        Respond with ONLY a single line containing the word, sentiment (in English), and score (-1, 0, or 1) 
        in the exact format requested. Do not include any explanations or additional text.
        
        Be objective in your analysis and ensure you understand the cultural context and nuances of 
        the {language_name} language.
        """
    
    def analyze_word(self, word, language, prompt_style="default"):
        if language.lower() not in self.LANGUAGES:
            raise ValueError(f"Unsupported language: {language}. Supported languages are: {', '.join(self.LANGUAGES.keys())}")
        
        prompt = self.get_bearing_prompt(word, language, prompt_style)
        system_prompt = self.get_bearing_system_prompt(language)
        
        # Get response from the LLM
        response = self.llm.generate(prompt, system_prompt=system_prompt)
        
        # Parse the response (expecting format: "word, sentiment, score")
        try:
            # Clean the response and split by comma
            clean_response = response.strip()
            parts = clean_response.split(',')
            
            if len(parts) >= 3:
                analyzed_word = parts[0].strip()
                sentiment = parts[1].strip().lower()
                score_str = parts[2].strip()
                
                # Convert score to int
                try:
                    score = int(score_str)
                except ValueError:
                    # If score is not an integer, try to extract it from the string
                    if '-1' in score_str:
                        score = -1
                    elif '1' in score_str and not score_str.startswith('-'):
                        score = 1
                    else:
                        score = 0
                
                return {
                    'word': analyzed_word,
                    'language': language.lower(),
                    'sentiment': sentiment,
                    'score': score,
                    'prompt_style': prompt_style
                }
            else:
                # If parsing fails, return a default response
                print(f"Warning: Could not parse LLM response correctly. Raw response: {response}")
                return {
                    'word': word,
                    'language': language.lower(),
                    'sentiment': 'unknown',
                    'score': None,
                    'prompt_style': prompt_style,
                    'raw_response': response
                }
                
        except Exception as e:
            print(f"Error parsing LLM response: {e}")
            return {
                'word': word,
                'language': language.lower(),
                'sentiment': 'error',
                'score': None,
                'prompt_style': prompt_style,
                'error': str(e),
                'raw_response': response
            }
    
    def analyze_words(self, words_dict, prompt_style="default"):
        results = {}
        
        for language, words in words_dict.items():
            language_results = []
            
            for word in words:
                result = self.analyze_word(word, language, prompt_style)
                language_results.append(result)
                
            results[language] = language_results
            
        return results
    
    def create_sentiment_lexicon(self, words_dict, prompt_style="default"):
        lexicon = {}
        
        for language, words in words_dict.items():
            language_lexicon = {}
            
            for word in words:
                result = self.analyze_word(word, language, prompt_style)
                if result['score'] is not None:
                    language_lexicon[word] = result['score']
                
            lexicon[language] = language_lexicon
            
        return lexicon


<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
<strong>Code Explanation:</strong>

This code sets up the test infrastructure for a multilingual sentiment analysis experiment across three South African languages (Sepedi, Sesotho, and Setswana). Here's what each component does:

1. **Test Data Definition**:
   - Creates a dictionary `test_words` containing sample words for each language
   - The words are strategically selected to include emotional concepts (joy, sadness), relationship concepts (love), and more neutral concepts (interest)
   - Some words (like "thabo" for joy) appear in multiple languages, enabling cross-language consistency testing

2. **Prompt Style Configuration**:
   - Defines four different prompting strategies to test with each LLM:
     - "default": Basic instruction with simple examples
     - "zero_shot": Instructions without examples
     - "few_shot": Instructions with language-specific examples
     - "in_context": Instructions with added cultural and project context
   - This enables systematic comparison of prompting techniques for sentiment analysis

3. **Results Display Function**:
   - `display_language_results()` formats and prints analysis results in a readable table
   - Uses string formatting to align columns for better readability
   - Extracts word, sentiment, score, and prompt style from each result object
   - Handles missing or unknown values gracefully with default values

4. **LLM Testing Function**:
   - `test_llm_sentiment()` provides a reusable way to test any LLM
   - Creates a sentiment analyzer using the provided LLM
   - Tests all four prompting styles sequentially
   - Processes results for all languages and words
   - Creates a sentiment lexicon using the "in_context" prompting style
   - Includes comprehensive error handling to continue testing even if one LLM fails
   - Returns the analyzer instance for later use in cross-LLM comparisons

This modular design eliminates code duplication while maintaining the ability to test each LLM independently. The standardized format makes results comparable across different models and prompting techniques, supporting rigorous analysis of sentiment detection capabilities in these under-resourced languages.
</div>

In [9]:
# Cell 1: Define test words and utilities

# Sample words in each language
test_words = {
    'sepedi': [
        'manyami',     # sadness
        'thabo',       # joy
        'kgahlego',    # interest
    ],
    'sesotho': [
        'lerato',      # love
        'thabo',       # joy
        'thahasello',  # interest
    ],
    'setswana': [
        'botlhoko',    # pain
        'boitumelo',   # happiness
        'kgatlhego',   # interest
    ]
}

# Prompt styles to test
prompt_styles = ["default", "zero_shot", "few_shot", "in_context"]

# Function to display sentiment analysis results
def display_language_results(language, results):
    print(f"\nSentiment Analysis Results for {language.title()}:")
    print(f"{'Word':<15} | {'Sentiment':<10} | {'Score':<5} | {'Prompt Style':<12}")
    print("-" * 55)
    
    for result in results:
        word = result.get('word', 'unknown')
        sentiment = result.get('sentiment', 'unknown')
        score = result.get('score', 'N/A')
        prompt_style = result.get('prompt_style', 'default')
        
        print(f"{word:<15} | {sentiment:<10} | {score:<5} | {prompt_style:<12}")

# Function to run tests for a single LLM
def test_llm_sentiment(llm, llm_name):
    try:
        # Create a MultilingualSentimentBearings instance
        analyzer = MultilingualSentimentBearings(llm)
        print(f"{llm_name} sentiment analyzer initialized successfully")
        
        # Test each prompt style
        for style in prompt_styles:
            print(f"\n=== Testing {llm_name} with {style.upper()} prompting style ===")
            
            # Analyze words
            results = analyzer.analyze_words(test_words, prompt_style=style)
            
            # Display results for each language
            for language, language_results in results.items():
                display_language_results(language, language_results)
        
        # Create a sentiment lexicon
        print(f"\n=== Creating {llm_name} Sentiment Lexicon ===")
        lexicon = analyzer.create_sentiment_lexicon(test_words, prompt_style="in_context")
        
        # Display the lexicon
        print(f"\n{llm_name} Sentiment Lexicon:")
        for language, words in lexicon.items():
            print(f"\n{language.title()}:")
            for word, score in words.items():
                sentiment = "positive" if score > 0 else "negative" if score < 0 else "neutral"
                print(f"  {word:<15}: {sentiment:<10} ({score})")
        
        print(f"\n{llm_name} sentiment analysis testing completed successfully")
        return analyzer
    except Exception as e:
        print(f"Error in {llm_name} sentiment testing: {e}")
        return None

<div style="background-color: #d1ecff; padding: 10px; border-radius: 5px;">
    <strong>Claude Results</strong>
</div>

In [14]:
claude_analyzer = test_llm_sentiment(claude, "Claude")

Claude sentiment analyzer initialized successfully

=== Testing Claude with DEFAULT prompting style ===

Sentiment Analysis Results for Sepedi:
Word            | Sentiment  | Score | Prompt Style
-------------------------------------------------------
manyami         | negative   | -1    | default     
thabo           | positive   | 1     | default     
kgahlego        | positive   | 1     | default     

Sentiment Analysis Results for Sesotho:
Word            | Sentiment  | Score | Prompt Style
-------------------------------------------------------
lerato          | positive   | 1     | default     
thabo           | positive   | 1     | default     
thahasello      | positive   | 1     | default     

Sentiment Analysis Results for Setswana:
Word            | Sentiment  | Score | Prompt Style
-------------------------------------------------------
botlhoko        | negative   | -1    | default     
boitumelo       | positive   | 1     | default     
kgatlhego       | positive   | 1 

<div style="background-color: #b5e64c; padding: 10px; border-radius: 5px;">
    <strong>Open AI Results</strong>
</div>

In [None]:
openai_analyzer = test_llm_sentiment(openai_llm, "OpenAI")

<div style="background-color: #e393ed; padding: 10px; border-radius: 5px;">
    <strong>Gemini Results</strong>
</div>

In [25]:
# Cell 4: Gemini Test
gemini_analyzer = test_llm_sentiment(gemini_llm, "Gemini")

Gemini sentiment analyzer initialized successfully

=== Testing Gemini with DEFAULT prompting style ===
Error in Gemini sentiment testing: 400 API key not valid. Please pass a valid API key. [reason: "API_KEY_INVALID"
domain: "googleapis.com"
metadata {
  key: "service"
  value: "generativelanguage.googleapis.com"
}
, locale: "en-US"
message: "API key not valid. Please pass a valid API key."
]


<div style="background-color: #f593a5; padding: 10px; border-radius: 5px;">
    <strong>Ollama Results</strong>
</div>

In [20]:
# Cell 5: Ollama Test
ollama_analyzer = test_llm_sentiment(ollama_llm, "Ollama")

Ollama sentiment analyzer initialized successfully

=== Testing Ollama with DEFAULT prompting style ===
Error in Ollama sentiment testing: model requires more system memory (5.1 GiB) than is available (5.1 GiB) (status code: 500)


### Part 2 of MSA: Sentiment Classifications

In this phase, we're elevating our sentiment analysis from individual words to complete sentences across Sepedi, Sesotho, and Setswana languages. We're testing how different LLMs (Claude, OpenAI, Gemini, and Ollama) classify the emotional tone of authentic sentences as positive, negative, or neutral. By comparing classification consistency across models and prompt styles, we aim to identify the most reliable approach for sentiment analysis in these under-resourced languages. This builds on our word-level sentiment bearings to develop more nuanced text understanding capabilities that respect cultural and linguistic context.

<div style="background-color: #d1f3d1; padding: 10px; border-radius: 5px;">
<strong>Code Explanation:</strong>

This code implements a multilingual sentiment classification framework for analyzing text sentiment in three South African languages: Sepedi, Sesotho, and Setswana. Here's a breakdown of how it works:

1. **Test Data Setup**:
   - Defines a dictionary of test sentences in each language with their English translations
   - Each language includes positive, negative, and neutral examples to test classification accuracy
   - Also defines prompt styles to test different LLM prompting approaches

2. **MultilingualSentimentClassifier Class**:
   - Core class for sentence-level sentiment analysis across languages
   - Maintains language mappings and handles the LLM interaction

3. **Prompt Generation**:
   - Implements four different prompting strategies:
     - "default": Basic classification with contextual hints
     - "zero_shot": Simple instructions without examples
     - "few_shot": Includes language-specific examples to guide the model
     - "in_context": Adds cultural and project context for better performance

4. **Classification Logic**:
   - `classify_text()` handles individual text classification:
     - Validates language support
     - Constructs appropriate prompts
     - Calls the LLM
     - Parses the response to extract sentiment
     - Includes fallback parsing for unexpected response formats
   - `classify_texts()` processes multiple texts across languages

5. **Output and Display**:
   - `display_classification_results()` formats and displays results in a readable table
   - Truncates long texts for better display formatting

6. **Test Framework**:
   - `test_llm_classification()` provides a standardized way to:
     - Initialize a classifier with a specific LLM
     - Test all prompt styles
     - Process results across all languages
     - Handle errors gracefully

This code enables systematic evaluation of different LLMs' ability to perform sentiment analysis on complete sentences in low-resource languages, comparing various prompt engineering approaches to identify optimal strategies for multilingual sentiment classification.
</div>

In [10]:
# Cell 1: Define test sentences and utilities

# Sample sentences in each language
test_sentences = {
    'sepedi': [
        "Ke thabile go bona gore o atlega mo dithutong tša gago.", # I'm happy to see that you're succeeding in your studies.
        "Ga ke rate maitshwaro a gago.", # I don't like your behavior.
        "Pula e a na lehono.", # It's raining today.
    ],
    'sesotho': [
        "Ke thabile ho bona batho ba bangata ba tshehetsana.", # I'm happy to see many people supporting each other.
        "Ha ke batle ho bua le motho ya ntseng a bua hampe ka batho.", # I don't want to talk to someone who speaks badly about people.
        "Buka ena e fuwe ke titjhere.", # This book was given by the teacher.
    ],
    'setswana': [
        "Ke itumetse thata go bona ditsala tsa me.", # I'm very happy to see my friends.
        "Ga ke rate go dira le batho ba ba sa tseyeng tiro ya bone ka tlhoafalo.", # I don't like working with people who don't take their work seriously.
        "Setlhare se se kwa pele ga ntlo.", # The tree is in front of the house.
    ]
}

# Prompt styles to test
prompt_styles = ["default", "zero_shot", "few_shot", "in_context"]

class MultilingualSentimentClassifier:
    # Language codes and names
    LANGUAGES = {
        'sepedi': 'Sepedi',
        'sesotho': 'Sesotho',
        'setswana': 'Setswana'
    }
    
    def __init__(self, llm_model):
        self.llm = llm_model
        
    def get_classification_prompt(self, text, language, prompt_style="default"):
        language_name = self.LANGUAGES.get(language.lower(), language)
        
        if prompt_style == "zero_shot":
            return f"""
            Classify the overall sentiment of the following {language_name} text as 'positive', 'negative', or 'neutral'.
            Text: "{text}"
            
            Provide the result in exactly this format:
            Sentiment: <positive/negative/neutral>
            """
            
        elif prompt_style == "few_shot":
            return f"""
            Classify the overall sentiment of the following {language_name} text.
            
            Here are some examples:
            
            Sepedi example 1: "Ke rata go raloka le bana ba ka." (I like playing with my children.)
            Sentiment: positive
            
            Sepedi example 2: "Ga ke na tshelete ya go reka dijo." (I don't have money to buy food.)
            Sentiment: negative
            
            Sepedi example 3: "Re tla kopana ka Mosupologo." (We will meet on Monday.)
            Sentiment: neutral
            
            Now classify this {language_name} text: "{text}"
            
            Provide the result in exactly this format:
            Sentiment: <positive/negative/neutral>
            """
            
        elif prompt_style == "in_context":
            return f"""
            You are analyzing text sentiment in African languages. 
            
            TASK CONTEXT:
            - You are helping to build sentiment analysis tools for {language_name}, which lacks NLP resources
            - Cultural context is critical to accurate sentiment determination
            - Even subtle emotional cues in the language should be considered
            
            Analyze the sentiment of this {language_name} text: "{text}"
            
            Provide the result in exactly this format:
            Sentiment: <positive/negative/neutral>
            """
            
        else:  # default
            return f"""
            Classify the overall sentiment of the following {language_name} text as 'positive', 'negative', or 'neutral'.
            Text: "{text}"
            
            Consider the emotional tone, cultural context, and any sentiment-bearing words.
            
            Provide the result in exactly this format:
            Sentiment: <positive/negative/neutral>
            """
    
    def get_classification_system_prompt(self, language):
        language_name = self.LANGUAGES.get(language.lower(), language)
        
        return f"""
        You are a precise sentiment classifier for the {language_name} language. Your task is to determine 
        if a {language_name} text has a positive, negative, or neutral sentiment.
        
        Respond with ONLY the sentiment classification in the exact format requested. 
        Do not include any explanations or additional text.
        
        Be objective in your analysis and ensure you understand the cultural context and nuances of 
        the {language_name} language.
        """
    
    def classify_text(self, text, language, prompt_style="default"):
        if language.lower() not in self.LANGUAGES:
            raise ValueError(f"Unsupported language: {language}. Supported languages are: {', '.join(self.LANGUAGES.keys())}")
        
        prompt = self.get_classification_prompt(text, language, prompt_style)
        system_prompt = self.get_classification_system_prompt(language)
        
        # Get response from the LLM
        response = self.llm.generate(prompt, system_prompt=system_prompt)
        
        # Parse the response (expecting format: "Sentiment: positive/negative/neutral")
        try:
            # Clean the response
            clean_response = response.strip().lower()
            
            # Extract sentiment
            if "sentiment:" in clean_response:
                sentiment_part = clean_response.split("sentiment:")[1].strip()
                if "positive" in sentiment_part:
                    sentiment = "positive"
                elif "negative" in sentiment_part:
                    sentiment = "negative"
                else:
                    sentiment = "neutral"
            else:
                # Fallback parsing if format is not as expected
                if "positive" in clean_response:
                    sentiment = "positive"
                elif "negative" in clean_response:
                    sentiment = "negative"
                else:
                    sentiment = "neutral"
            
            return {
                'text': text,
                'language': language.lower(),
                'sentiment': sentiment,
                'prompt_style': prompt_style
            }
                
        except Exception as e:
            print(f"Error parsing LLM response: {e}")
            return {
                'text': text,
                'language': language.lower(),
                'sentiment': 'error',
                'prompt_style': prompt_style,
                'error': str(e),
                'raw_response': response
            }
    
    def classify_texts(self, texts_dict, prompt_style="default"):
        results = {}
        
        for language, texts in texts_dict.items():
            language_results = []
            
            for text in texts:
                result = self.classify_text(text, language, prompt_style)
                language_results.append(result)
                
            results[language] = language_results
            
        return results

# Function to display sentiment classification results
def display_classification_results(language, results):
    print(f"\nSentiment Classification Results for {language.title()}:")
    print(f"{'Text':<50} | {'Sentiment':<10} | {'Prompt Style':<12}")
    print("-" * 75)
    
    for result in results:
        text = result.get('text', 'unknown')
        # Truncate long texts for better display
        if len(text) > 45:
            text = text[:42] + "..."
        sentiment = result.get('sentiment', 'unknown')
        prompt_style = result.get('prompt_style', 'default')
        
        print(f"{text:<50} | {sentiment:<10} | {prompt_style:<12}")

# Function to run tests for a single LLM
def test_llm_classification(llm, llm_name):
    try:
        # Create a MultilingualSentimentClassifier instance
        classifier = MultilingualSentimentClassifier(llm)
        print(f"{llm_name} sentiment classifier initialized successfully")
        
        # Test each prompt style
        for style in prompt_styles:
            print(f"\n=== Testing {llm_name} with {style.upper()} prompting style ===")
            
            # Classify texts
            results = classifier.classify_texts(test_sentences, prompt_style=style)
            
            # Display results for each language
            for language, language_results in results.items():
                display_classification_results(language, language_results)
        
        print(f"\n{llm_name} sentiment classification testing completed successfully")
        return classifier
    except Exception as e:
        print(f"Error in {llm_name} sentiment classification testing: {e}")
        return None

<div style="background-color: #d1ecff; padding: 10px; border-radius: 5px;">
    <strong>Claude Results</strong>
</div>

In [None]:
claude_classifier = test_llm_classification(claude, "Claude")

Claude sentiment classifier initialized successfully

=== Testing Claude with DEFAULT prompting style ===

Sentiment Classification Results for Sepedi:
Text                                               | Sentiment  | Prompt Style
---------------------------------------------------------------------------
Ke thabile go bona gore o atlega mo dithut...      | positive   | default     
Ga ke rate maitshwaro a gago.                      | negative   | default     
Pula e a na lehono.                                | neutral    | default     

Sentiment Classification Results for Sesotho:
Text                                               | Sentiment  | Prompt Style
---------------------------------------------------------------------------
Ke thabile ho bona batho ba bangata ba tsh...      | positive   | default     
Ha ke batle ho bua le motho ya ntseng a bu...      | negative   | default     
Buka ena e fuwe ke titjhere.                       | neutral    | default     

Sentiment Classi

<div style="background-color: #b5e64c; padding: 10px; border-radius: 5px;">
    <strong>Open AI Results</strong>
</div>

In [12]:
openai_classifier = test_llm_classification(openai_llm, "OpenAI")

OpenAI sentiment classifier initialized successfully

=== Testing OpenAI with DEFAULT prompting style ===

Sentiment Classification Results for Sepedi:
Text                                               | Sentiment  | Prompt Style
---------------------------------------------------------------------------
Ke thabile go bona gore o atlega mo dithut...      | positive   | default     
Ga ke rate maitshwaro a gago.                      | negative   | default     
Pula e a na lehono.                                | neutral    | default     

Sentiment Classification Results for Sesotho:
Text                                               | Sentiment  | Prompt Style
---------------------------------------------------------------------------
Ke thabile ho bona batho ba bangata ba tsh...      | positive   | default     
Ha ke batle ho bua le motho ya ntseng a bu...      | negative   | default     
Buka ena e fuwe ke titjhere.                       | neutral    | default     

Sentiment Classi

<div style="background-color: #e393ed; padding: 10px; border-radius: 5px;">
    <strong>Gemini Results</strong>
</div>

In [None]:
gemini_classifier = test_llm_classification(gemini, "Gemini")

<div style="background-color: #f593a5; padding: 10px; border-radius: 5px;">
    <strong>Ollama Results</strong>
</div>

In [None]:
ollama_classifier = test_llm_classification(ollama_llm, "Ollama")

Ollama sentiment classifier initialized successfully

=== Testing Ollama with DEFAULT prompting style ===
