# Wolof NLP - Applications

This notebook demonstrates the higher-level NLP applications built on top of the core tokenizer.

Available applications:
- **POS Tagging**: Part-of-speech tagging with morphology-based rules
- **NER**: Named entity recognition with Senegalese gazetteers
- **Sentiment Analysis**: Rule-based sentiment with negation handling
- **Interlinear Glossing**: Linguistic annotation following Leipzig conventions

In [43]:
import sys
sys.path.insert(0, '../src')

from wolof_nlp.applications import (
    tag,
    extract_entities,
    analyze_sentiment,
    gloss
)

print("Applications loaded successfully")

Applications loaded successfully


## 1. POS Tagging

The POS tagger uses a combination of:
- Lexicon lookup for known words
- Morphological patterns (e.g., `-kat` suffix → agent noun)
- Contextual rules (e.g., word after TAM marker → verb)

In [44]:
sentences = [
    "Xale bi dafa lekk",
    "Goor gi dem na Touba",
    "Bindkat bi rafet na",
]

for sent in sentences:
    tagged = tag(sent)
    print(f"\n{sent}")
    print("  " + "  ".join(f"{word}/{pos}" for word, pos in tagged))


Xale bi dafa lekk
  Xale/NOUN  bi/DET  dafa/TAM  lekk/VERB

Goor gi dem na Touba
  Goor/UNK  gi/DET  dem/VERB  na/TAM  Tuba/NOUN

Bindkat bi rafet na
  Bindkat/NOUN.AGENT  bi/DET  rafet/ADJ  na/TAM


### POS Tag Inventory

| Tag | Description | Example |
|-----|-------------|----------|
| NOUN | Common noun | xale (child) |
| VERB | Verb | dem (go), lekk (eat) |
| TAM | Tense-Aspect-Mood marker | dafa, na, dina |
| DET | Determiner | bi, gi, mi |
| ADJ | Adjective | baax (good), rafet (beautiful) |
| FWORD | French word | très, mais |
| NOUN.AGENT | Agent noun (-kat) | bindkat (writer) |

## 2. Named Entity Recognition

NER identifies:
- **PER**: Person names (Senegalese first/last names, religious titles)
- **LOC**: Locations (Senegalese cities, regions, countries)
- **ORG**: Organizations (media, religious groups)

In [45]:
texts = [
    "Dama bëgg Serigne Touba",
    "Fatou ak Mamadou dem nañu Dakar",
    "Soxna Astou mongi RTS",
]

for text in texts:
    entities = extract_entities(text)
    print(f"\n{text}")
    if entities:
        for ent in entities:
            print(f"  {ent.text:<20} {ent.label:<5} (confidence: {ent.confidence:.2f})")
    else:
        print("  No entities found")


Dama bëgg Serigne Touba
  Serigne Touba        PER   (confidence: 0.98)

Fatou ak Mamadou dem nañu Dakar
  Fatu                 PER   (confidence: 0.88)
  Mamadu               PER   (confidence: 0.88)
  Dakar                LOC   (confidence: 0.95)

Soxna Astou mongi RTS
  Soxna                PER   (confidence: 0.88)
  Astu                 PER   (confidence: 0.55)
  RTS                  ORG   (confidence: 0.92)


## 3. Sentiment Analysis

The sentiment analyzer handles:
- Wolof sentiment words (neex, baax, metti, bon...)
- Morphological negation (-ul suffix)
- Intensifiers (lool, torop)
- French sentiment words in code-switched text
- Discourse markers (waaye = "but" can flip sentiment)

In [46]:
texts = [
    "Dafa neex lool",
    "Neexul",
    "Baaxul waaye dafa rafet",
    "Dafa metti torop",
    "Alhamdulillah, aksinaa",
]

print(f"{'Text':<35} {'Sentiment':<12} {'Score':<8} {'Details'}")
print("-" * 80)

for text in texts:
    result = analyze_sentiment(text)
    details = []
    if result.positive_words:
        details.append(f"+{result.positive_words}")
    if result.negative_words:
        details.append(f"-{result.negative_words}")
    if result.negated_words:
        details.append(f"negated:{result.negated_words}")
    if result.intensified:
        details.append("intensified")
    
    print(f"{text:<35} {result.sentiment.name:<12} {result.score:<8.2f} {' '.join(details)}")

Text                                Sentiment    Score    Details
--------------------------------------------------------------------------------
Dafa neex lool                      POSITIVE     0.94     +['neex'] intensified
Neexul                              NEGATIVE     0.91     -['neexul'] negated:['neexul']
Baaxul waaye dafa rafet             POSITIVE     0.91     +['rafet'] negated:['baaxul']
Dafa metti torop                    NEGATIVE     0.94     -['metti'] intensified
Alhamdulillah, aksinaa              POSITIVE     0.91     +['alhamdulillah']


## 4. Interlinear Glossing

Produces linguistic annotation following Leipzig Glossing Rules:
- Word segmentation into morphemes
- Grammatical glosses (PFV, SBJF, NEG, etc.)
- English translations from the dictionary

In [47]:
sentences = [
    "Xale bi dem na",
    "Dafa naan ndox",
]

for sent in sentences:
    result = gloss(sent)
    print(f"\n{sent}")
    print("-" * 50)
    print(result.to_string())


Xale bi dem na
--------------------------------------------------
Xale   bi   dem  na     
xale   bi   dem  na     
XALE   DET  DEM  3SG.PFV
child  ?    go          

Dafa naan ndox
--------------------------------------------------
Dafa      naan   ndox 
Dafa      naan   ndox 
3SG.VRBF  NAAN   NDOX 
          drink  water


In [48]:
# Access individual glossed words
result = gloss("Bindkat bi wax na")

print(f"{'Wolof':<12} {'Morphemes':<15} {'Gloss':<15} {'Translation'}")
print("-" * 55)
for word in result.words:
    print(f"{word.wolof:<12} {word.morphemes:<15} {word.gloss:<15} {word.translation}")

Wolof        Morphemes       Gloss           Translation
-------------------------------------------------------
Bindkat      bind-kat        BIND-NMLZ       write
bi           bi              DET             ?
wax          wax             WAX             speak/say
na           na              3SG.PFV         


## 5. Combined Pipeline Example

Processing a comment with multiple analyses.

In [49]:
text = "Dafa sonnu lool, motax demul Kaolack"

print(f"Text: {text}\n")

# POS Tagging
print("POS Tags:")
for word, pos in tag(text):
    print(f"  {word}: {pos}")

# NER
print("\nNamed Entities:")
for ent in extract_entities(text):
    print(f"  {ent.text}: {ent.label}")

# Sentiment
result = analyze_sentiment(text)
print(f"\nSentiment: {result.sentiment.name} (score: {result.score:.2f})")

Text: Dafa sonnu lool, motax demul Kaolack

POS Tags:
  Dafa: TAM
  sonnu: ADJ
  lool: ADV
  motax: CONJ
  demul: VERB.NEG
  Kaolack: PROPN.LOC

Named Entities:
  Kaolack: LOC

Sentiment: NEGATIVE (score: 0.94)


## Implementation Notes

All applications use **rule-based** approaches:

- **POS Tagger**: Lexicon + morphological patterns + context rules
- **NER**: Gazetteers (Senegalese names, places) + title patterns
- **Sentiment**: Lexicon + negation morphology + discourse markers
- **Glosser**: Morphological analyzer + dictionary lookup

These are designed for Wolof-specific linguistic features and do not require training data.