# WordNet Auto-Translation Introduction

This notebook provides an introduction to the WordNet Auto-Translation system using DSPy pipelines.

## Overview

The WordNet Auto-Translation project aims to expand WordNet coverage for less-resourced languages by:
- Using DSPy for optimized translation pipelines
- Leveraging precision words and examples from target languages
- Providing tools for automatic synset translation

In [None]:
# Import necessary libraries
import sys
sys.path.append('../src')

from wordnet_autotranslate import TranslationPipeline, SynsetHandler, LanguageUtils
import pandas as pd
import matplotlib.pyplot as plt

## 1. Setting up the Translation Pipeline

In [None]:
# Initialize translation pipeline for Spanish
pipeline = TranslationPipeline(source_lang='en', target_lang='spanish')

print(f"Pipeline configured for: {LanguageUtils.get_language_name('en')} -> {LanguageUtils.get_language_name('es')}")

## 2. Loading Examples

In [None]:
# Load Spanish examples
examples = pipeline.load_examples()

print(f"Loaded {len(examples['words'])} words and {len(examples['sentences'])} sentences")
print("\nSample words:")
for word in examples['words'][:5]:
    print(f"  - {word}")
    
print("\nSample sentences:")
for sentence in examples['sentences'][:3]:
    print(f"  - {sentence}")

## 3. Working with WordNet Synsets

In [None]:
# Initialize synset handler
synset_handler = SynsetHandler()

# Get synsets for a word
word = "dog"
synsets = synset_handler.get_synsets(word)

print(f"Found {len(synsets)} synsets for '{word}':")
for i, synset in enumerate(synsets[:3]):  # Show first 3
    print(f"\n{i+1}. {synset['name']}")
    print(f"   Definition: {synset['definition']}")
    print(f"   Examples: {synset['examples']}")
    print(f"   Lemmas: {synset['lemmas']}")

## 4. Language Analysis

In [None]:
# Analyze available languages
from pathlib import Path

examples_path = Path('../examples')
available_languages = LanguageUtils.get_available_languages(examples_path)

print("Available languages:")
for lang in available_languages:
    validation = LanguageUtils.validate_examples_directory(examples_path, lang)
    status = "✓" if validation['has_content'] else "⚠"
    print(f"  {status} {lang} ({LanguageUtils.get_language_name(lang)})")

## 5. Text Processing Example

In [None]:
# Process sample text
sample_text = "The quick brown fox jumps over the lazy dog."
cleaned_text = LanguageUtils.clean_text(sample_text)
words = LanguageUtils.extract_words(sample_text)
stopwords = LanguageUtils.load_stopwords('en')

# Filter out stopwords
content_words = [word for word in words if word not in stopwords]

print(f"Original: {sample_text}")
print(f"Cleaned: {cleaned_text}")
print(f"All words: {words}")
print(f"Content words: {content_words}")

## 6. Visualization

In [None]:
# Create a simple visualization of language coverage
import matplotlib.pyplot as plt

# Sample data for visualization
languages = ['Spanish', 'French', 'German', 'Portuguese']
word_counts = [10, 10, 0, 0]  # Based on our examples
sentence_counts = [10, 10, 0, 0]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Words chart
ax1.bar(languages, word_counts, color='skyblue')
ax1.set_title('Precision Words by Language')
ax1.set_ylabel('Number of Words')
ax1.tick_params(axis='x', rotation=45)

# Sentences chart
ax2.bar(languages, sentence_counts, color='lightgreen')
ax2.set_title('Example Sentences by Language')
ax2.set_ylabel('Number of Sentences')
ax2.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## Next Steps

1. **Add more language examples**: Contribute precision words and sentences for additional languages
2. **Implement DSPy pipeline**: Develop the actual translation logic using DSPy
3. **Train models**: Use the examples to fine-tune translation models
4. **Evaluate results**: Create benchmarks for translation quality

See the other notebooks for more advanced topics:
- `02_translation_pipeline.ipynb`: Detailed pipeline implementation
- `03_language_expansion.ipynb`: Adding new languages
- `04_evaluation.ipynb`: Quality assessment and metrics