# Spanish NLP: Spell Checking Notebook

This notebook demonstrates how to use the `SpanishSpellChecker` class from the `spanish_nlp` library.

It supports multiple spell checking methods:
*   `dictionary`: Uses `pyspellchecker` based on dictionary lookups and edit distance.
*   `contextual_lm`: Uses a transformer-based masked language model (like BETO) for context-aware corrections.

For more information visit [spanish_nlp](https://github.com/jorgeortizfuentes/spanish_nlp) repository on GitHub.

## Setup

Import the necessary class and configure logging to see informational messages.

In [None]:
import logging
from spanish_nlp import SpanishSpellChecker

# Configure logging to see messages from the library
logging.basicConfig(level=logging.INFO)
# You might want to set a higher level (e.g., logging.WARNING) for less verbose output
# logging.basicConfig(level=logging.WARNING)

## Method 1: Dictionary-Based Spell Checker (`method='dictionary'`)

In [None]:
try:
    # Initialize with default settings (language='es', distance=2)
    dict_checker = SpanishSpellChecker(method='dictionary')
    print(f"Initialized: {dict_checker.get_implementation_details()}")
except Exception as e:
    print(f"Error initializing dictionary checker: {e}")

In [None]:
text_simple = "Ola kmo stás? Esto es un testo de prueva."

if 'dict_checker' in locals():
    print(f"Original Text: {text_simple}")
    
    # Find potential errors
    errors = dict_checker.find_errors(text_simple)
    print(f"Potential Errors: {errors}")
    
    # Check a specific word
    print(f"Is 'stás' correct? {dict_checker.is_correct('stás')}")
    print(f"Is 'hola' correct? {dict_checker.is_correct('hola')}")
    
    # Get suggestions for a word
    suggestions = dict_checker.suggest('pruevs')
    print(f"Suggestions for 'pruevs': {suggestions}")
    
    # Get the single best correction for a word
    correction = dict_checker.correct_word('testo')
    print(f"Correction for 'testo': {correction}")
    
    # Correct the entire text (use with caution)
    corrected_text = dict_checker.correct_text(text_simple)
    print(f"Corrected Text: {corrected_text}")

### Using Custom Dictionary and Distance

In [None]:
try:
    # Example with distance 1 and adding custom words
    custom_words = ['nlp', 'pythonista']
    dict_checker_custom = SpanishSpellChecker(method='dictionary', 
                                            distance=1, 
                                            custom_dictionary=custom_words)
    
    text_custom = "Me gusta el nlp y soy un buen pythonista."
    print(f"\nOriginal Text: {text_custom}")
    errors_custom = dict_checker_custom.find_errors(text_custom)
    print(f"Potential Errors (custom dict): {errors_custom}") # Should be empty
    
except Exception as e:
    print(f"Error initializing custom dictionary checker: {e}")

## Method 2: Contextual Language Model (`method='contextual_lm'`)

This method uses a transformer model (like BETO) to understand context. It's generally better for distinguishing between correctly spelled words used incorrectly (e.g., homophones) but is computationally more expensive.

In [None]:
try:
    # Initialize using BETO model, auto-detect device (GPU if available)
    # You can specify device='cpu' or device=0 (for first GPU)
    lm_checker = SpanishSpellChecker(
        method='contextual_lm',
        model_name="dccuchile/bert-base-spanish-wwm-uncased", # BETO
        top_k=5, # Number of candidates model considers internally
        suggestion_distance_threshold=2 # Filter suggestions by Levenshtein distance
    )
    print(f"Initialized: {lm_checker.get_implementation_details()}")

except Exception as e:
    print(f"Error initializing contextual checker: {e}")
    print("Make sure 'transformers', 'torch' (or 'tensorflow'), 'accelerate', and 'python-Levenshtein' are installed.")

In [None]:
text_context_1 = "Fui ha ver la pelicula."
text_context_2 = "El tubo se rompio."
text_context_3 = "No ce si ir al cine."

if 'lm_checker' in locals():
    print(f"Original 1: {text_context_1}")
    # is_correct only checks vocabulary, 'ha' might be in vocab
    print(f"Is 'ha' in vocab? {lm_checker.is_correct('ha')}") 
    # correct_text uses context
    corrected_1 = lm_checker.correct_text(text_context_1)
    print(f"Corrected 1: {corrected_1}")
    
    print(f"\nOriginal 2: {text_context_2}")
    corrected_2 = lm_checker.correct_text(text_context_2)
    print(f"Corrected 2: {corrected_2}") # Model might choose 'tubo' or 'tuvo'
    
    print(f"\nOriginal 3: {text_context_3}")
    corrected_3 = lm_checker.correct_text(text_context_3)
    print(f"Corrected 3: {corrected_3}")

### Contextual Suggestions (Less Reliable without Context)

The `suggest` and `correct_word` methods for the contextual checker work by masking the word in isolation. They lack context and are less reliable than `correct_text`.

In [None]:
if 'lm_checker' in locals():
    word_isolated = "vien"
    suggestions_isolated = lm_checker.suggest(word_isolated)
    correction_isolated = lm_checker.correct_word(word_isolated)
    
    print(f"\nSuggestions for '{word_isolated}' (isolated): {suggestions_isolated}")
    print(f"Correction for '{word_isolated}' (isolated): {correction_isolated}")