# Spanish NLP: Spell Checking Notebook

This notebook demonstrates how to use the `SpanishSpellChecker` class from the `spanish_nlp` library.

It supports multiple spell checking methods:
*   `dictionary`: Uses `pyspellchecker` based on dictionary lookups and edit distance.
*   `contextual_lm`: Uses a transformer-based masked language model (like BETO) for context-aware corrections.

For more information visit [spanish_nlp](https://github.com/jorgeortizfuentes/spanish_nlp) repository on GitHub.

## Setup

Import the necessary class and configure logging to see informational messages.

In [8]:
import logging
from spanish_nlp import SpanishSpellChecker

# Configure logging to see messages from the library
logging.basicConfig(level=logging.INFO)
# You might want to set a higher level (e.g., logging.WARNING) for less verbose output
# logging.basicConfig(level=logging.WARNING)

## Method 1: Dictionary-Based Spell Checker (`method='dictionary'`)

In [9]:
try:
    # Initialize with default settings (language='es', distance=2)
    dict_checker = SpanishSpellChecker(method="dictionary")
    print(f"Initialized: {dict_checker.get_implementation_details()}")
except Exception as e:
    print(f"Error initializing dictionary checker: {e}")

INFO:spanish_nlp.spellchecker:Initializing SpanishSpellChecker with method: 'dictionary'
INFO:spanish_nlp.spellchecker.dictionary_impl:DictionarySpellChecker initialized for language 'es' with distance 2.


Initialized: Using implementation: DictionarySpellChecker


In [10]:
text_simple = "hola komo stas?"

if "dict_checker" in locals():
    print(f"Original Text: {text_simple}")

    # Find potential errors
    errors = dict_checker.find_errors(text_simple)
    print(f"Potential Errors: {errors}")

    # Check a specific word
    print(f"Is 'stás' correct? {dict_checker.is_correct('stás')}")
    print(f"Is 'hola' correct? {dict_checker.is_correct('hola')}")

    # Get suggestions for a word
    suggestions = dict_checker.suggest("pruevs")
    print(f"Suggestions for 'pruevs': {suggestions}")

    # Get the single best correction for a word
    correction = dict_checker.correct_word("testo")
    print(f"Correction for 'testo': {correction}")

    # Correct the entire text (use with caution)
    corrected_text = dict_checker.correct_text(text_simple)
    print(f"Corrected Text: {corrected_text}")

Original Text: hola komo stas?
Potential Errors: ['stas', 'komo']
Is 'stás' correct? False
Is 'hola' correct? True
Suggestions for 'pruevs': ['pues', 'prueba']
Correction for 'testo': esto
Corrected Text: hola como estas?


### Using Custom distance

The `distance` parameter controls the maximum Levenshtein distance for suggestions. Levenshtein distance is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.                          

A lower `distance` value makes the checker stricter, only suggesting words that are very similar (few edits away). A higher value is more lenient and will suggest words that are less similar.

In the example below, we compare the default distance (2) with a stricter distance (1) for the misspelled word 'pruevs'. Notice how `distance=1` returns fewer (or no)

In [11]:
try:
    # Initialize with a stricter distance (distance=1)
    dict_checker_strict = SpanishSpellChecker(method="dictionary", distance=1)

    word_to_check = "pruevs"
    suggestions_default = dict_checker.suggest(word_to_check)  # Using default checker (distance=2)
    suggestions_strict = dict_checker_strict.suggest(word_to_check)

    print(f"Suggestions for '{word_to_check}' (distance=2): {suggestions_default}")
    print(f"Suggestions for '{word_to_check}' (distance=1): {suggestions_strict}")

except Exception as e:
    print(f"Error initializing strict distance checker: {e}")

INFO:spanish_nlp.spellchecker:Initializing SpanishSpellChecker with method: 'dictionary'
INFO:spanish_nlp.spellchecker.dictionary_impl:DictionarySpellChecker initialized for language 'es' with distance 1.


Suggestions for 'pruevs' (distance=2): ['pues', 'prueba']
Suggestions for 'pruevs' (distance=1): []


### Using Custom Dictionary

Loading words into a dictionary allows for custom spell checking, preventing correctly spelled domain-specific words or names from being flagged as errors.

In [12]:
try:
    # Example adding custom words (using default distance=2)
    custom_words = ["levenshtein"]
    dict_checker_custom = SpanishSpellChecker(method="dictionary", custom_dictionary=custom_words)

    text_custom = "holaa levenshtein komo stas"
    print(f"\nOriginal Text: {text_custom}")
    errors_custom = dict_checker_custom.find_errors(text_custom)
    print(f"Potential Errors (custom dict): {errors_custom}")

except Exception as e:
    print(f"Error initializing custom dictionary checker: {e}")

INFO:spanish_nlp.spellchecker:Initializing SpanishSpellChecker with method: 'dictionary'
INFO:spanish_nlp.spellchecker.dictionary_impl:Loading 1 words from custom list.
INFO:spanish_nlp.spellchecker.dictionary_impl:DictionarySpellChecker initialized for language 'es' with distance 2.



Original Text: holaa levenshtein komo stas
Potential Errors (custom dict): ['holaa', 'komo', 'stas']


## Method 2: Contextual Language Model (`method='contextual_lm'`)

This method uses a transformer model (like BETO) to understand context. It's generally better for distinguishing between correctly spelled words used incorrectly (e.g., homophones) but is computationally more expensive.

**Note:** This implementation is currently a skeleton and does not perform actual corrections.

In [None]:
try:
    # Initialize using BETO model, auto-detect device (GPU if available)
    # You can specify device='cpu' or device=0 (for first GPU)
    lm_checker = SpanishSpellChecker(
        method="contextual_lm",
        model_name="dccuchile/bert-base-spanish-wwm-uncased",  # BETO
        # Other parameters like top_k, suggestion_distance_threshold would go here
    )
    print(f"Initialized: {lm_checker.get_implementation_details()}")

    # --- Example usage (will show warnings as it's not implemented) ---
    text_context_1 = "Ola komo estas?"
    print(f"\nOriginal: {text_context_1}")
    corrected_1 = lm_checker.correct_text(text_context_1)
    print(f"Corrected (Skeleton): {corrected_1}")

    print(f"Is 'komo' correct? {lm_checker.is_correct('komo')}")
    print(f"Suggestions for 'komo': {lm_checker.suggest('komo')}")

except Exception as e:
    print(f"Error initializing or using contextual checker: {e}")
    # print(
    #     "Make sure 'transformers', 'torch' (or 'tensorflow'), 'accelerate', and 'python-Levenshtein' are installed for full functionality."
    # )