# NER Models Demonstration

This notebook demonstrates the modular NER system for Russian cultural texts using various models including OpenAI API, spaCy, and DeepPavlov.

## Features
- **Unified Interface**: All models work through the same API (BaseNERModel)
- **Direct Instantiation**: Simple, direct model creation without factory complexity
- **Multiple Providers**: OpenAI, spaCy, DeepPavlov, and extensible for more
- **Russian Focus**: Optimized for Russian cultural text NER
- **Cross-platform**: Works in both Google Colab and local environments

## Supported Models
- **OpenAI**: GPT-4o, GPT-4, GPT-3.5-turbo (requires API key)
- **spaCy**: Russian models (ru_core_news_sm, ru_spacy_ru_updated)
- **DeepPavlov**: BERT-based Russian NER (ner_collection3_bert, ner_ontonotes_bert_mult)

---
**Instructions**: Run the setup cells below in order, then continue with the demonstration.

## Setup Instructions

**Run cells in order:**

1. **Cell 2**: Common setup (repository cloning and dependencies)
2. **Cell 3**: Environment configuration (Colab OR Local)
3. **Cell 4**: Continue with the demo

The setup will automatically detect your environment and configure accordingly.

In [None]:
!git clone https://github.com/mary-lev/NER.git

In [None]:
# Environment setup and directory configuration
import os
from pathlib import Path

current_dir = Path.cwd()
if current_dir.name == 'NER':
    print("Already in NER directory")
    ner_dir = current_dir
else:
    ner_dir = current_dir / 'NER'


Already in NER directory


In [2]:
import pandas as pd
import numpy as np
import logging
from typing import List, Dict, Any
import matplotlib.pyplot as plt

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("Core libraries imported successfully!")

Core libraries imported successfully!


In [ ]:
# Import the NER models system - simplified without factory
from utils.ner_models import (
    BaseNERModel,
    NERPrediction, 
    OpenAIModel,
    SpacyModel,
    DeepPavlovModel
)

print("NER models system loaded successfully!")

# Show available model classes
print(f"\nAvailable model classes:")
print(f"   • OpenAIModel - for GPT-4o, GPT-4, GPT-3.5-turbo")
print(f"   • SpacyModel - for ru_core_news_sm, ru_spacy_ru_updated")
print(f"   • DeepPavlovModel - for ner_collection3_bert, ner_ontonotes_bert_mult")

print(f"\nExample usage:")
print(f"   model = SpacyModel('ru_core_news_sm')")
print(f"   model.initialize()")
print(f"   predictions = model.predict(text)")

## Sample Data for Testing

In [4]:
# Russian cultural texts for NER testing
sample_texts = [
    "Встреча с писательницей Сюзанной Кулешовой. Презентация книги «Последний глоток божоле на двоих». Кулешова Сюзанна Марковна, член Союза писателей Санкт-Петербурга.",
    "Большой поэтический вечер в самом начале весны! Дмитрий Артис, Борис Кутенков (Москва), Дмитрий Шабанов, Рахман Кусимов, Серафима Сапрыкина, Ася Анистратенко.",
    "Вечер поэта Томаса Венцлова (США). Презентация книги 'Гранёный воздух' М.: ОГИ, 2002.",
    "В рамках выставки «Максим Винавер. Пора возвращаться домой…». Спектакль по пьесе Максима Винавера «11 сентября».",
    "Очередная встреча проекта «Открытая читка – юность». Куратор − Черток Анна"
]

print(f"Loaded {len(sample_texts)} sample Russian cultural texts")
print("\nSample text:")
print(f"   {sample_texts[0][:80]}...")

Loaded 5 sample Russian cultural texts

Sample text:
   Встреча с писательницей Сюзанной Кулешовой. Презентация книги «Последний глоток ...


## spaCy Models Demo
Local models that don't require API keys.

In [None]:
!pip install -q numpy==1.25.0 openai pandas matplotlib seaborn
!python -m spacy download ru_core_news_sm

[33mDEPRECATION: The HTML index page being used (https://pypi.org/project/numpy/) is not a proper HTML 5 document. This is in violation of PEP 503 which requires these pages to be well-formed HTML 5 documents. Please reach out to the owners of this index page, and ask them to update this index page to a valid HTML 5 document. pip 22.2 will enforce this behaviour change. Discussion can be found at https://github.com/pypa/pip/issues/10825[0m[33m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
deeppavlov 1.7.0 requires numpy<1.24, but you have numpy 1.25.0 which is incompatible.[0m[31m
[0mLooking in indexes: https://pypi.org/simple, https://admin:****@pypi.welltory.tech/pypi/
Collecting ru-core-news-sm==3.8.0


In [ ]:
def demo_spacy_model(model_name='ru_core_news_sm', text_index=0):
    """Demonstrate a spaCy model on sample text."""
    try:
        print(f"\nTesting spaCy model: {model_name}...")
        
        # Create and initialize model directly
        model = SpacyModel(model_name)
        model.initialize()
        
        # Get model info
        info = model.get_model_info()
        print(f"Model: {info['model_name']} ({info['model_type']})")
        
        # Test prediction
        test_text = sample_texts[text_index]
        print(f"Text: {test_text[:100]}...")
        
        predictions = model.predict(test_text)
        
        print(f"Found {len(predictions)} entities:")
        for pred in predictions:
            print(f"   • '{pred.text}' [{pred.start}:{pred.end}] ({pred.entity_type})")
        
        return model, predictions
        
    except Exception as e:
        print(f"Failed: {e}")
        return None, []

# Test spaCy models
print("Testing spaCy models (local, no API key needed)")

# Check if spaCy is available and handle numpy compatibility issues
try:
    import spacy
    print(f"spaCy version: {spacy.__version__}")
    
    # Test the spaCy model
    spacy_model, spacy_predictions = demo_spacy_model('ru_core_news_sm')
    
except ImportError:
    print("spaCy not installed. Install with:")
    print("   !pip install spacy")
    print("   !python -m spacy download ru_core_news_sm")
    spacy_model, spacy_predictions = None, []
    
except ValueError as e:
    if "numpy.dtype size changed" in str(e):
        print("NumPy/spaCy compatibility issue detected.")
        print("Try restarting the runtime and running these commands:")
        print("   !pip install --upgrade numpy")
        print("   !pip install --no-build-isolation --force-reinstall spacy")
        print("   !python -m spacy download ru_core_news_sm")
        print("Then restart this notebook.")
    else:
        print(f"spaCy error: {e}")
    spacy_model, spacy_predictions = None, []

## DeepPavlov Models Demo
BERT-based models for high-accuracy Russian NER.

In [None]:
!pip install -q deeppavlov

In [ ]:
print("\nTesting DeepPavlov models (BERT-based, high accuracy)")
print("Note: First run takes 5-10 minutes to download models")

def demo_deeppavlov_model(model_name='ner_collection3_bert'):
    """Demonstrate a DeepPavlov model."""
    try:
        print(f"\nTesting DeepPavlov model: {model_name}...")
        
        # Create and initialize model directly
        model = DeepPavlovModel(model_name)
        model.initialize()
        
        # Test prediction
        test_text = sample_texts[0]
        predictions = model.predict(test_text)
        
        print(f"Found {len(predictions)} entities:")
        for pred in predictions:
            print(f"   • '{pred.text}' [{pred.start}:{pred.end}] ({pred.entity_type})")
        
        return model, predictions
        
    except Exception as e:
        print(f"DeepPavlov model failed: {e}")
        return None, []
    
# Uncomment to actually test (takes time on first run)
# dp_model, dp_predictions = demo_deeppavlov_model('ner_collection3_bert')
    
print("DeepPavlov demo skipped (uncomment above to run)")
print("Expected performance: F1 ~0.78-0.81 on Russian cultural texts")

## OpenAI Models Demo
API-based models with high flexibility.

In [ ]:
# Check for OpenAI API key
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY', "you_openai_api_key")
openai_available = 'OPENAI_API_KEY' in os.environ

def demo_openai_model(model_name='gpt-4o'):
    """Demonstrate an OpenAI model."""
    try:
        print(f"\nTesting OpenAI model: {model_name}...")
        
        # Create and initialize model directly
        model = OpenAIModel(model_name=model_name)
        model.initialize()
        
        # Test prediction
        test_text = sample_texts[0]
        predictions = model.predict(test_text)
        
        print(f"Found {len(predictions)} entities:")
        for pred in predictions:
            print(f"   • '{pred.text}' [{pred.start}:{pred.end}] ({pred.entity_type})")
        
        return model, predictions
        
    except Exception as e:
        print(f"OpenAI model failed: {e}")
        return None, []

if openai_available:
    gpt_model, gpt_predictions = demo_openai_model('gpt-4o')
else:
    print("OpenAI API key not set. Set OPENAI_API_KEY environment variable to test OpenAI models.")
    gpt_model, gpt_predictions = None, []

## Multiple Text Processing Demo
Process multiple texts efficiently using simple iteration.

In [ ]:
def process_multiple_texts_demo(model, texts):
    """Demonstrate processing multiple texts with a model."""
    try:
        print(f"\nProcessing {len(texts)} texts with {model.model_name}...")
        
        all_predictions = []
        total_entities = 0
        
        for i, text in enumerate(texts):
            try:
                predictions = model.predict(text)
                all_predictions.append(predictions)
                total_entities += len(predictions)
                
                print(f"Text {i+1}: {len(predictions)} entities")
                # Show first 2 entities from each text
                for pred in predictions[:2]:
                    print(f"   • '{pred.text}' ({pred.entity_type})")
                if len(predictions) > 2:
                    print(f"   ... and {len(predictions) - 2} more")
                print()
                    
            except Exception as e:
                print(f"Error processing text {i+1}: {e}")
                all_predictions.append([])
        
        avg_entities = total_entities / len(texts)
        print(f"Summary: {total_entities} total entities ({avg_entities:.1f} avg per text)")
        
        return all_predictions
        
    except Exception as e:
        print(f"Processing failed: {e}")
        return []

# Demo multiple text processing with available models
if 'spacy_model' in locals() and spacy_model:
    multiple_results = process_multiple_texts_demo(spacy_model, sample_texts)
else:
    print("No models available for multiple text processing demo")
    print("   Install spaCy or set OpenAI API key to test processing multiple texts")

## Model Comparison Demo
Compare different models on the same text.

In [ ]:
def compare_models(text, models):
    """Compare multiple models on the same text."""
    print(f"Comparing models on text:")
    print(f"   '{text[:80]}...'\n")
    
    results = {}
    
    for model_name, model in models.items():
        try:
            predictions = model.predict(text)
            results[model_name] = predictions
            
            print(f"{model_name}: {len(predictions)} entities")
            for pred in predictions:
                print(f"   • '{pred.text}' ({pred.entity_type})")
            print()
            
        except Exception as e:
            print(f"{model_name}: Failed - {e}\n")
            results[model_name] = []
    
    return results

# Define models to compare (add/remove based on availability)
comparison_models = {}

# Add spaCy if working
if 'spacy_model' in locals() and spacy_model:
    comparison_models['spacy'] = spacy_model

# Add OpenAI if available
if 'gpt_model' in locals() and gpt_model:
    comparison_models['gpt-4o'] = gpt_model

if comparison_models:
    comparison_results = compare_models(sample_texts[1], comparison_models)
else:
    print("No models available for comparison")
    print("   This would show side-by-side results from different models")

## Custom Model Configuration
Create models with custom settings.

In [ ]:
# Example: Create OpenAI model with custom settings
if openai_available:
    print("Creating custom OpenAI model configuration...")
    
    try:
        # Custom model with specific parameters
        custom_model = OpenAIModel(
            model_name='gpt-4',
            temperature=0.3,  # Lower temperature for more consistent results
            max_tokens=500,
            output_format='json'  # Structured JSON output
        )
        custom_model.initialize()
        
        print(f"Custom model created: {custom_model}")
        print(f"Configuration: {custom_model.get_model_info()['config']}")
        
    except Exception as e:
        print(f"Custom model creation failed: {e}")

# Example: Create spaCy model with custom entity types
print("\nCreating custom spaCy model configuration...")

try:
    custom_spacy = SpacyModel(
        model_name='ru_core_news_sm',
        entity_types={'PERSON', 'ORG', 'GPE'}  # Only these entity types
    )
    custom_spacy.initialize()
    
    print(f"Custom spaCy model created")
    info = custom_spacy.get_model_info()
    print(f"Entity types: {info.get('config', {}).get('entity_types', 'All')}")
    
except Exception as e:
    print(f"Custom spaCy model creation failed: {e}")

print("\nDirect model instantiation allows easy customization of any model type!")

## Integration with Evaluation Framework
Show how NER results integrate with the evaluation system.

In [ ]:
# Import evaluation utilities
from utils import NEREvaluator
from utils.common import safe_eval_list

def convert_to_evaluation_format(predictions):
    """Convert NER predictions to evaluation format."""
    return [pred.to_tuple() for pred in predictions]

# Example: Process text and convert to evaluation format
if 'spacy_model' in locals() and spacy_model:
    print("Integration with evaluation framework:")
    
    # Get predictions
    test_text = sample_texts[0]
    predictions = spacy_model.predict(test_text)
    
    # Convert to evaluation format (tuples)
    eval_format = convert_to_evaluation_format(predictions)
    
    print(f"Text: {test_text[:60]}...")
    print(f"Model predictions: {len(predictions)} entities")
    print(f"Evaluation format: {eval_format}")
    
    # This format can be used directly with NEREvaluator
    evaluator = NEREvaluator()
    
    # Example: Calculate metrics (would need ground truth for real evaluation)
    print("\nReady for evaluation pipeline integration!")
    print("For multiple texts, use simple iteration:")
    print("   for text in texts:")
    print("       predictions = model.predict(text)")
    print("       eval_format = convert_to_evaluation_format(predictions)")
else:
    print("Integration demo requires a working model")
    print("   This shows how to convert NER predictions to evaluation format")

## Summary and Next Steps

This notebook demonstrated the **simplified NER system** with:

### Features Shown
- **Unified Interface**: All models use the same `BaseNERModel` API
- **Direct Instantiation**: Simple, clear model creation without factory complexity
- **Multiple Providers**: spaCy, OpenAI, DeepPavlov support
- **Simple Processing**: Clean iteration for multiple texts
- **Custom Configuration**: Flexible model parameters
- **Evaluation Integration**: Compatible with existing evaluation framework

### Available Models
- **spaCy**: Local, no API key, good for development
- **OpenAI**: High accuracy, flexible, requires API key
- **DeepPavlov**: BERT-based, excellent for Russian, local

### Usage Patterns
```python
# Simple usage - direct instantiation
model = SpacyModel('ru_core_news_sm')
model.initialize()
predictions = model.predict(text)

# Multiple texts - simple iteration
for text in texts:
    predictions = model.predict(text)
    # Process each text individually

# Custom configuration
custom_model = OpenAIModel(model_name='gpt-4', temperature=0.3)
custom_model.initialize()
```

### Benefits of Simplified Design
- **Cleaner API**: Only `predict()` method needed per model
- **Easier debugging**: Direct stack traces, no batch complexity
- **Type-safe**: IDE autocomplete and type checking
- **Less complexity**: No batch processing overhead
- **Clear responsibility**: Models focus on single text prediction

### Next Steps
1. **Install models**: spaCy Russian models, DeepPavlov (optional)
2. **Set API keys**: OPENAI_API_KEY for OpenAI models (optional)
3. **Run evaluation**: See `Evaluation_Analysis.ipynb` for model comparison
4. **Extend system**: Add new model types using the `BaseNERModel` interface

---

**The simplified NER system provides a clean, maintainable foundation for Russian cultural text analysis!**