# Newspaper Scraper v2 - Modular OOP Architecture

This notebook demonstrates the usage of the modular newspaper scraper that supports 4 Mendoza news portals:
- Los Andes
- Diario UNO
- El Sol
- MDZ

## Features
- ✅ Modular OOP architecture with abstract base class
- ✅ Portal-specific scrapers with custom XPath selectors
- ✅ Duplicate detection by URL
- ✅ Comprehensive logging with progress tracking
- ✅ Configurable delays (1s between requests, 2s between portals)
- ✅ Export to CSV and JSON
- ✅ Timestamp field (scraped_at)

## 1. Imports

In [0]:
# pip install lxml

In [0]:
# Import the scraper module
from newspapers_scraper_v2 import (
    NewspaperScraperOrchestrator,
    LosAndesScraper,
    DiarioUnoScraper,
    ElSolScraper,
    MDZScraper
)

import pandas as pd
import json

## 2. Scrape All Portals

The orchestrator will scrape all 4 portals sequentially with proper delays.

In [0]:
# Initialize the orchestrator
orchestrator = NewspaperScraperOrchestrator()

# Scrape all portals
results = orchestrator.scrape_all()

## 3. View Results as DataFrame

In [0]:
# Convert to DataFrame
df = orchestrator.to_dataframe()

# Display basic info
print(f"Total articles scraped: {len(df)}")
print("\nArticles per newspaper:")
print(df['newspaper'].value_counts())

# Display the DataFrame
display(df)

## 4. Export Data

In [0]:
# Export to CSV
orchestrator.export_csv("news_data.csv")

# Export to JSON
orchestrator.export_json("news_data.json")

print("✅ Data exported successfully!")