In [None]:
# Mount Google Drive
# If already mounted this will show "Drive is already mounted" ‚Äî that's fine.
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Install packages that are not pre-installed in Colab
# (torch, torchvision, numpy, Pillow, requests are already available)
!pip install -q git+https://github.com/openai/CLIP.git ftfy

# Working with Cultural Heritage APIs: Europeana

Welcome to this workshop on working with APIs and cultural heritage data. In this notebook, you will:

1. **Learn** what an API is and why APIs matter for digital humanities research
2. **Explore** Europeana's collections: countries and institutions
3. **Filter** the collection (e.g., Swedish institutions)
4. **Search** for artworks by keyword and analyze results (who painted the most rivers?)
5. **Build** OR queries to search for multiple concepts (water bodies)
6. **Create** a universal download function with customizable options
7. **Download** paintings from a specific artist and time period

---

## About Europeana

[Europeana](https://www.europeana.eu/) is Europe's digital platform for cultural heritage, providing access to:
- **50+ million** digitized items from European museums, galleries, libraries and archives
- **Content from** 3,000+ institutions across Europe
- **Collections including** artworks, books, music, videos, photographs, manuscripts
- **Open data** under various Creative Commons licenses

Europeana aggregates content from major institutions including:
- Rijksmuseum (Netherlands)
- British Library (UK)
- Louvre (France)
- Uppsala University (Sweden)
- And many more across Europe

**Note:** Europeana uses IIIF (International Image Interoperability Framework) for many items, providing standardized access to high-resolution images.

---

## Part 1: What is an API?

**API** stands for **Application Programming Interface**

An API:
- Takes your **request** ("give me all paintings by Rembrandt")
- Sends it to a **server** (the database)
- Returns a **response** (the data you asked for)

### Why APIs matter for Digital Humanities

- **Scale**: Download thousands of records automatically instead of clicking through web pages
- **Structure**: Data comes in machine-readable formats (JSON, XML) ready for analysis
- **Reproducibility**: Your code documents exactly how you obtained your data
- **Updates**: Re-run your code to get the latest data
- **Integration**: Combine data from multiple institutions

### Common Data Formats

| Format | Description | Example |
|--------|-------------|---------|
| **JSON** | JavaScript Object Notation - human-readable, widely used | `{"name": "Mona Lisa", "year": 1503}` |
| **XML** | eXtensible Markup Language - similar to HTML | `<artwork><name>Mona Lisa</name></artwork>` |
| **CSV** | Comma-Separated Values - spreadsheet-like | `name,year\nMona Lisa,1503` |

### How an API Works

```mermaid
sequenceDiagram
    actor You as üßë‚Äçüíª You (Python)
    participant API as üåê Europeana API
    participant DB as üóÑÔ∏è Database<br/>(50M+ items)

    You->>API: GET /search.json?query=Rembrandt&rows=10
    Note over You,API: HTTP Request with your API key

    API->>DB: Query matching records
    DB-->>API: Raw records

    API-->>You: JSON response
    Note over API,You: {"totalResults": 12543,<br/>"items": [...]}

    You->>You: Parse & analyse data
```

> **Key idea:** You never talk to the database directly ‚Äî the API is a controlled gateway that handles authentication, rate limiting, and formats the data for you.

---

## Part 2: Setup

First, let's import the libraries we need and set up our project structure.

In [None]:
# Standard library imports
import os
import json
import time
from pathlib import Path
from urllib.parse import unquote, quote
from collections import Counter

# External libraries (you may need to install these)
import requests
from IPython.display import display, Image, HTML

# Set up paths
PROJECT_ROOT = Path("/content/drive/MyDrive/Distant_viewing")
DATA_DIR = PROJECT_ROOT / "data" / "europeana"
IMAGES_DIR = PROJECT_ROOT / "images" / "europeana"

# Create directories if they don't exist
DATA_DIR.mkdir(parents=True, exist_ok=True)
IMAGES_DIR.mkdir(parents=True, exist_ok=True)

print(f"Project root: {PROJECT_ROOT}")
print(f"Data directory: {DATA_DIR}")
print(f"Images directory: {IMAGES_DIR}")

### API Key Configuration

Europeana requires a free API key. You can get one by:
1. Visit: https://pro.europeana.eu/page/get-api
2. Register for a Europeana account
3. Request an API key from your account section
4. Save your API key to `notebooks/api-key-europeana.txt`

For testing, we'll use a demo key with limited access.

In [None]:
# Load API key - check multiple locations
NOTEBOOK_DIR = Path(".").resolve()
API_KEY_LOCATIONS = [
    NOTEBOOK_DIR / "api-key-europeana.txt",
    PROJECT_ROOT / "misc" / "api-key-europeana.txt",
    PROJECT_ROOT / "api-key-europeana.txt"
]

# Default demo key (limited requests)
API_KEY = "api2demo"

for key_file in API_KEY_LOCATIONS:
    if key_file.exists():
        with open(key_file, 'r') as f:
            custom_key = f.read().strip()
            if custom_key and custom_key != "api2demo":
                API_KEY = custom_key
                print(f"‚úì API key loaded from {key_file}")
                break
else:
    print(f"‚Ñπ Using demo API key (limited to 999 requests)")
    print(f"  For unlimited access, get your own key at: https://pro.europeana.eu/page/get-api")
    print(f"  Save it to: notebooks/api-key-europeana.txt")

# Base API endpoint
BASE_URL = "https://api.europeana.eu/record/v2"
print(f"\nAPI endpoint: {BASE_URL}")

---

## Part 3: Understanding the Europeana API

The Europeana API provides two main endpoints:

1. **Search API** - Query and filter the collection
2. **Record API** - Get detailed information about specific items

### Key API Parameters

| Parameter | Description | Example |
|-----------|-------------|---------|
| `query` | Search term | `Rembrandt`, `painting`, `*` (all) |
| `qf` | Query filter | `TYPE:IMAGE`, `COUNTRY:Netherlands` |
| `reusability` | License filter | `open`, `restricted`, `permission` |
| `rows` | Results per page (max 100) | `12` (default), `100` |
| `profile` | Detail level | `standard`, `rich`, `facets` |

### What is a Facet?

In API terminology, a **facet** is a category or attribute used to filter and aggregate search results. Think of it like a filter dimension in a search interface:

- `COUNTRY` facet ‚Üí shows how many results exist per country
- `DATA_PROVIDER` facet ‚Üí shows counts by institution/museum
- `TYPE` facet ‚Üí shows counts by media type (image, video, text, etc.)
- `dc_creator` facet ‚Üí shows counts by creator/artist

---

## Part 4: Exploring Countries and Institutions

Let's start by discovering what countries and institutions are available in Europeana.

In [None]:
def get_facet_values(facet_name, query="*", qf=None, max_values=50):
    """
    Query the Europeana API to get available values for a facet.
    
    Parameters:
        facet_name: The facet to query (e.g., "COUNTRY", "DATA_PROVIDER", "dc_creator")
        query: Search query to filter results (default: "*" for all)
        qf: Optional query filters (e.g., ["COUNTRY:Sweden"])
        max_values: Maximum number of facet values to return
    
    Returns:
        List of (value, count) tuples
    """
    url = f"{BASE_URL}/search.json"
    
    params = {
        "wskey": API_KEY,
        "query": query,
        "rows": 0,  # We only want facets, not actual results
        "profile": "facets",
        "facet": facet_name,
        f"f.{facet_name}.facet.limit": max_values
    }
    
    if qf:
        params["qf"] = qf
    
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        data = response.json()
        
        # Extract facet values
        facets = data.get('facets', [])
        for facet in facets:
            if facet.get('name') == facet_name:
                fields = facet.get('fields', [])
                return [(f['label'], f['count']) for f in fields]
        return []
    
    except requests.exceptions.RequestException as e:
        print(f"‚ùå Error fetching facets: {e}")
        return []

In [None]:
# Get all available countries
print("Available COUNTRIES in Europeana:")
print("=" * 60)
countries = get_facet_values("COUNTRY", max_values=50)

# Store for later use
country_counts = {country: count for country, count in countries}

for country, count in countries[:25]:
    print(f"  {country:<30} ({count:,} items)")
if len(countries) > 25:
    print(f"  ... and {len(countries) - 25} more countries")

print(f"\nTotal countries: {len(countries)}")

In [None]:
# Get top data providers (institutions)
print("Top DATA_PROVIDERS (Institutions) in Europeana:")
print("=" * 60)
providers = get_facet_values("DATA_PROVIDER", max_values=30)

for provider, count in providers[:20]:
    # Truncate long names for display
    display_name = provider[:55] + "..." if len(provider) > 55 else provider
    print(f"  {display_name:<58} ({count:,} items)")
if len(providers) > 20:
    print(f"  ... and {len(providers) - 20} more providers")

---

## Part 5: Use Case ‚Äî Swedish Institutions

Let's explore a specific use case: **How many Swedish institutions are in Europeana, and what do they have?**

In [None]:
# Find all data providers in Sweden
COUNTRY_TO_EXPLORE = "Sweden"  # <-- Try changing this to other countries!

swedish_providers = get_facet_values(
    "DATA_PROVIDER", 
    qf=[f"COUNTRY:{COUNTRY_TO_EXPLORE}"],
    max_values=100
)

# Calculate totals
total_institutions = len(swedish_providers)
total_items = sum(count for _, count in swedish_providers)

print(f"Data providers in {COUNTRY_TO_EXPLORE}:")
print("=" * 70)
print(f"üèõÔ∏è  Total institutions: {total_institutions}")
print(f"üì¶ Total items: {total_items:,}")
print("=" * 70)
print()

for i, (provider, count) in enumerate(swedish_providers, 1):
    display_name = provider[:55] + "..." if len(provider) > 55 else provider
    print(f"{i:3}. {display_name:<58} ({count:,} items)")

In [None]:
# Compare institution counts across Nordic countries
print("Comparing Nordic Countries:")
print("=" * 60)

nordic_countries = ["Sweden", "Norway", "Denmark", "Finland"]

for country in nordic_countries:
    providers = get_facet_values("DATA_PROVIDER", qf=[f"COUNTRY:{country}"], max_values=200)
    total_items = sum(count for _, count in providers)
    print(f"  {country:<15} {len(providers):>4} institutions, {total_items:>12,} items")

---

## Part 6: Searching for Keywords ‚Äî Who Painted the Most Rivers?

Let's search for paintings of rivers and analyze who created the most.

In [None]:
def search_europeana(query="*", rows=12, reusability="open", qf=None, profile="rich", cursor=None):
    """
    Search the Europeana collection.
    
    Parameters:
        query: Search term (default: "*" for all)
        rows: Number of results to return (max 100)
        reusability: Filter by license ("open", "restricted", "permission", or None)
        qf: Additional query filters as list (e.g., ["TYPE:IMAGE", "COUNTRY:Netherlands"])
        profile: "standard", "rich", or "facets" for more metadata
        cursor: Cursor for pagination (use for getting more than 100 results)
    
    Returns:
        Dictionary with search results
    """
    url = f"{BASE_URL}/search.json"
    
    params = {
        "wskey": API_KEY,
        "query": query,
        "rows": min(rows, 100),
        "profile": profile
    }
    
    if reusability:
        params["reusability"] = reusability
    
    if qf:
        params["qf"] = qf
    
    if cursor:
        params["cursor"] = cursor
    
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"‚ùå Error searching Europeana: {e}")
        return None

In [None]:
# Search for paintings of rivers
SEARCH_KEYWORD = "river"  # <-- Try changing this!

print(f"Searching for '{SEARCH_KEYWORD}' in paintings...")
print("=" * 60)

# Search with facets to see creator distribution
results = search_europeana(
    query=SEARCH_KEYWORD,
    rows=100,
    qf=["TYPE:IMAGE", "what:painting"],  # Filter for paintings
    reusability="open"
)

if results and results.get('success'):
    print(f"‚úì Found {results['totalResults']:,} paintings matching '{SEARCH_KEYWORD}'")
else:
    print("‚ùå Search failed")

In [None]:
# Analyze creators - who painted the most rivers?
# Extract creator information directly from the fetched search results
# (Europeana's API does not expose proxy_dc_creator as a facet field)

print(f"\nAnalyzing creators for '{SEARCH_KEYWORD}' paintings...")
print("=" * 60)

creator_counts = Counter()
items = results.get('items', []) if results else []

for item in items:
    raw = item.get('dcCreator', [])
    creator = (raw[0] if isinstance(raw, list) else raw) or 'Unknown'
    if creator.lower() not in ['unknown', 'anonymous', 'unbekannt']:
        creator_counts[creator] += 1

top_creators = creator_counts.most_common(15)

print(f"\nüé® Top Artists who painted '{SEARCH_KEYWORD}' (from {len(items)} results):\n")
if not top_creators:
    print("  No creator information found in these results.")
else:
    for i, (creator, count) in enumerate(top_creators, 1):
        display_name = creator[:50] + "..." if len(creator) > 50 else creator
        bar = "‚ñà" * min(count * 3, 30)
        print(f"{i:2}. {display_name:<53} {count:>5} {bar}")

---

## Part 7: OR Queries ‚Äî Searching for Multiple Water Bodies

What if we want to find paintings of various water bodies? We can use **OR** queries to combine multiple search terms.

### Europeana Query Syntax

| Operator | Example | Meaning |
|----------|---------|--------|
| `AND` (default) | `river landscape` | Must contain both |
| `OR` | `river OR sea OR ocean` | Contains any of these |
| `NOT` or `-` | `river NOT mountain` | Excludes term |
| `"..."` | `"still life"` | Exact phrase |
| `*` | `land*` | Wildcard |

In [None]:
def build_or_query(terms):
    """
    Build an OR query from a list of terms.
    
    Parameters:
        terms: List of search terms
    
    Returns:
        String like "(term1 OR term2 OR term3)"
    """
    if not terms:
        return "*"
    if len(terms) == 1:
        return terms[0]
    return "(" + " OR ".join(terms) + ")"


# Example: search for various water bodies
water_bodies = ["river", "sea", "ocean", "lake", "seashore", "beach", "coast", "waterfall"]

or_query = build_or_query(water_bodies)
print(f"Combined OR query: {or_query}")

In [None]:
# ============================================================
# EXERCISE: Customize the water body search!
# ============================================================

WATER_BODIES = ["river", "sea", "ocean", "lake", "seashore"]  # <-- Add or remove terms!

# ============================================================

or_query = build_or_query(WATER_BODIES)
print(f"Searching for: {or_query}")
print("=" * 60)

water_results = search_europeana(
    query=or_query,
    rows=100,
    qf=["TYPE:IMAGE", "what:painting"],
    reusability="open"
)

if water_results and water_results.get('success'):
    print(f"‚úì Found {water_results['totalResults']:,} paintings of water bodies")
    
    # Get top creators
    print("\nüé® Top Artists who painted water bodies:\n")
    water_creators = get_facet_values(
        "proxy_dc_creator",
        query=or_query,
        qf=["TYPE:IMAGE", "what:painting"],
        max_values=15
    )
    
    for i, (creator, count) in enumerate(water_creators[:10], 1):
        if creator.lower() in ['unknown', 'anonymous']:
            continue
        display_name = creator[:50] + "..." if len(creator) > 50 else creator
        print(f"{i:2}. {display_name:<53} {count:>5} paintings")
else:
    print("‚ùå Search failed")

In [None]:
# Compare how many results each water body term returns
print("Comparing individual water body terms:")
print("=" * 60)

for term in WATER_BODIES:
    result = search_europeana(
        query=term,
        rows=0,  # Just get count
        qf=["TYPE:IMAGE", "what:painting"],
        reusability="open"
    )
    if result and result.get('success'):
        count = result['totalResults']
        bar = "‚ñà" * min(count // 1000, 40)
        print(f"  {term:<15} {count:>8,} paintings {bar}")

---

## Part 8: Building a Universal Download Function

Now let's create a **flexible function** that can:
- Search Europeana with custom queries and filters
- Optionally download images
- Choose resolution (thumbnail, medium, large)
- Display brief or full metadata

This function will be reusable throughout the workshop!

In [None]:
# Helper functions for extracting metadata

def get_item_title(item):
    """Extract title from an item."""
    if 'title' in item and item['title']:
        return item['title'][0] if isinstance(item['title'], list) else item['title']
    if 'dcTitleLangAware' in item:
        for lang in ['en', 'nl', 'de', 'fr', 'sv', 'def']:
            if lang in item['dcTitleLangAware']:
                return item['dcTitleLangAware'][lang][0]
    return "Untitled"


def get_item_creator(item):
    """Extract creator/artist from an item."""
    if 'dcCreator' in item and item['dcCreator']:
        return item['dcCreator'][0] if isinstance(item['dcCreator'], list) else item['dcCreator']
    return "Unknown"


def get_item_year(item):
    """Extract year from an item."""
    if 'year' in item and item['year']:
        return item['year'][0] if isinstance(item['year'], list) else item['year']
    return "n.d."


def get_item_preview(item):
    """Extract preview image URL."""
    if 'edmPreview' in item and item['edmPreview']:
        return item['edmPreview'][0] if isinstance(item['edmPreview'], list) else item['edmPreview']
    return None


def get_full_image_url(item):
    """Extract full-size image URL (from source institution)."""
    if 'edmIsShownBy' in item and item['edmIsShownBy']:
        return item['edmIsShownBy'][0] if isinstance(item['edmIsShownBy'], list) else item['edmIsShownBy']
    return None


def sanitize_filename(name):
    """Remove problematic characters from filenames."""
    if not name:
        return "unknown"
    safe = "".join(c for c in name if c.isalnum() or c in ' ._-')
    return safe.strip()[:80]

In [None]:
def europeana_search_and_download(
    query,
    filters=None,
    max_results=20,
    download_images=False,
    resolution="thumbnail",
    output_dir=None,
    display_mode="brief",
    reusability="open",
    delay=0.5
):
    """
    Universal function to search Europeana and optionally download images.
    
    Parameters:
    -----------
    query : str
        Search query (supports OR syntax)
    
    filters : list, optional
        Query filters like ["TYPE:IMAGE", "COUNTRY:Sweden"]
    
    max_results : int, default=20
        Maximum number of results to return
    
    download_images : bool, default=False
        Whether to download images
    
    resolution : str, default="thumbnail"
        Image resolution: "thumbnail" (fast, ~200-400px), "full" (from source)
    
    output_dir : Path or str, optional
        Directory to save images (created automatically if download_images=True)
    
    display_mode : str, default="brief"
        Metadata display: "brief" (title, creator, year), "full" (all fields), "none"
    
    reusability : str, default="open"
        License filter: "open", "restricted", "permission", or None
    
    delay : float, default=0.5
        Delay between downloads (be nice to the server)
    
    Returns:
    --------
    dict with keys:
        - 'items': list of item metadata
        - 'total_results': total matching items
        - 'downloaded_files': list of downloaded file paths (if download_images=True)
    """
    
    print(f"üîç Searching Europeana for: {query}")
    if filters:
        print(f"   Filters: {', '.join(filters)}")
    print("=" * 60)
    
    # Search
    results = search_europeana(
        query=query,
        rows=min(max_results, 100),
        qf=filters,
        reusability=reusability,
        profile="rich"
    )
    
    if not results or not results.get('success'):
        print("‚ùå Search failed")
        return {'items': [], 'total_results': 0, 'downloaded_files': []}
    
    items = results.get('items', [])
    total = results.get('totalResults', 0)
    
    print(f"‚úì Found {total:,} total results, showing {len(items)}")
    print()
    
    # Display metadata
    if display_mode != "none" and items:
        print("üìã Results:")
        print("-" * 60)
        
        for i, item in enumerate(items[:max_results], 1):
            title = get_item_title(item)
            creator = get_item_creator(item)
            year = get_item_year(item)
            
            if display_mode == "brief":
                print(f"{i:3}. {title[:50]}")
                print(f"     by {creator} ({year})")
                print()
            
            elif display_mode == "full":
                country = item.get('country', ['Unknown'])[0] if item.get('country') else 'Unknown'
                provider = item.get('dataProvider', ['Unknown'])[0] if item.get('dataProvider') else 'Unknown'
                rights = item.get('rights', ['Unknown'])[0] if item.get('rights') else 'Unknown'
                item_id = item.get('id', 'Unknown')
                
                print(f"{i:3}. {title}")
                print(f"     Creator:    {creator}")
                print(f"     Year:       {year}")
                print(f"     Country:    {country}")
                print(f"     Provider:   {provider}")
                print(f"     License:    {rights}")
                print(f"     ID:         {item_id}")
                print()
    
    # Download images
    downloaded_files = []
    
    if download_images and items:
        if output_dir is None:
            safe_query = sanitize_filename(query)[:30]
            output_dir = IMAGES_DIR / f"{safe_query}_images"
        
        output_dir = Path(output_dir)
        output_dir.mkdir(parents=True, exist_ok=True)
        
        print(f"\nüì• Downloading images to: {output_dir}")
        print(f"   Resolution: {resolution}")
        print("-" * 60)
        
        for i, item in enumerate(items[:max_results], 1):
            title = get_item_title(item)
            creator = get_item_creator(item)
            item_id = item.get('id', 'unknown').replace('/', '_')
            
            # Get image URL based on resolution
            if resolution == "full":
                image_url = get_full_image_url(item) or get_item_preview(item)
            else:  # thumbnail
                image_url = get_item_preview(item)
            
            if not image_url:
                print(f"  [{i}/{len(items)}] ‚ö†Ô∏è  No image: {title[:40]}")
                continue
            
            # Create filename
            safe_creator = sanitize_filename(creator)[:30]
            safe_title = sanitize_filename(title)[:30]
            filename = f"{safe_creator}_{item_id}_{safe_title}.jpg"
            filepath = output_dir / filename
            
            # Skip if exists
            if filepath.exists() and filepath.stat().st_size > 0:
                print(f"  [{i}/{len(items)}] ‚äô Exists: {filename[:50]}")
                downloaded_files.append(filepath)
                continue
            
            # Download
            try:
                headers = {
                    'User-Agent': 'Mozilla/5.0 (Workshop Bot) Python/requests',
                    'Accept': 'image/*'
                }
                response = requests.get(image_url, headers=headers, stream=True, timeout=30)
                response.raise_for_status()
                
                with open(filepath, 'wb') as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        if chunk:
                            f.write(chunk)
                
                if filepath.exists() and filepath.stat().st_size > 0:
                    size_kb = filepath.stat().st_size / 1024
                    print(f"  [{i}/{len(items)}] ‚úì {filename[:50]} ({size_kb:.1f} KB)")
                    downloaded_files.append(filepath)
                else:
                    print(f"  [{i}/{len(items)}] ‚ùå Empty file: {filename[:50]}")
                    if filepath.exists():
                        filepath.unlink()
            
            except Exception as e:
                print(f"  [{i}/{len(items)}] ‚ùå Error: {str(e)[:50]}")
            
            if i < len(items):
                time.sleep(delay)
        
        print("-" * 60)
        print(f"‚úì Downloaded {len(downloaded_files)}/{len(items)} images")
    
    return {
        'items': items,
        'total_results': total,
        'downloaded_files': downloaded_files
    }

In [None]:
# Test the universal function - brief display, no download
result = europeana_search_and_download(
    query="river",
    filters=["TYPE:IMAGE", "what:painting"],
    max_results=5,
    download_images=False,
    display_mode="brief"
)

---

## Part 9: Download Paintings from a Specific Artist and Period

Now let's use our function to download river paintings from a specific artist in a time period.

In [None]:
# ============================================================
# EXERCISE: Configure your download!
# ============================================================

# Search configuration
SEARCH_TERM = "river"              # <-- Change this!
ARTIST_NAME = None                  # <-- Set to artist name or None for all
YEAR_FROM = 1800                    # <-- Start year (or None)
YEAR_TO = 1900                      # <-- End year (or None)
COUNTRY = None                      # <-- Filter by country (or None)

# Download configuration
MAX_IMAGES = 10                     # <-- How many to download?
DOWNLOAD = False                    # <-- Set to True to actually download
RESOLUTION = "thumbnail"            # <-- "thumbnail" or "full"
DISPLAY = "full"                    # <-- "brief", "full", or "none"

# ============================================================

# Build filters
filters = ["TYPE:IMAGE", "what:painting"]

# Add optional filters
if ARTIST_NAME:
    # Note: creator filtering uses proxy_dc_creator field
    filters.append(f'proxy_dc_creator:"{ARTIST_NAME}"')

if YEAR_FROM and YEAR_TO:
    filters.append(f"YEAR:[{YEAR_FROM} TO {YEAR_TO}]")
elif YEAR_FROM:
    filters.append(f"YEAR:[{YEAR_FROM} TO *]")
elif YEAR_TO:
    filters.append(f"YEAR:[* TO {YEAR_TO}]")

if COUNTRY:
    filters.append(f"COUNTRY:{COUNTRY}")

# Create output directory name
folder_parts = [SEARCH_TERM]
if ARTIST_NAME:
    folder_parts.append(ARTIST_NAME.replace(' ', '_'))
if YEAR_FROM or YEAR_TO:
    folder_parts.append(f"{YEAR_FROM or ''}-{YEAR_TO or ''}")
output_folder = IMAGES_DIR / "_".join(folder_parts)

# Run the search
result = europeana_search_and_download(
    query=SEARCH_TERM,
    filters=filters,
    max_results=MAX_IMAGES,
    download_images=DOWNLOAD,
    resolution=RESOLUTION,
    output_dir=output_folder,
    display_mode=DISPLAY
)

In [None]:
# Preview some images from the search
if result['items']:
    print("\nüñºÔ∏è  Preview of first 3 results:")
    print("=" * 60)
    
    for item in result['items'][:3]:
        title = get_item_title(item)
        creator = get_item_creator(item)
        year = get_item_year(item)
        preview = get_item_preview(item)
        
        print(f"\n{title}")
        print(f"by {creator}, {year}")
        
        if preview:
            display(Image(url=preview, width=400))
        print("-" * 40)

---

## Part 10: Saving Search Results

Save your search results as JSON for later analysis or use in the next notebook.

In [None]:
def save_search_results(items, filename, metadata=None):
    """
    Save search results to a JSON file.
    
    Parameters:
        items: List of Europeana items
        filename: Output filename
        metadata: Optional dict with search parameters
    """
    if not items:
        print("‚ùå No results to save")
        return None
    
    output_path = DATA_DIR / filename
    
    data = {
        'metadata': metadata or {},
        'count': len(items),
        'items': items
    }
    
    with open(output_path, 'w', encoding='utf-8') as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    
    print(f"‚úì Saved {len(items)} items to {output_path}")
    return output_path


# Save your search results
if result['items']:
    safe_query = sanitize_filename(SEARCH_TERM)
    save_search_results(
        result['items'],
        f"{safe_query}_search_results.json",
        metadata={
            'query': SEARCH_TERM,
            'filters': filters,
            'total_results': result['total_results']
        }
    )

---

## Part 11: IIIF ‚Äî International Image Interoperability Framework

**IIIF** (pronounced *"triple-eye-eff"*) is a family of open standards that lets cultural heritage institutions share their digitised collections in a **consistent, interoperable** way.

Without IIIF, every museum, library, and archive builds its own image viewer with its own API ‚Äî making it impossible to combine or compare images across institutions. IIIF solves this by providing a **common language**: any IIIF-compatible viewer (Mirador, Universal Viewer, OpenSeadragon, ...) can display content from any IIIF-compatible server, regardless of where the image lives.

> Think of it like a **power socket standard**: the image stays at the institution, but any compliant tool can plug in and use it.

---

### The Four Key APIs

| API | Purpose |
|-----|--------|
| **Presentation API** | Describes *what* to show and in what order ‚Äî the manifest, canvases, and sequences |
| **Image API** | Delivers image pixels on demand: crop, resize, and rotate any region of an image |
| **Search API** | Full-text search within the content of a manifest |
| **Authentication API** | Controls access to restricted or rights-managed content |

---

### How IIIF Works

A viewer never downloads the whole image upfront. Instead, it fetches a **manifest** (a JSON-LD description of the object), reads where the image lives, then requests only the tiles it needs from the image server ‚Äî at exactly the right size and region.

```mermaid
flowchart LR
    V["üñ•Ô∏è IIIF Viewer\nMirador ¬∑ Universal Viewer\nOpenSeadragon"]
    M["üìã Manifest\nJSON-LD document\ndescribes the object"]
    I["üóÑÔ∏è Image Server\nIIIF Image API\nat the institution"]
    P["üñºÔ∏è Image tiles\ncropped ¬∑ resized\nrotated ¬∑ transcoded"]

    V -->|"1 ¬∑ fetch manifest URL"| M
    M -->|"2 ¬∑ reads canvas &\nimage service location"| V
    V -->|"3 ¬∑ GET region/size/rotation/quality.format"| I
    I -->|"4 ¬∑ returns exact pixels"| P
    P --> V
```

---

### The IIIF Image API ‚Äî URL Structure

When an image resource includes a IIIF Image API service, you can request any region or size on the fly:

```
{service_url}/{region}/{size}/{rotation}/{quality}.{format}
```

| Parameter | Examples |
|-----------|---------|
| **region** | `full` ¬∑ `square` ¬∑ `x,y,w,h` ¬∑ `pct:x,y,w,h` |
| **size** | `max` ¬∑ `500,` (width) ¬∑ `,300` (height) ¬∑ `pct:50` |
| **rotation** | `0` ¬∑ `90` ¬∑ `180` ¬∑ `!0` (mirror) |
| **quality** | `default` ¬∑ `color` ¬∑ `gray` ¬∑ `bitonal` |
| **format** | `jpg` ¬∑ `png` ¬∑ `webp` ¬∑ `tif` |

In [None]:
def get_iiif_manifest_url(item):
    """
    Construct IIIF manifest URL from item ID.
    
    Europeana generates IIIF manifests on-the-fly for all items.
    Pattern: https://iiif.europeana.eu/presentation/{dataset}/{local_id}/manifest
    """
    item_id = item.get('id')
    if not item_id:
        return None
    
    parts = item_id.strip('/').split('/')
    if len(parts) >= 2:
        dataset = parts[0]
        local_id = parts[1]
        return f"https://iiif.europeana.eu/presentation/{dataset}/{local_id}/manifest"
    return None


# Show IIIF manifest URLs for search results
if result['items']:
    print("üîó IIIF Manifest URLs for your search results:")
    print("=" * 60)
    
    for item in result['items'][:5]:
        title = get_item_title(item)[:40]
        manifest_url = get_iiif_manifest_url(item)
        if manifest_url:
            print(f"\n{title}")
            print(f"  {manifest_url}")

---

## Summary

In this notebook, you learned:

1. **What APIs are** and why they're useful for digital humanities research
2. **How to explore** Europeana's countries and institutions
3. **How to filter** by country (Swedish institutions use case)
4. **How to search** for keywords and analyze results (who painted the most rivers?)
5. **How to build OR queries** for multiple concepts (water bodies)
6. **How to create** a universal download function with customizable options
7. **How to download** images from specific artists and time periods
8. **About IIIF** and standardized image access

### Next Steps

In the next notebook (**02_clip_semantic_search.ipynb**), you will:
- Download the Uppsala University collection
- Use CLIP to search images by natural language descriptions
- Find paintings of water bodies using semantic similarity

### Useful Resources

- **Europeana Portal**: https://www.europeana.eu/
- **API Documentation**: https://pro.europeana.eu/page/apis
- **Get API Key**: https://pro.europeana.eu/page/get-api
- **IIIF Information**: https://iiif.io/