# Working with Cultural Heritage APIs: Europeana

Welcome to this workshop on working with APIs and cultural heritage data. In this notebook, you will:

1. **Learn** what an API is and why APIs matter for digital humanities research
2. **Explore** Europeana's collections: countries and institutions
3. **Search** for artworks by keyword and analyze results (who painted the most rivers?)
4. **Build** OR queries to search for multiple concepts (water bodies)
5. **Create** a universal download function with customizable options
6. **Download** paintings from a specific artist and time period
7. **Discover** IIIF for standardized image access

---

## About Europeana

[Europeana](https://www.europeana.eu/) is Europe's digital platform for cultural heritage, providing access to:
- **50+ million** digitized items from European museums, galleries, libraries and archives
- **Content from** 3,000+ institutions across Europe
- **Collections including** artworks, books, music, videos, photographs, manuscripts
- **Open data** under various Creative Commons licenses

Europeana aggregates content from major institutions including:
- Rijksmuseum (Netherlands)
- British Library (UK)
- Louvre (France)
- Uppsala University (Sweden)
- And many more across Europe

**Note:** Europeana uses IIIF (International Image Interoperability Framework) for many items, providing standardized access to high-resolution images.

---

## Part 1: What is an API?

**API** stands for **Application Programming Interface**

An API:
- Takes your **request** ("give me all paintings by Rembrandt")
- Sends it to a **server** (the database)
- Returns a **response** (the data you asked for)

### Why APIs matter for Digital Humanities

- **Scale**: Download thousands of records automatically instead of clicking through web pages
- **Structure**: Data comes in machine-readable formats (JSON, XML) ready for analysis
- **Reproducibility**: Your code documents exactly how you obtained your data
- **Updates**: Re-run your code to get the latest data
- **Integration**: Combine data from multiple institutions

### Common Data Formats

| Format | Description | Example |
|--------|-------------|---------|
| **JSON** | JavaScript Object Notation - human-readable, widely used | `{"name": "Mona Lisa", "year": 1503}` |
| **XML** | eXtensible Markup Language - similar to HTML | `<artwork><name>Mona Lisa</name></artwork>` |
| **CSV** | Comma-Separated Values - spreadsheet-like | `name,year\nMona Lisa,1503` |

### How an API Works

```mermaid
sequenceDiagram
    actor You as üßë‚Äçüíª You (Python)
    participant API as üåê Europeana API
    participant DB as üóÑÔ∏è Database<br/>(50M+ items)

    You->>API: GET /search.json?query=Rembrandt&rows=10
    Note over You,API: HTTP Request with your API key

    API->>DB: Query matching records
    DB-->>API: Raw records

    API-->>You: JSON response
    Note over API,You: {"totalResults": 12543,<br/>"items": [...]}

    You->>You: Parse & analyse data
```

> **Key idea:** You never talk to the database directly ‚Äî the API is a controlled gateway that handles authentication, rate limiting, and formats the data for you.

---

## Part 2: Setup

First, let's import the libraries we need and set up our project structure.

In [None]:
# Standard library imports
import json
import time
from pathlib import Path
from collections import Counter

# External libraries
import requests
from IPython.display import display, Image

# Set up project paths
NOTEBOOK_DIR = Path(".").resolve()
PROJECT_ROOT = NOTEBOOK_DIR.parent

IMAGES_DIR = PROJECT_ROOT / "data" / "images"
DATA_DIR   = PROJECT_ROOT / "data"

IMAGES_DIR.mkdir(parents=True, exist_ok=True)
DATA_DIR.mkdir(parents=True, exist_ok=True)

print(f"Project root: {PROJECT_ROOT}")
print(f"Images dir:   {IMAGES_DIR}")
print(f"Data dir:     {DATA_DIR}")

Project root: /home/lauhp/000_PHD/000_003_Code/DH-Workshop
Images dir:   /home/lauhp/000_PHD/000_003_Code/DH-Workshop/data/images
Data dir:     /home/lauhp/000_PHD/000_003_Code/DH-Workshop/data


### API Key Configuration

Europeana requires a free API key. You can get one by:
1. Visit: https://pro.europeana.eu/page/get-api
2. Register for a Europeana account
3. Request an API key from your account 


4. Save your API key to `notebooks/api-key-europeana.txt`

For testing, we'll use a demo key with limited access.

In [None]:
# Load API key - check multiple locations
NOTEBOOK_DIR = Path(".").resolve()
API_KEY_LOCATIONS = [
    NOTEBOOK_DIR / "api-key-europeana.txt",
    PROJECT_ROOT / "misc" / "api-key-europeana.txt",
    PROJECT_ROOT / "api-key-europeana.txt"
]

# Default demo key (limited requests)
API_KEY = "api2demo"

for key_file in API_KEY_LOCATIONS:
    if key_file.exists():
        with open(key_file, 'r') as f:
            custom_key = f.read().strip()
            if custom_key and custom_key != "api2demo":
                API_KEY = custom_key
                print(f"‚úì API key loaded from {key_file}")
                break
else:
    print(f"‚Ñπ Using demo API key (limited to 999 requests)")
    print(f"  For unlimited access, get your own key at: https://pro.europeana.eu/page/get-api")
    print(f"  Save it to: notebooks/api-key-europeana.txt")

# Base API endpoint
BASE_URL = "https://api.europeana.eu/record/v2"
print(f"\nAPI endpoint: {BASE_URL}")

‚Ñπ Using demo API key (limited to 999 requests)
  For unlimited access, get your own key at: https://pro.europeana.eu/page/get-api
  Save it to: notebooks/api-key-europeana.txt

API endpoint: https://api.europeana.eu/record/v2


---

## Part 3: Understanding the Europeana API

The Europeana API provides two main endpoints:

1. **Search API** - Query and filter the collection
2. **Record API** - Get detailed information about specific items

### Key API Parameters

| Parameter | Description | Example |
|-----------|-------------|---------|
| `query` | Search term | `Rembrandt`, `painting`, `*` (all) |
| `qf` | Query filter | `TYPE:IMAGE`, `COUNTRY:Netherlands` |
| `theme` | Thematic collection | `art`, `fashion`, `photography`, `music` |
| `reusability` | License filter | `open`, `restricted`, `permission` |
| `rows` | Results per page (max 100) | `12` (default), `100` |
| `profile` | Detail level | `standard`, `rich`, `facets` |

### Available Themes

Europeana provides curated thematic collections. Use these with the `theme` parameter:

| Theme | Description |
|-------|-------------|
| `art` | Paintings, sculptures, and visual art |
| `photography` | Historical and contemporary photographs |
| `fashion` | Clothing, accessories, and fashion design |
| `music` | Musical scores, recordings, and instruments |
| `newspapers` | Historical newspapers and periodicals |
| `nature` | Natural history specimens and illustrations |
| `sport` | Sports history and memorabilia |
| `ww1` | World War I collections |
| `archaeology` | Archaeological artifacts and sites |

### What is a Facet?

In API terminology, a **facet** is a category or attribute used to filter and aggregate search results. Think of it like a filter dimension in a search interface:

- `COUNTRY` facet ‚Üí shows how many results exist per country
- `DATA_PROVIDER` facet ‚Üí shows counts by institution/museum
- `TYPE` facet ‚Üí shows counts by media type (image, video, text, etc.)
- `dc_creator` facet ‚Üí shows counts by creator/artist

---

## Part 4: Exploring Countries and Institutions

Let's start by discovering what countries and institutions are available in Europeana.

In [None]:
def get_facet_values(facet_name, query="*", qf=None, max_values=50, theme=None):
    """
    Query the Europeana API to get available values for a facet.
    
    Parameters:
        facet_name: The facet to query (e.g., "COUNTRY", "DATA_PROVIDER", "proxy_dc_creator")
        query: Search query to filter results (default: "*" for all)
        qf: Optional query filters (e.g., ["COUNTRY:Sweden"])
        max_values: Maximum number of facet values to return
        theme: Optional thematic collection (e.g., "art", "photography")
    
    Returns:
        List of (value, count) tuples
    """
    url = f"{BASE_URL}/search.json"
    
    params = {
        "wskey": API_KEY,
        "query": query,
        "rows": 0,  # We only want facets, not actual results
        "profile": "facets",
        "facet": facet_name,
        f"f.{facet_name}.facet.limit": max_values
    }
    
    if qf:
        params["qf"] = qf
    
    if theme:
        params["theme"] = theme
    
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        data = response.json()
        
        # Extract facet values
        facets = data.get('facets', [])
        for facet in facets:
            if facet.get('name') == facet_name:
                fields = facet.get('fields', [])
                return [(f['label'], f['count']) for f in fields]
        return []
    
    except requests.exceptions.RequestException as e:
        print(f"‚ùå Error fetching facets: {e}")
        return []

# The COUNTRY facet

In [None]:
# Get all available countries
print("Available COUNTRIES in Europeana:")
print("=" * 60)
countries = get_facet_values("COUNTRY", max_values=50)

# Store for later use
country_counts = {country: count for country, count in countries}

for country, count in countries[:50]:
    print(f"  {country:<30} ({count:,} items)")
# if len(countries) > 25:
#     print(f"  ... and {len(countries) - 25} more countries")

print(f"\nTotal countries: {len(countries)}")

Available COUNTRIES in Europeana:
  Netherlands                    (8,845,226 items)
  Germany                        (7,757,624 items)
  Spain                          (6,416,456 items)
  Sweden                         (6,083,438 items)
  United Kingdom                 (5,669,073 items)
  France                         (4,754,602 items)
  Poland                         (3,774,673 items)
  Norway                         (3,682,589 items)
  Belgium                        (2,674,224 items)
  Austria                        (2,147,988 items)
  Italy                          (1,844,306 items)
  Czech Republic                 (1,657,322 items)
  Denmark                        (1,520,041 items)
  Finland                        (1,302,588 items)
  Hungary                        (988,951 items)
  Estonia                        (894,417 items)
  Lithuania                      (832,178 items)
  Greece                         (793,779 items)
  Romania                        (529,539 items)
  Slove

## The DATA_PROVIDER Facet (Institutions)

In [None]:
# Get top data providers (institutions)
print("Top DATA_PROVIDERS (Institutions) in Europeana:")
print("=" * 60)
providers = get_facet_values("DATA_PROVIDER", "Norway", max_values=200)

# for provider, count in providers[:20]:
#     # Truncate long names for display
#     display_name = provider[:55] + "..." if len(provider) > 55 else provider
#     print(f"  {display_name:<58} ({count:,} items)")
# if len(providers) > 20:
#     print(f"  ... and {len(providers) - 20} more providers")

for provider, count in providers[:]:
    # Truncate long names for display
    display_name = provider[:55] + "..." if len(provider) > 55 else provider
    print(f"  {display_name:<58} ({count:,} items)")

Top DATA_PROVIDERS (Institutions) in Europeana:
  The National Archives of Norway                            (3,153,913 items)
  The Norwegian Museum of Cultural History                   (188,099 items)
  Domkirkeodden                                              (104,289 items)
  Glomdal Museum                                             (58,512 items)
  Oslo Museum                                                (51,719 items)
  Museum in Nord-√òsterdalen                                  (51,023 items)
  The Norwegian Forest Museum                                (19,275 items)
  Norwegian Pharmacy Museum                                  (10,837 items)
  Naturalis Biodiversity Center                              (8,406 items)
  The Trustees of the Natural History Museum, London         (7,536 items)
  NRK                                                        (7,465 items)
  The Museum Centre in Hordaland                             (6,949 items)
  The Museums for Coastal Heritage and

## The TYPE facet

In [None]:
# Get top data providers (institutions)
print("All media types in Europeana:")
print("=" * 60)
providers = get_facet_values("TYPE", max_values=200)

# for provider, count in providers[:20]:
#     # Truncate long names for display
#     display_name = provider[:55] + "..." if len(provider) > 55 else provider
#     print(f"  {display_name:<58} ({count:,} items)")
# if len(providers) > 20:
#     print(f"  ... and {len(providers) - 20} more providers")

for provider, count in providers[:]:
    display_name = provider[:55] + "..." if len(provider) > 55 else provider
    print(f"  {display_name:<58} ({count:,} items)")

All media types in Europeana:
  IMAGE                                                      (36,808,457 items)
  TEXT                                                       (26,524,360 items)
  SOUND                                                      (1,250,913 items)
  VIDEO                                                      (434,032 items)
  3D                                                         (10,444 items)


---

## Part 5: Searching for Paintings by Keyword

Now that we've explored the collection structure, let's search for specific content. We'll use the `theme="art"` parameter.

In [None]:
def search_europeana(query="*", rows=12, reusability="open", qf=None, profile="rich", cursor=None, theme=None):
    """
    Search the Europeana collection.

    Parameters:
        query: Search term (default: "*" for all)
        rows: Number of results to return (max 100)
        reusability: Filter by license ("open", "restricted", "permission", or None)
        qf: Additional query filters as list (e.g., ["TYPE:IMAGE", "COUNTRY:Netherlands"])
        profile: "standard", "rich", or "facets" for more metadata
        cursor: Cursor for pagination (use for getting more than 100 results)
        theme: Thematic collection filter (e.g., "art", "fashion", "music", "photography",
               "nature", "newspapers", "sport", "ww1", "archaeology", "migration",
               "maps-and-geography", "manuscripts") or None for all

    Returns:
        Dictionary with search results
    """
    url = f"{BASE_URL}/search.json"

    params = {
        "wskey": API_KEY,
        "query": query,
        "rows": min(rows, 100),
        "profile": profile
    }

    if reusability:
        params["reusability"] = reusability

    if qf:
        params["qf"] = qf

    if cursor:
        params["cursor"] = cursor

    if theme:
        params["theme"] = theme

    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"‚ùå Error searching Europeana: {e}")
        return None

In [None]:
# Helper functions for extracting metadata

def get_item_title(item):
    """Extract title from an item."""
    if 'title' in item and item['title']:
        return item['title'][0] if isinstance(item['title'], list) else item['title']
    if 'dcTitleLangAware' in item:
        for lang in ['en', 'nl', 'de', 'fr', 'sv', 'def']:
            if lang in item['dcTitleLangAware']:
                return item['dcTitleLangAware'][lang][0]
    return "Untitled"


def get_item_creator(item):
    """Extract creator/artist from an item."""
    if 'dcCreator' in item and item['dcCreator']:
        return item['dcCreator'][0] if isinstance(item['dcCreator'], list) else item['dcCreator']
    return "Unknown"


def get_item_year(item):
    """Extract year from an item."""
    if 'year' in item and item['year']:
        return item['year'][0] if isinstance(item['year'], list) else item['year']
    return "n.d."


def get_item_preview(item):
    """Extract preview image URL."""
    if 'edmPreview' in item and item['edmPreview']:
        return item['edmPreview'][0] if isinstance(item['edmPreview'], list) else item['edmPreview']
    return None


def get_full_image_url(item):
    """Extract full-size image URL (from source institution)."""
    if 'edmIsShownBy' in item and item['edmIsShownBy']:
        return item['edmIsShownBy'][0] if isinstance(item['edmIsShownBy'], list) else item['edmIsShownBy']
    return None


def sanitize_filename(name):
    """Remove problematic characters from filenames."""
    if not name:
        return "unknown"
    safe = "".join(c for c in name if c.isalnum() or c in ' ._-')
    return safe.strip()[:80]

In [None]:
# ============================================================
# EXERCISE: Search for paintings by keyword using the Art theme
# ============================================================

SEARCH_KEYWORD = "spaceship"  # <-- Try changing this! (e.g., "sunset", "portrait", "horse")

# ============================================================

print(f"Searching for '{SEARCH_KEYWORD}' in the Art collection...")
print("=" * 60)

results = search_europeana(
    query=SEARCH_KEYWORD,
    rows=100,
    qf=["TYPE:IMAGE"],  # Only images
    theme="art",        # Use the Art thematic collection
    reusability="open"
)

if results and results.get('success'):
    print(f"‚úì Found {results['totalResults']:,} art images matching '{SEARCH_KEYWORD}'")
    print(f"  (Showing first {len(results.get('items', []))} results)")

    # Preview first 3 results
    print("\nüñºÔ∏è  Preview:\n")
    for item in results['items'][:3]:
        title   = get_item_title(item)
        creator = get_item_creator(item)
        year    = get_item_year(item)
        preview = get_item_preview(item)
        print(f"{title}  ‚Äî  {creator} ({year})")
        if preview:
            display(Image(url=preview, width=300))
else:
    print("‚ùå Search failed")

Searching for 'spaceship' in the Art collection...
‚úì Found 1 art images matching 'spaceship'
  (Showing first 1 results)

üñºÔ∏è  Preview:

fantasy art from "Fifteen Hundred Miles an Hour. [The story of a visit to the planet Mars.] Edited [or rather written] by C. Dixon, etc".  ‚Äî  Unknown (1895)


---

## Part 10: IIIF ‚Äî International Image Interoperability Framework

**IIIF** (pronounced *"triple-eye-eff"*) is a family of open standards that lets cultural heritage institutions share their digitised collections in a **consistent, interoperable** way.

Without IIIF, every museum, library, and archive builds its own image viewer with its own API ‚Äî making it impossible to combine or compare images across institutions. IIIF solves this by providing a **common language**: any IIIF-compatible viewer (Mirador, Universal Viewer, OpenSeadragon, ...) can display content from any IIIF-compatible server, regardless of where the image lives.

> Think of it like a **power socket standard**: the image stays at the institution, but any compliant tool can plug in and use it.

---

### The Four Key APIs

| API | Purpose |
|-----|--------|
| **Presentation API** | Describes *what* to show and in what order ‚Äî the manifest, canvases, and sequences |
| **Image API** | Delivers image pixels on demand: crop, resize, and rotate any region of an image |
| **Search API** | Full-text search within the content of a manifest |
| **Authentication API** | Controls access to restricted or rights-managed content |

---

### How IIIF Works

A viewer never downloads the whole image upfront. Instead, it fetches a **manifest** (a JSON-LD description of the object), reads where the image lives, then requests only the tiles it needs from the image server ‚Äî at exactly the right size and region.

```mermaid
flowchart LR
    V["üñ•Ô∏è IIIF Viewer\nMirador ¬∑ Universal Viewer\nOpenSeadragon"]
    M["üìã Manifest\nJSON-LD document\ndescribes the object"]
    I["üóÑÔ∏è Image Server\nIIIF Image API\nat the institution"]
    P["üñºÔ∏è Image tiles\ncropped ¬∑ resized\nrotated ¬∑ transcoded"]

    V -->|"1 ¬∑ fetch manifest URL"| M
    M -->|"2 ¬∑ reads canvas &\nimage service location"| V
    V -->|"3 ¬∑ GET region/size/rotation/quality.format"| I
    I -->|"4 ¬∑ returns exact pixels"| P
    P --> V
```

---

### The IIIF Image API ‚Äî URL Structure

When an image resource includes a IIIF Image API service, you can request any region or size on the fly:

```
{service_url}/{region}/{size}/{rotation}/{quality}.{format}
```

| Parameter | Examples |
|-----------|---------|
| **region** | `full` ¬∑ `square` ¬∑ `x,y,w,h` ¬∑ `pct:x,y,w,h` |
| **size** | `max` ¬∑ `500,` (width) ¬∑ `,300` (height) ¬∑ `pct:50` |
| **rotation** | `0` ¬∑ `90` ¬∑ `180` ¬∑ `!0` (mirror) |
| **quality** | `default` ¬∑ `color` ¬∑ `gray` ¬∑ `bitonal` |
| **format** | `jpg` ¬∑ `png` ¬∑ `webp` ¬∑ `tif` |

In [None]:
def get_iiif_manifest_url(item):
    """
    Construct IIIF manifest URL from item ID.

    Europeana generates IIIF manifests on-the-fly for all items.
    Pattern: https://iiif.europeana.eu/presentation/{dataset}/{local_id}/manifest
    """
    item_id = item.get('id')
    if not item_id:
        return None
    parts = item_id.strip('/').split('/')
    if len(parts) >= 2:
        return f"https://iiif.europeana.eu/presentation/{parts[0]}/{parts[1]}/manifest"
    return None


# Show IIIF manifest URLs for the results from the search exercise above
items_to_show = results.get('items', []) if results else []

if items_to_show:
    print("üîó IIIF Manifest URLs for your search results:")
    print("=" * 60)
    for item in items_to_show[:5]:
        title = get_item_title(item)[:40]
        manifest_url = get_iiif_manifest_url(item)
        if manifest_url:
            print(f"\n{title}")
            print(f"  {manifest_url}")
else:
    print("‚ÑπÔ∏è Run the search cell above first.")

---

## Summary

In this notebook, you learned:

1. **What APIs are** and why they're useful for digital humanities research
2. **How to explore** Europeana's countries and institutions using facets
3. **How to search** for keywords and analyze results (who painted the most rivers?)
4. **How to build OR queries** for multiple concepts (water bodies)
5. **How to create** a universal download function with customizable options
6. **How to download** images from specific artists and time periods
7. **About IIIF** and standardized image access

### Next Steps

In the next notebook (**02_clip_semantic_search.ipynb**), you will:
- Download the Uppsala University collection
- Use CLIP to search images by natural language descriptions
- Find paintings of water bodies using semantic similarity

### Useful Resources

- **Europeana Portal**: https://www.europeana.eu/
- **API Documentation**: https://pro.europeana.eu/page/apis
- **Get API Key**: https://pro.europeana.eu/page/get-api
- **IIIF Information**: https://iiif.io/