# POP API Tutorial: General Dataset Management

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sci-ndp/pop/blob/main/docs/general_dataset_api_tutorial.ipynb)
[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sci-ndp/pop/main?filepath=docs/general_dataset_api_tutorial.ipynb)

> 🚀 **Run Online Options:**
> - **Google Colab**: Dependencies installed automatically in the first cell
> - **Binder**: Pre-configured environment, ready to run immediately
> - **Local**: Requires `pip install requests jupyter`

This notebook demonstrates how to use the POP (Point of Presence) API to manage datasets programmatically. You will learn how to:

1. **Authenticate** with the API using Keycloak tokens
2. **Create organizations** to organize your datasets
3. **Create comprehensive datasets** with metadata, tags, groups, and resources
4. **Update and manage** existing datasets
5. **Handle errors** and best practices

## Prerequisites

- Python 3.7+
- `requests` library
- Access to a POP API instance
- Valid authentication credentials

## API Overview

The POP API provides both specialized endpoints (for S3, Kafka, URLs) and general dataset endpoints for flexible data management. This tutorial focuses on the general dataset endpoints which offer maximum flexibility for diverse data types and structures.

In [17]:
# Install required packages
!pip install requests -q


[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## 1. Setup and Configuration

First, let's import the necessary libraries and configure our API connection parameters.

In [1]:
import requests
import json
from typing import Dict, Any, Optional
import time

# Pretty printing for JSON responses
from pprint import pprint

### Configuration Variables

**Important:** Replace these values with your actual API endpoint and credentials.

In [2]:
# API Configuration
API_BASE_URL = "http://localhost:8001"  # Replace with your API URL
CKAN_SERVER = "local"  # Options: "local" or "pre_ckan"

# Authentication Token
# You can obtain this token from your Keycloak instance or use the /token endpoint
AUTH_TOKEN = "your-keycloak-token-here"  # Replace with your actual token

# Request headers with authentication
HEADERS = {
    "Authorization": f"Bearer {AUTH_TOKEN}",
    "Content-Type": "application/json"
}

print(f"API Base URL: {API_BASE_URL}")
print(f"CKAN Server: {CKAN_SERVER}")
print(f"Token configured: {'✓' if AUTH_TOKEN != 'your-keycloak-token-here' else '✗ Please set your token'}")

API Base URL: http://localhost:8001
CKAN Server: local
Token configured: ✗ Please set your token


### Helper Functions

Let's create some utility functions to make our API interactions cleaner and more robust.

In [3]:
def make_api_request(method: str, endpoint: str, data: Optional[Dict] = None, 
                    params: Optional[Dict] = None) -> Dict[str, Any]:
    """
    Make an API request with proper error handling.
    
    Args:
        method: HTTP method (GET, POST, PUT, PATCH, DELETE)
        endpoint: API endpoint (e.g., '/dataset', '/organization')
        data: Request payload for POST/PUT/PATCH requests
        params: Query parameters
    
    Returns:
        Dictionary containing the response data
    """
    url = f"{API_BASE_URL}{endpoint}"
    
    try:
        response = requests.request(
            method=method,
            url=url,
            headers=HEADERS,
            json=data,
            params=params
        )
        
        print(f"🔗 {method} {url}")
        print(f"📊 Status: {response.status_code}")
        
        if response.status_code in [200, 201]:
            result = response.json()
            print("✅ Success!")
            return result
        else:
            print(f"❌ Error: {response.status_code}")
            try:
                error_detail = response.json()
                print(f"Error details: {json.dumps(error_detail, indent=2)}")
            except:
                print(f"Error text: {response.text}")
            return {"error": True, "status_code": response.status_code, "detail": response.text}
            
    except requests.exceptions.RequestException as e:
        print(f"❌ Request failed: {e}")
        return {"error": True, "exception": str(e)}

def print_response(response: Dict[str, Any], title: str = "Response"):
    """
    Pretty print API responses.
    """
    print(f"\n📋 {title}:")
    print("─" * 50)
    pprint(response)
    print("─" * 50)

## 2. Authentication Test

Let's verify that our authentication is working by checking the API status.

In [4]:
# Test API connectivity and authentication
status_response = make_api_request("GET", "/status")
print_response(status_response, "API Status")

if "error" not in status_response:
    print("🎉 API is accessible and responding correctly!")
else:
    print("⚠️  API connectivity issues. Please check your configuration.")

❌ Request failed: HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000022C57DEC1A0>: Failed to establish a new connection: [WinError 10061] No se puede establecer una conexión ya que el equipo de destino denegó expresamente dicha conexión'))

📋 API Status:
──────────────────────────────────────────────────
{'error': True,
 'exception': "HTTPConnectionPool(host='localhost', port=8001): Max retries "
              'exceeded with url: /status (Caused by '
              "NewConnectionError('<urllib3.connection.HTTPConnection object "
              'at 0x0000022C57DEC1A0>: Failed to establish a new connection: '
              '[WinError 10061] No se puede establecer una conexión ya que el '
              "equipo de destino denegó expresamente dicha conexión'))"}
──────────────────────────────────────────────────
⚠️  API connectivity issues. Please check your configuration.


## 3. Organization Management

Before creating datasets, we need to ensure we have an organization to contain them. Organizations in CKAN serve as containers and provide access control for datasets.

### List Existing Organizations

First, let's see what organizations already exist.

In [5]:
# List existing organizations
params = {"server": CKAN_SERVER}
organizations = make_api_request("GET", "/organization", params=params)
print_response(organizations, "Existing Organizations")

if isinstance(organizations, list):
    print(f"\n📈 Found {len(organizations)} organizations")
    if organizations:
        print("Organizations:")
        for i, org in enumerate(organizations[:5], 1):  # Show first 5
            print(f"  {i}. {org}")
        if len(organizations) > 5:
            print(f"  ... and {len(organizations) - 5} more")
else:
    print("⚠️  Unable to retrieve organizations list")

❌ Request failed: HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /organization?server=local (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000022C57CF65D0>: Failed to establish a new connection: [WinError 10061] No se puede establecer una conexión ya que el equipo de destino denegó expresamente dicha conexión'))

📋 Existing Organizations:
──────────────────────────────────────────────────
{'error': True,
 'exception': "HTTPConnectionPool(host='localhost', port=8001): Max retries "
              'exceeded with url: /organization?server=local (Caused by '
              "NewConnectionError('<urllib3.connection.HTTPConnection object "
              'at 0x0000022C57CF65D0>: Failed to establish a new connection: '
              '[WinError 10061] No se puede establecer una conexión ya que el '
              "equipo de destino denegó expresamente dicha conexión'))"}
──────────────────────────────────────────────────
⚠️  Unable to

### Create a New Organization

Now let's create a new organization for our tutorial. We'll use a timestamped name to avoid conflicts.

In [6]:
# Organization data
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

organization_data = {
    "name": f"research_tutorial_{timestamp}",
    "title": f"Research Tutorial Organization - {timestamp}",
    "description": "An organization created for the POP API tutorial demonstration. This organization showcases best practices for data management and dataset organization."
}

print("🏢 Creating organization with data:")
print_response(organization_data, "Organization Data")

# Create the organization
params = {"server": CKAN_SERVER}
org_response = make_api_request("POST", "/organization", data=organization_data, params=params)
print_response(org_response, "Organization Creation Response")

if "error" not in org_response and "id" in org_response:
    organization_id = organization_data["name"]  # Use name as ID for dataset creation
    print(f"\n✅ Organization created successfully!")
    print(f"🆔 Organization ID: {organization_id}")
    print(f"📋 Organization Name: {organization_data['name']}")
else:
    print("❌ Failed to create organization")
    organization_id = None

🏢 Creating organization with data:

📋 Organization Data:
──────────────────────────────────────────────────
{'description': 'An organization created for the POP API tutorial '
                'demonstration. This organization showcases best practices for '
                'data management and dataset organization.',
 'name': 'research_tutorial_20250628_234442',
 'title': 'Research Tutorial Organization - 20250628_234442'}
──────────────────────────────────────────────────
❌ Request failed: HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /organization?server=local (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000022C57CF65D0>: Failed to establish a new connection: [WinError 10061] No se puede establecer una conexión ya que el equipo de destino denegó expresamente dicha conexión'))

📋 Organization Creation Response:
──────────────────────────────────────────────────
{'error': True,
 'exception': "HTTPConnectionPool(host='lo

## 4. Dataset Creation

Now we'll create a comprehensive dataset using the general dataset endpoint. This dataset will demonstrate all the available features including metadata, tags, groups, and multiple resources.

### Prepare Dataset Payload

Let's prepare a comprehensive dataset that showcases the full capabilities of the API.

In [7]:
# Comprehensive dataset payload
dataset_payload = {
    # Required fields
    "name": f"climate_research_dataset_{timestamp}",
    "title": "Comprehensive Climate Research Dataset - Tutorial Example",
    "owner_org": organization_data['name'],  # Use our created organization
    
    # Descriptive metadata
    "notes": "This dataset contains comprehensive climate research data including temperature, precipitation, and atmospheric measurements. Created as part of the POP API tutorial to demonstrate best practices for dataset management and metadata organization.",
    
    # Categorization
    "tags": ["climate", "environment", "timeseries", "weather", "research", "tutorial"],
    "groups": ["science", "climate", "research"],
    
    # Administrative metadata
    "license_id": "cc-by-4.0",
    "version": "1.0",
    "private": False,
    
    # Custom metadata using extras
    "extras": {
        "project_code": "CLIMATE2024",
        "funding_agency": "National Science Foundation",
        "principal_investigator": "Dr. Jane Smith",
        "data_collection_period": "2023-01-01 to 2024-12-31",
        "geographical_coverage": "North America",
        "temporal_resolution": "Daily",
        "quality_level": "Level 2 - Processed",
        "contact_email": "climate-data@university.edu",
        "methodology": "Automated weather station network",
        "data_format_standard": "CF-1.8",
        "doi": "10.5194/essd-2024-example",
        "publication_status": "Published",
        "access_restrictions": "None - Open Access",
        "update_frequency": "Monthly",
        "backup_location": "Cloud Storage Tier 1"
    },
    
    # Associated resources
    "resources": [
        {
            "url": "https://data.climate-research.org/temperature/daily_temp_2024.csv",
            "format": "CSV",
            "name": "Daily Temperature Measurements",
            "description": "Daily temperature readings from 500+ weather stations across North America. Includes min, max, and average temperatures with quality flags.",
            "mimetype": "text/csv",
            "size": 15728640  # ~15MB
        },
        {
            "url": "https://data.climate-research.org/precipitation/monthly_precip_2024.json",
            "format": "JSON",
            "name": "Monthly Precipitation Data",
            "description": "Monthly precipitation totals with statistical summaries and anomaly calculations relative to 30-year climate normals.",
            "mimetype": "application/json",
            "size": 2097152  # ~2MB
        },
        {
            "url": "https://data.climate-research.org/atmospheric/pressure_humidity.xml",
            "format": "XML",
            "name": "Atmospheric Measurements",
            "description": "Comprehensive atmospheric data including barometric pressure, humidity, wind speed and direction, formatted according to meteorological XML standards.",
            "mimetype": "application/xml",
            "size": 8388608  # ~8MB
        },
        {
            "url": "https://data.climate-research.org/spatial/climate_zones.tif",
            "format": "GeoTIFF",
            "name": "Climate Zone Classification",
            "description": "High-resolution GeoTIFF raster showing climate zone classifications based on temperature and precipitation patterns. Includes metadata for coordinate reference system.",
            "mimetype": "image/tiff",
            "size": 52428800  # ~50MB
        }
    ]
}

print("📊 Prepared comprehensive dataset payload:")
print_response(dataset_payload, "Dataset Payload")

📊 Prepared comprehensive dataset payload:

📋 Dataset Payload:
──────────────────────────────────────────────────
{'extras': {'access_restrictions': 'None - Open Access',
            'backup_location': 'Cloud Storage Tier 1',
            'contact_email': 'climate-data@university.edu',
            'data_collection_period': '2023-01-01 to 2024-12-31',
            'data_format_standard': 'CF-1.8',
            'doi': '10.5194/essd-2024-example',
            'funding_agency': 'National Science Foundation',
            'geographical_coverage': 'North America',
            'methodology': 'Automated weather station network',
            'principal_investigator': 'Dr. Jane Smith',
            'project_code': 'CLIMATE2024',
            'publication_status': 'Published',
            'quality_level': 'Level 2 - Processed',
            'temporal_resolution': 'Daily',
            'update_frequency': 'Monthly'},
 'groups': ['science', 'climate', 'research'],
 'license_id': 'cc-by-4.0',
 'name': 'clima

### Create the Dataset

Now let's create the dataset using our comprehensive payload.

In [8]:
if organization_id:
    print("🚀 Creating comprehensive dataset...")
    
    # Create the dataset
    params = {"server": CKAN_SERVER}
    dataset_response = make_api_request("POST", "/dataset", data=dataset_payload, params=params)
    print_response(dataset_response, "Dataset Creation Response")
    
    if "error" not in dataset_response and "id" in dataset_response:
        dataset_id = dataset_response["id"]
        print(f"\n🎉 Dataset created successfully!")
        print(f"🆔 Dataset ID: {dataset_id}")
        print(f"📋 Dataset Name: {dataset_payload['name']}")
        print(f"🏷️  Tags: {', '.join(dataset_payload['tags'])}")
        print(f"👥 Groups: {', '.join(dataset_payload['groups'])}")
        print(f"📁 Resources: {len(dataset_payload['resources'])} files")
        print(f"🏢 Organization: {organization_id}")
    else:
        print("❌ Failed to create dataset")
        dataset_id = None
else:
    print("⚠️  Cannot create dataset without a valid organization")
    dataset_id = None

⚠️  Cannot create dataset without a valid organization


## 5. Dataset Updates

The API supports both full updates (PUT) and partial updates (PATCH). Let's demonstrate both approaches.

### Partial Update (PATCH)

PATCH updates allow you to modify specific fields without affecting others. This is ideal for incremental changes.

In [9]:
if dataset_id:
    print("🔄 Performing partial update (PATCH)...")
    
    # Partial update payload - only fields we want to change
    partial_update = {
        "version": "1.1",
        "notes": dataset_payload["notes"] + "\n\n**Update Log:**\n- Version 1.1: Added additional quality control measures and enhanced metadata.",
        "extras": {
            "last_updated": datetime.datetime.now().isoformat(),
            "update_type": "Quality enhancement",
            "quality_level": "Level 3 - Validated",
            "validation_status": "Completed",
            "revision_notes": "Enhanced quality control procedures applied"
        }
    }
    
    print("📝 Partial update payload:")
    print_response(partial_update, "Partial Update Data")
    
    # Apply the partial update
    params = {"server": CKAN_SERVER}
    patch_response = make_api_request("PATCH", f"/dataset/{dataset_id}", data=partial_update, params=params)
    print_response(patch_response, "Partial Update Response")
    
    if "error" not in patch_response:
        print("\n✅ Partial update completed successfully!")
        print("📈 Updated fields: version, notes, extras")
        print("🔒 Preserved fields: title, tags, groups, resources, etc.")
    else:
        print("❌ Partial update failed")
else:
    print("⚠️  No dataset available for update")

⚠️  No dataset available for update


### Full Update (PUT)

PUT updates replace the entire dataset. This is useful when you want to make comprehensive changes.

In [10]:
if dataset_id:
    print("🔄 Performing full update (PUT)...")
    
    # Full update - modify the original payload
    full_update = dataset_payload.copy()
    full_update.update({
        "version": "2.0",
        "title": "Comprehensive Climate Research Dataset - Updated Edition",
        "tags": dataset_payload["tags"] + ["updated", "validated", "v2"],
        "license_id": "cc-by-sa-4.0",  # Changed license
    })
    
    # Update extras with new information
    full_update["extras"].update({
        "major_update": datetime.datetime.now().isoformat(),
        "update_type": "Major revision",
        "quality_level": "Level 4 - Research Grade",
        "peer_review_status": "Completed",
        "publication_doi": "10.5194/essd-2024-updated",
        "citation_count": "0",
        "download_statistics": "Available via API"
    })
    
    # Add a new resource
    full_update["resources"].append({
        "url": "https://data.climate-research.org/documentation/dataset_methodology_v2.pdf",
        "format": "PDF",
        "name": "Methodology Documentation v2.0",
        "description": "Comprehensive methodology documentation including data collection procedures, quality control measures, and validation protocols for version 2.0.",
        "mimetype": "application/pdf",
        "size": 1048576  # ~1MB
    })
    
    print("📝 Full update includes:")
    print(f"  • Updated title: {full_update['title']}")
    print(f"  • New version: {full_update['version']}")
    print(f"  • Updated license: {full_update['license_id']}")
    print(f"  • Additional tags: {len(full_update['tags'])} total")
    print(f"  • Enhanced extras: {len(full_update['extras'])} metadata fields")
    print(f"  • New resource: {len(full_update['resources'])} total resources")
    
    # Apply the full update
    params = {"server": CKAN_SERVER}
    put_response = make_api_request("PUT", f"/dataset/{dataset_id}", data=full_update, params=params)
    print_response(put_response, "Full Update Response")
    
    if "error" not in put_response:
        print("\n✅ Full update completed successfully!")
        print("🔄 Dataset completely replaced with new version")
        print("📊 All fields updated to new values")
    else:
        print("❌ Full update failed")
else:
    print("⚠️  No dataset available for update")

⚠️  No dataset available for update


## 6. Error Handling and Best Practices

Let's demonstrate proper error handling and common pitfalls to avoid.

### Validation Errors

Here's what happens when you provide invalid data:

In [11]:
print("🧪 Testing validation errors...")

# Example 1: Missing required fields
invalid_payload_1 = {
    "title": "Dataset Without Name",
    # Missing required 'name' and 'owner_org' fields
}

print("\n1️⃣ Testing missing required fields:")
error_response_1 = make_api_request("POST", "/dataset", data=invalid_payload_1)
print_response(error_response_1, "Validation Error - Missing Fields")

# Example 2: Invalid resource structure  
invalid_payload_2 = {
    "name": "test_invalid_resources",
    "title": "Test Invalid Resources",
    "owner_org": organization_id,
    "resources": [
        {
            "resource_url": "https://example.com/data.csv",  # Wrong field name
            "format": "CSV"
            # Missing required 'url' and 'name' fields
        }
    ]
}

print("\n2️⃣ Testing invalid resource structure:")
error_response_2 = make_api_request("POST", "/dataset", data=invalid_payload_2)
print_response(error_response_2, "Validation Error - Invalid Resources")

# Example 3: Reserved keys in extras
invalid_payload_3 = {
    "name": "test_reserved_keys",
    "title": "Test Reserved Keys",
    "owner_org": organization_id,
    "extras": {
        "name": "This conflicts with the dataset name",  # Reserved key
        "id": "This conflicts with CKAN ID",  # Reserved key
        "custom_field": "This is fine"
    }
}

print("\n3️⃣ Testing reserved keys in extras:")
error_response_3 = make_api_request("POST", "/dataset", data=invalid_payload_3)
print_response(error_response_3, "Validation Error - Reserved Keys")

print("\n📚 Key Takeaways from Error Examples:")
print("  • Always include required fields: name, title, owner_org")
print("  • Use 'url' not 'resource_url' in resource objects")
print("  • Avoid reserved keys in extras: name, title, id, resources, etc.")
print("  • Check API documentation for complete field specifications")

🧪 Testing validation errors...

1️⃣ Testing missing required fields:
❌ Request failed: HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /dataset (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000022C57CEE2C0>: Failed to establish a new connection: [WinError 10061] No se puede establecer una conexión ya que el equipo de destino denegó expresamente dicha conexión'))

📋 Validation Error - Missing Fields:
──────────────────────────────────────────────────
{'error': True,
 'exception': "HTTPConnectionPool(host='localhost', port=8001): Max retries "
              'exceeded with url: /dataset (Caused by '
              "NewConnectionError('<urllib3.connection.HTTPConnection object "
              'at 0x0000022C57CEE2C0>: Failed to establish a new connection: '
              '[WinError 10061] No se puede establecer una conexión ya que el '
              "equipo de destino denegó expresamente dicha conexión'))"}
────────────────────

### Best Practices Summary

Based on our tutorial, here are the key best practices for using the POP API:

In [12]:
print("🎯 POP API Best Practices Summary:")
print("="*60)

best_practices = {
    "Authentication": [
        "Always include valid Bearer token in Authorization header",
        "Store tokens securely and rotate them regularly",
        "Test authentication before making bulk operations"
    ],
    "Dataset Naming": [
        "Use descriptive, unique names with underscores",
        "Include timestamps for versioning: dataset_20240615",
        "Follow your organization's naming conventions"
    ],
    "Metadata Management": [
        "Provide comprehensive descriptions in 'notes' field",
        "Use tags for discoverability (5-10 relevant tags)",
        "Leverage extras for domain-specific metadata",
        "Include contact information and data provenance"
    ],
    "Resource Organization": [
        "Provide accurate format specifications",
        "Include file sizes when known",
        "Use descriptive resource names and descriptions",
        "Organize related files within single datasets"
    ],
    "Update Strategy": [
        "Use PATCH for incremental updates",
        "Use PUT for major revisions",
        "Version your datasets appropriately",
        "Document changes in update notes"
    ],
    "Error Handling": [
        "Always check response status codes",
        "Implement retry logic for transient failures",
        "Log errors for debugging and monitoring",
        "Validate payloads before submission"
    ]
}

for category, practices in best_practices.items():
    print(f"\n🔸 {category}:")
    for practice in practices:
        print(f"   • {practice}")

print("\n" + "="*60)

🎯 POP API Best Practices Summary:

🔸 Authentication:
   • Always include valid Bearer token in Authorization header
   • Store tokens securely and rotate them regularly
   • Test authentication before making bulk operations

🔸 Dataset Naming:
   • Use descriptive, unique names with underscores
   • Include timestamps for versioning: dataset_20240615
   • Follow your organization's naming conventions

🔸 Metadata Management:
   • Provide comprehensive descriptions in 'notes' field
   • Use tags for discoverability (5-10 relevant tags)
   • Leverage extras for domain-specific metadata
   • Include contact information and data provenance

🔸 Resource Organization:
   • Provide accurate format specifications
   • Include file sizes when known
   • Use descriptive resource names and descriptions
   • Organize related files within single datasets

🔸 Update Strategy:
   • Use PATCH for incremental updates
   • Use PUT for major revisions
   • Version your datasets appropriately
   • Document ch

## 7. Advanced Use Cases

Here are some advanced patterns for using the API in production environments.

### Bulk Dataset Creation

For creating multiple datasets efficiently:

In [13]:
def create_dataset_batch(datasets: list, organization_id: str, delay: float = 0.5):
    """
    Create multiple datasets with rate limiting and error handling.
    
    Args:
        datasets: List of dataset configurations
        organization_id: Organization to create datasets in
        delay: Delay between requests to avoid rate limiting
    """
    results = []
    
    for i, dataset_config in enumerate(datasets, 1):
        print(f"\n📦 Creating dataset {i}/{len(datasets)}: {dataset_config['name']}")
        
        # Ensure organization is set
        dataset_config["owner_org"] = organization_id
        
        # Create dataset
        params = {"server": CKAN_SERVER}
        response = make_api_request("POST", "/dataset", data=dataset_config, params=params)
        
        results.append({
            "dataset_name": dataset_config["name"],
            "success": "error" not in response,
            "response": response
        })
        
        # Rate limiting
        if i < len(datasets):
            time.sleep(delay)
    
    return results

# Example batch creation
if organization_id:
    print("🚀 Demonstrating batch dataset creation...")
    
    sample_datasets = [
        {
            "name": f"sample_dataset_1_{timestamp}",
            "title": "Sample Dataset 1 - Temperature Data",
            "notes": "Sample temperature measurements for tutorial",
            "tags": ["temperature", "sample", "tutorial"]
        },
        {
            "name": f"sample_dataset_2_{timestamp}",
            "title": "Sample Dataset 2 - Humidity Data", 
            "notes": "Sample humidity measurements for tutorial",
            "tags": ["humidity", "sample", "tutorial"]
        },
        {
            "name": f"sample_dataset_3_{timestamp}",
            "title": "Sample Dataset 3 - Wind Data",
            "notes": "Sample wind measurements for tutorial", 
            "tags": ["wind", "sample", "tutorial"]
        }
    ]
    
    batch_results = create_dataset_batch(sample_datasets, organization_id)
    
    # Summary
    successful = sum(1 for r in batch_results if r["success"])
    failed = len(batch_results) - successful
    
    print(f"\n📊 Batch Creation Summary:")
    print(f"   ✅ Successful: {successful}")
    print(f"   ❌ Failed: {failed}")
    print(f"   📈 Success Rate: {(successful/len(batch_results)*100):.1f}%")
else:
    print("⚠️  Organization required for batch creation demo")

⚠️  Organization required for batch creation demo


### Dataset Template Function

Create a reusable template for consistent dataset creation:

In [14]:
def create_dataset_from_template(name: str, title: str, description: str, 
                                resources: list, organization_id: str,
                                project_code: str = None, 
                                tags: list = None) -> dict:
    """
    Create a dataset using a standardized template.
    
    This function enforces organizational standards and reduces errors.
    """
    # Standard template with organizational defaults
    template = {
        "name": name.lower().replace(" ", "_"),  # Normalize name
        "title": title,
        "owner_org": organization_id,
        "notes": description,
        "license_id": "cc-by-4.0",  # Organizational default
        "private": False,
        "version": "1.0",
        "tags": tags or [],
        "groups": ["research"],  # Default group
        "resources": resources,
        "extras": {
            "created_via": "POP API Tutorial",
            "creation_date": datetime.datetime.now().isoformat(),
            "data_steward": "API User",
            "review_status": "Pending"
        }
    }
    
    # Add project code if provided
    if project_code:
        template["extras"]["project_code"] = project_code
    
    return template

# Example usage
if organization_id:
    print("🏗️  Creating dataset from template...")
    
    template_dataset = create_dataset_from_template(
        name="Template Example Dataset",
        title="Dataset Created from Template",
        description="This dataset demonstrates the use of standardized templates for consistent data management.",
        resources=[
            {
                "url": "https://example.com/template_data.csv",
                "name": "Template Data",
                "format": "CSV",
                "description": "Sample data created from template"
            }
        ],
        organization_id=organization_id,
        project_code="TEMPLATE2024",
        tags=["template", "example", "standardized"]
    )
    
    print_response(template_dataset, "Template Dataset")
    
    # Create the dataset
    params = {"server": CKAN_SERVER}
    template_response = make_api_request("POST", "/dataset", data=template_dataset, params=params)
    
    if "error" not in template_response:
        print("\n✅ Template dataset created successfully!")
    else:
        print("❌ Template dataset creation failed")
else:
    print("⚠️  Organization required for template demo")

⚠️  Organization required for template demo


## 8. Cleanup and Summary

Let's clean up our tutorial resources and summarize what we've accomplished.

### Tutorial Summary

In this comprehensive tutorial, we've demonstrated:

In [15]:
print("🎓 Tutorial Summary")
print("="*50)

summary_points = [
    "✅ API Authentication with Keycloak tokens",
    "✅ Organization creation and management", 
    "✅ Comprehensive dataset creation with metadata",
    "✅ Resource management (multiple file types)",
    "✅ Tags and groups for categorization",
    "✅ Custom metadata using extras",
    "✅ Partial updates (PATCH) for incremental changes",
    "✅ Full updates (PUT) for major revisions",
    "✅ Error handling and validation",
    "✅ Best practices and advanced patterns",
    "✅ Batch operations and templates",
    "✅ Production-ready code examples"
]

for point in summary_points:
    print(f"  {point}")

print("\n🎯 Key Achievements:")
if organization_id:
    print(f"  📁 Created organization: {organization_id}")
if dataset_id:
    print(f"  📊 Created main dataset: {dataset_id}")
    print(f"  🔄 Performed updates (PATCH and PUT)")

print("\n📚 Next Steps:")
next_steps = [
    "Integrate this code into your data management workflows",
    "Customize the templates for your organization's needs", 
    "Implement monitoring and logging for production use",
    "Explore the specialized endpoints (S3, Kafka, URL) for specific use cases",
    "Set up automated data pipeline integration",
    "Configure proper authentication and access controls"
]

for i, step in enumerate(next_steps, 1):
    print(f"  {i}. {step}")

print("\n" + "="*50)
print("🚀 You're now ready to use the POP API in production!")

🎓 Tutorial Summary
  ✅ API Authentication with Keycloak tokens
  ✅ Organization creation and management
  ✅ Comprehensive dataset creation with metadata
  ✅ Resource management (multiple file types)
  ✅ Tags and groups for categorization
  ✅ Custom metadata using extras
  ✅ Partial updates (PATCH) for incremental changes
  ✅ Full updates (PUT) for major revisions
  ✅ Error handling and validation
  ✅ Best practices and advanced patterns
  ✅ Batch operations and templates
  ✅ Production-ready code examples

🎯 Key Achievements:

📚 Next Steps:
  1. Integrate this code into your data management workflows
  2. Customize the templates for your organization's needs
  3. Implement monitoring and logging for production use
  4. Explore the specialized endpoints (S3, Kafka, URL) for specific use cases
  5. Set up automated data pipeline integration
  6. Configure proper authentication and access controls

🚀 You're now ready to use the POP API in production!


### Optional: Cleanup Tutorial Resources

If you want to clean up the resources created during this tutorial, run the following cells. **Warning: This will permanently delete the datasets and organization created in this tutorial.**

In [16]:
# OPTIONAL: Cleanup tutorial resources
# Uncomment and run if you want to remove tutorial data

CLEANUP_ENABLED = False  # Set to True to enable cleanup

if CLEANUP_ENABLED:
    print("🧹 Starting cleanup of tutorial resources...")
    
    # Note: In a real implementation, you would need delete endpoints
    # The current API may not have delete endpoints implemented
    # This is a placeholder for demonstration
    
    if dataset_id:
        print(f"🗑️  Would delete dataset: {dataset_id}")
        # delete_response = make_api_request("DELETE", f"/dataset/{dataset_id}")
    
    if organization_id:
        print(f"🗑️  Would delete organization: {organization_id}")
        # delete_org_response = make_api_request("DELETE", f"/organization/{organization_id}")
    
    print("ℹ️  Cleanup simulation complete. Enable cleanup to actually delete resources.")
else:
    print("ℹ️  Cleanup disabled. Tutorial resources preserved.")
    print("💡 To clean up, set CLEANUP_ENABLED = True and run this cell again.")

ℹ️  Cleanup disabled. Tutorial resources preserved.
💡 To clean up, set CLEANUP_ENABLED = True and run this cell again.


## 9. Additional Resources

### API Documentation
- **Swagger UI**: Visit `{API_BASE_URL}/docs` for interactive API documentation
- **OpenAPI Spec**: Available at `{API_BASE_URL}/openapi.json`

### CKAN Resources
- [CKAN API Documentation](https://docs.ckan.org/en/latest/api/)
- [CKAN Data Model](https://docs.ckan.org/en/latest/user-guide.html)

### Python Libraries
- [Requests Documentation](https://docs.python-requests.org/)
- [CKAN API Python Client](https://github.com/ckan/ckanapi)

### Support
For technical support or questions about the POP API:
- Check the API status endpoint: `/status`
- Review error responses for detailed information
- Consult your organization's data management guidelines

---

**End of Tutorial** 🎉

*This notebook provides a comprehensive foundation for working with the POP API. Adapt the code examples to your specific use cases and organizational requirements.*