# NDP EP Tutorial: Pelican Federation Integration

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sci-ndp/pop/blob/main/docs/pelican_api_tutorial.ipynb)
[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sci-ndp/pop/main?filepath=docs/pelican_api_tutorial.ipynb)

> üöÄ **Run Online Options:**
> - **Google Colab**: Dependencies installed automatically in the first cell
> - **Binder**: Pre-configured environment, ready to run immediately
> - **Local**: Requires `pip install requests jupyter`

This notebook demonstrates how to use the NDP EP API to interact with Pelican federations. You will learn how to:

1. **List available federations** (OSDF, PATh-CC, etc.)
2. **Browse namespaces** and directories in federations
3. **Get file information** without downloading
4. **Download files** from federations
5. **Import external files** as resources in your local catalog

## What is Pelican?

**Pelican** is a federated data platform that enables sharing and accessing scientific data across institutions. Key federations include:

- **OSDF** (Open Science Data Federation): Primary federation for scientific data sharing
- **PATh-CC**: PATh Facility data federation

## Prerequisites

- Python 3.7+
- `requests` library
- Access to a NDP EP API instance

## API Overview

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/pelican/federations` | GET | List available federations |
| `/pelican/browse` | GET | Browse namespace directories |
| `/pelican/info` | GET | Get file metadata |
| `/pelican/download` | GET | Download file content |
| `/pelican/import-metadata` | POST | Import file to local catalog |

In [None]:
# Install required packages
!pip install requests -q

## 1. Setup and Configuration

First, let's import the necessary libraries and configure our API connection parameters.

In [None]:
import requests
import json
from typing import Dict, Any, Optional
from pprint import pprint

### Configuration Variables

**Important:** Replace `API_BASE_URL` with your actual NDP EP API endpoint.

In [None]:
# API Configuration
API_BASE_URL = "http://localhost:8000"  # Replace with your API URL

# Default federation to use
DEFAULT_FEDERATION = "osdf"

print(f"API Base URL: {API_BASE_URL}")
print(f"Default Federation: {DEFAULT_FEDERATION}")

### Helper Functions

Let's create utility functions for cleaner API interactions.

In [None]:
def make_request(method: str, endpoint: str, params: Optional[Dict] = None, 
                 json_data: Optional[Dict] = None) -> Dict:
    """
    Make an API request and return the JSON response.
    
    Parameters
    ----------
    method : str
        HTTP method (GET, POST, etc.)
    endpoint : str
        API endpoint (e.g., '/pelican/federations')
    params : dict, optional
        Query parameters
    json_data : dict, optional
        JSON body for POST requests
        
    Returns
    -------
    dict
        API response
    """
    url = f"{API_BASE_URL}{endpoint}"
    
    try:
        response = requests.request(
            method=method,
            url=url,
            params=params,
            json=json_data,
            timeout=30
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"‚ùå Request failed: {e}")
        return {"success": False, "error": str(e)}


def download_file(path: str, federation: str = DEFAULT_FEDERATION, 
                  stream: bool = False) -> bytes:
    """
    Download a file from Pelican federation.
    
    Parameters
    ----------
    path : str
        File path in the federation
    federation : str
        Federation name
    stream : bool
        If True, stream the file
        
    Returns
    -------
    bytes
        File contents
    """
    url = f"{API_BASE_URL}/pelican/download"
    params = {"path": path, "federation": federation, "stream": stream}
    
    response = requests.get(url, params=params, timeout=60)
    response.raise_for_status()
    return response.content

## 2. List Available Federations

First, let's see what Pelican federations are available through the API.

In [None]:
# List available federations
federations = make_request("GET", "/pelican/federations")

if federations.get("success"):
    print("‚úÖ Available Federations:\n")
    for fed_id, fed_info in federations["federations"].items():
        print(f"  üì° {fed_info['name']} ({fed_id})")
        print(f"     URL: {fed_info['url']}")
        print(f"     {fed_info['description']}")
        print()
else:
    print("‚ùå Failed to list federations")
    pprint(federations)

## 3. Browse Namespaces

Pelican organizes data into namespaces. Let's browse the directory structure.

### 3.1 Browse Root Namespace

In [None]:
# Browse a namespace path
# Common OSDF paths: /ospool/uc-shared/public, /chtc/staging

namespace_path = "/ospool/uc-shared/public"

result = make_request(
    "GET", 
    "/pelican/browse",
    params={"path": namespace_path, "federation": "osdf", "detail": False}
)

if result.get("success"):
    print(f"‚úÖ Contents of {namespace_path}:\n")
    
    files = result.get("files", [])
    dirs = result.get("directories", [])
    
    # Show directories first
    for d in dirs[:10]:
        print(f"  üìÅ {d}")
    
    # Show files
    for f in files[:10]:
        print(f"  üìÑ {f}")
    
    total = len(files) + len(dirs)
    if total > 20:
        print(f"\n  ... and {total - 20} more items")
else:
    print("‚ùå Failed to browse namespace")
    pprint(result)

### 3.2 Browse with Detailed Information

In [None]:
# Browse with detailed file information
result = make_request(
    "GET", 
    "/pelican/browse",
    params={"path": namespace_path, "federation": "osdf", "detail": True}
)

if result.get("success"):
    print(f"‚úÖ Detailed contents of {namespace_path}:\n")
    
    items = result.get("items", [])
    for item in items[:10]:
        item_type = "üìÅ" if item.get("is_directory") else "üìÑ"
        size = item.get("size", 0)
        size_str = f"{size:,} bytes" if size else ""
        print(f"  {item_type} {item['name']} {size_str}")
else:
    print("‚ùå Failed to browse namespace")
    pprint(result)

## 4. Get File Information

Before downloading, you can get metadata about a specific file.

In [None]:
# Get information about a specific file
file_path = "/ospool/uc-shared/public/example.txt"  # Replace with actual file path

file_info = make_request(
    "GET",
    "/pelican/info",
    params={"path": file_path, "federation": "osdf"}
)

if file_info.get("success"):
    print("‚úÖ File Information:\n")
    info = file_info.get("info", {})
    print(f"  Name: {info.get('name')}")
    print(f"  Size: {info.get('size', 0):,} bytes")
    print(f"  Type: {info.get('content_type', 'unknown')}")
    print(f"  Modified: {info.get('modified', 'unknown')}")
else:
    print("‚ùå Failed to get file info (file may not exist)")
    pprint(file_info)

## 5. Download Files

Download files from Pelican federations.

### 5.1 Download Small File

In [None]:
# Download a file
file_path = "/ospool/uc-shared/public/example.txt"  # Replace with actual file path

try:
    content = download_file(file_path, federation="osdf")
    print(f"‚úÖ Downloaded {len(content):,} bytes")
    
    # If it's a text file, show preview
    if len(content) < 1000:
        print("\n--- File Preview ---")
        print(content.decode('utf-8', errors='ignore'))
except Exception as e:
    print(f"‚ùå Download failed: {e}")

### 5.2 Save Downloaded File

In [None]:
# Download and save to local file
import os

file_path = "/ospool/uc-shared/public/example.txt"  # Replace with actual file path
local_filename = os.path.basename(file_path)

try:
    content = download_file(file_path, federation="osdf")
    
    with open(local_filename, 'wb') as f:
        f.write(content)
    
    print(f"‚úÖ Saved to {local_filename} ({len(content):,} bytes)")
except Exception as e:
    print(f"‚ùå Download failed: {e}")

## 6. Import Files to Local Catalog

You can register external Pelican files in your local NDP EP catalog. This allows:
- Files to appear in searches
- Unified management with local resources
- Tracking of external data sources

In [None]:
# Import a Pelican file as a resource in the local catalog

import_data = {
    "pelican_url": "pelican://osg-htc.org/ospool/uc-shared/public/example.txt",
    "package_id": "my-dataset-id",  # The dataset where this resource will be added
    "resource_name": "Example Data from OSDF",
    "resource_description": "Sample data file imported from Open Science Data Federation"
}

result = make_request("POST", "/pelican/import-metadata", json_data=import_data)

if result.get("success"):
    print("‚úÖ Successfully imported file as resource!")
    print(f"   Resource ID: {result.get('resource_id')}")
else:
    print("‚ùå Import failed")
    pprint(result)

## 7. Common Use Cases

### 7.1 Search and Download Multiple Files

In [None]:
def list_files_recursive(path: str, federation: str = "osdf", max_depth: int = 2, 
                         current_depth: int = 0) -> list:
    """
    Recursively list files in a namespace.
    
    Parameters
    ----------
    path : str
        Starting path
    federation : str
        Federation name
    max_depth : int
        Maximum recursion depth
    current_depth : int
        Current depth (internal)
        
    Returns
    -------
    list
        List of file paths
    """
    if current_depth >= max_depth:
        return []
    
    all_files = []
    
    result = make_request(
        "GET",
        "/pelican/browse",
        params={"path": path, "federation": federation, "detail": False}
    )
    
    if not result.get("success"):
        return []
    
    # Add files from current directory
    for f in result.get("files", []):
        all_files.append(f"{path}/{f}")
    
    # Recurse into subdirectories
    for d in result.get("directories", []):
        subpath = f"{path}/{d}"
        all_files.extend(
            list_files_recursive(subpath, federation, max_depth, current_depth + 1)
        )
    
    return all_files


# Example: List all files in a namespace (2 levels deep)
namespace = "/ospool/uc-shared/public"
files = list_files_recursive(namespace, max_depth=2)

print(f"Found {len(files)} files:")
for f in files[:20]:
    print(f"  üìÑ {f}")
if len(files) > 20:
    print(f"  ... and {len(files) - 20} more")

### 7.2 Filter Files by Extension

In [None]:
def find_files_by_extension(path: str, extension: str, federation: str = "osdf") -> list:
    """
    Find files with a specific extension.
    
    Parameters
    ----------
    path : str
        Namespace path to search
    extension : str
        File extension (e.g., '.csv', '.nc')
    federation : str
        Federation name
        
    Returns
    -------
    list
        Matching file paths
    """
    all_files = list_files_recursive(path, federation, max_depth=3)
    return [f for f in all_files if f.endswith(extension)]


# Example: Find all CSV files
csv_files = find_files_by_extension("/ospool/uc-shared/public", ".csv")
print(f"Found {len(csv_files)} CSV files")
for f in csv_files[:10]:
    print(f"  üìÑ {f}")

## 8. Error Handling

Common errors and how to handle them:

In [None]:
# Example: Handle common errors

def safe_browse(path: str, federation: str = "osdf") -> dict:
    """
    Safely browse a path with error handling.
    """
    result = make_request(
        "GET",
        "/pelican/browse",
        params={"path": path, "federation": federation}
    )
    
    if not result.get("success"):
        error = result.get("error", "Unknown error")
        
        if "not found" in error.lower() or "404" in str(error):
            print(f"‚ö†Ô∏è Path not found: {path}")
            print("   Check that the path exists and you have access.")
        elif "timeout" in error.lower():
            print(f"‚è±Ô∏è Request timed out for: {path}")
            print("   The federation may be slow. Try again later.")
        elif "connection" in error.lower():
            print(f"üîå Connection error for: {path}")
            print("   Check your network and API endpoint.")
        else:
            print(f"‚ùå Error: {error}")
    
    return result


# Test with a non-existent path
safe_browse("/this/path/does/not/exist")

## Summary

In this tutorial, you learned how to:

1. ‚úÖ List available Pelican federations
2. ‚úÖ Browse namespaces and directories
3. ‚úÖ Get file information without downloading
4. ‚úÖ Download files from federations
5. ‚úÖ Import external files to your local catalog
6. ‚úÖ Handle common errors

## Next Steps

- Explore the [General Dataset API Tutorial](./general_dataset_api_tutorial.ipynb) to manage your datasets
- Check the [S3 API Tutorial](./s3_api_tutorial.ipynb) for local storage management
- Read the [NDP EP Documentation](https://github.com/sci-ndp/pop) for more features

## Resources

- [Pelican Platform](https://pelicanplatform.org/)
- [Open Science Data Federation](https://osg-htc.org/services/osdf.html)
- [NDP EP API Documentation](https://github.com/sci-ndp/pop)