[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sci-ndp/ndp-ep-py/blob/main/docs/source/tutorials/pelican_federation.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sci-ndp/ndp-ep-py/main?filepath=docs%2Fsource%2Ftutorials%2Fpelican_federation.ipynb)

# Pelican Federation Tutorial

Welcome to the comprehensive tutorial on Pelican Federation operations with the NDP EP Python client!

## üéØ What You'll Learn

This tutorial covers Pelican Federation data access workflows:

- **üåê Federation Discovery**: List and explore available Pelican federations
- **üìÇ Namespace Browsing**: Navigate and browse federation namespaces
- **üìã File Information**: Retrieve metadata without downloading
- **üì• File Downloads**: Download files with regular and streaming modes
- **üì¶ Metadata Import**: Register Pelican files in the local catalog
- **üõ°Ô∏è Error Handling**: Robust error handling and best practices

## üîß Use Cases

Perfect for:
- **Open Science Data Access**: Accessing open science data from OSDF and other federations
- **Data Discovery**: Exploring available datasets across federations
- **Data Integration**: Importing external data references into your local catalog
- **Large File Handling**: Streaming downloads for large scientific datasets
- **Metadata Cataloging**: Building comprehensive data catalogs with external references

## üåê About Pelican

Pelican is a federated data platform that enables secure, scalable data sharing across institutions. The Open Science Data Federation (OSDF) is one of the primary federations, providing access to scientific datasets from research institutions worldwide.

## ‚ö†Ô∏è Prerequisites

- Valid NDP EP API credentials (token)
- Network access to Pelican federations
- Basic understanding of federated data concepts

In [None]:
# Install required libraries
!pip install ndp-ep

# Import required modules
import time
import getpass
from typing import List, Dict, Any, Optional
from ndp_ep import APIClient

print("‚úÖ Libraries installed and imported successfully!")
print("üìö Ready to start Pelican Federation tutorial")

## 1. üîê Authentication and Client Setup

First, let's configure the client with your credentials for Pelican operations.

In [None]:
# Interactive API configuration
print("üîß Pelican Federation Configuration")
print("=" * 40)

# Get API base URL
api_url = input("Enter API base URL [http://localhost:8000]: ").strip()
if not api_url:
    api_url = "http://localhost:8000"

print(f"üì° API URL: {api_url}")

# Get API token securely
print("\nüîê Authentication")
print("Please enter your API token (it will be hidden):")
api_token = getpass.getpass("API Token: ")

if not api_token.strip():
    raise ValueError("‚ùå API token is required for Pelican operations")

print("‚úÖ Credentials configured securely")

In [None]:
# Initialize and test the API client
print("üöÄ Initializing Pelican-enabled API Client...")

try:
    client = APIClient(base_url=api_url, token=api_token)
    
    # Test basic connection
    try:
        system_status = client.get_system_status()
        print("‚úÖ API client initialized successfully")
        print(f"üåê Connected to: {api_url}")
        print("üîë Authentication verified")
        
        # Test Pelican functionality by listing federations
        try:
            federations = client.list_federations()
            fed_count = federations.get('count', len(federations.get('federations', {})))
            print(f"üåê Pelican functionality confirmed - {fed_count} federations available")
        except Exception as e:
            print(f"‚ö†Ô∏è  Pelican functionality test: {e}")
            print("üí° Pelican features may require additional setup")
            
    except Exception as e:
        print(f"‚ö†Ô∏è  API connection test failed: {e}")
        print("üí° Continuing in demo mode - some features may not work")
        print("‚úÖ Client object created successfully")
    
except Exception as e:
    print(f"‚ùå Failed to initialize client: {e}")
    print("üí° Please check your credentials and API URL")
    raise

## 2. üìã Helper Functions and Configuration

Let's create utility functions for our Pelican operations.

In [None]:
# Configuration for the tutorial
DEFAULT_FEDERATION = "osdf"
SAMPLE_PATHS = [
    "/ospool/uc-shared/public",
    "/osgconnect/public",
]

print("üìä Tutorial Configuration")
print("=" * 30)
print(f"Default federation: {DEFAULT_FEDERATION}")
print(f"Sample paths to explore: {len(SAMPLE_PATHS)} paths")

# Storage for tracking operations
operation_log = []

In [None]:
def log_operation(operation: str, resource_type: str, 
                  resource_name: str, success: bool, 
                  details: str = "") -> None:
    """
    Log a Pelican operation for tracking and debugging.
    
    Args:
        operation: Type of operation (list, browse, download, import)
        resource_type: Type of resource (federation, file, directory)
        resource_name: Name of the resource
        success: Whether the operation was successful
        details: Additional details or error messages
    """
    timestamp = time.strftime("%H:%M:%S")
    status = "‚úÖ" if success else "‚ùå"
    
    log_entry = {
        "timestamp": timestamp,
        "operation": operation,
        "resource_type": resource_type,
        "resource_name": resource_name,
        "success": success,
        "details": details
    }
    
    operation_log.append(log_entry)
    print(f"{status} [{timestamp}] {operation.title()} {resource_type}: {resource_name}")
    
    if details and not success:
        print(f"   ‚îî‚îÄ Error: {details}")
    elif details and success:
        print(f"   ‚îî‚îÄ {details}")


def format_file_size(size_bytes: int) -> str:
    """
    Format file size in human-readable format.
    
    Args:
        size_bytes: Size in bytes
        
    Returns:
        Formatted size string
    """
    for unit in ['B', 'KB', 'MB', 'GB']:
        if size_bytes < 1024.0:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024.0
    return f"{size_bytes:.1f} TB"


def safe_pelican_operation(func, *args, operation_name: str, **kwargs):
    """
    Safely execute a Pelican operation with error handling.
    
    Args:
        func: Function to execute
        operation_name: Description of the operation
        *args, **kwargs: Arguments for the function
        
    Returns:
        Result of the function or None if failed
    """
    try:
        result = func(*args, **kwargs)
        return result
    except ValueError as e:
        print(f"‚ùå {operation_name} failed: {e}")
        return None
    except Exception as e:
        print(f"‚ö†Ô∏è  Unexpected error in {operation_name}: {e}")
        return None

print("üîß Helper functions defined successfully")
print("üìù Ready for Pelican operations")

## 3. üåê Federation Discovery

Let's discover and explore available Pelican federations.

In [None]:
# List available federations
print("üåê Available Pelican Federations")
print("=" * 35)

try:
    federations_data = client.list_federations()
    
    if federations_data.get('success', True):
        federations = federations_data.get('federations', {})
        count = federations_data.get('count', len(federations))
        
        print(f"üìã Found {count} available federations:\n")
        
        for i, (name, info) in enumerate(federations.items(), 1):
            print(f"   {i}. {name}")
            if isinstance(info, dict):
                if 'url' in info:
                    print(f"      üîó URL: {info['url']}")
                if 'description' in info:
                    print(f"      üìù Description: {info['description']}")
            else:
                print(f"      üîó URL: {info}")
            print()
        
        log_operation("list", "federations", "all", True, 
                     f"Found {count} federations")
    else:
        print("‚ö†Ô∏è  Unexpected response format")
        print(f"Response: {federations_data}")
    
except Exception as e:
    print(f"‚ùå Failed to list federations: {e}")
    log_operation("list", "federations", "all", False, str(e))

## 4. üìÇ Namespace Browsing

Let's explore files and directories in Pelican namespaces.

In [None]:
# Browse a namespace path
browse_path = SAMPLE_PATHS[0] if SAMPLE_PATHS else "/ospool"

print(f"üìÇ Browsing Namespace: {browse_path}")
print(f"üåê Federation: {DEFAULT_FEDERATION}")
print("=" * 50)

try:
    browse_result = client.browse_pelican(
        path=browse_path,
        federation=DEFAULT_FEDERATION,
        detail=False
    )
    
    if browse_result.get('success', True):
        files = browse_result.get('files', [])
        count = browse_result.get('count', len(files))
        
        print(f"üìã Found {count} items:\n")
        
        # Display first 10 items
        display_limit = min(10, len(files))
        for i, item in enumerate(files[:display_limit], 1):
            if isinstance(item, dict):
                name = item.get('name', item.get('Name', 'Unknown'))
                item_type = item.get('type', 'file')
                icon = "üìÅ" if item_type == 'directory' else "üìÑ"
            else:
                name = str(item)
                icon = "üìÑ"
            print(f"   {i}. {icon} {name}")
        
        if len(files) > display_limit:
            print(f"   ... and {len(files) - display_limit} more items")
        
        log_operation("browse", "directory", browse_path, True, 
                     f"Found {count} items")
    else:
        print("‚ö†Ô∏è  Browse returned unexpected format")
        
except Exception as e:
    print(f"‚ùå Failed to browse path: {e}")
    log_operation("browse", "directory", browse_path, False, str(e))

In [None]:
# Browse with detailed information
print(f"\nüîç Detailed Browse: {browse_path}")
print("=" * 50)

try:
    detailed_result = client.browse_pelican(
        path=browse_path,
        federation=DEFAULT_FEDERATION,
        detail=True  # Request detailed file information
    )
    
    if detailed_result.get('success', True):
        files = detailed_result.get('files', [])
        
        print(f"üìã Detailed listing ({len(files)} items):\n")
        
        # Display first 5 items with details
        display_limit = min(5, len(files))
        for i, item in enumerate(files[:display_limit], 1):
            if isinstance(item, dict):
                name = item.get('name', item.get('Name', 'Unknown'))
                size = item.get('size', item.get('Size', 0))
                item_type = item.get('type', 'file')
                modified = item.get('modified', item.get('LastModified', 'Unknown'))
                
                icon = "üìÅ" if item_type == 'directory' else "üìÑ"
                print(f"   {i}. {icon} {name}")
                if size:
                    print(f"      üì¶ Size: {format_file_size(int(size))}")
                if modified and modified != 'Unknown':
                    print(f"      üìÖ Modified: {modified}")
            else:
                print(f"   {i}. üìÑ {item}")
            print()
        
        if len(files) > display_limit:
            print(f"   ... and {len(files) - display_limit} more items")
        
        log_operation("browse_detailed", "directory", browse_path, True, 
                     f"Retrieved details for {len(files)} items")
    
except Exception as e:
    print(f"‚ùå Failed to get detailed browse: {e}")
    log_operation("browse_detailed", "directory", browse_path, False, str(e))

## 5. üìã File Information and Metadata

Let's retrieve file metadata without downloading the actual content.

In [None]:
# Get file information
# We'll try to find a file from the browse results
print("üìã Getting File Information")
print("=" * 30)

# Try to find a file to get info for
sample_file_path = None

try:
    # Use the browse results to find a file
    browse_result = client.browse_pelican(
        path=browse_path,
        federation=DEFAULT_FEDERATION,
        detail=True
    )
    
    files = browse_result.get('files', [])
    
    # Find first file (not directory)
    for item in files:
        if isinstance(item, dict):
            item_type = item.get('type', 'file')
            if item_type == 'file':
                name = item.get('name', item.get('Name', ''))
                if name:
                    sample_file_path = f"{browse_path}/{name}"
                    break
    
    if sample_file_path:
        print(f"üîç Getting info for: {sample_file_path}")
        print(f"üåê Federation: {DEFAULT_FEDERATION}\n")
        
        file_info = client.get_pelican_info(
            path=sample_file_path,
            federation=DEFAULT_FEDERATION
        )
        
        if file_info.get('success', True):
            print("‚úÖ File information retrieved:")
            
            info_fields = ['name', 'size', 'type', 'modified', 'content_type']
            for field in info_fields:
                if field in file_info:
                    value = file_info[field]
                    if field == 'size':
                        value = f"{value} bytes ({format_file_size(int(value))})"
                    print(f"   üìã {field.replace('_', ' ').title()}: {value}")
            
            # Display any additional metadata
            extra_fields = [k for k in file_info.keys() 
                           if k not in info_fields + ['success']]
            if extra_fields:
                print("   üè∑Ô∏è  Additional metadata:")
                for field in extra_fields:
                    print(f"      {field}: {file_info[field]}")
            
            log_operation("info", "file", sample_file_path, True, 
                         "Metadata retrieved")
    else:
        print("‚ÑπÔ∏è  No files found to inspect")
        print("üí° Try browsing a different path with files")
        
except Exception as e:
    print(f"‚ùå Failed to get file info: {e}")
    if sample_file_path:
        log_operation("info", "file", sample_file_path, False, str(e))

## 6. üì• File Downloads

Let's download files from Pelican federations using different methods.

In [None]:
# Regular download (entire file at once)
print("üì• Regular File Download")
print("=" * 30)

if sample_file_path:
    print(f"üîç Downloading: {sample_file_path}")
    print(f"üåê Federation: {DEFAULT_FEDERATION}\n")
    
    try:
        download_start = time.time()
        
        # Download entire file
        content = client.download_pelican(
            path=sample_file_path,
            federation=DEFAULT_FEDERATION,
            stream=False  # Get entire file at once
        )
        
        download_duration = time.time() - download_start
        
        print(f"‚úÖ Download completed")
        print(f"üì¶ Size: {format_file_size(len(content))}")
        print(f"‚è±Ô∏è  Duration: {download_duration:.2f} seconds")
        
        if download_duration > 0:
            speed = len(content) / download_duration
            print(f"üöÄ Speed: {format_file_size(int(speed))}/s")
        
        # Preview content if it's text
        try:
            text_content = content.decode('utf-8')
            preview_length = min(200, len(text_content))
            print(f"\nüìÑ Content preview:")
            print(f"   {text_content[:preview_length]}")
            if len(text_content) > preview_length:
                print(f"   ... (truncated, {len(text_content)} total characters)")
        except UnicodeDecodeError:
            print(f"\nüìÑ Binary file (cannot preview as text)")
        
        log_operation("download", "file", sample_file_path, True, 
                     f"Size: {format_file_size(len(content))}")
        
    except Exception as e:
        print(f"‚ùå Download failed: {e}")
        log_operation("download", "file", sample_file_path, False, str(e))
else:
    print("‚ÑπÔ∏è  No file available for download")
    print("üí° Browse a path with files first")

In [None]:
# Streaming download (for large files)
print("üì• Streaming File Download")
print("=" * 30)
print("üí° Streaming is ideal for large files to reduce memory usage\n")

if sample_file_path:
    print(f"üîç Streaming: {sample_file_path}")
    print(f"üåê Federation: {DEFAULT_FEDERATION}\n")
    
    try:
        stream_start = time.time()
        
        # Get streaming iterator
        content_iterator = client.download_pelican(
            path=sample_file_path,
            federation=DEFAULT_FEDERATION,
            stream=True  # Get iterator for streaming
        )
        
        # Process chunks
        total_bytes = 0
        chunk_count = 0
        chunks = []
        
        print("üìä Processing chunks:")
        for chunk in content_iterator:
            chunk_count += 1
            total_bytes += len(chunk)
            chunks.append(chunk)
            
            # Show progress every 10 chunks
            if chunk_count % 10 == 0:
                print(f"   ‚è≥ Chunk {chunk_count}: {format_file_size(total_bytes)} received")
        
        stream_duration = time.time() - stream_start
        
        print(f"\n‚úÖ Streaming completed")
        print(f"üì¶ Total size: {format_file_size(total_bytes)}")
        print(f"üìä Chunks received: {chunk_count}")
        print(f"‚è±Ô∏è  Duration: {stream_duration:.2f} seconds")
        
        if stream_duration > 0:
            speed = total_bytes / stream_duration
            print(f"üöÄ Speed: {format_file_size(int(speed))}/s")
        
        log_operation("stream_download", "file", sample_file_path, True, 
                     f"{chunk_count} chunks, {format_file_size(total_bytes)}")
        
    except Exception as e:
        print(f"‚ùå Streaming failed: {e}")
        log_operation("stream_download", "file", sample_file_path, False, str(e))
else:
    print("‚ÑπÔ∏è  No file available for streaming")

## 7. üì¶ Importing Pelican Metadata to Local Catalog

Let's register a Pelican file in the local catalog for easier discovery and management.

In [None]:
# Import Pelican metadata to local catalog
print("üì¶ Importing Pelican Metadata to Local Catalog")
print("=" * 50)

# This operation requires an existing dataset/package in the catalog
print("\n‚ö†Ô∏è  Prerequisites for metadata import:")
print("   1. A valid Pelican URL (pelican://federation/path)")
print("   2. An existing package/dataset ID in the catalog")
print("   3. Write permissions on the target package\n")

# Example of how to use import_pelican_metadata
print("üìù Example Usage:")
print("""
   # Import a Pelican file into a catalog package
   result = client.import_pelican_metadata(
       pelican_url="pelican://osg-htc.org/ospool/uc-shared/public/sample.csv",
       package_id="my-dataset-id",
       resource_name="Sample Data from OSDF",
       resource_description="Climate data from Open Science Grid"
   )
   
   # The result contains the created resource information
   print(result['resource']['id'])
""")

# Optionally try an actual import if user provides package ID
print("\n" + "=" * 50)
print("üîß Interactive Import (Optional)")

try_import = input("\nWould you like to try an actual import? (yes/no): ").strip().lower()

if try_import == 'yes':
    package_id = input("Enter target package ID: ").strip()
    
    if package_id and sample_file_path:
        # Construct Pelican URL
        pelican_url = f"pelican://osg-htc.org{sample_file_path}"
        
        print(f"\nüì§ Importing:")
        print(f"   URL: {pelican_url}")
        print(f"   Package: {package_id}")
        
        try:
            result = client.import_pelican_metadata(
                pelican_url=pelican_url,
                package_id=package_id,
                resource_name=f"Pelican Import - {sample_file_path.split('/')[-1]}",
                resource_description="Imported from Pelican Federation tutorial"
            )
            
            print("\n‚úÖ Metadata imported successfully!")
            if 'resource' in result:
                print(f"   üìã Resource ID: {result['resource'].get('id', 'N/A')}")
            
            log_operation("import", "metadata", pelican_url, True, 
                         f"To package: {package_id}")
            
        except Exception as e:
            print(f"\n‚ùå Import failed: {e}")
            log_operation("import", "metadata", pelican_url, False, str(e))
    else:
        print("‚ÑπÔ∏è  Package ID or file path not provided")
else:
    print("‚è≠Ô∏è  Skipping interactive import")

## 8. üìä Operation Summary

Let's review all the operations we performed during this tutorial.

In [None]:
# Display operation summary
print("üìä Pelican Federation Tutorial Summary")
print("=" * 45)

if operation_log:
    successful_ops = sum(1 for op in operation_log if op['success'])
    failed_ops = sum(1 for op in operation_log if not op['success'])
    
    print(f"\nüìà Overall Statistics:")
    print(f"   ‚úÖ Successful operations: {successful_ops}")
    print(f"   ‚ùå Failed operations: {failed_ops}")
    print(f"   üìã Total operations: {len(operation_log)}")
    
    print(f"\nüìã Operation Log:")
    print("-" * 60)
    
    for i, op in enumerate(operation_log, 1):
        status = "‚úÖ" if op['success'] else "‚ùå"
        print(f"   {i}. [{op['timestamp']}] {status} {op['operation'].title()}")
        print(f"      {op['resource_type']}: {op['resource_name']}")
        if op['details']:
            print(f"      ‚îî‚îÄ {op['details']}")
else:
    print("\nüì≠ No operations logged")

print("\n" + "=" * 45)
print("üéâ Tutorial completed!")
print("\nüí° Next steps:")
print("   1. Explore other federations using list_federations()")
print("   2. Browse different namespace paths in OSDF")
print("   3. Import useful datasets into your catalog")
print("   4. Use streaming downloads for large files")

## üéì Additional Resources

### Pelican Documentation
- [Pelican Platform](https://pelicanplatform.org/) - Official Pelican documentation
- [Open Science Data Federation](https://osg-htc.org/services/osdf.html) - OSDF information

### API Methods Reference

| Method | Description |
|--------|-------------|
| `list_federations()` | List available Pelican federations |
| `browse_pelican(path, federation, detail)` | Browse namespace contents |
| `get_pelican_info(path, federation)` | Get file metadata |
| `download_pelican(path, federation, stream)` | Download files |
| `import_pelican_metadata(...)` | Import to local catalog |

### Common Federation Paths

```python
# OSDF public paths
"/ospool/uc-shared/public"
"/osgconnect/public"
```