# Amplifierd API - Collection Management

This notebook demonstrates collection discovery and management operations.

## Overview

Collections in amplifier provide:
- Profile manifests (schema v2 required)
- Agent definitions
- Context files
- Module packages

**New in v2**:
- Simplified text-based registry (`collection-sources.txt`)
- Git commit-based caching (immutable, efficient)
- Schema v2 validation for profiles
- Direct source path resolution (no extraction)

**Collection Types (Simplified)**:
- `git+` - Git repositories (starts with `git+`)
- `fsspec` - Everything else (https://, /path, ./path, file:///, bundled:, etc.)

**Collection Source Formats**:

**Git Repositories** (type: git+):
```
git+https://github.com/org/repo@main
git+https://github.com/org/repo@v1.0.0
git+https://github.com/org/repo@commit-hash
```

**Fsspec Paths** (type: fsspec):
```
https://example.com/path/to/collection    # HTTP(S) URL
/absolute/path/to/collection               # Absolute path
./relative/path/to/collection              # Relative to daemon
file:///absolute/path/to/collection        # File URL
bundled:amplifierd.data.collections.name   # Bundled collection
```

## Directory Structure

Collections are now organized with commit-based caching:

```
$AMPLIFIERD_HOME/
├── local/share/
│   ├── collection-sources.txt    # Simple text registry
│   └── profiles/                 # Cached profile manifests
│       └── {collection-id}/
│           └── {profile-id}.md
├── state/
│   ├── git/                      # Git checkout cache
│   │   └── {commit-hash}/
│   │       ├── profiles/
│   │       ├── agents/
│   │       └── context/
│   └── profiles/                 # Compiled profiles
│       └── {collection-id}/
│           └── {profile-uid}/
```

This notebook covers both **read operations** (discovery) and **write operations** (sync/mounting).

In [5]:
import json

import requests

BASE_URL = "http://127.0.0.1:8420"
API_BASE = f"{BASE_URL}/api/v1"


def print_response(response: requests.Response, title: str = "") -> None:
    if title:
        print(f"\n{'=' * 60}")
        print(f"{title}")
        print(f"{'=' * 60}")
    print(f"Status: {response.status_code} {response.reason}")
    if response.content:
        try:
            data = response.json()
            print(json.dumps(data, indent=2))
            return data
        except json.JSONDecodeError:
            print(response.text)
            return None
    return None


print("✓ Setup complete")

✓ Setup complete


## Discovery Operations

### List All Collections

Get all available collections with their basic info:

In [7]:
response = requests.get(f"{API_BASE}/collections/")
collections = print_response(response, "LIST COLLECTIONS")

if collections:
    print(f"\n✓ Found {len(collections)} collection(s)")
    for collection in collections:
        print(f"  - {collection['identifier']} ({collection['type']})")
        print(f"    Source: {collection['source']}")
        print(f"    Profiles: {collection.get('profilesCount', 0)}")
        if collection.get("packageBundled"):
            print("    [Bundled]")

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8420): Max retries exceeded with url: /api/v1/collections/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7697c8a3d940>: Failed to establish a new connection: [Errno 111] Connection refused'))

### Get Collection Details

Retrieve detailed information about a specific collection:

In [None]:
# Get details for a collection (change identifier as needed)
if collections:
    collection_id = collections[0]["identifier"]

    response = requests.get(f"{API_BASE}/collections/{collection_id}")
    details = print_response(response, f"GET COLLECTION: {collection_id}")

    if details:
        print(f"\n✓ Collection: {details['identifier']}")
        print(f"  Type: {details['type']}")
        print(f"  Source: {details['source']}")
        print(f"  Profiles: {len(details.get('profiles', []))} (count: {details.get('profilesCount', 0)})")
        print(f"  Agents: {len(details.get('agents', []))}")

        modules = details.get("modules", {})
        print("  Modules:")
        print(f"    - Providers: {len(modules.get('providers', []))}")
        print(f"    - Tools: {len(modules.get('tools', []))}")
        print(f"    - Hooks: {len(modules.get('hooks', []))}")
        print(f"    - Orchestrators: {len(modules.get('orchestrators', []))}")

        if details.get("packageBundled"):
            print("  [Package Bundled]")
else:
    print("No collections available. Try syncing collections first.")

### Explore Collection Resources

List all profiles and modules in a collection:

In [None]:
if collections and details:
    print("Profiles in collection:")
    for profile_path in details.get("profiles", []):
        print(f"  - {profile_path}")

    print("\nProvider modules:")
    for provider in details.get("modules", {}).get("providers", []):
        print(f"  - {provider}")

    print("\nTool modules:")
    for tool in details.get("modules", {}).get("tools", []):
        print(f"  - {tool}")

## Collection Sync Operations

### Sync All Collections

Update all collections from their sources (git repos fetch latest, local paths re-scan):

In [None]:
# Sync all collections from their sources
# This will:
# - For git sources: Fetch latest commits and update cache
# - For local sources: Re-scan directories
# - For bundled sources: No-op (always available)

response = requests.post(f"{API_BASE}/collections/sync", params={"update": True})
result = print_response(response, "SYNC COLLECTIONS")

if response.ok:
    print("\n✓ Collections synced")
    if result:
        print(f"  Synced: {result.get('synced', [])}")
        print(f"  Updated: {result.get('updated', [])}")
        print(f"  Skipped: {result.get('skipped', [])}")
        print(f"  Errors: {result.get('errors', [])}")

## Understanding Collection Types

Collection types are automatically inferred from the source reference:

**git+**: Git repositories
- Source starts with `git+`
- Example: `git+https://github.com/org/repo@main`
- Cached by commit hash in `state/git/{commit}/`

**fsspec**: Everything else (filesystem spec)
- https:// URLs (curl-like behavior)
- Absolute paths: `/absolute/path/to/collection`
- Relative paths: `./relative/path` (relative to daemon)
- File URLs: `file:///absolute/path`
- Bundled: `bundled:amplifierd.data.collections.name`

**Type Detection**:
```python
if source.startswith("git+"):
    type = "git+"
else:
    type = "fsspec"  # Includes https://, /path, ./path, file:///, bundled:
```

| Source Pattern | Type | Example |
|----------------|------|---------|
| `git+...` | git+ | `git+https://github.com/org/repo@main` |
| `bundled:...` | fsspec | `bundled:amplifierd.data.collections.core` |
| `https://...` | fsspec | `https://example.com/path/to/collection` |
| `/absolute/path` | fsspec | `/path/to/collection` |
| `./relative/path` | fsspec | `./my-collection` |
| `file:///...` | fsspec | `file:///absolute/path` |

In [None]:
# Example collection responses with new type system

# Example 1: Git repository
example_git = {
    "identifier": "my-repo",
    "source": "git+https://github.com/org/repo@main",
    "type": "git+",  # git+ repos
    "packageBundled": False,
    "profilesCount": 5,
}

# Example 2: Bundled collection
example_bundled = {
    "identifier": "foundation",
    "source": "bundled:amplifierd.data.collections.foundation",
    "type": "fsspec",  # bundled: uses fsspec
    "packageBundled": True,
    "profilesCount": 4,
}

# Example 3: Local path
example_local = {
    "identifier": "local-coll",
    "source": "/home/user/collection",
    "type": "fsspec",  # local paths use fsspec
    "packageBundled": False,
    "profilesCount": 3,
}

# Example 4: HTTPS URL
example_https = {
    "identifier": "remote-coll",
    "source": "https://example.com/collections/my-collection",
    "type": "fsspec",  # HTTPS URLs use fsspec
    "packageBundled": False,
    "profilesCount": 2,
}

print("Example API responses with new type system:")
print("\n1. Git Repository (type: git+):")
print(json.dumps(example_git, indent=2))
print("\n2. Bundled Collection (type: fsspec):")
print(json.dumps(example_bundled, indent=2))
print("\n3. Local Path (type: fsspec):")
print(json.dumps(example_local, indent=2))
print("\n4. HTTPS URL (type: fsspec):")
print(json.dumps(example_https, indent=2))

## Managing Collection Sources

### Viewing Collection Sources

The collection sources are stored in a simple text file (`collection-sources.txt`):

```text
# Bundled collections (type: fsspec)
bundled:amplifierd.data.collections.foundation

# Git collections (type: git+)
git+https://github.com/org/repo@main

# Local collections (type: fsspec)
/path/to/local/collection
./relative/path/collection

# HTTPS collections (type: fsspec)
https://example.com/collections/my-collection
```

**All sources except git+ use fsspec** for unified handling of:
- HTTP(S) URLs
- Local paths (absolute and relative)
- File URLs
- Bundled collections

In [None]:
# In the future, there will be API endpoints for:
# - POST /collections/sources (add a new source)
# - DELETE /collections/sources/{id} (remove a source)

# For now, edit the collection-sources.txt file directly:
# Location: $AMPLIFIERD_HOME/local/share/collection-sources.txt

print("ℹ Collection sources are managed via collection-sources.txt")
print("  Location: $AMPLIFIERD_HOME/local/share/collection-sources.txt")
print("  After editing, run sync to update: POST /collections/sync")

## Git Commit-Based Caching

Collections from git sources are cached by commit hash for efficiency:

In [None]:
# How git commit caching works:
#
# 1. Source: git+https://github.com/org/repo@main
# 2. Resolve ref 'main' to commit hash: abc123...
# 3. Check cache: $AMPLIFIERD_HOME/state/git/abc123.../
# 4. If cached: Use it (immutable - commits never change)
# 5. If not cached: Clone and checkout that commit
# 6. Cache remains valid until ref points to new commit

print("Git Commit Caching Benefits:")
print("  ✓ Immutable caching (commits never change)")
print("  ✓ No duplicate clones of same commit")
print("  ✓ Efficient updates (only fetch if new commits)")
print("  ✓ Rollback support (can reference any historical commit)")
print("")
print("Cache location: $AMPLIFIERD_HOME/state/git/{commit-hash}/")

## Complete Collection Workflow

Demonstrate discovery and sync workflow:

In [None]:
def collection_workflow():
    """Complete workflow: sync and explore collections."""

    # 1. Sync all collections
    print("1. Syncing collections...")
    response = requests.post(f"{API_BASE}/collections/sync", params={"update": True})
    if not response.ok:
        print("✗ Failed to sync collections")
        return
    print("✓ Collections synced")

    # 2. List all collections
    print("\n2. Listing collections...")
    response = requests.get(f"{API_BASE}/collections/")
    if not response.ok:
        print("✗ Failed to list collections")
        return

    collections = response.json()
    print(f"✓ Found {len(collections)} collection(s)")

    # 3. Explore each collection
    for collection in collections:
        identifier = collection["identifier"]
        print(f"\n3. Exploring '{identifier}'...")

        response = requests.get(f"{API_BASE}/collections/{identifier}")
        if response.ok:
            details = response.json()
            modules = details.get("modules", {})
            total_modules = (
                len(modules.get("providers", []))
                + len(modules.get("tools", []))
                + len(modules.get("hooks", []))
                + len(modules.get("orchestrators", []))
            )
            print("✓ Collection has:")
            print(f"  - Type: {details['type']}")
            print(f"  - Source: {details['source']}")
            print(f"  - Profiles: {details.get('profilesCount', 0)}")
            print(f"  - Agents: {len(details.get('agents', []))}")
            print(f"  - Modules: {total_modules}")

    print("\n✓ Workflow complete")


collection_workflow()

## Summary

### Discovery Operations
- ✓ List all available collections with basic info
- ✓ Get collection details (profiles, agents, modules)
- ✓ Explore collection resources

### Sync Operations
- ✓ Sync all collections from sources
- ✓ Git commit-based caching for efficiency
- ✓ Handle git+ and fsspec collection types

## API Endpoints Reference

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/v1/collections/` | List all collections |
| GET | `/api/v1/collections/{id}` | Get collection details |
| POST | `/api/v1/collections/sync` | Sync all collections from sources |

## Important Notes

### Schema v2 Profiles

The daemon now requires **schema v2** profiles. Schema v1 profiles will be skipped with warnings.

**Schema v2 Requirements**:
- Must have `schema-version: 2` in YAML frontmatter
- No `extends` field (profiles must be fully resolved)
- Agents referenced as individual files
- Context referenced as directory refs

### Git Commit Caching

Collections from git sources are cached by commit hash:
- Immutable caching (commits never change)
- No duplicate clones of same commit
- Efficient updates (only fetch if new commits)
- Rollback support (can reference any historical commit)

### Collection Types (Simplified)

Collection type is **inferred from source**:

**git+**: Git repositories
- Source starts with `git+`
- Example: `git+https://github.com/org/repo@main`

**fsspec**: Everything else
- Bundled: `bundled:amplifierd.data.collections.name`
- HTTPS: `https://example.com/path/to/collection`
- Absolute paths: `/path/to/collection`
- Relative paths: `./path/to/collection`
- File URLs: `file:///path/to/collection`

**Type Detection Logic**:
```python
if source.startswith("git+"):
    type = "git+"
else:
    type = "fsspec"
```

### Configuration Files

Collection sources are managed via:
- `$AMPLIFIERD_HOME/local/share/collection-sources.txt` - Simple text registry

Cached data locations:
- `$AMPLIFIERD_HOME/state/git/{commit-hash}/` - Git checkout cache
- `$AMPLIFIERD_HOME/local/share/profiles/` - Profile manifest cache
- `$AMPLIFIERD_HOME/state/profiles/` - Compiled profiles

## Troubleshooting

### Collection Not Showing Profiles

If a collection shows `profilesCount: 0`:
1. Check that profile files have `schema-version: 2` in frontmatter
2. Ensure profiles are in `profiles/` directory
3. Run sync to re-scan: `POST /collections/sync?update=true`
4. Check daemon logs for profile validation warnings

### Git Collection Not Updating

If syncing doesn't update a git collection:
1. Check that the ref (branch/tag) has new commits
2. The cache uses commit hash - same commit = cached
3. Force update by changing ref in collection-sources.txt
4. Check network connectivity to git remote

### Fsspec Collection Not Found

If an fsspec collection (local/https/bundled) isn't discovered:
1. Verify path/URL exists and is accessible
2. For local paths: Check directory has `profiles/` subdirectory
3. For HTTPS: Verify URL is reachable and returns valid content
4. For bundled: Ensure package is installed
5. Run sync to re-scan: `POST /collections/sync?update=true`

## Next Steps

Continue to:
- **05-profile-management.ipynb** - Profile discovery and compilation
- **06-agent-management.ipynb** - Agent discovery and usage