# Customer.IO Data Pipelines API - Objects and Relationships

## Purpose

This notebook demonstrates comprehensive object management and relationship tracking with Customer.IO's Data Pipelines API.
It covers groups (companies/organizations), object relationships, hierarchies, and advanced relationship patterns with proper validation and error handling.

## Prerequisites

- Complete setup from `00_setup_and_configuration.ipynb`
- Complete authentication setup from `01_authentication_and_utilities.ipynb`
- Customer.IO API key configured in Databricks secrets
- Sample data available in Delta tables

## Key Concepts

- **Groups/Objects**: Organizations, companies, accounts, or any business entity
- **Relationships**: Connections between users and groups (membership, ownership, etc.)
- **Hierarchies**: Parent-child relationships between objects
- **Attributes**: Custom properties for groups and relationships
- **Type Safety**: Validated object creation and management
- **Batch Operations**: Efficient bulk object and relationship management

## Object Types Covered

1. **Companies/Organizations**: Business entities with users
2. **Teams/Departments**: Sub-units within organizations
3. **Projects/Workspaces**: Collaborative spaces
4. **Custom Objects**: Business-specific entities
5. **Relationship Types**: Member, Admin, Owner, Custom roles

## Setup and Imports

In [None]:
# Standard library imports
import sys
import os
from datetime import datetime, timezone, timedelta
from typing import Dict, List, Optional, Any, Union, Set
import json
import uuid
from enum import Enum
from dataclasses import dataclass

print("SUCCESS: Standard libraries imported")

In [None]:
# Add utils directory to Python path
sys.path.append('/Workspace/Repos/customer_io_notebooks/utils')
print("SUCCESS: Utils directory added to Python path")

In [None]:
# Import Customer.IO API utilities
from utils.api_client import CustomerIOClient
from utils.validators import (
    GroupRequest,
    validate_request_size,
    create_context
)

print("SUCCESS: Customer.IO API utilities imported")

In [None]:
# Import transformation utilities
from utils.transformers import (
    CustomerTransformer,
    EventTransformer,
    BatchTransformer,
    ContextTransformer
)

print("SUCCESS: Transformation utilities imported")

In [None]:
# Import error handling utilities
from utils.error_handlers import (
    CustomerIOError,
    RateLimitError,
    ValidationError,
    NetworkError,
    retry_on_error,
    ErrorContext
)

print("SUCCESS: Error handling utilities imported")

In [None]:
# Import Databricks and Spark utilities
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import *
from delta.tables import DeltaTable

print("SUCCESS: Databricks and Spark utilities imported")

In [None]:
# Import validation and logging
import structlog
from pydantic import ValidationError as PydanticValidationError, BaseModel, Field, validator

# Import GroupManager for advanced object and relationship handling
from utils.group_manager import (
    GroupManager, 
    GroupTraits, 
    UserGroupRelationship, 
    ObjectHierarchy,
    ObjectType, 
    RelationshipRole, 
    RelationshipStatus,
    PermissionSet
)

# Initialize logger
logger = structlog.get_logger("objects_relationships")

print("SUCCESS: Validation, logging, and GroupManager imported")

## Configuration and Client Setup

In [None]:
# Load configuration from setup notebook (secure approach)
try:
    CUSTOMERIO_REGION = dbutils.widgets.get("customerio_region") or "us"
    DATABASE_NAME = dbutils.widgets.get("database_name") or "customerio_demo"
    CATALOG_NAME = dbutils.widgets.get("catalog_name") or "main"
    ENVIRONMENT = dbutils.widgets.get("environment") or "test"
    
    print(f"Configuration loaded from setup notebook:")
    print(f"  Region: {CUSTOMERIO_REGION}")
    print(f"  Database: {CATALOG_NAME}.{DATABASE_NAME}")
    print(f"  Environment: {ENVIRONMENT}")
    
except Exception as e:
    print(f"WARNING: Could not load configuration from setup notebook: {str(e)}")
    print("INFO: Using fallback configuration")
    CUSTOMERIO_REGION = "us"
    DATABASE_NAME = "customerio_demo"
    CATALOG_NAME = "main"
    ENVIRONMENT = "test"

In [None]:
# Get Customer.IO API key from secure storage
CUSTOMERIO_API_KEY = dbutils.secrets.get("customerio", "api_key")
print("SUCCESS: Customer.IO API key retrieved from secure storage")

In [None]:
# Configure Spark to use the specified database
spark.sql(f"USE {CATALOG_NAME}.{DATABASE_NAME}")
print("SUCCESS: Database configured")

In [None]:
# Initialize the Customer.IO client
try:
    client = CustomerIOClient(
        api_key=CUSTOMERIO_API_KEY,
        region=CUSTOMERIO_REGION,
        timeout=30,
        max_retries=3,
        retry_backoff_factor=2.0,
        enable_logging=True,
        spark_session=spark
    )
    print("SUCCESS: Customer.IO client initialized for object management")
    
except Exception as e:
    print(f"ERROR: Failed to initialize Customer.IO client: {str(e)}")
    raise

In [None]:
# Initialize the GroupManager with the Customer.IO client
group_manager = GroupManager(client)
print("SUCCESS: GroupManager initialized for object and relationship management")

## Test-Driven Development: Object and Relationship Validation

In [None]:
# Test function: Validate basic group structure
def test_basic_group_validation():
    """Test that basic groups have required fields and pass validation."""
    
    # Test valid group
    valid_group = {
        "userId": "user_123",
        "groupId": "company_456",
        "traits": {
            "name": "Acme Corporation",
            "industry": "Technology",
            "size": "100-500"
        },
        "timestamp": datetime.now(timezone.utc)
    }
    
    try:
        group_request = GroupRequest(**valid_group)
        assert group_request.userId == "user_123"
        assert group_request.groupId == "company_456"
        print("SUCCESS: Basic group validation test passed")
        return True
    except Exception as e:
        print(f"ERROR: Basic group validation test failed: {str(e)}")
        return False

# Run the test
test_basic_group_validation()

In [None]:
# Test function: Validate relationship data structures
def test_relationship_structure():
    """Test that relationship data structures are properly validated."""
    
    # Test relationship data
    relationship = {
        "user_id": "user_123",
        "group_id": "company_456",
        "role": "admin",
        "joined_at": datetime.now(timezone.utc).isoformat(),
        "permissions": ["read", "write", "delete"]
    }
    
    # Validate structure
    required_fields = ["user_id", "group_id", "role"]
    for field in required_fields:
        if field not in relationship:
            print(f"ERROR: Missing required field: {field}")
            return False
    
    # Validate data types
    if not isinstance(relationship["permissions"], list):
        print("ERROR: Permissions must be a list")
        return False
    
    print("SUCCESS: Relationship structure validation passed")
    return True

# Run the test
test_relationship_structure()

In [None]:
# Test object types and enumerations from GroupManager
print(\"Available object types:\")\nfor obj_type in ObjectType:\n    print(f\"  - {obj_type.value}\")\n\nprint(\"\\nAvailable relationship roles:\")\nfor role in RelationshipRole:\n    print(f\"  - {role.value}\")\n\nprint(\"\\nAvailable relationship statuses:\")\nfor status in RelationshipStatus:\n    print(f\"  - {status.value}\")\n\nprint(\"\\nSUCCESS: Object types and enumerations are available from GroupManager\")"

In [None]:
# Test type-safe group traits from GroupManager\nsample_company_traits = GroupTraits(\n    name=\"TechCorp Solutions\",\n    type=ObjectType.COMPANY,\n    industry=\"Software\",\n    size=\"50-100\",\n    website=\"techcorp.com\",\n    plan=\"enterprise\",\n    monthly_spend=4999.99,\n    employee_count=75\n)\n\nprint(\"GroupTraits model test:\")\nprint(f\"  Name: {sample_company_traits.name}\")\nprint(f\"  Type: {sample_company_traits.type}\")\nprint(f\"  Website: {sample_company_traits.website}\")  # Should auto-add https://\nprint(f\"  Employee Count: {sample_company_traits.employee_count}\")\nprint(\"SUCCESS: GroupTraits model working with validation\")"

In [None]:
# Test relationship model from GroupManager\nsample_relationship = UserGroupRelationship(\n    user_id=\"user_admin_001\",\n    group_id=\"company_techcorp_001\",\n    role=RelationshipRole.ADMIN,\n    status=RelationshipStatus.ACTIVE,\n    permissions=[\"users.manage\", \"billing.view\", \"settings.edit\"],\n    metadata={\n        \"department\": \"Engineering\",\n        \"title\": \"VP of Engineering\"\n    }\n)\n\nprint(\"UserGroupRelationship model test:\")\nprint(f\"  User ID: {sample_relationship.user_id}\")\nprint(f\"  Group ID: {sample_relationship.group_id}\")\nprint(f\"  Role: {sample_relationship.role}\")\nprint(f\"  Status: {sample_relationship.status}\")\nprint(f\"  Permissions: {sample_relationship.permissions}\")\nprint(f\"  Joined at: {sample_relationship.joined_at}\")\nprint(\"SUCCESS: UserGroupRelationship model working with validation\")"

In [None]:
# Test hierarchy model from GroupManager\nsample_hierarchy = ObjectHierarchy(\n    parent_id=\"company_techcorp_001\",\n    child_id=\"team_engineering_001\",\n    relationship_type=\"contains\",\n    level=1,\n    path=\"/company_techcorp_001/team_engineering_001\"\n)\n\nprint(\"ObjectHierarchy model test:\")\nprint(f\"  Parent ID: {sample_hierarchy.parent_id}\")\nprint(f\"  Child ID: {sample_hierarchy.child_id}\")\nprint(f\"  Relationship Type: {sample_hierarchy.relationship_type}\")\nprint(f\"  Level: {sample_hierarchy.level}\")\nprint(f\"  Path: {sample_hierarchy.path}\")\nprint(f\"  Created at: {sample_hierarchy.created_at}\")\nprint(\"SUCCESS: ObjectHierarchy model working with validation\")"

In [None]:
# Implementation: Create group using GroupManager\nif ENVIRONMENT == \"test\":\n    print(\"INFO: Running in test mode - group not actually sent\")\n    result = {\"status\": \"test_success\", \"message\": \"Group validated successfully\"}\nelse:\n    result = group_manager.create_group(\n        user_id=\"user_founder_001\",\n        group_id=\"company_techcorp_001\",\n        group_traits=sample_company_traits\n    )\n\nprint(\"Group created using GroupManager:\")\nprint(json.dumps({\"traits\": sample_company_traits.dict()}, indent=2, default=str))\nprint(f\"Result: {result}\")"

In [None]:
# Define relationship model
class UserGroupRelationship(BaseModel):
    """Type-safe user-group relationship model."""
    user_id: str = Field(..., description="User identifier")
    group_id: str = Field(..., description="Group identifier")
    role: RelationshipRole = Field(..., description="User role in group")
    status: RelationshipStatus = Field(default=RelationshipStatus.ACTIVE)
    joined_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    permissions: List[str] = Field(default_factory=list, description="User permissions")
    metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
    
    @validator('user_id', 'group_id')
    def validate_ids(cls, v: str) -> str:
        """Validate ID format."""
        if not v or len(v.strip()) == 0:
            raise ValueError("ID cannot be empty")
        return v.strip()
    
    class Config:
        use_enum_values = True

print("SUCCESS: UserGroupRelationship model defined")

In [None]:
# Define hierarchy model
class ObjectHierarchy(BaseModel):
    """Type-safe object hierarchy model."""
    parent_id: str = Field(..., description="Parent object ID")
    child_id: str = Field(..., description="Child object ID")
    relationship_type: str = Field(default="contains", description="Type of hierarchy")
    level: int = Field(ge=0, description="Hierarchy level")
    path: str = Field(..., description="Full hierarchy path")
    created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    
    @validator('child_id')
    def validate_no_circular_reference(cls, v: str, values: Dict) -> str:
        """Validate no circular references."""
        if 'parent_id' in values and v == values['parent_id']:
            raise ValueError("Circular reference: parent and child cannot be the same")
        return v

print("SUCCESS: ObjectHierarchy model defined")

## Group/Object Management Implementation

In [None]:
# Implementation: Create and identify a group
def create_group(
    user_id: str,
    group_id: str,
    group_traits: GroupTraits
) -> Dict[str, Any]:
    """Create a group/organization with proper validation."""
    
    group_data = {
        "userId": user_id,
        "groupId": group_id,
        "traits": group_traits.dict(exclude_none=True),
        "timestamp": datetime.now(timezone.utc)
    }
    
    # Validate the group request
    group_request = GroupRequest(**group_data)
    
    return group_request.dict()

# Test group creation
sample_company = GroupTraits(
    name="TechCorp Solutions",
    type=ObjectType.COMPANY,
    industry="Software",
    size="50-100",
    website="techcorp.com",
    plan="enterprise",
    monthly_spend=4999.99,
    employee_count=75
)

group_data = create_group(
    user_id="user_founder_001",
    group_id="company_techcorp_001",
    group_traits=sample_company
)

print("Group created:")
print(json.dumps(group_data, indent=2, default=str))

In [None]:
# Implementation: Send group to Customer.IO
def send_group(group_data: Dict[str, Any], test_mode: bool = True) -> Dict[str, Any]:
    """Send a group to Customer.IO with error handling."""
    
    try:
        if test_mode:
            print("INFO: Running in test mode - group not actually sent")
            return {"status": "test_success", "message": "Group validated successfully"}
        
        # Send the group
        response = client.group(**group_data)
        
        logger.info(
            "Group sent successfully",
            user_id=group_data.get("userId"),
            group_id=group_data.get("groupId")
        )
        
        return response
        
    except CustomerIOError as e:
        logger.error("Failed to send group", error=str(e))
        raise

# Test sending the group
result = send_group(group_data, test_mode=(ENVIRONMENT == "test"))
print(f"Group send result: {result}")

## Relationship Management

In [None]:
# Implementation: Create user-group relationship
def create_relationship(
    relationship: UserGroupRelationship
) -> Dict[str, Any]:
    """Create a user-group relationship event."""
    
    event_data = {
        "userId": relationship.user_id,
        "event": "User Added to Group",
        "properties": {
            "group_id": relationship.group_id,
            "role": relationship.role,
            "status": relationship.status,
            "joined_at": relationship.joined_at.isoformat(),
            "permissions": relationship.permissions,
            **relationship.metadata
        },
        "timestamp": datetime.now(timezone.utc)
    }
    
    return event_data

# Test relationship creation
admin_relationship = UserGroupRelationship(
    user_id="user_admin_001",
    group_id="company_techcorp_001",
    role=RelationshipRole.ADMIN,
    status=RelationshipStatus.ACTIVE,
    permissions=["users.manage", "billing.view", "settings.edit"],
    metadata={
        "department": "Engineering",
        "title": "VP of Engineering"
    }
)

relationship_event = create_relationship(admin_relationship)
print("Relationship event created:")
print(json.dumps(relationship_event, indent=2, default=str))

In [None]:
# Implementation: Update relationship status
def update_relationship_status(
    user_id: str,
    group_id: str,
    new_status: RelationshipStatus,
    reason: Optional[str] = None
) -> Dict[str, Any]:
    """Update a user's relationship status with a group."""
    
    event_data = {
        "userId": user_id,
        "event": "Group Relationship Updated",
        "properties": {
            "group_id": group_id,
            "new_status": new_status,
            "previous_status": "active",  # In production, fetch from database
            "updated_at": datetime.now(timezone.utc).isoformat()
        },
        "timestamp": datetime.now(timezone.utc)
    }
    
    if reason:
        event_data["properties"]["reason"] = reason
    
    return event_data

# Test status update
status_update = update_relationship_status(
    user_id="user_member_002",
    group_id="company_techcorp_001",
    new_status=RelationshipStatus.SUSPENDED,
    reason="Payment failure"
)

print("Relationship status update:")
print(json.dumps(status_update, indent=2, default=str))

## Hierarchical Relationships

In [None]:
# Implementation: Create object hierarchy
def create_hierarchy(
    parent_group: Dict[str, Any],
    child_group: Dict[str, Any],
    hierarchy: ObjectHierarchy
) -> List[Dict[str, Any]]:
    """Create hierarchical relationship between objects."""
    
    events = []
    
    # Parent perspective event
    parent_event = {
        "userId": parent_group["userId"],
        "event": "Child Object Added",
        "properties": {
            "parent_id": hierarchy.parent_id,
            "child_id": hierarchy.child_id,
            "child_type": child_group["traits"].get("type", "unknown"),
            "relationship_type": hierarchy.relationship_type,
            "hierarchy_level": hierarchy.level,
            "path": hierarchy.path
        },
        "timestamp": datetime.now(timezone.utc)
    }
    events.append(parent_event)
    
    # Child perspective event
    child_event = {
        "userId": child_group["userId"],
        "event": "Added to Parent Object",
        "properties": {
            "parent_id": hierarchy.parent_id,
            "parent_type": parent_group["traits"].get("type", "unknown"),
            "child_id": hierarchy.child_id,
            "relationship_type": hierarchy.relationship_type,
            "hierarchy_level": hierarchy.level,
            "path": hierarchy.path
        },
        "timestamp": datetime.now(timezone.utc)
    }
    events.append(child_event)
    
    return events

# Test hierarchy creation
engineering_team = GroupTraits(
    name="Engineering Team",
    type=ObjectType.TEAM,
    size="10-20",
    created_at=datetime.now(timezone.utc)
)

team_group_data = create_group(
    user_id="user_team_lead_001",
    group_id="team_engineering_001",
    group_traits=engineering_team
)

team_hierarchy = ObjectHierarchy(
    parent_id="company_techcorp_001",
    child_id="team_engineering_001",
    relationship_type="contains",
    level=1,
    path="/company_techcorp_001/team_engineering_001"
)

hierarchy_events = create_hierarchy(
    parent_group=group_data,
    child_group=team_group_data,
    hierarchy=team_hierarchy
)

print(f"Created {len(hierarchy_events)} hierarchy events:")
for event in hierarchy_events:
    print(f"  - {event['event']} for {event['userId']}")

## Batch Operations for Groups and Relationships

In [None]:
# Implementation: Batch create multiple groups
def batch_create_groups(
    groups_data: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
    """Create multiple groups in batch."""
    
    batch_requests = []
    
    for group in groups_data:
        # Create group request
        group_request = GroupRequest(**group)
        
        batch_request = {
            "type": "group",
            **group_request.dict()
        }
        batch_requests.append(batch_request)
    
    return batch_requests

# Create sample departments
departments = [
    {
        "userId": "user_dept_head_001",
        "groupId": "dept_sales_001",
        "traits": {
            "name": "Sales Department",
            "type": "department",
            "size": "20-30",
            "budget": 500000
        },
        "timestamp": datetime.now(timezone.utc)
    },
    {
        "userId": "user_dept_head_002",
        "groupId": "dept_marketing_001",
        "traits": {
            "name": "Marketing Department",
            "type": "department",
            "size": "10-20",
            "budget": 300000
        },
        "timestamp": datetime.now(timezone.utc)
    },
    {
        "userId": "user_dept_head_003",
        "groupId": "dept_product_001",
        "traits": {
            "name": "Product Department",
            "type": "department",
            "size": "5-10",
            "budget": 200000
        },
        "timestamp": datetime.now(timezone.utc)
    }
]

batch_groups = batch_create_groups(departments)
print(f"Created batch of {len(batch_groups)} groups")
for req in batch_groups:
    print(f"  - {req['traits']['name']} ({req['groupId']})")

In [None]:
# Implementation: Batch create relationships
def batch_create_relationships(
    relationships: List[UserGroupRelationship]
) -> List[Dict[str, Any]]:
    """Create multiple user-group relationships in batch."""
    
    events = []
    
    for rel in relationships:
        event = create_relationship(rel)
        events.append(event)
    
    return events

# Create sample team members
team_relationships = [
    UserGroupRelationship(
        user_id="user_eng_001",
        group_id="team_engineering_001",
        role=RelationshipRole.MEMBER,
        permissions=["code.write", "pr.create"],
        metadata={"seniority": "senior", "specialization": "backend"}
    ),
    UserGroupRelationship(
        user_id="user_eng_002",
        group_id="team_engineering_001",
        role=RelationshipRole.MEMBER,
        permissions=["code.write", "pr.create"],
        metadata={"seniority": "mid", "specialization": "frontend"}
    ),
    UserGroupRelationship(
        user_id="user_eng_003",
        group_id="team_engineering_001",
        role=RelationshipRole.MEMBER,
        permissions=["code.read"],
        metadata={"seniority": "junior", "specialization": "fullstack"}
    )
]

relationship_events = batch_create_relationships(team_relationships)
print(f"Created {len(relationship_events)} relationship events")
for event in relationship_events:
    print(f"  - {event['userId']} -> {event['properties']['group_id']} as {event['properties']['role']}")

In [None]:
# Implementation: Send batch with error handling
@retry_on_error(max_retries=3, backoff_factor=2.0)
def send_batch_with_optimization(
    batch_requests: List[Dict[str, Any]],
    test_mode: bool = True
) -> List[Dict[str, Any]]:
    """Send batch requests with size optimization and error handling."""
    
    try:
        # Optimize batch sizes
        optimized_batches = BatchTransformer.optimize_batch_sizes(
            requests=batch_requests,
            max_size_bytes=500 * 1024  # 500KB limit
        )
        
        print(f"Optimized {len(batch_requests)} requests into {len(optimized_batches)} batch(es)")
        
        results = []
        
        for i, batch in enumerate(optimized_batches):
            try:
                if test_mode:
                    print(f"  Batch {i+1}: {len(batch)} requests (test mode)")
                    results.append({
                        "batch_id": i,
                        "status": "test_success",
                        "count": len(batch)
                    })
                else:
                    response = client.batch(batch)
                    results.append({
                        "batch_id": i,
                        "status": "success",
                        "count": len(batch),
                        "response": response
                    })
                    
            except Exception as e:
                results.append({
                    "batch_id": i,
                    "status": "failed",
                    "count": len(batch),
                    "error": str(e)
                })
                logger.error(f"Batch {i} failed", error=str(e))
        
        return results
        
    except Exception as e:
        logger.error("Batch processing failed", error=str(e))
        raise

# Combine all batch requests
all_requests = batch_groups + [
    {"type": "track", **event} for event in relationship_events
]

# Send the batch
batch_results = send_batch_with_optimization(
    batch_requests=all_requests,
    test_mode=(ENVIRONMENT == "test")
)

print("\nBatch submission results:")
for result in batch_results:
    print(f"  Batch {result['batch_id']}: {result['status']} ({result['count']} requests)")

## Complex Relationship Queries

In [None]:
# Implementation: Find all users in a group with specific role
def get_users_by_role(
    group_id: str,
    role: RelationshipRole,
    relationships: List[UserGroupRelationship]
) -> List[str]:
    """Get all users with a specific role in a group."""
    
    matching_users = [
        rel.user_id 
        for rel in relationships 
        if rel.group_id == group_id and rel.role == role and rel.status == RelationshipStatus.ACTIVE
    ]
    
    return matching_users

# Test the query
all_relationships = team_relationships + [admin_relationship]
members = get_users_by_role(
    group_id="team_engineering_001",
    role=RelationshipRole.MEMBER,
    relationships=all_relationships
)

print(f"Found {len(members)} members in Engineering Team:")
for user_id in members:
    print(f"  - {user_id}")

In [None]:
# Implementation: Get group hierarchy tree
def build_hierarchy_tree(
    root_id: str,
    hierarchies: List[ObjectHierarchy]
) -> Dict[str, Any]:
    """Build a hierarchical tree structure from relationships."""
    
    def get_children(parent_id: str) -> List[Dict[str, Any]]:
        children = []
        for h in hierarchies:
            if h.parent_id == parent_id:
                child_node = {
                    "id": h.child_id,
                    "level": h.level,
                    "path": h.path,
                    "children": get_children(h.child_id)
                }
                children.append(child_node)
        return children
    
    root_node = {
        "id": root_id,
        "level": 0,
        "path": f"/{root_id}",
        "children": get_children(root_id)
    }
    
    return root_node

# Create more hierarchies for testing
additional_hierarchies = [
    team_hierarchy,
    ObjectHierarchy(
        parent_id="company_techcorp_001",
        child_id="dept_sales_001",
        relationship_type="contains",
        level=1,
        path="/company_techcorp_001/dept_sales_001"
    ),
    ObjectHierarchy(
        parent_id="dept_sales_001",
        child_id="team_sales_na_001",
        relationship_type="contains",
        level=2,
        path="/company_techcorp_001/dept_sales_001/team_sales_na_001"
    )
]

# Build the tree
org_tree = build_hierarchy_tree(
    root_id="company_techcorp_001",
    hierarchies=additional_hierarchies
)

print("Organization hierarchy tree:")
print(json.dumps(org_tree, indent=2))

## Data-Driven Group Generation from Spark

In [None]:
# Load sample organization data
print("=== Data-Driven Group Generation ===")

# Create sample organizations table if it doesn't exist
spark.sql(f"""
CREATE TABLE IF NOT EXISTS {CATALOG_NAME}.{DATABASE_NAME}.organizations (
    org_id STRING,
    org_name STRING,
    industry STRING,
    employee_count INT,
    annual_revenue DOUBLE,
    created_at TIMESTAMP
) USING DELTA
""")

# Insert sample data
spark.sql(f"""
INSERT INTO {CATALOG_NAME}.{DATABASE_NAME}.organizations
SELECT * FROM VALUES
    ('org_001', 'Alpha Corp', 'Finance', 250, 5000000, current_timestamp()),
    ('org_002', 'Beta Industries', 'Manufacturing', 500, 10000000, current_timestamp()),
    ('org_003', 'Gamma Tech', 'Technology', 100, 2500000, current_timestamp())
WHERE NOT EXISTS (
    SELECT 1 FROM {CATALOG_NAME}.{DATABASE_NAME}.organizations WHERE org_id = 'org_001'
)
""")

# Load organizations
organizations_df = spark.table(f"{CATALOG_NAME}.{DATABASE_NAME}.organizations")
organizations_df.show()

In [None]:
# Transform Spark data to Customer.IO groups
def transform_orgs_to_groups(df):
    """Transform organization data to Customer.IO group format."""
    
    # Collect data (in production, process in batches)
    orgs = df.collect()
    
    groups = []
    for org in orgs:
        # Determine organization size category
        emp_count = org['employee_count']
        if emp_count < 50:
            size = "1-50"
        elif emp_count < 200:
            size = "50-200"
        elif emp_count < 500:
            size = "200-500"
        else:
            size = "500+"
        
        group_data = {
            "userId": f"admin_{org['org_id']}",  # Default admin user
            "groupId": org['org_id'],
            "traits": {
                "name": org['org_name'],
                "type": "company",
                "industry": org['industry'],
                "size": size,
                "employee_count": emp_count,
                "annual_revenue": org['annual_revenue'],
                "created_at": org['created_at'].isoformat()
            },
            "timestamp": datetime.now(timezone.utc)
        }
        
        groups.append(group_data)
    
    return groups

# Transform organizations
spark_groups = transform_orgs_to_groups(organizations_df)
print(f"Transformed {len(spark_groups)} organizations to groups")

# Show sample
if spark_groups:
    print("\nSample transformed group:")
    print(json.dumps(spark_groups[0], indent=2, default=str))

In [None]:
# Process transformed groups in batch
if spark_groups:
    # Create batch requests
    spark_batch_requests = batch_create_groups(spark_groups)
    
    # Send batch
    spark_batch_results = send_batch_with_optimization(
        batch_requests=spark_batch_requests,
        test_mode=(ENVIRONMENT == "test")
    )
    
    print("\nSpark groups batch results:")
    for result in spark_batch_results:
        print(f"  Batch {result['batch_id']}: {result['status']} ({result['count']} groups)")
else:
    print("No organizations to process")

## Permission Management

In [None]:
# Implementation: Permission system
class PermissionSet:
    """Manage permissions for different roles."""
    
    # Define default permissions by role
    DEFAULT_PERMISSIONS = {
        RelationshipRole.OWNER: [
            "*"  # All permissions
        ],
        RelationshipRole.ADMIN: [
            "users.manage", "users.view",
            "billing.manage", "billing.view",
            "settings.edit", "settings.view",
            "reports.view", "reports.export"
        ],
        RelationshipRole.MEMBER: [
            "users.view",
            "settings.view",
            "reports.view"
        ],
        RelationshipRole.VIEWER: [
            "reports.view"
        ],
        RelationshipRole.GUEST: []
    }
    
    @classmethod
    def get_permissions(cls, role: RelationshipRole) -> List[str]:
        """Get default permissions for a role."""
        return cls.DEFAULT_PERMISSIONS.get(role, [])
    
    @classmethod
    def has_permission(cls, user_permissions: List[str], required: str) -> bool:
        """Check if user has required permission."""
        if "*" in user_permissions:
            return True
        
        # Check exact match
        if required in user_permissions:
            return True
        
        # Check wildcard permissions (e.g., "users.*" matches "users.view")
        for perm in user_permissions:
            if perm.endswith(".*"):
                prefix = perm[:-2]
                if required.startswith(prefix + "."):
                    return True
        
        return False

# Test permission system
print("Permission system test:")
print(f"Admin permissions: {PermissionSet.get_permissions(RelationshipRole.ADMIN)}")
print(f"Member permissions: {PermissionSet.get_permissions(RelationshipRole.MEMBER)}")

# Test permission checking
admin_perms = PermissionSet.get_permissions(RelationshipRole.ADMIN)
print(f"\nAdmin can manage users: {PermissionSet.has_permission(admin_perms, 'users.manage')}")
print(f"Admin can delete users: {PermissionSet.has_permission(admin_perms, 'users.delete')}")

In [None]:
# Implementation: Update user permissions
def update_user_permissions(
    user_id: str,
    group_id: str,
    add_permissions: List[str] = None,
    remove_permissions: List[str] = None
) -> Dict[str, Any]:
    """Update user permissions in a group."""
    
    # In production, fetch current permissions from database
    current_permissions = ["users.view", "settings.view", "reports.view"]
    
    # Apply changes
    new_permissions = set(current_permissions)
    
    if add_permissions:
        new_permissions.update(add_permissions)
    
    if remove_permissions:
        new_permissions.difference_update(remove_permissions)
    
    # Create update event
    event_data = {
        "userId": user_id,
        "event": "User Permissions Updated",
        "properties": {
            "group_id": group_id,
            "previous_permissions": current_permissions,
            "new_permissions": list(new_permissions),
            "added": add_permissions or [],
            "removed": remove_permissions or [],
            "updated_at": datetime.now(timezone.utc).isoformat()
        },
        "timestamp": datetime.now(timezone.utc)
    }
    
    return event_data

# Test permission update
permission_update = update_user_permissions(
    user_id="user_eng_003",
    group_id="team_engineering_001",
    add_permissions=["code.write", "pr.create"],
    remove_permissions=[]
)

print("Permission update event:")
print(json.dumps(permission_update, indent=2, default=str))

## Performance Monitoring and Metrics

In [None]:
# Implementation: Object and relationship metrics
def calculate_org_metrics(
    groups: List[Dict[str, Any]],
    relationships: List[UserGroupRelationship],
    hierarchies: List[ObjectHierarchy]
) -> Dict[str, Any]:
    """Calculate organizational metrics."""
    
    # Count objects by type
    type_counts = {}
    for group in groups:
        obj_type = group.get("traits", {}).get("type", "unknown")
        type_counts[obj_type] = type_counts.get(obj_type, 0) + 1
    
    # Count relationships by role
    role_counts = {}
    active_relationships = 0
    for rel in relationships:
        role_counts[rel.role] = role_counts.get(rel.role, 0) + 1
        if rel.status == RelationshipStatus.ACTIVE:
            active_relationships += 1
    
    # Calculate hierarchy depth
    max_depth = max([h.level for h in hierarchies], default=0)
    
    metrics = {
        "total_objects": len(groups),
        "objects_by_type": type_counts,
        "total_relationships": len(relationships),
        "active_relationships": active_relationships,
        "relationships_by_role": role_counts,
        "hierarchy_max_depth": max_depth,
        "total_hierarchies": len(hierarchies)
    }
    
    return metrics

# Calculate metrics
all_groups = [group_data, team_group_data] + departments + spark_groups
metrics = calculate_org_metrics(
    groups=all_groups,
    relationships=all_relationships,
    hierarchies=additional_hierarchies
)

print("=== Organization Metrics ===")
print(json.dumps(metrics, indent=2))

## Clean Up and Summary

In [None]:
# Final summary
print("=== Objects and Relationships Summary ===")

print("\n=== Objects Created ===")
print("SUCCESS: Company organization (TechCorp Solutions)")
print("SUCCESS: Engineering team with hierarchy")
print("SUCCESS: Multiple departments (Sales, Marketing, Product)")
print("SUCCESS: Organizations from Spark data")

print("\n=== Relationships Established ===")
print("SUCCESS: User-group relationships with roles")
print("SUCCESS: Hierarchical parent-child relationships")
print("SUCCESS: Permission-based access control")
print("SUCCESS: Status management (active, suspended, etc.)")

print("\n=== Key Capabilities Demonstrated ===")
print("SUCCESS: Type-safe object creation with validation")
print("SUCCESS: Complex relationship management")
print("SUCCESS: Hierarchical organization structures")
print("SUCCESS: Permission-based access control")
print("SUCCESS: Batch operations with optimization")
print("SUCCESS: Data transformation from Spark")
print("SUCCESS: Organizational metrics and analytics")

In [None]:
# Close the API client connection
client.close()
print("SUCCESS: API client connection closed")

print("\nCOMPLETED: Objects and relationships notebook finished successfully!")
print("Ready for device management operations in the next notebook.")