# Lakehouse Agent - Prerequisites Setup

This notebook helps you set up the initial configuration in AWS Systems Manager (SSM) Parameter Store.

**What this notebook does:**
- Validates your AWS credentials and region
- Reads AWS Account credentials from .env current directory
- Creates S3 bucket for lakehouse data storage
- Creates initial SSM parameters with the `/app/lakehouse-agent/` prefix
- Validates the configuration

**Prerequisites:**
- AWS credentials configured (via AWS CLI or environment variables)
- Python 3.10 or later
- boto3 installed: `pip install boto3`

**IAM Permissions Required:**
- `ssm:PutParameter`
- `ssm:GetParameter`
- `sts:GetCallerIdentity`
- `s3:CreateBucket`
- `s3:HeadBucket`

In [None]:
import boto3
import json
from datetime import datetime

print("‚úÖ Imports successful")

## Ensure you have .env file in current working directory with the AWS credentials:
AWS_ACCESS_KEY_ID="your aws key"
AWS_SECRET_ACCESS_KEY="your aws secret"
AWS_SESSION_TOKEN="session token"
AWS_DEFAULT_REGION="your preferred region"

In [None]:
# Load AWS credentials and initialize session
from utils.notebook_init import init_aws

# This will:
# 1. Load credentials from .env file (if it exists)
# 2. Create and validate AWS session
# 3. Return session, region, and account_id for use in this notebook
session, region, account_id = init_aws()

# Initialize AWS clients with the validated session
ssm_client = session.client('ssm', region_name=region)
sts_client = session.client('sts', region_name=region)

# Store for later use
AWS_REGION = region
AWS_ACCOUNT_ID = account_id

print(f'\n‚úÖ Setup complete')
print(f'   Account ID: {account_id}')
print(f'   Region: {region}')

## Step 1: Define Initial Configuration

Set your initial configuration values. These will be stored in SSM Parameter Store with the `/app/lakehouse-agent/` prefix.

**Important Notes:**
- **AWS_REGION and AWS_ACCOUNT_ID** are auto-detected and NOT stored in SSM
- **S3_BUCKET_NAME**: Provide just the base name (e.g., `lk-agent`)
  - This notebook will create the S3 bucket with full name: `{account_id}-{region}-{base_name}`
  - The full bucket name will be saved to SSM for all subsequent notebooks
  - Example: `XXXXXXXXXXXX-us-east-1-lk-agent`

In [None]:
# Initial configuration - UPDATE THESE VALUES
config = {
    # S3 Configuration
    # NOTE: Provide just the base name (e.g., 'lk-agent')
    # The deployment script will create a bucket with the full name:
    # {account_id}-{region}-{base_name}
    # Example: XXXXXXXXXXXX-us-east-1-lk-agent
    'S3_BUCKET_NAME': 'lk-agent',  # CHANGE THIS - use a unique base name
    'S3_CLAIMS_PREFIX': 'lakehouse-data/claims/',
    'S3_USERS_PREFIX': 'lakehouse-data/users/',
    'S3_ATHENA_RESULTS_PREFIX': 'athena-results/',
    
    # Athena Configuration
    'DATABASE_NAME': 'lakehouse_db',
    'ATHENA_WORKGROUP': 'primary',
    
    # Security Configuration
    'SECURITY_MODE': 'lakeformation',
    'LOCAL_DEVELOPMENT': 'false',
    'LOG_LEVEL': 'INFO',
    
    # Test Users
    'TEST_USER_1': 'user001@example.com',
    'TEST_USER_2': 'user002@example.com',
    'TEST_USER_3': 'adjuster001@example.com',
    'TEST_PASSWORD': 'TempPass123!'
}

print("üìã Initial Configuration:")
for key, value in config.items():
    if key == 'S3_BUCKET_NAME':
        print(f"   {key}: {value}")
        print(f"      ‚Üí Full bucket name will be: {account_id}-{region}-{value}")
    else:
        print(f"   {key}: {value}")

## Step 2: Create SSM Parameters

This will create all parameters in SSM Parameter Store with the `/app/lakehouse-agent/` prefix.

**Sensitive parameters** (containing SECRET, PASSWORD, KEY) will be created as SecureString.

In [None]:
def is_sensitive(key):
    """Check if parameter should be SecureString"""
    sensitive_keywords = ['SECRET', 'PASSWORD', 'KEY', 'TOKEN']
    return any(keyword in key.upper() for keyword in sensitive_keywords)

def create_ssm_parameter(key, value, overwrite=False):
    """Create or update SSM parameter"""
    # Convert to SSM parameter name (lowercase with /app/lakehouse-agent/ prefix)
    # Convert underscores to hyphens for consistency
    param_name = f"/app/lakehouse-agent/{key.lower().replace('_', '-')}"
    param_type = 'SecureString' if is_sensitive(key) else 'String'
    
    try:
        ssm_client.put_parameter(
            Name=param_name,
            Value=str(value),
            Type=param_type,
            Description=f"Lakehouse Agent - {key}",
            Overwrite=overwrite
        )
        return True, param_type
    except ssm_client.exceptions.ParameterAlreadyExists:
        return False, param_type
    except Exception as e:
        print(f"‚ùå Error creating {param_name}: {e}")
        return None, param_type

# Create parameters
print("üîÑ Creating SSM Parameters...\n")
created = 0
skipped = 0
failed = 0

for key, value in config.items():
    result, param_type = create_ssm_parameter(key, value, overwrite=False)
    param_name = f"/app/lakehouse-agent/{key.lower().replace('_', '-')}"
    
    if result is True:
        print(f"‚úÖ Created {param_name} ({param_type})")
        created += 1
    elif result is False:
        print(f"‚è≠Ô∏è  Skipped {param_name} (already exists)")
        skipped += 1
    else:
        failed += 1

print(f"\nüìä Summary:")
print(f"   Created: {created}")
print(f"   Skipped: {skipped}")
print(f"   Failed: {failed}")

## Step 2.5: Create S3 Bucket

Create the S3 bucket that will be used for all lakehouse data storage. The bucket will be created with the full name format: `{account_id}-{region}-{base_name}` and the full name will be saved to SSM.

In [None]:
# Create S3 bucket with full name
bucket_base_name = config['S3_BUCKET_NAME']
full_bucket_name = f"{account_id}-{region}-{bucket_base_name}"

print(f"üì¶ Creating S3 bucket: {full_bucket_name}\n")

# Initialize S3 client
s3_client = session.client('s3', region_name=region)

try:
    # Check if bucket already exists
    s3_client.head_bucket(Bucket=full_bucket_name)
    print(f"‚úÖ Bucket {full_bucket_name} already exists")
    bucket_existed = True
except:
    # Bucket doesn't exist, create it
    try:
        if region == 'us-east-1':
            s3_client.create_bucket(Bucket=full_bucket_name)
        else:
            s3_client.create_bucket(
                Bucket=full_bucket_name,
                CreateBucketConfiguration={'LocationConstraint': region}
            )
        print(f"‚úÖ Created S3 bucket: {full_bucket_name}")
        bucket_existed = False
    except Exception as e:
        print(f"‚ùå Error creating bucket: {e}")
        raise

# Update SSM parameter with the full bucket name
print(f"\nüíæ Saving full bucket name to SSM...")
try:
    ssm_client.put_parameter(
        Name='/app/lakehouse-agent/s3-bucket-name',
        Value=full_bucket_name,
        Type='String',
        Description='S3 bucket name for lakehouse data storage (full name)',
        Overwrite=True
    )
    print(f"‚úÖ Updated SSM parameter /app/lakehouse-agent/s3-bucket-name")
    print(f"   Value: {full_bucket_name}")
except Exception as e:
    print(f"‚ùå Error updating SSM parameter: {e}")
    raise

print(f"\n‚úÖ S3 bucket setup complete!")
print(f"   Bucket: s3://{full_bucket_name}")

## Step 3: Validate Configuration

Let's verify all parameters were created successfully and the S3 bucket is accessible.

In [None]:
def validate_ssm_parameters():
    """Validate all required parameters exist in SSM"""
    print("üîç Validating SSM Parameters...\n")
    
    missing = []
    found = []
    
    for key in config.keys():
        param_name = f"/app/lakehouse-agent/{key.lower().replace('_', '-')}"
        try:
            response = ssm_client.get_parameter(Name=param_name)
            param_type = response['Parameter']['Type']
            
            if param_type == 'SecureString':
                value = '****** (encrypted)'
            else:
                value = response['Parameter']['Value']
            
            print(f"‚úÖ {param_name}: {value}")
            found.append(param_name)
        except ssm_client.exceptions.ParameterNotFound:
            print(f"‚ùå {param_name}: NOT FOUND")
            missing.append(param_name)
    
    print(f"\nüìä Validation Summary:")
    print(f"   Found: {len(found)}")
    print(f"   Missing: {len(missing)}")
    
    if missing:
        print(f"\n‚ö†Ô∏è  Missing parameters: {', '.join(missing)}")
        return False
    else:
        print(f"\n‚úÖ All parameters validated successfully!")
        return True

validate_ssm_parameters()

## Next Steps

‚úÖ **Prerequisites Complete!**

Your configuration is ready:
- ‚úÖ SSM Parameter Store configured
- ‚úÖ S3 bucket created: `{full_bucket_name}`
- ‚úÖ All parameters validated

**Next:** Run `01-deploy-athena.ipynb` to create the Athena database and tables.

The Athena deployment will automatically use the S3 bucket created in this notebook.