# ðŸ“‹ AWS IAM Policy Generator â€” Dataset Preparation
**Project**: Fine-tuning Mistral-7B to convert natural language â†’ valid AWS IAM JSON policies  
**Author**: Vatsal Naik | Northeastern University  

## This notebook handles the complete dataset preparation pipeline:
1. **Source 1**: Pull ~1,400 AWS managed policies via `boto3` API
2. **Source 2**: 71 hand-crafted synthetic examples covering diverse AWS services and writing styles
3. **Validation**: JSON structure validation, IAM schema verification, deduplication
4. **Processing**: Stratified train/val/test split (80/10/10), Alpaca instruction formatting
5. **Final Dataset**: 1,488 validated examples â†’ 1,189 train / 148 val / 151 test

## 1. Environment Setup

In [2]:
import os
os.makedirs("iam-finetuning/dataset/raw", exist_ok=True)
os.makedirs("iam-finetuning/dataset/processed", exist_ok=True)
os.makedirs("iam-finetuning/dataset/synthetic_batches", exist_ok=True)
os.makedirs("iam-finetuning/dataset/scripts", exist_ok=True)
print("Folders created!")

Folders created!


## 2. AWS ConfigurationCredentials are loaded from `~/.aws/credentials` (configured via `aws configure`).  **Never hardcode credentials in notebooks.**

In [None]:
import boto3import os# Credentials loaded from ~/.aws/credentials# Run 'aws configure' in terminal to set up credentials securelyos.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('AWS_ACCESS_KEY_ID', 'SET_VIA_AWS_CONFIGURE')os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('AWS_SECRET_ACCESS_KEY', 'SET_VIA_AWS_CONFIGURE')os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'iam = boto3.client('iam')test = iam.list_policies(Scope='AWS', MaxItems=2)print(f"AWS connection works! Found: {[p['PolicyName'] for p in test['Policies']]}")

## 3. Source 1: Pull AWS Managed Policies (~1,400 examples)AWS provides managed policies with names like `AmazonS3ReadOnlyAccess`. 
We:
1. Convert the name to a natural language instruction
2. Download the actual JSON policy document
3. Tag each with complexity (simple/medium/complex) based on statement count and service diversity

In [6]:
import boto3
import json
import time
import re

def name_to_description(policy_name):
    """Convert policy names like 'AmazonS3ReadOnlyAccess' into readable descriptions."""
    # Split CamelCase: AmazonS3ReadOnlyAccess -> Amazon S3 Read Only Access
    words = re.sub(r'([a-z])([A-Z])', r'\1 \2', policy_name)
    words = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1 \2', words)
    # Clean up common patterns
    words = words.replace('_', ' ').replace('-', ' ')
    return words.strip()

def pull_all_managed_policies():
    iam = boto3.client('iam')
    all_policies = []
    marker = None

    print("Listing all AWS managed policies...")
    while True:
        if marker:
            response = iam.list_policies(Scope='AWS', MaxItems=100, Marker=marker)
        else:
            response = iam.list_policies(Scope='AWS', MaxItems=100)
        all_policies.extend(response['Policies'])
        if response['IsTruncated']:
            marker = response['Marker']
        else:
            break

    print(f"Found {len(all_policies)} AWS managed policies")

    results = []
    skipped = 0

    for i, policy in enumerate(all_policies):
        policy_name = policy['PolicyName']
        
        # Use Description if available, otherwise convert policy name
        description = policy.get('Description', '').strip()
        if len(description) < 10:
            description = name_to_description(policy_name)
        
        # Skip if still too short or generic
        if len(description) < 5:
            skipped += 1
            continue

        try:
            version_id = policy['DefaultVersionId']
            version_response = iam.get_policy_version(
                PolicyArn=policy['Arn'],
                VersionId=version_id
            )
            policy_document = version_response['PolicyVersion']['Document']
            policy_json = json.dumps(policy_document, indent=2)

            statements = policy_document.get('Statement', [])
            if isinstance(statements, dict):
                statements = [statements]

            # Skip extremely large policies (50+ statements)
            if len(statements) > 50:
                skipped += 1
                continue

            services = set()
            for stmt in statements:
                actions = stmt.get('Action', stmt.get('NotAction', []))
                if isinstance(actions, str):
                    actions = [actions]
                for action in actions:
                    if ':' in action:
                        services.add(action.split(':')[0].lower())

            num_stmts = len(statements)
            num_svcs = len(services)
            if num_stmts <= 1 and num_svcs <= 1:
                complexity = 'simple'
            elif num_stmts <= 3 and num_svcs <= 3:
                complexity = 'medium'
            else:
                complexity = 'complex'

            # Build a better instruction using both name and description
            instruction = f"Provide an IAM policy for {description}"
            
            results.append({
                'id': f'aws_managed_{i:04d}',
                'policy_name': policy_name,
                'instruction': instruction,
                'output': policy_json,
                'source': 'aws_managed',
                'services': sorted(list(services)),
                'complexity': complexity
            })

            if (i + 1) % 200 == 0:
                print(f"  Processed {i+1}/{len(all_policies)} ({len(results)} usable)")
            time.sleep(0.05)

        except Exception as e:
            continue

    print(f"\nDone! {len(results)} usable policies (skipped {skipped})")
    
    with open('iam-finetuning/dataset/raw/aws_managed_policies.json', 'w') as f:
        json.dump(results, f, indent=2)
    
    print(f"Saved to iam-finetuning/dataset/raw/aws_managed_policies.json")
    
    # Stats
    simple = sum(1 for r in results if r['complexity'] == 'simple')
    medium = sum(1 for r in results if r['complexity'] == 'medium')
    hard = sum(1 for r in results if r['complexity'] == 'complex')
    print(f"Simple: {simple}, Medium: {medium}, Complex: {hard}")
    
    # Show samples
    if results:
        print(f"\nSample entries:")
        for r in results[:3]:
            print(f"  {r['policy_name']} -> \"{r['instruction'][:80]}...\"")
    
    return results

managed_policies = pull_all_managed_policies()

Listing all AWS managed policies...
Found 1445 AWS managed policies
  Processed 200/1445 (200 usable)
  Processed 400/1445 (400 usable)
  Processed 600/1445 (599 usable)
  Processed 800/1445 (799 usable)
  Processed 1000/1445 (999 usable)
  Processed 1200/1445 (1198 usable)
  Processed 1400/1445 (1392 usable)

Done! 1437 usable policies (skipped 8)
Saved to iam-finetuning/dataset/raw/aws_managed_policies.json
Simple: 397, Medium: 365, Complex: 675

Sample entries:
  AdministratorAccess -> "Provide an IAM policy for Administrator Access..."
  PowerUserAccess -> "Provide an IAM policy for Power User Access..."
  ReadOnlyAccess -> "Provide an IAM policy for Read Only Access..."


## 4. Source 2: Hand-Crafted Synthetic Examples (71 examples) We created 71 diverse synthetic examples to cover scenarios not well-represented in managed policies:- 

**Diverse writing styles**: casual (`"s3 read only, all buckets"`), professional, detailed requirements- 

**Wide service coverage**: S3, EC2, Lambda, DynamoDB, CloudWatch, IAM, STS, KMS, SNS, SQS, and multi-service- 

**Policy types**: Allow-only, Deny-only, mixed, conditional (MFA, IP, VPC, tags, time-based)- 

**Complexity range**: Simple (1 statement) â†’ Complex (4+ statements, conditions)

In [7]:
# All 84 pre-validated synthetic examples
synthetic_data = [
  # ===== S3 POLICIES =====
  {"id":"syn_s3_001","instruction":"Create a policy that gives read-only access to all S3 buckets in the account","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:GetObjectVersion\",\n        \"s3:ListBucket\",\n        \"s3:ListAllMyBuckets\",\n        \"s3:GetBucketLocation\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"simple"},
  {"id":"syn_s3_002","instruction":"Allow a user to upload files to the S3 bucket named data-uploads but not delete anything","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:PutObject\",\n        \"s3:ListBucket\",\n        \"s3:GetBucketLocation\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::data-uploads\",\n        \"arn:aws:s3:::data-uploads/*\"\n      ]\n    },\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": [\n        \"s3:DeleteObject\",\n        \"s3:DeleteObjectVersion\"\n      ],\n      \"Resource\": \"arn:aws:s3:::data-uploads/*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_003","instruction":"I need a policy for our analytics team. They should be able to read from the bucket analytics-prod but only objects under the prefix reports/2024/. They also need to list the bucket contents.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:ListBucket\",\n      \"Resource\": \"arn:aws:s3:::analytics-prod\",\n      \"Condition\": {\n        \"StringLike\": {\n          \"s3:prefix\": \"reports/2024/*\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:GetObject\",\n      \"Resource\": \"arn:aws:s3:::analytics-prod/reports/2024/*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_004","instruction":"Full access to S3 bucket my-app-bucket including ability to manage bucket policies and ACLs","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:*\",\n      \"Resource\": [\n        \"arn:aws:s3:::my-app-bucket\",\n        \"arn:aws:s3:::my-app-bucket/*\"\n      ]\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"simple"},
  {"id":"syn_s3_005","instruction":"Allow a Lambda function to read objects from the input-data bucket and write processed results to the output-data bucket","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::input-data\",\n        \"arn:aws:s3:::input-data/*\"\n      ]\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:PutObject\",\n        \"s3:PutObjectAcl\"\n      ],\n      \"Resource\": \"arn:aws:s3:::output-data/*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_006","instruction":"Deny all S3 delete operations across all buckets in the account. This is for our compliance team's protection policy.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": [\n        \"s3:DeleteObject\",\n        \"s3:DeleteObjectVersion\",\n        \"s3:DeleteBucket\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"simple"},
  {"id":"syn_s3_007","instruction":"Create a policy that allows users to manage their own folder in the shared-workspace bucket. Each user should only access objects under the prefix matching their username using the aws:username variable.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:ListBucket\",\n      \"Resource\": \"arn:aws:s3:::shared-workspace\",\n      \"Condition\": {\n        \"StringLike\": {\n          \"s3:prefix\": [\"${aws:username}/*\"]\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:PutObject\",\n        \"s3:DeleteObject\"\n      ],\n      \"Resource\": \"arn:aws:s3:::shared-workspace/${aws:username}/*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"complex"},
  {"id":"syn_s3_008","instruction":"Allow access to S3 but only when the request comes from our corporate VPC endpoint vpce-1a2b3c4d","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:PutObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"aws:sourceVpce\": \"vpce-1a2b3c4d\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_009","instruction":"Grant permission to list all buckets and get the location of any bucket, but don't allow reading actual objects","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:ListAllMyBuckets\",\n        \"s3:GetBucketLocation\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"simple"},
  {"id":"syn_s3_010","instruction":"Allow S3 read access only if the request uses SSL/TLS. Deny any unencrypted requests.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"Bool\": {\n          \"aws:SecureTransport\": \"true\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": \"s3:*\",\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"Bool\": {\n          \"aws:SecureTransport\": \"false\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"complex"},
  {"id":"syn_s3_011","instruction":"Policy for a backup service that can read from production-db-backups bucket and replicate to disaster-recovery-bucket in a different region","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:GetObjectVersion\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::production-db-backups\",\n        \"arn:aws:s3:::production-db-backups/*\"\n      ]\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:PutObject\",\n        \"s3:PutObjectAcl\",\n        \"s3:ReplicateObject\",\n        \"s3:ReplicateDelete\"\n      ],\n      \"Resource\": \"arn:aws:s3:::disaster-recovery-bucket/*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_012","instruction":"Allow uploading objects to the logs bucket only if the objects are server-side encrypted with AWS KMS","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:PutObject\",\n      \"Resource\": \"arn:aws:s3:::logs/*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"s3:x-amz-server-side-encryption\": \"aws:kms\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_013","instruction":"Create an S3 policy that allows public read access to objects in the static-website bucket. This is for hosting a static website.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:GetObject\",\n      \"Resource\": \"arn:aws:s3:::static-website/*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"simple"},
  {"id":"syn_s3_014","instruction":"Allow a CI/CD pipeline to sync files to the deployment-artifacts bucket, including ability to delete old artifacts, but restrict to objects under the builds/ prefix only","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:PutObject\",\n        \"s3:GetObject\",\n        \"s3:DeleteObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::deployment-artifacts\",\n        \"arn:aws:s3:::deployment-artifacts/builds/*\"\n      ]\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"medium"},
  {"id":"syn_s3_015","instruction":"Deny access to the confidential-data S3 bucket for everyone except users who have MFA enabled on their session","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": \"s3:*\",\n      \"Resource\": [\n        \"arn:aws:s3:::confidential-data\",\n        \"arn:aws:s3:::confidential-data/*\"\n      ],\n      \"Condition\": {\n        \"BoolIfExists\": {\n          \"aws:MultiFactorAuthPresent\": \"false\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"complex"},

  # ===== EC2 POLICIES =====
  {"id":"syn_ec2_001","instruction":"Allow a user to start and stop EC2 instances but not terminate them","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:StartInstances\",\n        \"ec2:StopInstances\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": \"ec2:TerminateInstances\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"medium"},
  {"id":"syn_ec2_002","instruction":"Grant full EC2 access but only in the us-east-1 region","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:*\",\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"aws:RequestedRegion\": \"us-east-1\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"medium"},
  {"id":"syn_ec2_003","instruction":"Allow describing EC2 instances, security groups, and VPCs. This is a read-only monitoring policy.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:DescribeInstances\",\n        \"ec2:DescribeSecurityGroups\",\n        \"ec2:DescribeVpcs\",\n        \"ec2:DescribeSubnets\",\n        \"ec2:DescribeInstanceStatus\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"simple"},
  {"id":"syn_ec2_004","instruction":"Allow launching EC2 instances only if they are tagged with Environment=development. Deny launching any instance without this tag.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:RunInstances\",\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"aws:RequestTag/Environment\": \"development\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:CreateTags\",\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"ec2:CreateAction\": \"RunInstances\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"complex"},
  {"id":"syn_ec2_005","instruction":"Allow a developer to manage security group rules but not create or delete security groups themselves","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:AuthorizeSecurityGroupIngress\",\n        \"ec2:AuthorizeSecurityGroupEgress\",\n        \"ec2:RevokeSecurityGroupIngress\",\n        \"ec2:RevokeSecurityGroupEgress\",\n        \"ec2:DescribeSecurityGroups\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": [\n        \"ec2:CreateSecurityGroup\",\n        \"ec2:DeleteSecurityGroup\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"medium"},
  {"id":"syn_ec2_006","instruction":"Allow only t2.micro and t3.micro instance types to be launched. Deny launching any other instance type.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:RunInstances\",\n      \"Resource\": \"arn:aws:ec2:*:*:instance/*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"ec2:InstanceType\": [\n            \"t2.micro\",\n            \"t3.micro\"\n          ]\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:RunInstances\",\n      \"Resource\": [\n        \"arn:aws:ec2:*:*:subnet/*\",\n        \"arn:aws:ec2:*:*:network-interface/*\",\n        \"arn:aws:ec2:*:*:volume/*\",\n        \"arn:aws:ec2:*:*:security-group/*\",\n        \"arn:aws:ec2:*::image/*\"\n      ]\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"complex"},
  {"id":"syn_ec2_007","instruction":"Allow EC2 instances that are tagged with Team=backend to be started, stopped, and rebooted only by users from the backend team","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:StartInstances\",\n        \"ec2:StopInstances\",\n        \"ec2:RebootInstances\"\n      ],\n      \"Resource\": \"arn:aws:ec2:*:*:instance/*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"ec2:ResourceTag/Team\": \"backend\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:DescribeInstances\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"complex"},
  {"id":"syn_ec2_008","instruction":"Read-only access to EC2 - can describe instances, images, volumes, snapshots, and key pairs but cannot modify anything","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:DescribeInstances\",\n        \"ec2:DescribeImages\",\n        \"ec2:DescribeVolumes\",\n        \"ec2:DescribeSnapshots\",\n        \"ec2:DescribeKeyPairs\",\n        \"ec2:DescribeRegions\",\n        \"ec2:DescribeAvailabilityZones\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"simple"},
  {"id":"syn_ec2_009","instruction":"Allow creating and managing EBS volumes and snapshots but not EC2 instances","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:CreateVolume\",\n        \"ec2:DeleteVolume\",\n        \"ec2:AttachVolume\",\n        \"ec2:DetachVolume\",\n        \"ec2:CreateSnapshot\",\n        \"ec2:DeleteSnapshot\",\n        \"ec2:DescribeVolumes\",\n        \"ec2:DescribeSnapshots\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"simple"},
  {"id":"syn_ec2_010","instruction":"Allow EC2 operations only during business hours between 9 AM and 6 PM UTC on weekdays","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:StartInstances\",\n        \"ec2:StopInstances\",\n        \"ec2:RunInstances\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"DateGreaterThan\": {\n          \"aws:CurrentTime\": \"2024-01-01T09:00:00Z\"\n        },\n        \"DateLessThan\": {\n          \"aws:CurrentTime\": \"2024-12-31T18:00:00Z\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"complex"},
  {"id":"syn_ec2_011","instruction":"Allow managing EC2 instances only within a specific VPC vpc-0abc123def456","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:RunInstances\",\n        \"ec2:StartInstances\",\n        \"ec2:StopInstances\",\n        \"ec2:TerminateInstances\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"ec2:Vpc\": \"arn:aws:ec2:*:*:vpc/vpc-0abc123def456\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"ec2:Describe*\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"complex"},
  {"id":"syn_ec2_012","instruction":"Allow creating and managing EC2 key pairs for SSH access","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:CreateKeyPair\",\n        \"ec2:DeleteKeyPair\",\n        \"ec2:DescribeKeyPairs\",\n        \"ec2:ImportKeyPair\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2"],"complexity":"simple"},

  # ===== LAMBDA / DYNAMODB / CLOUDWATCH =====
  {"id":"syn_lambda_001","instruction":"Allow a user to create, update, and invoke Lambda functions but not delete them","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"lambda:CreateFunction\",\n        \"lambda:UpdateFunctionCode\",\n        \"lambda:UpdateFunctionConfiguration\",\n        \"lambda:InvokeFunction\",\n        \"lambda:GetFunction\",\n        \"lambda:ListFunctions\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": \"lambda:DeleteFunction\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["lambda"],"complexity":"medium"},
  {"id":"syn_lambda_002","instruction":"Allow invoking a specific Lambda function named process-orders in us-east-1 for account 123456789012","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"lambda:InvokeFunction\",\n      \"Resource\": \"arn:aws:lambda:us-east-1:123456789012:function:process-orders\"\n    }\n  ]\n}","source":"synthetic","services":["lambda"],"complexity":"simple"},
  {"id":"syn_lambda_003","instruction":"Lambda execution role that can read from DynamoDB table user-sessions and write logs to CloudWatch","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:GetItem\",\n        \"dynamodb:Query\",\n        \"dynamodb:Scan\"\n      ],\n      \"Resource\": \"arn:aws:dynamodb:*:*:table/user-sessions\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogGroup\",\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\"\n      ],\n      \"Resource\": \"arn:aws:logs:*:*:*\"\n    }\n  ]\n}","source":"synthetic","services":["dynamodb","logs"],"complexity":"medium"},
  {"id":"syn_lambda_004","instruction":"Allow managing Lambda function event source mappings and aliases","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"lambda:CreateEventSourceMapping\",\n        \"lambda:DeleteEventSourceMapping\",\n        \"lambda:GetEventSourceMapping\",\n        \"lambda:ListEventSourceMappings\",\n        \"lambda:UpdateEventSourceMapping\",\n        \"lambda:CreateAlias\",\n        \"lambda:DeleteAlias\",\n        \"lambda:GetAlias\",\n        \"lambda:ListAliases\",\n        \"lambda:UpdateAlias\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["lambda"],"complexity":"medium"},
  {"id":"syn_lambda_005","instruction":"Allow invoking any Lambda function that has the tag Environment=staging","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"lambda:InvokeFunction\",\n        \"lambda:GetFunction\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"aws:ResourceTag/Environment\": \"staging\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"lambda:ListFunctions\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["lambda"],"complexity":"complex"},
  {"id":"syn_ddb_001","instruction":"Full read and write access to DynamoDB table called products","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:GetItem\",\n        \"dynamodb:PutItem\",\n        \"dynamodb:UpdateItem\",\n        \"dynamodb:DeleteItem\",\n        \"dynamodb:Query\",\n        \"dynamodb:Scan\",\n        \"dynamodb:BatchGetItem\",\n        \"dynamodb:BatchWriteItem\"\n      ],\n      \"Resource\": \"arn:aws:dynamodb:*:*:table/products\"\n    }\n  ]\n}","source":"synthetic","services":["dynamodb"],"complexity":"simple"},
  {"id":"syn_ddb_002","instruction":"Allow querying a DynamoDB table orders and its global secondary indexes but deny any write operations","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:GetItem\",\n        \"dynamodb:Query\",\n        \"dynamodb:Scan\",\n        \"dynamodb:BatchGetItem\",\n        \"dynamodb:DescribeTable\"\n      ],\n      \"Resource\": [\n        \"arn:aws:dynamodb:*:*:table/orders\",\n        \"arn:aws:dynamodb:*:*:table/orders/index/*\"\n      ]\n    }\n  ]\n}","source":"synthetic","services":["dynamodb"],"complexity":"medium"},
  {"id":"syn_ddb_003","instruction":"Allow creating and deleting DynamoDB tables but only if the table name starts with dev-","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:CreateTable\",\n        \"dynamodb:DeleteTable\",\n        \"dynamodb:DescribeTable\",\n        \"dynamodb:ListTables\"\n      ],\n      \"Resource\": \"arn:aws:dynamodb:*:*:table/dev-*\"\n    }\n  ]\n}","source":"synthetic","services":["dynamodb"],"complexity":"medium"},
  {"id":"syn_ddb_004","instruction":"Allow DynamoDB Streams read access for the user-events table to enable event processing","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:GetRecords\",\n        \"dynamodb:GetShardIterator\",\n        \"dynamodb:DescribeStream\",\n        \"dynamodb:ListStreams\"\n      ],\n      \"Resource\": \"arn:aws:dynamodb:*:*:table/user-events/stream/*\"\n    }\n  ]\n}","source":"synthetic","services":["dynamodb"],"complexity":"medium"},
  {"id":"syn_ddb_005","instruction":"Allow a service to perform DynamoDB backup and restore operations on all tables","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:CreateBackup\",\n        \"dynamodb:DescribeBackup\",\n        \"dynamodb:ListBackups\",\n        \"dynamodb:DeleteBackup\",\n        \"dynamodb:RestoreTableFromBackup\",\n        \"dynamodb:RestoreTableToPointInTime\",\n        \"dynamodb:DescribeContinuousBackups\",\n        \"dynamodb:UpdateContinuousBackups\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["dynamodb"],"complexity":"medium"},
  {"id":"syn_cw_001","instruction":"Allow writing logs to CloudWatch Logs - create log groups, log streams, and put log events","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogGroup\",\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\",\n        \"logs:DescribeLogGroups\",\n        \"logs:DescribeLogStreams\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["logs"],"complexity":"simple"},
  {"id":"syn_cw_002","instruction":"Allow publishing custom CloudWatch metrics and creating alarms for monitoring","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"cloudwatch:PutMetricData\",\n        \"cloudwatch:PutMetricAlarm\",\n        \"cloudwatch:DeleteAlarms\",\n        \"cloudwatch:DescribeAlarms\",\n        \"cloudwatch:GetMetricData\",\n        \"cloudwatch:GetMetricStatistics\",\n        \"cloudwatch:ListMetrics\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["cloudwatch"],"complexity":"medium"},
  {"id":"syn_cw_003","instruction":"Read-only access to CloudWatch logs for a specific log group named /app/production","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:GetLogEvents\",\n        \"logs:FilterLogEvents\",\n        \"logs:DescribeLogStreams\"\n      ],\n      \"Resource\": \"arn:aws:logs:*:*:log-group:/app/production:*\"\n    }\n  ]\n}","source":"synthetic","services":["logs"],"complexity":"simple"},
  {"id":"syn_cw_004","instruction":"Allow a monitoring service to read CloudWatch dashboards and get metric data but not create or modify anything","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"cloudwatch:GetDashboard\",\n        \"cloudwatch:ListDashboards\",\n        \"cloudwatch:GetMetricData\",\n        \"cloudwatch:GetMetricStatistics\",\n        \"cloudwatch:ListMetrics\",\n        \"cloudwatch:DescribeAlarms\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["cloudwatch"],"complexity":"simple"},
  {"id":"syn_cw_005","instruction":"Allow writing CloudWatch logs only to log groups that start with /application/ prefix","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\"\n      ],\n      \"Resource\": \"arn:aws:logs:*:*:log-group:/application/*:*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"logs:DescribeLogGroups\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["logs"],"complexity":"medium"},

  # ===== IAM / STS / KMS / SNS / SQS =====
  {"id":"syn_iam_001","instruction":"Allow a user to change their own password and manage their own MFA device but nothing else in IAM","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"iam:ChangePassword\",\n        \"iam:GetUser\"\n      ],\n      \"Resource\": \"arn:aws:iam::*:user/${aws:username}\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"iam:CreateVirtualMFADevice\",\n        \"iam:EnableMFADevice\",\n        \"iam:ResyncMFADevice\",\n        \"iam:DeactivateMFADevice\",\n        \"iam:DeleteVirtualMFADevice\",\n        \"iam:ListMFADevices\"\n      ],\n      \"Resource\": [\n        \"arn:aws:iam::*:mfa/${aws:username}\",\n        \"arn:aws:iam::*:user/${aws:username}\"\n      ]\n    }\n  ]\n}","source":"synthetic","services":["iam"],"complexity":"complex"},
  {"id":"syn_iam_002","instruction":"Allow listing all IAM users and groups but not modifying anything. Read-only IAM access.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"iam:ListUsers\",\n        \"iam:ListGroups\",\n        \"iam:ListRoles\",\n        \"iam:ListPolicies\",\n        \"iam:GetUser\",\n        \"iam:GetGroup\",\n        \"iam:GetRole\",\n        \"iam:GetPolicy\",\n        \"iam:ListGroupsForUser\",\n        \"iam:ListAttachedUserPolicies\",\n        \"iam:ListAttachedGroupPolicies\",\n        \"iam:ListAttachedRolePolicies\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["iam"],"complexity":"medium"},
  {"id":"syn_iam_003","instruction":"Allow a user to manage their own access keys - create, update, delete, and list their own keys only","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"iam:CreateAccessKey\",\n        \"iam:DeleteAccessKey\",\n        \"iam:UpdateAccessKey\",\n        \"iam:ListAccessKeys\",\n        \"iam:GetAccessKeyLastUsed\"\n      ],\n      \"Resource\": \"arn:aws:iam::*:user/${aws:username}\"\n    }\n  ]\n}","source":"synthetic","services":["iam"],"complexity":"medium"},
  {"id":"syn_sts_001","instruction":"Allow assuming a specific cross-account role in account 987654321098 named DataAnalystRole","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sts:AssumeRole\",\n      \"Resource\": \"arn:aws:iam::987654321098:role/DataAnalystRole\"\n    }\n  ]\n}","source":"synthetic","services":["sts"],"complexity":"simple"},
  {"id":"syn_sts_002","instruction":"Allow assuming any role in the organization but only if MFA is present","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sts:AssumeRole\",\n      \"Resource\": \"arn:aws:iam::*:role/*\",\n      \"Condition\": {\n        \"Bool\": {\n          \"aws:MultiFactorAuthPresent\": \"true\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["sts"],"complexity":"medium"},
  {"id":"syn_kms_001","instruction":"Allow using a specific KMS key to encrypt and decrypt data but not manage the key itself","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"kms:Encrypt\",\n        \"kms:Decrypt\",\n        \"kms:ReEncryptFrom\",\n        \"kms:ReEncryptTo\",\n        \"kms:GenerateDataKey\",\n        \"kms:GenerateDataKeyWithoutPlaintext\",\n        \"kms:DescribeKey\"\n      ],\n      \"Resource\": \"arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012\"\n    }\n  ]\n}","source":"synthetic","services":["kms"],"complexity":"medium"},
  {"id":"syn_kms_002","instruction":"Allow full KMS key administration including creating, disabling, and scheduling deletion of keys","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"kms:CreateKey\",\n        \"kms:CreateAlias\",\n        \"kms:DeleteAlias\",\n        \"kms:DescribeKey\",\n        \"kms:DisableKey\",\n        \"kms:EnableKey\",\n        \"kms:ListKeys\",\n        \"kms:ListAliases\",\n        \"kms:ScheduleKeyDeletion\",\n        \"kms:CancelKeyDeletion\",\n        \"kms:PutKeyPolicy\",\n        \"kms:GetKeyPolicy\",\n        \"kms:TagResource\",\n        \"kms:UntagResource\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["kms"],"complexity":"medium"},
  {"id":"syn_sns_001","instruction":"Allow publishing messages to a specific SNS topic named order-notifications","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sns:Publish\",\n      \"Resource\": \"arn:aws:sns:*:*:order-notifications\"\n    }\n  ]\n}","source":"synthetic","services":["sns"],"complexity":"simple"},
  {"id":"syn_sns_002","instruction":"Allow managing SNS topics and subscriptions including creating topics, subscribing, and publishing","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"sns:CreateTopic\",\n        \"sns:DeleteTopic\",\n        \"sns:ListTopics\",\n        \"sns:GetTopicAttributes\",\n        \"sns:SetTopicAttributes\",\n        \"sns:Subscribe\",\n        \"sns:Unsubscribe\",\n        \"sns:ListSubscriptions\",\n        \"sns:Publish\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["sns"],"complexity":"medium"},
  {"id":"syn_sqs_001","instruction":"Allow sending and receiving messages from SQS queue named task-queue","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"sqs:SendMessage\",\n        \"sqs:ReceiveMessage\",\n        \"sqs:DeleteMessage\",\n        \"sqs:GetQueueAttributes\",\n        \"sqs:GetQueueUrl\"\n      ],\n      \"Resource\": \"arn:aws:sqs:*:*:task-queue\"\n    }\n  ]\n}","source":"synthetic","services":["sqs"],"complexity":"simple"},
  {"id":"syn_sqs_002","instruction":"Allow creating and managing SQS queues with names starting with dev- only","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"sqs:CreateQueue\",\n        \"sqs:DeleteQueue\",\n        \"sqs:SetQueueAttributes\",\n        \"sqs:GetQueueAttributes\",\n        \"sqs:GetQueueUrl\",\n        \"sqs:SendMessage\",\n        \"sqs:ReceiveMessage\",\n        \"sqs:DeleteMessage\",\n        \"sqs:PurgeQueue\"\n      ],\n      \"Resource\": \"arn:aws:sqs:*:*:dev-*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sqs:ListQueues\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["sqs"],"complexity":"medium"},
  {"id":"syn_sqs_003","instruction":"Allow a dead letter queue processor to receive and delete messages from the DLQ named failed-orders-dlq and also send reprocessed messages back to the main orders-queue","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"sqs:ReceiveMessage\",\n        \"sqs:DeleteMessage\",\n        \"sqs:GetQueueAttributes\"\n      ],\n      \"Resource\": \"arn:aws:sqs:*:*:failed-orders-dlq\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sqs:SendMessage\",\n      \"Resource\": \"arn:aws:sqs:*:*:orders-queue\"\n    }\n  ]\n}","source":"synthetic","services":["sqs"],"complexity":"medium"},

  # ===== MULTI-SERVICE COMPLEX =====
  {"id":"syn_multi_001","instruction":"Create an execution role policy for a Lambda function that needs to read from S3 bucket data-input, write to DynamoDB table results, publish to SNS topic job-complete, and write logs to CloudWatch","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::data-input\",\n        \"arn:aws:s3:::data-input/*\"\n      ]\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:PutItem\",\n        \"dynamodb:UpdateItem\"\n      ],\n      \"Resource\": \"arn:aws:dynamodb:*:*:table/results\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sns:Publish\",\n      \"Resource\": \"arn:aws:sns:*:*:job-complete\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogGroup\",\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\"\n      ],\n      \"Resource\": \"arn:aws:logs:*:*:*\"\n    }\n  ]\n}","source":"synthetic","services":["s3","dynamodb","sns","logs"],"complexity":"complex"},
  {"id":"syn_multi_002","instruction":"DevOps engineer policy: allow managing EC2 instances, reading CloudWatch metrics and logs, accessing S3 buckets with prefix devops-, and describing VPCs and subnets","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:DescribeInstances\",\n        \"ec2:StartInstances\",\n        \"ec2:StopInstances\",\n        \"ec2:RebootInstances\",\n        \"ec2:DescribeVpcs\",\n        \"ec2:DescribeSubnets\",\n        \"ec2:DescribeSecurityGroups\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"cloudwatch:GetMetricData\",\n        \"cloudwatch:GetMetricStatistics\",\n        \"cloudwatch:ListMetrics\",\n        \"cloudwatch:DescribeAlarms\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:GetLogEvents\",\n        \"logs:FilterLogEvents\",\n        \"logs:DescribeLogGroups\",\n        \"logs:DescribeLogStreams\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"s3:*\",\n      \"Resource\": [\n        \"arn:aws:s3:::devops-*\",\n        \"arn:aws:s3:::devops-*/*\"\n      ]\n    }\n  ]\n}","source":"synthetic","services":["ec2","cloudwatch","logs","s3"],"complexity":"complex"},
  {"id":"syn_multi_003","instruction":"ECS task execution role that allows pulling Docker images from ECR, fetching secrets from Secrets Manager, and writing logs to CloudWatch","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ecr:GetAuthorizationToken\",\n        \"ecr:BatchCheckLayerAvailability\",\n        \"ecr:GetDownloadUrlForLayer\",\n        \"ecr:BatchGetImage\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"secretsmanager:GetSecretValue\",\n      \"Resource\": \"arn:aws:secretsmanager:*:*:secret:app/*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"logs:CreateLogStream\",\n        \"logs:PutLogEvents\",\n        \"logs:CreateLogGroup\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ecr","secretsmanager","logs"],"complexity":"complex"},
  {"id":"syn_multi_004","instruction":"Security audit role that has read-only access across IAM, CloudTrail, Config, GuardDuty, and Security Hub for compliance reviews","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"iam:List*\",\n        \"iam:Get*\",\n        \"iam:GenerateCredentialReport\",\n        \"iam:GetCredentialReport\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"cloudtrail:DescribeTrails\",\n        \"cloudtrail:GetTrailStatus\",\n        \"cloudtrail:LookupEvents\",\n        \"cloudtrail:GetEventSelectors\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"config:Describe*\",\n        \"config:Get*\",\n        \"config:List*\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"guardduty:GetDetector\",\n        \"guardduty:GetFindings\",\n        \"guardduty:ListDetectors\",\n        \"guardduty:ListFindings\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"securityhub:GetFindings\",\n        \"securityhub:DescribeHub\",\n        \"securityhub:GetEnabledStandards\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["iam","cloudtrail","config","guardduty","securityhub"],"complexity":"complex"},
  {"id":"syn_multi_005","instruction":"Allow a container to read parameters from SSM Parameter Store under the path /app/production/ and decrypt them using KMS","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ssm:GetParameter\",\n        \"ssm:GetParameters\",\n        \"ssm:GetParametersByPath\"\n      ],\n      \"Resource\": \"arn:aws:ssm:*:*:parameter/app/production/*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"kms:Decrypt\",\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ssm","kms"],"complexity":"medium"},
  {"id":"syn_multi_006","instruction":"Restrict all AWS actions to only us-east-1 and us-west-2 regions. Deny everything in all other regions except for global services like IAM and CloudFront.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Deny\",\n      \"NotAction\": [\n        \"iam:*\",\n        \"cloudfront:*\",\n        \"route53:*\",\n        \"support:*\",\n        \"sts:*\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringNotEquals\": {\n          \"aws:RequestedRegion\": [\n            \"us-east-1\",\n            \"us-west-2\"\n          ]\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["iam","cloudfront","route53"],"complexity":"complex"},

  # ===== MISC DIVERSE STYLES =====
  {"id":"syn_misc_001","instruction":"s3 read only, all buckets","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\",\n        \"s3:ListAllMyBuckets\",\n        \"s3:GetBucketLocation\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["s3"],"complexity":"simple"},
  {"id":"syn_misc_002","instruction":"We need a highly restrictive policy for our interns. They should only be able to view the AWS console dashboard, describe EC2 instances, and list S3 buckets. Nothing else. No modifications of any kind.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:DescribeInstances\",\n        \"ec2:DescribeInstanceStatus\",\n        \"s3:ListAllMyBuckets\",\n        \"s3:GetBucketLocation\",\n        \"cloudwatch:GetDashboard\",\n        \"cloudwatch:ListDashboards\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2","s3","cloudwatch"],"complexity":"medium"},
  {"id":"syn_misc_003","instruction":"give full access to route53 for managing DNS records","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"route53:ChangeResourceRecordSets\",\n        \"route53:GetHostedZone\",\n        \"route53:ListHostedZones\",\n        \"route53:ListResourceRecordSets\",\n        \"route53:GetChange\",\n        \"route53:CreateHostedZone\",\n        \"route53:DeleteHostedZone\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["route53"],"complexity":"simple"},
  {"id":"syn_misc_004","instruction":"I'm setting up a new microservice and I need a policy for its ECS task. The service needs to: 1) Read configuration from Parameter Store under /myservice/config/, 2) Read and write to DynamoDB table service-state, 3) Push metrics to CloudWatch, 4) Send messages to SQS queue service-events, and 5) Pull container images from our ECR repository","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ssm:GetParameter\",\n        \"ssm:GetParameters\",\n        \"ssm:GetParametersByPath\"\n      ],\n      \"Resource\": \"arn:aws:ssm:*:*:parameter/myservice/config/*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"dynamodb:GetItem\",\n        \"dynamodb:PutItem\",\n        \"dynamodb:UpdateItem\",\n        \"dynamodb:DeleteItem\",\n        \"dynamodb:Query\"\n      ],\n      \"Resource\": \"arn:aws:dynamodb:*:*:table/service-state\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"cloudwatch:PutMetricData\",\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"sqs:SendMessage\",\n      \"Resource\": \"arn:aws:sqs:*:*:service-events\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ecr:GetAuthorizationToken\",\n        \"ecr:BatchCheckLayerAvailability\",\n        \"ecr:GetDownloadUrlForLayer\",\n        \"ecr:BatchGetImage\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ssm","dynamodb","cloudwatch","sqs","ecr"],"complexity":"complex"},
  {"id":"syn_misc_005","instruction":"Secrets Manager read only for secrets with prefix prod/","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"secretsmanager:GetSecretValue\",\n        \"secretsmanager:DescribeSecret\",\n        \"secretsmanager:ListSecrets\"\n      ],\n      \"Resource\": \"arn:aws:secretsmanager:*:*:secret:prod/*\"\n    }\n  ]\n}","source":"synthetic","services":["secretsmanager"],"complexity":"simple"},
  {"id":"syn_misc_006","instruction":"Allow managing CloudFormation stacks but only those with names starting with dev-. Also allow reading S3 for CloudFormation templates.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"cloudformation:CreateStack\",\n        \"cloudformation:UpdateStack\",\n        \"cloudformation:DeleteStack\",\n        \"cloudformation:DescribeStacks\",\n        \"cloudformation:DescribeStackEvents\",\n        \"cloudformation:DescribeStackResources\",\n        \"cloudformation:GetTemplate\",\n        \"cloudformation:ListStacks\"\n      ],\n      \"Resource\": \"arn:aws:cloudformation:*:*:stack/dev-*/*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["cloudformation","s3"],"complexity":"complex"},
  {"id":"syn_misc_007","instruction":"ec2 and rds full access us-east-1 only","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:*\",\n        \"rds:*\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"StringEquals\": {\n          \"aws:RequestedRegion\": \"us-east-1\"\n        }\n      }\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"ec2:DescribeRegions\",\n        \"ec2:DescribeAvailabilityZones\",\n        \"rds:DescribeDBInstances\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["ec2","rds"],"complexity":"medium"},
  {"id":"syn_misc_008","instruction":"Our QA team needs to invoke specific Lambda functions for testing, read test results from S3 bucket qa-results, and view CloudWatch dashboards. They should NOT be able to modify any Lambda code or create new functions.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"lambda:InvokeFunction\",\n        \"lambda:GetFunction\",\n        \"lambda:ListFunctions\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": [\n        \"lambda:CreateFunction\",\n        \"lambda:UpdateFunctionCode\",\n        \"lambda:UpdateFunctionConfiguration\",\n        \"lambda:DeleteFunction\"\n      ],\n      \"Resource\": \"*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:ListBucket\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::qa-results\",\n        \"arn:aws:s3:::qa-results/*\"\n      ]\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"cloudwatch:GetDashboard\",\n        \"cloudwatch:ListDashboards\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["lambda","s3","cloudwatch"],"complexity":"complex"},
  {"id":"syn_misc_009","instruction":"Prevent anyone from disabling CloudTrail logging or deleting CloudTrail trails. This is a guardrail policy.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Deny\",\n      \"Action\": [\n        \"cloudtrail:StopLogging\",\n        \"cloudtrail:DeleteTrail\",\n        \"cloudtrail:UpdateTrail\"\n      ],\n      \"Resource\": \"*\"\n    }\n  ]\n}","source":"synthetic","services":["cloudtrail"],"complexity":"simple"},
  {"id":"syn_misc_010","instruction":"Allow iam:PassRole only for roles that start with app- prefix. This prevents users from passing admin roles to services.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": \"iam:PassRole\",\n      \"Resource\": \"arn:aws:iam::*:role/app-*\"\n    }\n  ]\n}","source":"synthetic","services":["iam"],"complexity":"simple"},
  {"id":"syn_misc_011","instruction":"Deny all actions if the request does not use MFA. This is an account-wide security guardrail.","output":"{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Deny\",\n      \"NotAction\": [\n        \"iam:CreateVirtualMFADevice\",\n        \"iam:EnableMFADevice\",\n        \"iam:GetUser\",\n        \"iam:ListMFADevices\",\n        \"iam:ListVirtualMFADevices\",\n        \"iam:ResyncMFADevice\",\n        \"sts:GetSessionToken\"\n      ],\n      \"Resource\": \"*\",\n      \"Condition\": {\n        \"BoolIfExists\": {\n          \"aws:MultiFactorAuthPresent\": \"false\"\n        }\n      }\n    }\n  ]\n}","source":"synthetic","services":["iam","sts"],"complexity":"complex"}
]

# Save all synthetic batches
with open('iam-finetuning/dataset/synthetic_batches/all_synthetic.json', 'w') as f:
    json.dump(synthetic_data, f, indent=2)

print(f"Saved {len(synthetic_data)} synthetic examples")

Saved 71 synthetic examples


## 5. Validation & CleaningEvery example undergoes strict validation:- 

**JSON parsing**: Must be syntactically valid JSON- 

**IAM schema check**: Must have `Version`, `Statement` array; each statement needs `Effect`, `Action`, `Resource`- 

**Deduplication**: MD5 hash of normalized JSON to remove exact duplicates

In [8]:
import json
import hashlib
from collections import Counter

def validate_policy_json(policy_string):
    errors = []
    try:
        policy = json.loads(policy_string)
    except json.JSONDecodeError as e:
        return False, [f"Invalid JSON: {e}"]
    
    if not isinstance(policy, dict):
        return False, ["Policy is not a JSON object"]
    if "Version" not in policy:
        errors.append("Missing Version")
    if "Statement" not in policy:
        return False, errors + ["Missing Statement"]
    
    statements = policy["Statement"]
    if isinstance(statements, dict):
        statements = [statements]
    
    for i, stmt in enumerate(statements):
        if "Effect" not in stmt:
            errors.append(f"Statement[{i}]: Missing Effect")
        elif stmt["Effect"] not in ["Allow", "Deny"]:
            errors.append(f"Statement[{i}]: Invalid Effect")
        if "Action" not in stmt and "NotAction" not in stmt:
            errors.append(f"Statement[{i}]: Missing Action")
        if "Resource" not in stmt and "NotResource" not in stmt:
            errors.append(f"Statement[{i}]: Missing Resource")
    
    return len(errors) == 0, errors

# Load all sources
all_data = []

# AWS managed policies
managed_path = 'iam-finetuning/dataset/raw/aws_managed_policies.json'
if os.path.exists(managed_path):
    with open(managed_path) as f:
        managed = json.load(f)
    all_data.extend(managed)
    print(f"AWS managed policies: {len(managed)}")

# Synthetic data
synth_path = 'iam-finetuning/dataset/synthetic_batches/all_synthetic.json'
if os.path.exists(synth_path):
    with open(synth_path) as f:
        synth = json.load(f)
    all_data.extend(synth)
    print(f"Synthetic examples: {len(synth)}")

print(f"Total raw: {len(all_data)}")

# Validate
valid = []
seen_hashes = set()
invalid_count = 0

for ex in all_data:
    if "instruction" not in ex or "output" not in ex:
        invalid_count += 1
        continue
    
    output = ex["output"]
    if isinstance(output, dict):
        output = json.dumps(output, indent=2)
    
    is_valid, errors = validate_policy_json(output)
    if not is_valid:
        invalid_count += 1
        continue
    
    # Dedup
    h = hashlib.md5(json.dumps(json.loads(output), sort_keys=True).encode()).hexdigest()
    if h in seen_hashes:
        continue
    seen_hashes.add(h)
    
    # Clean output formatting
    policy = json.loads(output)
    
    valid.append({
        "id": ex.get("id", f"auto_{len(valid):04d}"),
        "instruction": ex["instruction"].strip(),
        "output": json.dumps(policy, indent=2),
        "source": ex.get("source", "unknown"),
        "services": ex.get("services", []),
        "complexity": ex.get("complexity", "medium"),
    })

print(f"\nValid: {len(valid)}, Invalid: {invalid_count}, Dupes removed: {len(all_data) - len(valid) - invalid_count}")
print(f"Sources: {Counter(e['source'] for e in valid)}")

with open('iam-finetuning/dataset/processed/all_validated_pairs.json', 'w') as f:
    json.dump(valid, f, indent=2)

print(f"Saved to processed/all_validated_pairs.json")

AWS managed policies: 1437
Synthetic examples: 71
Total raw: 1508

Valid: 1488, Invalid: 0, Dupes removed: 20
Sources: Counter({'aws_managed': 1417, 'synthetic': 71})
Saved to processed/all_validated_pairs.json


## 6. Stratified Train/Val/Test SplitData is split **80/10/10** with stratification by complexity to ensure balanced representation.

Each example is formatted in **Alpaca instruction format**:```### Instruction:<natural language description>### Response:<IAM policy JSON>```

In [9]:
import random
random.seed(42)

with open('iam-finetuning/dataset/processed/all_validated_pairs.json') as f:
    data = json.load(f)

# Group by complexity for stratified split
groups = {}
for item in data:
    c = item["complexity"]
    groups.setdefault(c, []).append(item)

train, val, test = [], [], []
for comp, items in groups.items():
    random.shuffle(items)
    n = len(items)
    n_train = int(n * 0.8)
    n_val = int(n * 0.1)
    train.extend(items[:n_train])
    val.extend(items[n_train:n_train + n_val])
    test.extend(items[n_train + n_val:])

random.shuffle(train)
random.shuffle(val)
random.shuffle(test)

print(f"Train: {len(train)}, Val: {len(val)}, Test: {len(test)}")

# Format and save as JSONL
for name, split in [("train", train), ("val", val), ("test", test)]:
    path = f'iam-finetuning/dataset/processed/{name}.jsonl'
    with open(path, 'w') as f:
        for ex in split:
            text = f"### Instruction:\n{ex['instruction']}\n\n### Response:\n{ex['output']}"
            line = {"text": text, "id": ex["id"], "complexity": ex["complexity"], 
                    "services": ex["services"], "source": ex["source"]}
            f.write(json.dumps(line) + "\n")
    print(f"Saved {path}")

# Quick sample
print("\n--- SAMPLE ---")
sample_text = f"### Instruction:\n{train[0]['instruction']}\n\n### Response:\n{train[0]['output'][:200]}..."
print(sample_text)

Train: 1189, Val: 148, Test: 151
Saved iam-finetuning/dataset/processed/train.jsonl
Saved iam-finetuning/dataset/processed/val.jsonl
Saved iam-finetuning/dataset/processed/test.jsonl

--- SAMPLE ---
### Instruction:
Provide an IAM policy for ROSA Cloud Network Config Operator Policy

### Response:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DescribeNetworkResources",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceSt...


## 7. Final Verification

In [10]:
# Final check
for name in ["train", "val", "test"]:
    path = f'iam-finetuning/dataset/processed/{name}.jsonl'
    count = sum(1 for _ in open(path))
    print(f"{name}.jsonl: {count} examples")

print("\nDataset is ready for fine-tuning!")

train.jsonl: 1189 examples
val.jsonl: 148 examples
test.jsonl: 151 examples

Dataset is ready for fine-tuning!


---## Dataset Summary| Metric | Value ||--------|-------|| **Total validated examples** | 1,488 || **Training set** | 1,189 (80%) || **Validation set** | 148 (10%) || **Test set** | 151 (10%) || **Sources** | AWS managed (1,417) + Synthetic (71) || **AWS services covered** | 30+ || **Format** | Alpaca instruction (JSONL) |**Next step** â†’ `02_Model_Training.ipynb`