# Athena D&R Okta Automation
This notebook outlines the process of setting up automated Athena queries for security detections. We'll be focusing on ingesting logs into AWS Security Lake and creating detections based on those logs.
## Logs to Ingest
- **Tier 1: AWS Environment**
  - AWS CloudTrail logs
  - AWS GuardDuty findings
  - AWS S3 access logs
  - AWS WAF
  - AWS CloudWatch logs
- **Tier 1: SaaS Applications**
  - Okta logs
  - GitHub audit logs
  - Postman App Logs
  - Postman Cloudflare Logs
## Additional Tools
- AWS Glue and Brex Substation for log ingestion, transformation, and enrichment.
## Objective
To create scheduled Athena queries that will function as detections and run every minute. These detections will be based on the DDL table `postman_s3_okta_audit_logs`.

## Athena Queries for Detections
Here are some example Athena queries that can be used for detections. These queries are based on the DDL table `postman_s3_okta_audit_logs` and are designed to detect suspicious activities.

In [None]:
# Importing required libraries
import boto3
from botocore.exceptions import ClientError
import json
import time
import os

# Initialize Athena client
athena_client = boto3.client('athena', region_name='us-east-1')

# Initialize S3 client
s3_client = boto3.client('s3', region_name='us-east-1')

# Athena settings
athena_database = 'your_database'  # Replace with your Athena database name
athena_output_bucket = 's3://your-athena-output-bucket/'  # Replace with your S3 bucket where Athena will store query results

# Function to run Athena query
def run_athena_query(query):
    response = athena_client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={
            'Database': athena_database
        },
        ResultConfiguration={
            'OutputLocation': athena_output_bucket
        }
    )
    query_execution_id = response['QueryExecutionId']
    return query_execution_id

# Function to check Athena query status
def check_athena_query_status(query_execution_id):
    response = athena_client.get_query_execution(
        QueryExecutionId=query_execution_id
    )
    status = response['QueryExecution']['Status']['State']
    return status

# Function to get Athena query results
def get_athena_query_results(query_execution_id):
    results = []
    try:
        response = athena_client.get_query_results(
            QueryExecutionId=query_execution_id
        )
        for row in response['ResultSet']['Rows'][1:]:  # Skip header row
            results.append(row['Data'])
    except ClientError as e:
        print(f'An error occurred: {e}')
    return results

## Athena Queries
The following Athena queries are designed to detect suspicious activities based on the Okta logs stored in the `postman_s3_okta_audit_logs` table. These queries will be executed by the AWS Lambda function.

In [None]:
# Define Athena queries for detections
queries = {
    'suspicious_ips': '''
    SELECT detail.client.ipaddress AS suspicious_ip, COUNT(*) AS count
    FROM your_database.postman_s3_okta_audit_logs
    WHERE detail.outcome.result = 'FAILURE'
    GROUP BY detail.client.ipaddress
    HAVING COUNT(*) > 5
    ''',
    'unusual_user_agents': '''
    SELECT detail.client.useragent.rawuseragent AS user_agent, COUNT(*) AS count
    FROM your_database.postman_s3_okta_audit_logs
    GROUP BY detail.client.useragent.rawuseragent
    HAVING COUNT(*) < 3
    ''',
    'high_frequency_failed_logins': '''
    SELECT detail.actor.id AS user_id, COUNT(*) AS failed_count
    FROM your_database.postman_s3_okta_audit_logs
    WHERE detail.outcome.result = 'FAILURE'
    GROUP BY detail.actor.id
    HAVING COUNT(*) > 10
    ''',
    'unusual_times_of_activity': '''
    SELECT date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') AS parsed_time, COUNT(*) AS count
    FROM your_database.postman_s3_okta_audit_logs
    WHERE date_format(date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ'), '%H') NOT BETWEEN '08' AND '18'
    GROUP BY date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ')
    ''',
    'unusual_geographical_locations': '''
    SELECT detail.client.geographicalcontext.country AS country, COUNT(*) AS count
    FROM your_database.postman_s3_okta_audit_logs
    GROUP BY detail.client.geographicalcontext.country
    HAVING COUNT(*) < 5
    '''
}

## AWS Lambda Function
We'll create an AWS Lambda function to execute these Athena queries. The Lambda function will be triggered by AWS CloudWatch Events every minute to run all the detections. The results will be stored in another Athena table or sent to a monitoring system for alerting.

In [None]:
# Lambda function to execute Athena queries
def lambda_handler(event, context):
    for query_name, query in queries.items():
        print(f'Running query: {query_name}')
        query_execution_id = run_athena_query(query)
        while True:
            status = check_athena_query_status(query_execution_id)
            if status in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
                break
            time.sleep(5)  # Wait for 5 seconds before checking the status again
        if status == 'SUCCEEDED':
            results = get_athena_query_results(query_execution_id)
            print(f'Results for {query_name}: {results}')
            # TODO: Store results in another Athena table or send to monitoring system
        else:
            print(f'Query {query_name} failed to execute')
    return {
        'statusCode': 200,
        'body': json.dumps('Athena queries executed successfully.')
    }

## GitHub Actions Workflow
To keep the queries up-to-date, we can use GitHub Actions to automatically update the Lambda function whenever the queries are updated in the GitHub repository. The workflow will do the following:
- Check out the latest code from the GitHub repository
- Install AWS CLI
- Update the Lambda function with the new queries

In [None]:
# GitHub Actions YAML configuration for updating Lambda function
github_actions_yaml = '''
name: Update Lambda Function
\non: [push]
\njobs:
  update-lambda:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v2
    - name: Set up AWS CLI
      run: pip install awscli
    - name: Update Lambda function
      run: aws lambda update-function-code --function-name your-lambda-function-name --zip-file fileb://your-code.zip
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        AWS_REGION: us-east-1
'''
print(github_actions_yaml)