# Amazon Redshift Serverless with Boto3
Region: **us-east-1**

This notebook demonstrates how to interact with **Amazon Redshift Serverless** end-to-end using the **boto3 Redshift Data API**.

While you will create your **namespace**, **workgroup**, and **IAM role** in the AWS Console, this notebook focuses on running SQL statements programmatically through the Data API.

### Steps Covered
1. Setup Redshift Data API client
2. Create table for sample *tickit* dataset
3. Load data from S3 into Redshift
4. Run analytical queries
5. Fetch and display results
6. Optional: Unload data back to S3
7. Cleanup resources

---
### Prerequisites
- Redshift Serverless namespace and workgroup created via AWS Console
- Workgroup has IAM role with S3 access for COPY/UNLOAD
- Tickit dataset available in S3 (example: `s3://aws-tickit-data/`)
- AWS credentials configured in local environment (`aws configure`)
- Permissions: `AmazonRedshiftDataFullAccess`, `AmazonS3FullAccess`


## Step 1: Setup Redshift Data API Client

In [None]:
import boto3
import time
import pandas as pd

# Replace with your configuration
region = 'us-east-1'
workgroup = 'my-redshift-workgroup'   # Replace with your Redshift Serverless workgroup
database = 'dev'                      # Replace with your Redshift database
schema = 'public'
iam_role_arn = 'arn:aws:iam::123456789012:role/MyRedshiftS3AccessRole'
s3_data_path = 's3://awssampledbuswest2/tickit/spectrum/sales/'   # Sample tickit data
s3_unload_path = 's3://my-redshift-output-bucket/unload/'

redshift_data = boto3.client('redshift-data', region_name=region)
print('✅ Redshift Data API client initialized.')

## Step 2: Helper Functions for Query Execution

In [None]:
def execute_sql(sql: str):
    response = redshift_data.execute_statement(
        WorkgroupName=workgroup,
        Database=database,
        Sql=sql
    )
    return response['Id']

def wait_for_query(statement_id: str):
    status = 'RUNNING'
    while status in ['RUNNING', 'STARTED', 'SUBMITTED']:
        result = redshift_data.describe_statement(Id=statement_id)
        status = result['Status']
        if status in ['FAILED', 'ABORTED']:
            raise Exception(f"Query {statement_id} failed: {result['Error']}" )
        time.sleep(2)
    return result

def fetch_results(statement_id: str):
    result = redshift_data.get_statement_result(Id=statement_id)
    cols = [col['name'] for col in result['ColumnMetadata']]
    rows = []
    for record in result['Records']:
        rows.append([list(field.values())[0] if field else None for field in record])
    return pd.DataFrame(rows, columns=cols)

## Step 3: Create Tickit Table

In [None]:
create_table = f"""
CREATE TABLE IF NOT EXISTS {schema}.sales(
    salesid INTEGER NOT NULL,
    listid INTEGER NOT NULL,
    sellerid INTEGER NOT NULL,
    buyerid INTEGER NOT NULL,
    eventid INTEGER NOT NULL,
    dateid SMALLINT NOT NULL,
    qtysold SMALLINT NOT NULL,
    pricepaid DECIMAL(8,2),
    commission DECIMAL(8,2),
    saletime TIMESTAMP
);
"""

stmt_id = execute_sql(create_table)
wait_for_query(stmt_id)
print('✅ sales table created successfully.')

## Step 4: Load Tickit Data from S3

In [None]:
copy_sql = f"""
COPY {schema}.sales
FROM '{s3_data_path}'
IAM_ROLE '{iam_role_arn}'
FORMAT AS PARQUET;
"""

stmt_id = execute_sql(copy_sql)
wait_for_query(stmt_id)
print('✅ Data loaded successfully from S3.')

## Step 5: Run Analytical Queries

In [None]:
# Query 1: Total sales count
sql = "SELECT COUNT(*) AS total_sales FROM sales;"
stmt_id = execute_sql(sql)
wait_for_query(stmt_id)
df1 = fetch_results(stmt_id)
print(df1)

In [None]:
# Query 2: Top 5 events by total revenue
sql = """
SELECT eventid, SUM(pricepaid) AS total_revenue
FROM sales
GROUP BY eventid
ORDER BY total_revenue DESC
LIMIT 5;
"""
stmt_id = execute_sql(sql)
wait_for_query(stmt_id)
df2 = fetch_results(stmt_id)
print(df2)

## Step 6: Optional - Unload Query Results to S3

In [None]:
unload_sql = f"""
UNLOAD ('SELECT * FROM sales WHERE qtysold > 10')
TO '{s3_unload_path}'
IAM_ROLE '{iam_role_arn}'
PARQUET
ALLOWOVERWRITE;
"""

stmt_id = execute_sql(unload_sql)
wait_for_query(stmt_id)
print('✅ Unload completed successfully.')

## Step 7: Cleanup Resources

In [None]:
drop_table = f"DROP TABLE IF EXISTS {schema}.sales;"
stmt_id = execute_sql(drop_table)
wait_for_query(stmt_id)
print('✅ Table dropped successfully.')