# AWS Athena with Boto3
Region: **us-east-1**

This notebook demonstrates how to run Athena queries using the `boto3` SDK for Python.

### Workflow:
1. Define the Athena client and key configurations
2. Create a database and table from an existing CSV file in S3
3. Run sample SQL queries (Count, Group By)
4. Fetch and display query results
5. Clean up temporary resources if needed

---
### Prerequisites
- AWS account with **AmazonAthenaFullAccess** and **AmazonS3FullAccess**
- AWS credentials configured (`aws configure`)
- CSV file already uploaded to S3 (`orders.csv`)
- Athena output S3 bucket available for query results

## Step 1: Setup Athena Client and Configurations

In [None]:
import boto3
import time

region = 'us-east-1'
database = 'salesdb'
table_name = 'orders'
s3_data = 's3://my-athena-lab-data222/orders/'   # Replace with your S3 location for orders.csv
s3_output = 's3://my-athena-lab-output222/'      # Replace with your Athena output location

athena_client = boto3.client('athena', region_name=region)
print('Athena client initialized successfully.')

## Step 2: Helper Function to Run Athena Query

In [None]:
def run_athena_query(query, database, output_location):
    """Execute Athena query and return QueryExecutionId."""
    response = athena_client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={'Database': database},
        ResultConfiguration={'OutputLocation': output_location}
    )
    return response['QueryExecutionId']

def wait_for_query(execution_id):
    """Wait until Athena query finishes."""
    state = 'RUNNING'
    while state in ['RUNNING', 'QUEUED']:
        response = athena_client.get_query_execution(QueryExecutionId=execution_id)
        state = response['QueryExecution']['Status']['State']
        if state in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
            break
        time.sleep(2)
    return state

## Step 3: Create Database

In [None]:
query = f"""CREATE DATABASE IF NOT EXISTS {database};"""
execution_id = run_athena_query(query, database, s3_output)
state = wait_for_query(execution_id)
print('Database creation status:', state)

## Step 4: Create Table for Orders CSV

In [None]:
create_table_query = f"""
CREATE EXTERNAL TABLE IF NOT EXISTS {database}.{table_name} (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
)
LOCATION '{s3_data}'
TBLPROPERTIES ('skip.header.line.count'='0');
"""

execution_id = run_athena_query(create_table_query, database, s3_output)
state = wait_for_query(execution_id)
print('Table creation status:', state)

## Step 5: Run Sample Query - Total Orders

In [None]:
query = f"SELECT COUNT(*) AS total_orders FROM {table_name};"
execution_id = run_athena_query(query, database, s3_output)
state = wait_for_query(execution_id)

if state == 'SUCCEEDED':
    results = athena_client.get_query_results(QueryExecutionId=execution_id)
    for row in results['ResultSet']['Rows']:
        print([col.get('VarCharValue', '') for col in row['Data']])
else:
    print('Query failed:', state)

## Step 6: Run Query - Orders by Status

In [None]:
query = f"SELECT order_status, COUNT(*) AS total FROM {table_name} GROUP BY order_status;"
execution_id = run_athena_query(query, database, s3_output)
state = wait_for_query(execution_id)

if state == 'SUCCEEDED':
    results = athena_client.get_query_results(QueryExecutionId=execution_id)
    print('order_status | total')
    print('-------------------')
    for row in results['ResultSet']['Rows'][1:]:  # Skip header
        cols = [col.get('VarCharValue', '') for col in row['Data']]
        print(f"{cols[0]} | {cols[1]}")
else:
    print('Query failed:', state)

## Step 7: Optional Cleanup

In [None]:
# Example cleanup queries
# DROP TABLE orders;
# DROP DATABASE salesdb;