# 3 â€“ Create Athena Database

This notebook creates an Amazon Athena database that will be used to
register the processed extreme precipitation dataset stored in S3.


## Import Required Libraries and Initialize AWS Session


In [6]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from pyathena import connect
import pandas as pd

# Initialize SageMaker session
sess = sagemaker.Session()
bucket = sess.default_bucket()
region = boto3.Session().region_name
role = get_execution_role()

print("Bucket:", bucket)
print("Region:", region)
print("Role:", role)


Bucket: sagemaker-us-east-1-083422367993
Region: us-east-1
Role: arn:aws:iam::083422367993:role/LabRole


## Define Athena Database Name and S3 Staging Location


In [7]:
database_name = "ghcn_extreme_precip_db"
athena_staging_dir = f"s3://{bucket}/athena/staging/"

print("Database name:", database_name)
print("Athena staging directory:", athena_staging_dir)


Database name: ghcn_extreme_precip_db
Athena staging directory: s3://sagemaker-us-east-1-083422367993/athena/staging/


## Establish Athena Connection


In [8]:
conn = connect(
    region_name=region,
    s3_staging_dir=athena_staging_dir
)

print("Athena connection established.")


Athena connection established.


## Create Athena Database

In [9]:
create_db_query = f"""
CREATE DATABASE IF NOT EXISTS {database_name}
"""

pd.read_sql(create_db_query, conn)

print(f"Database '{database_name}' created or already exists.")


  pd.read_sql(create_db_query, conn)


Database 'ghcn_extreme_precip_db' created or already exists.


## Verify Database Creation


In [10]:
pd.read_sql("SHOW DATABASES", conn)


  pd.read_sql("SHOW DATABASES", conn)


Unnamed: 0,database_name
0,default
1,ghcn_extreme_precip_db


## Confirm Database is Ready for Table Registration

The Athena database has been created successfully.
The next step is to register the processed CSV dataset
as an external table within this database.
