# 🧭 AWS Athena Console-Based Lab: Querying access.log.1

This lab guides you through creating a fully functional **Athena-based log analysis** setup using **only the AWS Console** — no code, no boto3.

You’ll upload Apache web server logs (`access.log.1`) into Amazon S3, define a schema in Athena, and query it directly from the console.

## 🪣 Step 1: Create S3 Buckets and Folders

1. Open the AWS Console → **S3 Service**.
2. Click **Create bucket**.
3. Name it something globally unique, e.g.:
   ```
   athena-accesslogs-lab-98
   ```
4. Keep region as `ap-south-1` (or your nearest region).
5. Leave other defaults and click **Create bucket**.

### Inside the bucket, create two folders:
```
data/
output/
```

### Upload your log file
Upload `access.log.1` into:
```
s3://athena-accesslogs-lab-98/data/
```

## 👤 Step 2: Create or Verify IAM Role

Ensure your IAM user or role has these permissions:
- `AmazonS3FullAccess`
- `AWSGlueServiceRole`
- `AmazonAthenaFullAccess`

If not, create a new role or attach these policies in **IAM → Roles → Create role**.

## 📊 Step 3: Configure Athena Settings

1. Open the **Athena Console** → [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/)
2. Click the ⚙️ **Settings** icon.
3. Under *Query result location*, set:
   ```
   s3://athena-accesslogs-lab-98/output/
   ```
4. Save settings.

## 🗄️ Step 4: Create Database in Athena

In the **Query Editor**, paste and run:

```sql
CREATE DATABASE IF NOT EXISTS accesslogsdb;
```

## 🧩 Step 5: Create External Table for Logs

Paste and run the following SQL in the Athena Query Editor:

```sql


CREATE EXTERNAL TABLE IF NOT EXISTS accesslogsdb.access_log (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status INT,
  size STRING,
  referer STRING,
  agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^(\\S+) (\\S+) (\\S+) \\[(.*?)\\] \"(.*?)\" (\\d{3}) (\\S+) \"(.*?)\" \"(.*?)\"$"
)
LOCATION 's3://athena-accesslogs-lab-<yourinitials>/data/'
TBLPROPERTIES ('skip.header.line.count'='0');

```

## 🔍 Step 6: Run Analysis Queries

### ✅ Total Records
```sql
SELECT COUNT(*) FROM accesslogsdb.access_log;
```

### 🌐 Top 10 IPs by Traffic
```sql
SELECT host, COUNT(*) AS hits
FROM accesslogsdb.access_log
GROUP BY host
ORDER BY hits DESC
LIMIT 10;
```

### 🚫 Identify 404 Errors
```sql
SELECT * FROM accesslogsdb.access_log WHERE status = 404;
```

### 🔝 Most Requested URLs
```sql
SELECT split(request, ' ')[2] AS url, COUNT(*) AS hits
FROM accesslogsdb.access_log
GROUP BY split(request, ' ')[2]
ORDER BY hits DESC
LIMIT 10;

```

## 🧾 Step 7: Verify Results

Check the Athena query results directly in the console.
All output files will also appear automatically in:
```
s3://athena-accesslogs-lab-98/output/
```

You can download them for further analysis if needed.

## 🧹 Step 8: Cleanup (Optional)

After completing the lab, clean up resources:

```sql
DROP TABLE accesslogsdb.access_log;
DROP DATABASE accesslogsdb;
```

Then delete files and folders from your S3 bucket to avoid storage costs.