
# 🧩 AWS Redshift S3 Integration Lab 

**Objective:** Learn how to load a **CSV file** from an **S3 bucket** into **Amazon Redshift** using the `COPY` command.  


---

## 🧭 Lab Overview

In this lab, you will:
1. Prepare and verify your S3 bucket and IAM role.
2. Create a table in Amazon Redshift matching your file structure.
3. Use the `COPY` command to load data from S3 to Redshift.
4. Validate and analyze the data with SQL queries.

> 🧠 **Note:** This lab assumes your Redshift cluster is already running, the IAM role is attached, and your S3 bucket (created using boto3 or console) already contains `orders.csv`



## ⚙️ Prerequisites

Before starting, ensure that you have:

1. **Redshift Cluster Running**
   - Database name: e.g., `dev`
   - User: `admin`
   - Status: `available`

2. **IAM Role with S3 Read Access**
   - Role attached to the Redshift cluster.
   - Policy: `AmazonS3ReadOnlyAccess`.

3. **S3 Bucket and File**
   - Existing S3 bucket: e.g., `my-demo-bucket-12345`
   - File path: `s3://my-demo-bucket-12345/orders.csv`
   - File format: **CSV, no header**
   - File structure:

```
order_id,order_date,customer_id,order_status
1001,2023-01-03,25,SHIPPED
1002,2023-01-05,47,CANCELLED
1003,2023-01-07,31,PENDING
...
```



## 🧱 Step 1: Create Table in Redshift

We first create a table matching the structure of the CSV file.

```sql
CREATE TABLE orders_redshift (
    order_id INT,
    order_date DATE,
    customer_id INT,
    order_status VARCHAR(20)
);
```

✅ **Explanation:**
- The CSV has **no header**, so columns must match file order exactly.
- `order_status` is the **last column**.



## 📥 Step 2: Load Data from S3 (COPY Command)

Run the following command from the **Redshift Query Editor v2** or your preferred SQL client.

```sql
COPY orders_redshift
FROM 's3://my-demo-bucket-12345/orders.csv'
IAM_ROLE 'arn:aws:iam::111122223333:role/MyRedshiftS3Role'
FORMAT AS CSV
DELIMITER ','
REMOVEQUOTES;
```

✅ **Explanation:**
- `IAM_ROLE` → ARN of the IAM role with S3 read access.
- `DELIMITER ','` → Defines CSV column separator.
- `REMOVEQUOTES` → Strips double quotes from text values.
- **Do not** use `IGNOREHEADER` since this file has no header.



## 🔍 Step 3: Verify Data Load

Use simple queries to validate and inspect the loaded data.

```sql
-- Preview sample records
SELECT * FROM orders_redshift LIMIT 10;

-- Total number of orders
SELECT COUNT(*) AS total_orders FROM orders_redshift;
```



## 📊 Step 4: Analyze Orders by Status

```sql
-- Count orders by status
SELECT order_status, COUNT(*) AS total_by_status
FROM orders_redshift
GROUP BY order_status
ORDER BY total_by_status DESC;
```
✅ **Expected Output Example:**
| order_status | total_by_status |
|---------------|----------------|
| SHIPPED       | 1540           |
| PENDING       | 732            |
| CANCELLED     | 128            |



## 🧹 Step 5: Cleanup (Optional)

If you want to remove the table after verification:

```sql
DROP TABLE orders_redshift;
```

---

## 🧠 Reflection

In this lab, you learned how to:
- Load **headerless CSV data** from S3 into Redshift.  
- Define explicit **column order and types**.  
- Analyze loaded data using basic SQL queries.  

This process demonstrates how **buckets created via boto3 or console** can seamlessly integrate with **Redshift COPY operations**, enabling easy ETL workflows.

---

## 🪞 References

- [Amazon Redshift COPY Command](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html)
- [Redshift IAM Role for S3 Access](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html)
- [Redshift Query Editor v2 Guide](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2.html)
