Create an external schema pointing to the data in S3 using the CREATE EXTERNAL SCHEMA SQL command.

In [0]:
CREATE EXTERNAL SCHEMA spectrum_schema
FROM DATA CATALOG
DATABASE 'spectrum_database'
IAM_ROLE 'arn:aws:iam::146962103229:role/RedshiftAccessRole'
CREATE EXTERNAL DATABASE IF NOT EXISTS;

Define an external table that points to your CSV file in S3.
If your data is in Parquet format, the STORED AS clause

In [0]:
CREATE EXTERNAL TABLE spectrum_schema.sales_data (
    order_id INT,
    customer_id INT,
    order_date DATE,
    product_id VARCHAR(10),
    product_name VARCHAR(100),
    quantity INT,
    unit_price DECIMAL(10, 2),
    total_amount DECIMAL(10, 2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://niks-glue-data/';

Query External Data Using Redshift Spectrum

In [0]:
SELECT * 
FROM spectrum_schema.sales_data
LIMIT 10;

Aggregating Sales by Product

In [0]:
SELECT 
    product_name, 
    SUM(total_amount) AS total_sales,
    COUNT(order_id) AS num_orders
FROM spectrum_schema.sales_data
GROUP BY product_name
ORDER BY total_sales DESC;

Filtering Data (Sales in 2023)

In [0]:
SELECT 
    product_name, 
    SUM(total_amount) AS total_sales
FROM spectrum_schema.sales_data
WHERE EXTRACT(YEAR FROM order_date) = 2023
GROUP BY product_name
ORDER BY total_sales DESC;