Application Load Balancer sending access logs do S3.
Server-side encryption for this integration only supports Amazon S3-managed keys (SSE-S3).
Reference from the docs:
The only server-side encryption option that's supported is Amazon S3-managed keys (SSE-S3). For more information, see Amazon S3-managed encryption keys (SSE-S3).
Create the temporary key pair:
mkdir -p keys
ssh-keygen -f keys/temp_key
Copy the sample .auto.tfvars
file:
cp samples/sample.tfvars .auto.tfvars
Start the environment:
terraform init
terraform apply -auto-approve
ELB will confirm that the configuration worked by creating the file ELBAccessLogTestFile
:
https://<bucket>.s3.<region>.amazonaws.com/<prefix>/AWSLogs/<account>/ELBAccessLogTestFile
Once traffic starts coming in to ELB the access logs will be generated in the S3. You can use Athena to query the results.
Once access are on an S3 bucket, it is time to analyze it. This next section will follow the documentation for Querying Application Load Balancer logs.
We'll use Glue + Athena to achieve that.
It is worth remembering that Athena can query several data sources:
To query the ELB access logs available on S3, a database is required, and the Terraform scripts will create one.
It is possible to do it manually by creating a table, or by using Glue data crawler.
Create a table directly from Athena. There are two options:
- No partitions
- With partitions
As per documentation for PARTITIONED table:
Because ALB logs have a known structure whose partition scheme you can specify in advance, you can reduce query runtime and automate partition management by using the Athena partition projection feature. Partition projection automatically adds new partitions as new data is added. This removes the need for you to manually add partitions by using
ALTER TABLE ADD PARTITION
.
Use the local file alb_logs.sql
as a reference, but try getting it fresh from the documentation. It is necessary to replace the S3 data source references in the Athena SQL command. The value is provided as an output by Terraform.
s3://your-alb-logs-directory/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/
Terraform will also prepare an Athena Workgroup with a dedicated S3 output.
All you have to do now is select the elb-access-logs
Workgroup.
I copied these queries from the documentation:
View the first 100 access log entries in chronological order
SELECT *
FROM alb_logs
ORDER by time ASC
LIMIT 100
List all client IP addresses that accessed the Application Load Balancer, and how many times they accessed the Application Load Balancer
SELECT distinct client_ip, count() as count from alb_logs
GROUP by client_ip
ORDER by count() DESC;
The following query counts the number of HTTP GET requests received by the load balancer grouped by the client IP address:
SELECT COUNT(request_verb) AS
count,
request_verb,
client_ip
FROM alb_logs
GROUP BY request_verb, client_ip
LIMIT 100;
Another query shows the URLs visited by Safari browser users:
SELECT request_url
FROM alb_logs
WHERE user_agent LIKE '%Safari%'
LIMIT 10;
The following query shows records that have ELB status code values greater than or equal to 500.
SELECT * FROM alb_logs
WHERE elb_status_code >= 500
The following example shows how to parse the logs by
datetime
:
SELECT client_ip, sum(received_bytes)
FROM alb_logs
WHERE parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z')
BETWEEN parse_datetime('2018-05-30-12:00:00','yyyy-MM-dd-HH:mm:ss')
AND parse_datetime('2018-05-31-00:00:00','yyyy-MM-dd-HH:mm:ss')
GROUP BY client_ip;
The following query queries the table that uses partition projection for all ALB logs from the specified day.
SELECT *
FROM alb_logs
WHERE day = '2022/02/12'
terraform destroy -auto-approve