New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add routing rules for cloudfront logs, elb logs and s3access logs #7932
Conversation
🌐 Coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cloudfront, s3 access logs and elb logs, we need to use a lambda function to send these logs from S3 bucket to Firehose. Maybe we can take advantage of the lambda function and add some information?!
Adding some routing keys in the lambda function seems a compelling option. Where can I learn more about this lambda function? Who's responsible for adding it, and do we have a reference implementation?
Adding lambda function is the next step for firehose integration. I have one written for testing but it's not published in any documentation yet. The problem with adding info in lambda function is we will have to create firehose using lambda function. Not sure if user is ok with that. Users can write their own lambda too so if they use their customized lambda, then we will lose all the info. That's why in this PR Im adding routing rules purely based on the log format. |
im' not really comfortable with just a count of the number of fields. as more and more routing rules are added, it becomes ambiguous. couple of further ideas:
in all three cases, combining some more simple checks on individual fields with the total number of fields probably adds a lot more weight to the confidence of the rule. |
i don't think we rely on a custom lamba; many users may want to roll their own including their own enrichment. |
For what I see, all three logs contain these fields: "client:port destination:port" so I was trying to add regex to check that.
For s3 access logs, any field can be set to @tommyers-elastic Thanks for the comment. I will add the regex back in and yes I agree to not rely on lambda. |
insideQuotes = !insideQuotes; | ||
} | ||
} | ||
if (tokenCount==33 && ctx.message =~ /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s[a-zA-Z0-9-]+\s\d+\s(\d+\.\d+\.\d+\.\d+|[a-fA-F0-9:]+)/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
it's a shame we can't make use of the builtin grok pattern matching for routing eh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was reading about this again and doesn't seem like this is possible. It's going to be hard to read/debug in the future. Hopefully, it doesn't get more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - but the overhead of these complex expressions is slightly worrying to me
we should do some benchmarking with and without routing
@tommyers-elastic I agree. I will merge this PR for now and I just created an issue about benchmarking to track that testing work. Thank you! |
Package awsfirehose - 0.4.0 containing this change is available at https://epr.elastic.co/search?package=awsfirehose |
What does this PR do?
CloudFront logs, ELB logs and S3 access logs are all requires a lambda function to send from s3 bucket to Firehose. This PR is to define basic routing rules for these logs to send them to the right data streams.
How to route these three log formats?
Combine regex with checking the number of fields both to define routing rules.
CloudFront logs
Define a regular expression pattern to check if the log starts with
2019-12-04 21:02:31 LAX1 392 89.160.20.112 ...
CloudFront log contains 33 fields, please see
Standard log file fields
in Amazon CloudFront documentation for more details.Sample log:
ELB logs
Classic Load Balancer: timestamp elb client:port backend:port ...
Application Load Balancer: type timestamp elb client:port target:port ...
Network Load Balancer: type version timestamp elb listener client:port destination:port...
common part: "client:port destination:port"
Classic Load Balancer: 15 fields
Application Load Balancer: 29 fields
Network Load Balancer: 22 fields
For example application load balancer log:
S3 access logs
S3 access log always has 25 fields total. For example:
The #24 field is the host header which represents the endpoint used to connect to Amazon S3. For example
s3.us-west-2.amazonaws.com
. The endpoint always containss3
andamazonaws.com
keywords.Checklist
changelog.yml
file.