-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update: Add support for x_forwarded_for headers in apaches access logs #3251
Update: Add support for x_forwarded_for headers in apaches access logs #3251
Conversation
🌐 Coverage report
|
I believe my pr to be good for review 🤞 At least open for discussion. Btw seeing the whole mess the |
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/_dev/test/pipeline/test-access-vhost.log-expected.json
Show resolved
Hide resolved
@andrewkroh I believe it is worth taking a look again ^^ Thanks a lot for helping me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reviewing this from technical perspective on the code/config in the pipeline. I didn't really consider the big picture with respect to ECS mappings, the field meanings, and whether this is a breaking change.
Taking a step back and looking at the approach here, this looks like a change in behavior w.r.t. to how source
is populated. IMO the source.address
should be the IP (or hostname) where traffic came from (the last hop). That is %{c}h
or %h
or (barring any changes from things like mod_remoteip) in mod_log_config format. This would be the same source address observed if we looked at a packet capture.
The ECS network.forwarded_ip
would be populated only when X-Forwarded-For
is used because this field is only used when the source address represents a proxy. I think the first address in the X-Forwarded-For list would always be used to fill network.forwarded_ip
.
I think the new apache.access.remote_ip_list
field should be the x-forwarded-for list plus the remote hostname (%h value). This would mean that all the hops from the client through the last proxy are accounted for in the list. And given the reference info you pointed me to, perhaps the name of this field is better as apache.access.remote_addresses
(data type keyword
) so that we can take the raw values without caring if they are IPs.
In order to have both %h
and %{X-Forwarded-For}i
available in the log I think we should prescribe formats in the docs that it will accept. We want to continue accepting the current format as is and we also want to allow X-Forwarded-For to be added by users. So perhaps we should allow another header to be tacked on to the end like:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" X-Forwarded-For=\"%{X-Forwarded-For}i\"
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but I think there is one thing missing -- docs. Users need to know how to configure there HTTPD service to output one of the formats the integration expects. Can you take a look at the readme (should be in packages/apache/_dev/...) and see if you can add a blurb about the accepted formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc additions look good. The package needs to be build to update the generated readme file (e.g. elastic-package build
). Then there is a conflict with the changelog and version that need resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This will require a review from @elastic/obs-service-integrations.
@elastic/obs-service-integrations Hi guys ? have you had the time to look at this PR ? I'll fix the conflict asap |
This Pr is heavily inpired by the work done in [this PR](elastic/beats#4417) It is adressing [this ER](elastic/enhancements#14402). `Grok` pattern has been updated to match logs starting with a list of IP adresses and store all those ip in `apache.access.remote_ip`. This pattern is heavily insipred by the one in the nginx integration. I also decided to fill a new field `network.forwarded_ip` as it seems to be the perfect fit.
It want the expression ot be as lean as possible. I removed the ?: non capturing group - we want to capture it - It was ignored somehow I removed the %{NOTSPACE:source.address} match : - I am not sure why it is was there in the first place
As x-forwarded-for mostly send ip, grok pattern is updated, to match only ip. To improve searchability remote_ip type is set to ip
apache.access.remote_adresses contain all IP/HOST the request rebounded of X-Forwarded-For is optional, should be tacked a the end of the logs network.forwarded_ip exist only when X-Forwarded_for is found Merge 2 similar pattern in GROK processor
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
3681532
to
499515f
Compare
cd0748b
to
2c91c78
Compare
packages/apache/data_stream/access/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
@@ -30,6 +32,8 @@ | |||
name: log.level | |||
- external: ecs | |||
name: message | |||
- external: ecs | |||
name: network.forwarded_ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are setting the value of network.forwarded_ip in the ingest pipeline.
Do we need to use the ecs version of this here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have to.
If I do not I get this issue
FAILURE DETAILS:
apache/access test-access-basic.log:
[0] field "network.forwarded_ip" is undefined
while running this command elastic-package build && elastic-package stack up -d -v --services=elasticsearch && eval "$(elastic-package stack shellinit)" && elastic-package test pipeline --generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it to fields.yml then as assignment is happening under ingest pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in ddaed22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to undo ddaed22
packages/apache/data_stream/access/_dev/test/pipeline/test-access-basic.log
Show resolved
Hide resolved
ddaed22
to
3c9d133
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good !
What does this PR do?
Tldr;
Enable support for apache logs with the x_forwarded_for header.
Heavily inspired by the work done in this PR
Addressing this Enhancement Request.
Grok
pattern has been updated to match logs starting with a list of IP, storing it inapache.access.remote_ip
.This pattern is heavily inspired by the nginx integration.
A painless scrip extract the first public ip, and store the result in
source.address
.network.forwarded_ip
field is being filled bysource.ip
, as it seems to be the perfect fit.Checklist
changelog.yml
file.Author's Checklist
x_forward_for
?httpd.conf
?How to test this PR locally
Following the steps in the documentation
Related issues
None
Screenshots
None