Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update: Add support for x_forwarded_for headers in apaches access logs #3251

Merged

Conversation

redcinelli
Copy link
Contributor

@redcinelli redcinelli commented May 2, 2022

What does this PR do?

Tldr;
Enable support for apache logs with the x_forwarded_for header.

Heavily inspired by the work done in this PR
Addressing this Enhancement Request.

Grok pattern has been updated to match logs starting with a list of IP, storing it in apache.access.remote_ip.
This pattern is heavily inspired by the nginx integration.

A painless scrip extract the first public ip, and store the result in source.address.

network.forwarded_ip field is being filled by source.ip, as it seems to be the perfect fit.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • Do we want to support x_forward_for ?
  • Lots of noise in the test files, let's double check it is really backward compatible.
  • Simplify the Grok pattern, verify the Grok pattern
  • I am allowed to modify the httpd.conf ?

How to test this PR locally

Following the steps in the documentation

elastic-package stack down && elastic-package build && elastic-package stack up -d -v --services=elasticsearch && eval "$(elastic-package stack shellinit)" && elastic-package test pipeline --generate

Related issues

None

Screenshots

None

@elasticmachine
Copy link

elasticmachine commented May 2, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-26T12:12:53.719+0000

  • Duration: 16 min 6 sec

Test stats 🧪

Test Results
Failed 0
Passed 44
Skipped 0
Total 44

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented May 2, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (2/2) 💚
Files 50.0% (2/4) 👎 -47.11
Classes 50.0% (2/4) 👎 -47.11
Methods 70.0% (28/40) 👎 -19.623
Lines 70.234% (210/299) 👎 -21.514
Conditionals 100.0% (0/0) 💚

@redcinelli
Copy link
Contributor Author

I believe my pr to be good for review 🤞

At least open for discussion.

Btw seeing the whole mess the .json files are should I first do a pr to "update" the test then rebase this pr ?

@redcinelli redcinelli marked this pull request as ready for review May 16, 2022 14:37
@redcinelli redcinelli requested a review from a team as a code owner May 16, 2022 14:37
@redcinelli
Copy link
Contributor Author

@andrewkroh I believe it is worth taking a look again ^^

Thanks a lot for helping me.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing this from technical perspective on the code/config in the pipeline. I didn't really consider the big picture with respect to ECS mappings, the field meanings, and whether this is a breaking change.

Taking a step back and looking at the approach here, this looks like a change in behavior w.r.t. to how source is populated. IMO the source.address should be the IP (or hostname) where traffic came from (the last hop). That is %{c}h or %h or (barring any changes from things like mod_remoteip) in mod_log_config format. This would be the same source address observed if we looked at a packet capture.

The ECS network.forwarded_ip would be populated only when X-Forwarded-For is used because this field is only used when the source address represents a proxy. I think the first address in the X-Forwarded-For list would always be used to fill network.forwarded_ip.

I think the new apache.access.remote_ip_list field should be the x-forwarded-for list plus the remote hostname (%h value). This would mean that all the hops from the client through the last proxy are accounted for in the list. And given the reference info you pointed me to, perhaps the name of this field is better as apache.access.remote_addresses (data type keyword) so that we can take the raw values without caring if they are IPs.

In order to have both %h and %{X-Forwarded-For}i available in the log I think we should prescribe formats in the docs that it will accept. We want to continue accepting the current format as is and we also want to allow X-Forwarded-For to be added by users. So perhaps we should allow another header to be tacked on to the end like:

%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" X-Forwarded-For=\"%{X-Forwarded-For}i\"

@redcinelli
Copy link
Contributor Author

redcinelli commented May 18, 2022

Hi Andrew, so 2 new commits:

  • e3f15b9 to address your suggestions (null checks, weird try/catch ...)

  • 4abfdbb to try to address the bigger picture

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I think there is one thing missing -- docs. Users need to know how to configure there HTTPD service to output one of the formats the integration expects. Can you take a look at the readme (should be in packages/apache/_dev/...) and see if you can add a blurb about the accepted formats.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc additions look good. The package needs to be build to update the generated readme file (e.g. elastic-package build). Then there is a conflict with the changelog and version that need resolved.

packages/apache/_dev/build/docs/README.md Outdated Show resolved Hide resolved
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This will require a review from @elastic/obs-service-integrations.

@andrewkroh andrewkroh added the Team:Service-Integrations Label for the Service Integrations team label Jun 15, 2022
@redcinelli
Copy link
Contributor Author

redcinelli commented Jul 18, 2022

@elastic/obs-service-integrations Hi guys ? have you had the time to look at this PR ?

I'll fix the conflict asap

redcinelli and others added 13 commits July 18, 2022 11:29
This Pr is heavily inpired by the work done in [this PR](elastic/beats#4417)
It is adressing [this ER](elastic/enhancements#14402).

`Grok` pattern has been updated to match logs starting with a list of IP
adresses and store all those ip in `apache.access.remote_ip`.
This pattern is heavily insipred by the one in the nginx integration.

I also decided to fill a new field `network.forwarded_ip` as it seems to
be the perfect fit.
It want the expression ot be as lean as possible.
I removed the ?: non capturing group

- we want to capture it
- It was ignored somehow

I removed the %{NOTSPACE:source.address} match :

- I am not sure why it is was there in the first place
As x-forwarded-for mostly send ip, grok pattern is updated, to match
only ip.

To improve searchability remote_ip type is set to ip
apache.access.remote_adresses contain all IP/HOST the request rebounded
of

X-Forwarded-For is optional, should be tacked a the end of the logs

network.forwarded_ip exist only when X-Forwarded_for is found

Merge 2 similar pattern in GROK processor
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
@redcinelli redcinelli force-pushed the apache/support_x_forwarded_for_headers branch from 3681532 to 499515f Compare July 18, 2022 09:34
@redcinelli redcinelli force-pushed the apache/support_x_forwarded_for_headers branch from cd0748b to 2c91c78 Compare July 29, 2022 12:34
@@ -30,6 +32,8 @@
name: log.level
- external: ecs
name: message
- external: ecs
name: network.forwarded_ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are setting the value of network.forwarded_ip in the ingest pipeline.
Do we need to use the ecs version of this here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have to.

If I do not I get this issue

FAILURE DETAILS:
apache/access test-access-basic.log:
[0] field "network.forwarded_ip" is undefined

while running this command elastic-package build && elastic-package stack up -d -v --services=elasticsearch && eval "$(elastic-package stack shellinit)" && elastic-package test pipeline --generate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it to fields.yml then as assignment is happening under ingest pipeline

Copy link
Contributor Author

@redcinelli redcinelli Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in ddaed22

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to undo ddaed22

@redcinelli redcinelli force-pushed the apache/support_x_forwarded_for_headers branch from ddaed22 to 3c9d133 Compare August 26, 2022 12:12
Copy link
Contributor

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good !

@redcinelli redcinelli merged commit 176203b into elastic:main Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Team:Service-Integrations Label for the Service Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants