Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support input transform pipeline on raw log lines #148

Open
bkcsfi opened this issue Dec 14, 2018 · 2 comments
Open

support input transform pipeline on raw log lines #148

bkcsfi opened this issue Dec 14, 2018 · 2 comments

Comments

@bkcsfi
Copy link

bkcsfi commented Dec 14, 2018

I am running a web server in a docker container, managed by containerpilot inside the container and deployed via docker stack

containerpilot captures stdout from the web server and prefixes it with additional fields before writing to it's own stdout, which logagent eventually processes

e.g a sample output line as seen by logagent and recorded as 'message' in elk

2018-12-14T16:22:40.497203741Z quote-server 20 10.0.6.16 - - [14/Dec/2018:16:22:40 +0000] "GET /-/metrics?module=http_2xx&target=10.0.6.24%3A8000 HTTP/1.1" 200 12485 "-" "Prometheus/2.5.0"

or, as seen by docker logs command

2018-12-14T20:54:35.108408562Z 2018-12-14T20:54:35.108156363Z quote-server 20 127.0.0.1 - - [14/Dec/2018:20:54:35 +0000] "GET /-/health HTTP/1.1" 200 16 "-" "curl/7.52.1"

I can handle this situation by copying the existing httpd pattern and adding additional fields corresponding to datestamp, process name, and loglevel.

However I will have other processes deployed in a similar manner that won't be web servers. Rather than writing custom plugins just to handle additional fields prefixed by containerpilot, I wonder if it would be possible to have some kind of globalTransform that runs BEFORE existing pattern matching, on non-json input.

This could be something like the grep input filter, except adding the ability to:

a. transform the data before passing it on to the callback for subsequent processing

b. optionally capturing meta-data in this stage of the pipeline (such as loglevel, in my example)

In some sense, this could be a group of transforms that try to match the raw input line, the first input transform that "matches" would be the only one that gets to alter the raw input line, and then processing continues just as it does now with input filters and patterns.

maybe this can also handle the case where a container application outputs json, which needs to be split back into regular text for pattern match (e.g transform from json to 'text'), or even nested json extraction, etc..

@megastef
Copy link
Contributor

megastef commented Dec 17, 2018

I did not know containerpilot.

+1 To create some input filter for container pilot. The input filter could set a log context object, and an output filter could add the fields from log context back to the log message object. Just an idea to implement what you want with the existing Logagent mechanisms.

If you have more questions, please feel free to reach out ...

@megastef
Copy link
Contributor

megastef commented Apr 4, 2019

@bkcsfi We did recently something similar, parsing containerd log headers.
See the plugin code: https://github.com/sematext/logagent-js/blob/master/lib/plugins/input-filter/kubernetesContainerd.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants