-
Notifications
You must be signed in to change notification settings - Fork 39.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes logging, journalD, fluentD, and Splunk, oh my! #24677
Comments
cc/ @vishh |
@dcowden Thanks for picking this up! The requirements for a logging solution is described here and
Once fluentd is installed as a DaemonSet, we can avoid having to include the logging framework in kube-up.sh and instead launch fluentd as a DaemonSet first and then launch Splunk subsequently. As I said earlier, there is no necessity to add journald to the mix here.
kubernetes metadata like labels, pod and container name/UID will be necessary for pushing the logs to a cluster level solution.
Can this be achieved through option
My experience has been that users are not comfortable with relying completely on cluster level logging solutions. it is still useful to keep some amount of recent container and system logs on the node, in case there are network issues in the cluster.
Personally, I wish we can skip docker and have containers log directly to files. We support attach and overriding stdout/stderr is currently not possible.. cc @kzk @edsiper who were interested in helping with fluentd integration in the past |
Ok thanks for the thoughts!
Good point. Though it feels 'clean' to simply send everything to the cloud we also have found that in practice it is more reliable to log to files and then consume those. But isn't this a problem with the fluentD approach too? k8s uses a network connection to send logs to fluentd right? What happens if the fluentd container dies?
Do you mean have the container log to files on the container filesystem, or on the node filesystem via a volume mount?
I do not understand what you mean.
What is the value of fluentd in this case then? If we are agreed that logging to the filesystem is best, then the most simple and straighforward strategy is to skip fluentd completely, and simply have all of the containers mount a volume on the node filesystem and dump logs there. Then, splunk, or some other forwarder can send the logs on to another host.. |
The current setup is that containers writes log files to the file system and Fluentd parse them. This approach have some benefits and also some performance penalties as some people mentioned earlier. I would like to know what's the general opinion about how to handle the logs, currently we have two choices:
Comments are welcome |
Not to add confusion/more questions, but I would also like to understand what everything thinks is the 'right' way for applications to behave. Shouldl applications:
I really do not like option 3. I understand its benefits. But in our environment, we are trying really really hard to avoid reasons for people to need to get interactive sessions on containers. Plus, it leaves the door open for a wild-west of file locations that everyone has to agree on. Options 1 and 2 make a clear, consistent pattern containers must follow, which i think is worth the trade-off that a given container might crash. I think either option 1 or 2 is 'reliable enough', given that the fluentd instance is on the same network as the container. Regarding options 1 and 2, my experience has told me that you never completely eliminate the need to capture stderr/stdout. There's always something you missed, 3rd party software that isnt fluentd aware, etc. So while i like the consistency of option 2, i feel like in practice option 1 is the most likely to actually work. IE, option 2 really becomes option 1 AND option 2. What do people think about this? |
In experimenting today, I have come up with a problem sending everything to stdout/stderr. I'm running a tomcat application, so I have at least 3 distinct types of logs: (1) my application logs, which are currently formatted for splunk, but can be done any way i wish In my current production(non-docker) systems, these all go to different files. Then, when they are sent to splunk, they are processed using different parsers based on the location. This way, we can parse the logs correctly. If we use the 'send all of the logs to stdout/stderr' strategy this becomes much more difficult. I need one 'super parser' that can understand which type of log it is, which is a real nightmare. Further, even if I solve that, I have another problem: fluentd encapsulates my messages inside of a JSON payload. but what if my application wanted to send fields ( like the java classname and log priority)? I cannot easily do this if my log statement will be pressed in the MESSAGE field of a JSON payload. What this is telling me, is that though ending 'all' of the logs from a container to stderr/stdout sounds great, in practice it will not work well. Which leads you back to either writing to fluentd or writing to files on the container filesystem. fluentd is more standardized, but will make applications fail really badly if fluentd stops working. |
Since we are storing logs to the disk first before shipping off the node using fluentd, it is ok for fluentd to be temporarily offline. To be safe, we can keep fluentd running as it does today, where kubelet has the docker daemon sending logs to disk directly.
As of now containers log to stdout & stderr. Initially, we should redirect those fd's to files on the disk. Those files can be on volumes, or on directories managed by kubelet. They should not be on the rootfs of the container ideally.
What I meant was that it is not possible to redirect stdout/stderr of a container to a file because we support
We are in agreement here. What you describe will be an ideal goal. Until we reach that goal, we can continue having docker engine log to disk. To guarantee stability, I'd prefer applications running in kubernetes to have least amount of dependencies. So logging from the applications to local disk, either directly or via docker daemon, is preferrable over say logging to fluentd first, before writing out to local disk. |
@dcowden |
@vishh But the 'log format' question is a tougher one, that to some extend drives the implementation. Let's define a 'logging source' as a source of logs having a distinct format. Lets also assume that each logging source is communicating a group of fields. If we assume that a given pod ( or even a single container) can have multiple logging sources generating logs, then we know when they are read, the reader must have a way to intelligently deal with those logs. I suppose fluentd might be one such intelligent reader. At that point in the flow, two things must happen:
1 and 2 are not trivial at all. So i guess combining all of this, the best strategy is:
I'm not clear at all on how exactly the steps above would work, because i dont know the k8s internals. would it be best for k8s to consume files from fluentd? or would it be best for k8s to natively get the files, then massage/parse them,? |
From a design perspective, I'd prefer k8s not interpreting log data. k8s can (should) help manage logs and also provide means to add additional metadata around logs, like logging format, application type etc. So altering your proposal a bit,
|
Yes, I like that. I would further propose several other small additions:
We have several choices about how to deal with this: I'm not sure what type of log metadata you have in mind, but I think option (2b) above would be a mistake. the metadata for a single log file is hard enough-- trying to describe a single file that has more than one format in it sounds nightmarish. I think (2c) is most straightforward, but i'm afraid it would be viewed as too opinionated, so i feel like 2a would probably be best. I think we need more discussion on what the metadata format is. Fluentd already has tons of functionality to parse logs, so i assume we'd be talking something like
or are you thinking about supporting all of the stuff necessary to process an arbitrary log format into fields |
@dcowden I'd extend the API discussed here to include an Annotation field. spec:
volumes:
- name mylogs
emptyDir: {}
containers:
- name: foo
volumeMounts:
name: mylogs
path: /var/log/
policy:
logsDir:
subDir: foo
glob: *.log
rotate: Daily
annotations:
"fluentd-config": "actual fluentd configuration"
- name: bar
volumeMounts:
name: mylogs
path: /var/log/
policy:
logsDir:
subDir: bar
glob: *.log
rotate: Hourly
annotations:
"fluentd-config": "actual fluentd configuration" |
Excellent! Ok I think we have a good diagram of how to proceed then. There are details left, but this discussion definitely gives me a good idea of how i should proceed while being compatible with what is to come later. How can I help? I'm afraid I will not be able to help much. I'm a complete Noob at Go, and i'm not very experienced with the codebase. But given that, if I can be of assistance, let me know and I will help if I can! |
Ok I was thinking about this more, and I have a nagging 'detail' that I am worried could become a big deal. The problem: some apps generate messages that cannot be enriched without parsing themConsider this JVM gc logging format:
It is non-trivial to convert this into a sane, fielded message. Tools like fluentd and splunk can do it, with some work. We've agreed that we do NOT want k8s to get into the parsing businesses. The current state is broken, but its not kubernetes' fault.Today, docker journald and json drivers encapsulate this type of message into a json message.
Both fluentd and splunk will have a hard time with this type of message. It is now an 'enveloped' format. Getting the top-layer, json fields out of it is easy, but getting those AND then dealing with the structure of the internal message is not easy at all. Currently most of our applications are splunk optimized, so we have messages that look like this:
And then end up enveloped like this:
K8S isn't making things worse, and could easily add fields into these encapsulated messages since the outer message is json. But the situation still stinks. How can we collect the logs, while also letting k8s and docker add fields along the way?
Of these options, I recommend option 4. It stinks to have k8s doing some log parsing, but that's better than accepting weird problems when the fluentd daemon goes down, or accepting that its really hard to parse logs downstream. I think eventually option 1 is the right answer in the long term. I think nearly everyone in the logging community is in favor of json log messages. Option 4 would allow people/systems to transition to json over time. I'm open to other options. Sorry for the long post. |
@vishh definitely! |
Parsing the log format of docker daemon is a general problem. For the longer term, we should work towards having containers log to disks directly, thereby not clobbering log lines. FYI: @edsiper works on fluentd :) |
@vishh I am not sure how Docker containers are launched but I think Kubernetes may add it metadata when starting it, e.g: https://docs.docker.com/engine/admin/logging/overview/ If we always use the JSON file format, custom fields can be added as attributes, e.g:
generates:
then Fluentd can tail the JSON files in the old fashion way. |
@edsiper That sounds like a good short-term solution. In the long term though, we'd like to not use docker daemon for logs at all, as I mentioned earlier. That means that there will be metadata added to each log line. Instead, we will have to define a new k8s plugin that can add metadata to one or more log files. @jimmidyson how do we add k8s metadata to logs now? Is #24677 (comment) something that we should look into? |
@vishh adding metadata to log lines sounds tricky without knowing the log format. How do you handle multiline logs correctly? |
@dcowden Are you referring to existing docker support for additional metadata in logs or the long-term solution I was alluding to? |
@vishh I was referring to the long term plan, when the docker daemon is no longer in the picture. In the long term plan, containers will be logging to a location on the filesystem that is specified in the k8s metadata as a logging volume. k8s knows these files are logs, and knows that it should add metadata to them. they are in whatever format the application is using, which could include multi-line stack traces. How would k8s add metadata to each line in that case, without knowing if it is a multi-line log format or not? And, would format would it use to append data, given that it wouldnt be consistent with the format of the existing log format? |
k8s need not add metadata to each line. It just needs to make the metadata available for logging daemons to consume along with log files. One option would be to create metadata files inside each log volume in a curated format and have logging daemons associate log lines with metadata appropriately. |
Oh I see ok! How do you envision logging daemons getting the logs and metadata? Would the k8s API provide an endpoint that logging daemons consume, or do the logging daemons read the data directly off the filesystem? I'm sorry to be asking so many questions, I'm needing to set my system up and I'd like to do it as similarly as possible to what will eventually become the 'right' way |
@vishh Kubernetes metadata is added to all log events by our fluentd kubernetes metadata filter. Currently this relies on extracting info from the log file name. Could you link to details on the long term plan, bypassing docker for streaming logs as you describe above? Multiline logs are really tricky to handle in a generic way across platforms/languages/frameworks. This is one reason that structured logging is highly preferred, the basic JSON log that Docker provides makes some of this a bit easier. |
@jimmidyson the structured json logs in docker makes it harder too, when the logs were already structured before docker got ahold of them |
@dcowden if k8s is sitting in the middle between the container and journald (eg. if the runtime doesn't inject directly to journald) it can just attach additional fields before writing to journald. If k8s in not sitting in the log hotpath (ie. the runtime can inject directly to journald), it can just ask the runtime to do it. |
This item has generated a lot of discussion and interest--thanks to all who have helped! I've created a document at the request of the sig-instrumentation group that more clearly describes the requirements, as well as the comments/thoughts of the discussion in this thread https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q |
I think it's worth mentioning that you can create a container to send all the docker logs to splunk - which would include all kubernetes container logs: http://blogs.splunk.com/2015/08/24/collecting-docker-logs-and-stats-with-splunk/ |
Thanks for the link @dmcnaught, but unfortunately that technique still leaves a lot of gaps which Since that blog post, Splunk has shipped an official Docker image that includes a pre-configured Docker monitor (hub.docker.com/splunk/splunk:6.5.0-monitor). This, combined with a daemonset and some file monitors for your apps other output do get you pretty far towards an end to end solution. Hal @ Splunk |
Thanks @halr9000 - I was looking for a better article from Splunk that describes what we've done, but I didn't find one. We have done something very similar to this (https://hub.docker.com/r/splunk/universalforwarder/), and we also add our certs to encrypt data to our splunk server. We run as a k8s daemonset. |
If it was in Oct @dmcnaught, then no doubt he'd submitted to an earlier unofficial image (was under github.com/outcoldman) that unfortunately had to be pulled prior to the official one being published. Changes were quashed, which made us very sad. |
For those who might be interested, we recently updated the k8s documentation regarding logging: https://kubernetes.io/docs/user-guide/logging/overview/ This new page describes the current situation with the logging in Kubernetes and lists solution from the most preferable in most cases to the least preferable in most cases. It also includes an example of a sidecar container streaming logs from files to its It's worth mentioning that we also plan to document how to configure node-level logging agent in 1.6. |
the stack-driver has a new sink export to pubsub, this is a very elegant solution: |
Wow thanks for the information. I was looking for long time for this answer in Splunk Administration and i need some information about this.
|
FYI for people interested in Splunk + Kubernetes. We just published first version of our application "Monitoring Kubernetes" (https://splunkbase.splunk.com/app/3743/) and collector (https://www.outcoldsolutions.com). Please take a look on our manual how to get started https://www.outcoldsolutions.com/docs/monitoring-kubernetes/ |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
My K8S->Splunk logging is currently broken by issues in the (mostly inactive) https://github.com/brycied00d/fluent-plugin-splunk-http-eventcollector plugin, but I notice that Splunk has https://github.com/splunk/fluent-plugin-splunk-hec as well as https://github.com/splunk/splunk-connect-for-kubernetes so I'm going to give one of those a try. |
…up-dynamiccert-pkg-4.4 Bug 1812877: fixes configmap "extension-apiserver-authentication" not found Origin-commit: b80bcf408be3057add5ebc9b38274bd7b15259a1
You should use Fluent Bit: |
In order to run k8s in our environment, I need to get logging information into Splunk. I need to pick up node logs AND application logs. I know at this point there is no drop-in solution, so I know i'll need to hack something together.
What I'm seeking in this issue is not a 'here's the solution', though that would be great of course.
I'm seeking advice on which of several solutions I've come up with are most likely to work, given how Kubernetes is evolving.
Research and references
Issue #1071 seems to have resulted in the current ability to choose 'elasticsearch' as a provider in kube-up.sh
Issue #17183 seems to indicate that the future is not yet clear.
Issue #23782 contemplates some limitations of the current approach also
Issue #21285 provides a proxy to send data to AWS instead of ES
Based on many sources, it seems clear that systemd/journalctl is the future of logging in *nix. latest versions of RHEL/Centos/Fedora and Ubuntu have moved this way.
While i'm of course willing to hack, i'd really rather avoid needing to re-build the k8s images if possible.
Option 1 -- k8s->fluentd-->splunk
I could use fluentd, and then use a fluentd splunk plugin. If I use kube-up.sh with KUBERNETES_LOGGING_DESTINATION='elasticsearch', I get a fluentd/ES setup. There are plugins for fluentd that can forward content on to splunk.
But i do not know how i would configure k8s to NOT install elasticsearch as a part of startup. I also think this represents an extra layer of forwarding i'd rather avoid. With this solution, I think i would need to send all application output to STDOUT/STDERR, since this is how k8s currently gathers stuff. This solution does not use journald, which causes me to think this option is a doomed-to-change-in-the-near-future. Which leads me to option 2
Option 2-- k8s->journald, docker->journald
I could forward all logs ( from the nodes and pods too ) to journalD using the docker journald log driver, and then capture data out of the journald logs and send to splunk from there. Honestly this seems like the 'right' solution. Why re-invent logging capture? If not already true, any strong security setup is going to require use of centralized logging capture, and that will need to be based on journald.
There are two problems with this, though:
Option 3 -- docker -> k8s -> journald
Logging directly from docker to journald means there isnt a chance to add k8s meta. What might be the coolest option is if k8s provided a docker logging driver that then proxies the data and sends it to journald. That way, as a system administrator, i just need to go to one place to get all my logs-- the journald service on each node. But i still get the k8s metadata too.
I have no reasonable ability to execute this option though, it would be a big change. But i'd be willing to help if this is a good way to do it.
Option 4-- splunk docker log driver
Splunk has an experimental log driver I could use. But it doesnt allow k8s to see the data, or to enrich it ( IE, it will certainly break k8s logs ). This might work as a workaround, but it is not appealing.
http://blogs.splunk.com/2015/12/16/splunk-logging-driver-for-docker/
Closing Thoughts
If option 1 is the best, I could use some ideas about how i could get K8s to start up fluentD, but NOT start elasticsearch.
If option 2 is best, I could use some help pointing me in the right direction about how to avoid breaking kubernetes when i configure the docker daemons on the nodes to send logs to journald instead of the json logs.
if option 3 is best, I need some pointers on how I could contribute.
Thanks for insights you can offer.
The text was updated successfully, but these errors were encountered: