Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes logging, journalD, fluentD, and Splunk, oh my! #24677

Closed
dcowden opened this issue Apr 22, 2016 · 114 comments
Closed

Kubernetes logging, journalD, fluentD, and Splunk, oh my! #24677

dcowden opened this issue Apr 22, 2016 · 114 comments
Labels
area/extensibility area/kubelet-api area/logging lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@dcowden
Copy link

dcowden commented Apr 22, 2016

In order to run k8s in our environment, I need to get logging information into Splunk. I need to pick up node logs AND application logs. I know at this point there is no drop-in solution, so I know i'll need to hack something together.

What I'm seeking in this issue is not a 'here's the solution', though that would be great of course.

I'm seeking advice on which of several solutions I've come up with are most likely to work, given how Kubernetes is evolving.

Research and references

Issue #1071 seems to have resulted in the current ability to choose 'elasticsearch' as a provider in kube-up.sh

Issue #17183 seems to indicate that the future is not yet clear.

Issue #23782 contemplates some limitations of the current approach also

Issue #21285 provides a proxy to send data to AWS instead of ES

Based on many sources, it seems clear that systemd/journalctl is the future of logging in *nix. latest versions of RHEL/Centos/Fedora and Ubuntu have moved this way.

While i'm of course willing to hack, i'd really rather avoid needing to re-build the k8s images if possible.

Option 1 -- k8s->fluentd-->splunk

I could use fluentd, and then use a fluentd splunk plugin. If I use kube-up.sh with KUBERNETES_LOGGING_DESTINATION='elasticsearch', I get a fluentd/ES setup. There are plugins for fluentd that can forward content on to splunk.

But i do not know how i would configure k8s to NOT install elasticsearch as a part of startup. I also think this represents an extra layer of forwarding i'd rather avoid. With this solution, I think i would need to send all application output to STDOUT/STDERR, since this is how k8s currently gathers stuff. This solution does not use journald, which causes me to think this option is a doomed-to-change-in-the-near-future. Which leads me to option 2

Option 2-- k8s->journald, docker->journald

I could forward all logs ( from the nodes and pods too ) to journalD using the docker journald log driver, and then capture data out of the journald logs and send to splunk from there. Honestly this seems like the 'right' solution. Why re-invent logging capture? If not already true, any strong security setup is going to require use of centralized logging capture, and that will need to be based on journald.

There are two problems with this, though:

  1. doing it this way will not allow kubernetes metadata to be included in the log stream, and
  2. I believe this will break kubectl logs, because it relies on the docker json logs.

Option 3 -- docker -> k8s -> journald

Logging directly from docker to journald means there isnt a chance to add k8s meta. What might be the coolest option is if k8s provided a docker logging driver that then proxies the data and sends it to journald. That way, as a system administrator, i just need to go to one place to get all my logs-- the journald service on each node. But i still get the k8s metadata too.

I have no reasonable ability to execute this option though, it would be a big change. But i'd be willing to help if this is a good way to do it.

Option 4-- splunk docker log driver

Splunk has an experimental log driver I could use. But it doesnt allow k8s to see the data, or to enrich it ( IE, it will certainly break k8s logs ). This might work as a workaround, but it is not appealing.

http://blogs.splunk.com/2015/12/16/splunk-logging-driver-for-docker/

Closing Thoughts

If option 1 is the best, I could use some ideas about how i could get K8s to start up fluentD, but NOT start elasticsearch.

If option 2 is best, I could use some help pointing me in the right direction about how to avoid breaking kubernetes when i configure the docker daemons on the nodes to send logs to journald instead of the json logs.

if option 3 is best, I need some pointers on how I could contribute.

Thanks for insights you can offer.

@pwittrock pwittrock added sig/node Categorizes an issue or PR as relevant to SIG Node. area/logging labels Apr 22, 2016
@dchen1107
Copy link
Member

cc/ @vishh

@vishh
Copy link
Contributor

vishh commented Apr 22, 2016

@dcowden Thanks for picking this up!

The requirements for a logging solution is described here and journald does not satisfy some of the requirements.
The fact that *nix is moving to journald doesn't mean that k8s has to embrace it right? k8s is providing cluster management and I feel it is not necessary to treat all the containers that k8s runs as system daemons. The latter can use journald. k8s for example, could even run on diskless machines by shipping logs directly from the container to a cluster level logging provider (hosted splunk or elasticsearch for example).

Option 1 -- k8s->fluentd-->splunk

Once fluentd is installed as a DaemonSet, we can avoid having to include the logging framework in kube-up.sh and instead launch fluentd as a DaemonSet first and then launch Splunk subsequently. As I said earlier, there is no necessity to add journald to the mix here.

Option 2-- k8s->journald, docker->journald

kubernetes metadata like labels, pod and container name/UID will be necessary for pushing the logs to a cluster level solution.
journald is not designed for container workloads. K8s will want to restrict disk space and bandwidth per container/pod and rotate logs for each container. journald does not provide such options.

Option 3 -- docker -> k8s -> journald

Can this be achieved through option 1 by having fluentd output to journald as well. In any case, the problems I mentioned with journald will have to be addressed.

Option 4-- splunk docker log driver

My experience has been that users are not comfortable with relying completely on cluster level logging solutions. it is still useful to keep some amount of recent container and system logs on the node, in case there are network issues in the cluster.

Closing Thoughts

Personally, I wish we can skip docker and have containers log directly to files. We support attach and overriding stdout/stderr is currently not possible..
Among the options your proposed, I prefer Option 1 with some changes. If we can have fluentd output to local disk by default, in addition to supporting splunk and ELK that will satisfy most of the requirements. Fluentd can add kubernetes metadata and kubelet can manage logs on disk for running and dead containers.

cc @kzk @edsiper who were interested in helping with fluentd integration in the past

@dcowden
Copy link
Author

dcowden commented Apr 23, 2016

Ok thanks for the thoughts!

it is still useful to keep some amount of recent container and system logs on the node, in case there are network issues in the cluster.

Good point. Though it feels 'clean' to simply send everything to the cloud we also have found that in practice it is more reliable to log to files and then consume those. But isn't this a problem with the fluentD approach too? k8s uses a network connection to send logs to fluentd right? What happens if the fluentd container dies?

Personally, I wish we can skip docker and have containers log directly to files.

Do you mean have the container log to files on the container filesystem, or on the node filesystem via a volume mount?

We support attach and overriding stdout/stderr is currently not possible..

I do not understand what you mean.

If we can have fluentd output to local disk by default, in addition to supporting splunk and ELK that will satisfy most of the requirements.

What is the value of fluentd in this case then? If we are agreed that logging to the filesystem is best, then the most simple and straighforward strategy is to skip fluentd completely, and simply have all of the containers mount a volume on the node filesystem and dump logs there. Then, splunk, or some other forwarder can send the logs on to another host..

@edsiper
Copy link
Contributor

edsiper commented Apr 23, 2016

The current setup is that containers writes log files to the file system and Fluentd parse them. This approach have some benefits and also some performance penalties as some people mentioned earlier. I would like to know what's the general opinion about how to handle the logs, currently we have two choices:

  • Container writes logs to Fluentd over the network (Docker provides a native Fluentd logging driver). Then Fluentd may make two things: write them to the file system and also flush the records to some destination (Elasticsearch), this approach is pretty common and works properly and you improve performance.
  • Write logs to the file system and have Fluentd consume them.

Comments are welcome

@dcowden
Copy link
Author

dcowden commented Apr 23, 2016

Not to add confusion/more questions, but I would also like to understand what everything thinks is the 'right' way for applications to behave. Shouldl applications:

  1. write to stderr/stdout
  2. write to a fluentd instance on a known address ( but assumed to be on the same node for reliability )
  3. write to files in the container filesystem

I really do not like option 3. I understand its benefits. But in our environment, we are trying really really hard to avoid reasons for people to need to get interactive sessions on containers. Plus, it leaves the door open for a wild-west of file locations that everyone has to agree on.

Options 1 and 2 make a clear, consistent pattern containers must follow, which i think is worth the trade-off that a given container might crash. I think either option 1 or 2 is 'reliable enough', given that the fluentd instance is on the same network as the container.

Regarding options 1 and 2, my experience has told me that you never completely eliminate the need to capture stderr/stdout. There's always something you missed, 3rd party software that isnt fluentd aware, etc. So while i like the consistency of option 2, i feel like in practice option 1 is the most likely to actually work. IE, option 2 really becomes option 1 AND option 2.

What do people think about this?

@dcowden
Copy link
Author

dcowden commented Apr 25, 2016

In experimenting today, I have come up with a problem sending everything to stdout/stderr.

I'm running a tomcat application, so I have at least 3 distinct types of logs:

(1) my application logs, which are currently formatted for splunk, but can be done any way i wish
(2) the tomcat logs, which can be controlled via tomcat configuration files to some extent, but it is kind of a pain
(3) the java gc logs, over which I have no control whatsoever

In my current production(non-docker) systems, these all go to different files. Then, when they are sent to splunk, they are processed using different parsers based on the location. This way, we can parse the logs correctly.

If we use the 'send all of the logs to stdout/stderr' strategy this becomes much more difficult. I need one 'super parser' that can understand which type of log it is, which is a real nightmare.

Further, even if I solve that, I have another problem: fluentd encapsulates my messages inside of a JSON payload. but what if my application wanted to send fields ( like the java classname and log priority)? I cannot easily do this if my log statement will be pressed in the MESSAGE field of a JSON payload.

What this is telling me, is that though ending 'all' of the logs from a container to stderr/stdout sounds great, in practice it will not work well.

Which leads you back to either writing to fluentd or writing to files on the container filesystem.

fluentd is more standardized, but will make applications fail really badly if fluentd stops working.
using container files is more reliable, but it becomes hard to figure out where all of the files are in a given container that you need to monitor. this is a pain too.

@vishh
Copy link
Contributor

vishh commented Apr 29, 2016

@dcowden

But isn't this a problem with the fluentD approach too? k8s uses a network connection to send logs to fluentd right? What happens if the fluentd container dies?

Since we are storing logs to the disk first before shipping off the node using fluentd, it is ok for fluentd to be temporarily offline. To be safe, we can keep fluentd running as it does today, where kubelet has the docker daemon sending logs to disk directly.

Do you mean have the container log to files on the container filesystem, or on the node filesystem via a volume mount?

As of now containers log to stdout & stderr. Initially, we should redirect those fd's to files on the disk. Those files can be on volumes, or on directories managed by kubelet. They should not be on the rootfs of the container ideally.
Many applications might want to write to files instead of stdout/stderr. In such cases, we need Kubernetes API changes to let users specify the logging directories, and have kubelet manage them.

We support attach and overriding stdout/stderr is currently not possible..
I do not understand what you mean.

What I meant was that it is not possible to redirect stdout/stderr of a container to a file because we support attach feature.

What is the value of fluentd in this case then? If we are agreed that logging to the filesystem is best, then the most simple and straighforward strategy is to skip fluentd completely, and simply have all of the containers mount a volume on the node filesystem and dump logs there. Then, splunk, or some other forwarder can send the logs on to another host..

We are in agreement here. What you describe will be an ideal goal. Until we reach that goal, we can continue having docker engine log to disk.

@edsiper

To guarantee stability, I'd prefer applications running in kubernetes to have least amount of dependencies. So logging from the applications to local disk, either directly or via docker daemon, is preferrable over say logging to fluentd first, before writing out to local disk.

@vishh
Copy link
Contributor

vishh commented Apr 29, 2016

@dcowden
#24677 (comment) surfaces one of the major flaws with logging to stdout/stderr.
#13010 was an attempt to define logging volumes as a first class feature. If we have pods express their logging configuration, files/directories & format, via a logging volumes, we can then have fluentd and other such agents parse the logs files using the config attached to the logging volume.

@dcowden
Copy link
Author

dcowden commented Apr 29, 2016

@vishh
I skimmed #13010. Most of that discusses the implementation of the LogDir,

But the 'log format' question is a tougher one, that to some extend drives the implementation. Let's define a 'logging source' as a source of logs having a distinct format. Lets also assume that each logging source is communicating a group of fields.

If we assume that a given pod ( or even a single container) can have multiple logging sources generating logs, then we know when they are read, the reader must have a way to intelligently deal with those logs. I suppose fluentd might be one such intelligent reader. At that point in the flow, two things must happen:

  1. The logs must be converted into a stream of events, each event having fields
  2. the k8s metadata must be added to the stream as well, as a set of other fields

1 and 2 are not trivial at all.

So i guess combining all of this, the best strategy is:

  1. containers write logs to multiple files in a logging directory.
    1a. if possible, containers use self-describing formats like json
    1b. containers that cannot ( like java processes sending out GC logs ) must accept some way to describe how to parse the log into events
  2. k8s can grab the logs, use the information from 1b to convert to fielded events
    2a then, enrich the data with k8s fields
    2b, then, send it on to fluentd.

I'm not clear at all on how exactly the steps above would work, because i dont know the k8s internals. would it be best for k8s to consume files from fluentd? or would it be best for k8s to natively get the files, then massage/parse them,?

@vishh
Copy link
Contributor

vishh commented Apr 29, 2016

From a design perspective, I'd prefer k8s not interpreting log data. k8s can (should) help manage logs and also provide means to add additional metadata around logs, like logging format, application type etc.

So altering your proposal a bit,

  1. Containers should log to either stdout/stderr, or to logging volumes.
  2. Logging volumes will include additional metadata that will help logging agents like fluentd parse log files.
  3. Logging volumes can also include retention & log rotation policies. Kubelet can perform log rotation and cleanup log volumes based on retention policy. For example, log files that are super critical can be retained by the kubelet, until all the log files have been read by fluentd and log files are emptied.
  4. Fluentd (or any other agent) can track these log volumes per pod, either via REST APIs or a filesystem protocol, and use the metadata attached with log volumes to parse the log files. Kubelet can create log volumes in a well-known directory on the host, along with log volume metadata stored within those log volumes, for example.

@dcowden
Copy link
Author

dcowden commented Apr 29, 2016

Yes, I like that. I would further propose several other small additions:

  1. The additional metadata described in step 2 should also be possible to associate with the stderr/stdout stream. That way, you're not a second class citizen from a metadata viewpoint if stderr/stdout works well for you.
    1. I'd hazard a guess that anybody using a JVM, and anyone using even remotely complex containers, will ultimately not be well served by using stderr/stdout, because there are frequently > 1 types of logs mixed together.

We have several choices about how to deal with this:
(a) Recommend only using stderr/stdout when you know you have one type of log output, and you can describe it with very simple metadata, and recommend strongly that if you have more than one kind of log, you should use filesystem.
(b) Create a log format metadata format complex enough to represent the notion of a single stream that has more than one type of log in it.
(c) Do not support stderr/stdout. use files, and if we need to support stdin/stderr, simply redirect those streams to files.

I'm not sure what type of log metadata you have in mind, but I think option (2b) above would be a mistake. the metadata for a single log file is hard enough-- trying to describe a single file that has more than one format in it sounds nightmarish.

I think (2c) is most straightforward, but i'm afraid it would be viewed as too opinionated, so i feel like 2a would probably be best.

I think we need more discussion on what the metadata format is. Fluentd already has tons of functionality to parse logs, so i assume we'd be talking something like

{
     path : '/var/logs/gc.log'
     format:  'JVM-GC'   <-- this is a pointer to a fluentd configuration that knows how to parse this log
     retentionPolicy: {
          'type': 'size'
          'value': '10M'
     },
    rotationPolicy: {
         'type': 'postfix',
        'gzip': true,
        'suffix':  '.%d',
    }
}

or are you thinking about supporting all of the stuff necessary to process an arbitrary log format into fields

@vishh
Copy link
Contributor

vishh commented Apr 29, 2016

@dcowden
+1 for associating metadata with stdout/stderr, in addition to log files & directories.
Option 2a is what makes sense to me as well.
As for metadata specific to logging daemons, since core kube components will not be interpreting that data, we can pass them along using Annotations on log volumes.

I'd extend the API discussed here to include an Annotation field.

spec:
    volumes:
        - name mylogs
          emptyDir: {}
    containers:
        - name: foo
          volumeMounts:
              name: mylogs
              path: /var/log/
              policy:
                  logsDir:
                      subDir: foo
                      glob: *.log
                      rotate: Daily
                      annotations:
                          "fluentd-config": "actual fluentd configuration" 
        - name: bar
          volumeMounts:
              name: mylogs
              path: /var/log/
              policy:
                  logsDir:
                      subDir: bar
                      glob: *.log
                      rotate: Hourly
                      annotations:
                          "fluentd-config": "actual fluentd configuration" 

@vishh
Copy link
Contributor

vishh commented Apr 29, 2016

@thockin

@dcowden
Copy link
Author

dcowden commented Apr 29, 2016

Excellent! Ok I think we have a good diagram of how to proceed then. There are details left, but this discussion definitely gives me a good idea of how i should proceed while being compatible with what is to come later.

How can I help? I'm afraid I will not be able to help much. I'm a complete Noob at Go, and i'm not very experienced with the codebase. But given that, if I can be of assistance, let me know and I will help if I can!

@vishh
Copy link
Contributor

vishh commented Apr 29, 2016

@dcowden : As for next steps, we need an owner for this feature. @edsiper is this something you are interested in tackling?

@dcowden
Copy link
Author

dcowden commented Apr 30, 2016

Ok I was thinking about this more, and I have a nagging 'detail' that I am worried could become a big deal.

The problem: some apps generate messages that cannot be enriched without parsing them

Consider this JVM gc logging format:

2015-05-26T14:45:59.690-0200: 172.829: [GC (Allocation Failure) 172.829: [DefNew: 629120K->629120K(629120K), 0.0000372 secs]172.829: [Tenured: 1203359K->755802K(1398144K), 0.1855567 secs] 1832479K->755802K(2027264K), [Metaspace: 6741K->6741K(1056768K)], 0.1856954 secs] [Times: user=0.18 sys=0.00, real=0.18 secs]

It is non-trivial to convert this into a sane, fielded message. Tools like fluentd and splunk can do it, with some work. We've agreed that we do NOT want k8s to get into the parsing businesses.

The current state is broken, but its not kubernetes' fault.

Today, docker journald and json drivers encapsulate this type of message into a json message.

{
     containerid:  'a7b82c9sd3f92202'
     time: '2015-05-26T14:45:59.690'
     pid : '20122'
     message: '2015-05-26T14:45:59.690-0200: 172.829: [GC (Allocation Failure) 172.829: [DefNew: 629120K->629120K(629120K), 0.0000372 secs]172.829: [Tenured: 1203359K->755802K(1398144K), 0.1855567 secs] 1832479K->755802K(2027264K), [Metaspace: 6741K->6741K(1056768K)], 0.1856954 secs] [Times: user=0.18 sys=0.00, real=0.18 secs]'

     other-fields...
}

Both fluentd and splunk will have a hard time with this type of message. It is now an 'enveloped' format. Getting the top-layer, json fields out of it is easy, but getting those AND then dealing with the structure of the internal message is not easy at all.

Currently most of our applications are splunk optimized, so we have messages that look like this:

2015-05-26T14:45:59.690 class=MyJavaClass priority=2 user=joe url=http:/my/url message

And then end up enveloped like this:

{
     containerid:  'a7b82c9sd3f92202'
     time: '2015-05-26T14:45:59.690'
     pid : '20122'
     message: '`2015-05-26T14:45:59.690  class=MyJavaClass priority=2 user=joe url=http:/my/url  message`'
}

K8S isn't making things worse, and could easily add fields into these encapsulated messages since the outer message is json. But the situation still stinks.

How can we collect the logs, while also letting k8s and docker add fields along the way?

  1. Make all apps use json Obviously, this wont work for lots of cases, like the JVM GC logs
  2. Stick with the envelope format, like things are working today. Then write fluentd and splunk parsers that understand this format. That'd be quite a pain, but i guess its somehow do-able. Offhand i have no idea how to do it in splunk though.
  3. Send logs to fluentd before docker/k8s gets them FLuentd gets it first, and can parse it and produce a nice json message. Unfortunately, this goes against our architectural desire to avoid sending messages directly to fluentd, due to reliability considerations
  4. Have k8s do just enough parsing to detect if an envelope is necessary If the message already looks like json, then the k8s fields could be added in easily. Otherwise, the fields could be added onto the end of the message in kv format, which would be fairly easy for either splunk or fluentd to handle downstream. ( much easier than a json with odd format inside of one field ).

Of these options, I recommend option 4. It stinks to have k8s doing some log parsing, but that's better than accepting weird problems when the fluentd daemon goes down, or accepting that its really hard to parse logs downstream.

I think eventually option 1 is the right answer in the long term. I think nearly everyone in the logging community is in favor of json log messages. Option 4 would allow people/systems to transition to json over time.

I'm open to other options. Sorry for the long post.

@edsiper
Copy link
Contributor

edsiper commented Apr 30, 2016

@vishh definitely!

@vishh
Copy link
Contributor

vishh commented May 2, 2016

Parsing the log format of docker daemon is a general problem. For the longer term, we should work towards having containers log to disks directly, thereby not clobbering log lines.
Until that is possible, I'd prefer solving the parsing issue in fluentd or splunk.
Given that both docker and kubernetes are starting to get used extensively, it probably makes sense to have plugins that can interpret docker's log format, and kubernetes metadata.

FYI: @edsiper works on fluentd :)

@edsiper
Copy link
Contributor

edsiper commented May 3, 2016

@vishh I am not sure how Docker containers are launched but I think Kubernetes may add it metadata when starting it, e.g:

https://docs.docker.com/engine/admin/logging/overview/

If we always use the JSON file format, custom fields can be added as attributes, e.g:

$ docker run --label foo=bar -e fizz=buzz -d -P training/webapp python app.py

generates:

"attrs":{"fizz":"buzz","foo":"bar"}

then Fluentd can tail the JSON files in the old fashion way.

@vishh
Copy link
Contributor

vishh commented May 3, 2016

@edsiper That sounds like a good short-term solution. In the long term though, we'd like to not use docker daemon for logs at all, as I mentioned earlier. That means that there will be metadata added to each log line. Instead, we will have to define a new k8s plugin that can add metadata to one or more log files.

@jimmidyson how do we add k8s metadata to logs now? Is #24677 (comment) something that we should look into?

@dcowden
Copy link
Author

dcowden commented May 3, 2016

@vishh adding metadata to log lines sounds tricky without knowing the log format. How do you handle multiline logs correctly?

@vishh
Copy link
Contributor

vishh commented May 3, 2016

@dcowden Are you referring to existing docker support for additional metadata in logs or the long-term solution I was alluding to?

@dcowden
Copy link
Author

dcowden commented May 4, 2016

@vishh I was referring to the long term plan, when the docker daemon is no longer in the picture. In the long term plan, containers will be logging to a location on the filesystem that is specified in the k8s metadata as a logging volume. k8s knows these files are logs, and knows that it should add metadata to them. they are in whatever format the application is using, which could include multi-line stack traces.

How would k8s add metadata to each line in that case, without knowing if it is a multi-line log format or not? And, would format would it use to append data, given that it wouldnt be consistent with the format of the existing log format?

@vishh
Copy link
Contributor

vishh commented May 4, 2016

k8s need not add metadata to each line. It just needs to make the metadata available for logging daemons to consume along with log files. One option would be to create metadata files inside each log volume in a curated format and have logging daemons associate log lines with metadata appropriately.

@dcowden
Copy link
Author

dcowden commented May 4, 2016

Oh I see ok! How do you envision logging daemons getting the logs and metadata? Would the k8s API provide an endpoint that logging daemons consume, or do the logging daemons read the data directly off the filesystem?

I'm sorry to be asking so many questions, I'm needing to set my system up and I'd like to do it as similarly as possible to what will eventually become the 'right' way

@jimmidyson
Copy link
Member

@vishh Kubernetes metadata is added to all log events by our fluentd kubernetes metadata filter. Currently this relies on extracting info from the log file name. Could you link to details on the long term plan, bypassing docker for streaming logs as you describe above?

Multiline logs are really tricky to handle in a generic way across platforms/languages/frameworks. This is one reason that structured logging is highly preferred, the basic JSON log that Docker provides makes some of this a bit easier.

@dcowden
Copy link
Author

dcowden commented May 4, 2016

@jimmidyson the structured json logs in docker makes it harder too, when the logs were already structured before docker got ahold of them

@lucab
Copy link
Contributor

lucab commented Sep 16, 2016

@dcowden if k8s is sitting in the middle between the container and journald (eg. if the runtime doesn't inject directly to journald) it can just attach additional fields before writing to journald. If k8s in not sitting in the log hotpath (ie. the runtime can inject directly to journald), it can just ask the runtime to do it.

@piosz
Copy link
Member

piosz commented Oct 24, 2016

cc @crassirostris

@dcowden
Copy link
Author

dcowden commented Oct 31, 2016

This item has generated a lot of discussion and interest--thanks to all who have helped!

I've created a document at the request of the sig-instrumentation group that more clearly describes the requirements, as well as the comments/thoughts of the discussion in this thread

https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q

@dmcnaught
Copy link

I think it's worth mentioning that you can create a container to send all the docker logs to splunk - which would include all kubernetes container logs: http://blogs.splunk.com/2015/08/24/collecting-docker-logs-and-stats-with-splunk/

@halr9000
Copy link

halr9000 commented Nov 1, 2016

Thanks for the link @dmcnaught, but unfortunately that technique still leaves a lot of gaps which
Dave has detailed in the above google doc. The problem isn't so much getting data into Splunk (although this does need to be easier on our side), but that Kuberneres isn't helping to manage the multiple streams of data that applications emit.

Since that blog post, Splunk has shipped an official Docker image that includes a pre-configured Docker monitor (hub.docker.com/splunk/splunk:6.5.0-monitor). This, combined with a daemonset and some file monitors for your apps other output do get you pretty far towards an end to end solution.

Hal @ Splunk

@dmcnaught
Copy link

dmcnaught commented Nov 1, 2016

Thanks @halr9000 - I was looking for a better article from Splunk that describes what we've done, but I didn't find one. We have done something very similar to this (https://hub.docker.com/r/splunk/universalforwarder/), and we also add our certs to encrypt data to our splunk server. We run as a k8s daemonset.
Our engineer submitted the code in Oct 2015, and it looks like it used the Splunk Dockerfile as the base. I couldn't find where he got it from though - maybe that resource has been removed...

@halr9000
Copy link

halr9000 commented Nov 1, 2016

If it was in Oct @dmcnaught, then no doubt he'd submitted to an earlier unofficial image (was under github.com/outcoldman) that unfortunately had to be pulled prior to the official one being published. Changes were quashed, which made us very sad.

@crassirostris
Copy link

For those who might be interested, we recently updated the k8s documentation regarding logging: https://kubernetes.io/docs/user-guide/logging/overview/

This new page describes the current situation with the logging in Kubernetes and lists solution from the most preferable in most cases to the least preferable in most cases. It also includes an example of a sidecar container streaming logs from files to its stdout, making it possible to re-use current node-level infrastructure to handle multiple files inside application container, but at the same time separating different log streams.

It's worth mentioning that we also plan to document how to configure node-level logging agent in 1.6.

@henry-hz
Copy link

the stack-driver has a new sink export to pubsub, this is a very elegant solution:
http://blogs.splunk.com/2016/03/23/announcing-splunk-add-on-for-google-cloud-platform-gcp-at-gcpnext16/

@gnanasekar6914
Copy link

Wow thanks for the information. I was looking for long time for this answer in Splunk Administration and i need some information about this.

Option 1 -- k8s->fluentd-->splunk

@outcoldman
Copy link

FYI for people interested in Splunk + Kubernetes. We just published first version of our application "Monitoring Kubernetes" (https://splunkbase.splunk.com/app/3743/) and collector (https://www.outcoldsolutions.com). Please take a look on our manual how to get started https://www.outcoldsolutions.com/docs/monitoring-kubernetes/

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 10, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@rehevkor5
Copy link

My K8S->Splunk logging is currently broken by issues in the (mostly inactive) https://github.com/brycied00d/fluent-plugin-splunk-http-eventcollector plugin, but I notice that Splunk has https://github.com/splunk/fluent-plugin-splunk-hec as well as https://github.com/splunk/splunk-connect-for-kubernetes so I'm going to give one of those a try.

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Mar 14, 2020
…up-dynamiccert-pkg-4.4

Bug 1812877: fixes configmap "extension-apiserver-authentication" not found

Origin-commit: b80bcf408be3057add5ebc9b38274bd7b15259a1
@edsiper
Copy link
Contributor

edsiper commented Jan 31, 2022

You should use Fluent Bit:

https://docs.fluentbit.io/manual/pipeline/outputs/splunk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensibility area/kubelet-api area/logging lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests