Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat 6.4.0 chokes on docker continuation lines, "parsing CRI timestamp" errors #8203

Closed
towolf opened this issue Sep 3, 2018 · 29 comments

Comments

@towolf
Copy link

towolf commented Sep 3, 2018

We upgraded to Filebeat 6.4.0 running in Kubernetes with regular Docker engine with json-file logging.

Now it begun to choke on some messages, trying to parse dates for CRI format. We do not have CRI format.

The actual file on disk looks like this:

{"log":"{ \"timestamp\":\"2018-09-03 11:25:31.950\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:31.951\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:31.951\", \"level\":\"info\", \"http_x_
{"log":"{ \"timestamp\":\"2018-09-03 11:25:31.951\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:32.361\", \"level\":\"debug\", \"http_x
{"log":"ncies\\\" : [ {\\n        \\\"currency_id\\\" : \\\"sample-client-current\
{"log":"\"user\\\" : \\\"200060\\\",\\n      \\\"lang\\\" : \\\"de\\\"\\n    }\\n 
{"log":"d\\\" : \\\"242219ea-c250-45a0-9070-d6fa65501eba\\\",\\n    \\\"account_nu
{"log":" \\\"19a8562e-78ae-45a5-842a-8a0e36206461\\\",\\n    
{"log":"{ \"timestamp\":\"2018-09-03 11:25:32.363\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:32.364\", \"level\":\"info\", \"http_x_
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.023\", \"level\":\"info\", \"http_x_
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.023\", \"level\":\"info\", \"http_x_
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.023\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.241\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.241\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.242\", \"level\":\"info\", \"http_x_
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.242\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.792\", \"level\":\"debug\", \"http_x
{"log":"encies\\\" : [ {\\n        \\\"currency_id\\\" : \\\"sample-client-current
{"log":"\\\"user\\\" : \\\"200060\\\",\\n      \\\"lang\\\" : \\\"de\\\"\\n    }\\
{"log":"id\\\" : \\\"242219ea-c250-45a0-9070-d6fa65501eba\\\",\\n    \\\"account_n
{"log":": \\\"19a8562e-78ae-45a5-842a-8a0e36206461\\\",\\n    
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.794\", \"level\":\"debug\", \"http_x
{"log":"{ \"timestamp\":\"2018-09-03 11:25:38.796\", \"level\":\"info\", \"http_x_

The errors printed are these:

2018-09-03T11:25:17.962Z	ERROR	log/harvester.go:278	Read line error: parsing CRI timestamp: parsing time ""-101\\\",\\n" as "2006-01-02T15:04:05Z07:00": cannot parse ""-101\\\",\\n" as "2006"; File: /var/lib/docker/containers/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577-json.log
2018-09-03T11:44:28.000Z	ERROR	log/harvester.go:278	Read line error: parsing CRI timestamp: parsing time "\\\",\\n" as "2006-01-02T15:04:05Z07:00": cannot parse "\\\",\\n" as "2006"; File: /var/lib/docker/containers/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577-json.log
2018-09-03T11:44:38.001Z	ERROR	log/harvester.go:278	Read line error: parsing CRI timestamp: parsing time "\\\",\\n" as "2006-01-02T15:04:05Z07:00": cannot parse "\\\",\\n" as "2006"; File: /var/lib/docker/containers/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577-json.log
2018-09-03T11:44:48.001Z	ERROR	log/harvester.go:278	Read line error: parsing CRI timestamp: parsing time "\\\",\\n" as "2006-01-02T15:04:05Z07:00": cannot parse "\\\",\\n" as "2006"; File: /var/lib/docker/containers/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577/2e9a3a7e4245431ab5c09cd794fed0f6c2e564ef9daf7aa1f8ddcebd96616577-json.log

And so on and so forth.

@towolf towolf changed the title Filebeat 6.4.0 chokes on docker continuation lines Filebeat 6.4.0 chokes on docker continuation lines, "parsing CRI timestamp" errors Sep 3, 2018
@exekias
Copy link
Contributor

exekias commented Sep 3, 2018

hi @towolf, thank you for reporting this, it's a duplicate of #8175, which is already fixed and will be released with next 6.4.1

Best regards

@exekias exekias closed this as completed Sep 3, 2018
@SharpEdgeMarshall
Copy link

SharpEdgeMarshall commented Sep 19, 2018

I'm still experiencing this with 6.4.1

2018-09-19T10:52:38.990Z	INFO	log/harvester.go:251	Harvester started for file: /var/lib/docker/containers/XYZ/XYZ-json.log
2018-09-19T10:52:38.990Z	ERROR	log/harvester.go:278	Read line error: parsing CRI timestamp: parsing time "+256\\\",\\\"myfield\\\":\\\"myvalue\\\"},{\\\"myfield\\\":\\\"myvalue\\\",\\\"myfield\\\":\\\"myvalue\\\",\\\"myfield\\\":\\\"\\\",\\\"myfield\\\":\\\"\\\",\\\"myfield\\\":\\\"myvalue" as "2006-01-02T15:04:05Z07:00": cannot parse "+256\\\",\\\"myfield\\\":\\\"myvalue\\\"},{\\\"myfield\\\":\\\"myvalue\\\",\\\"myfield\\\":\\\"myvalue\\\",\\\"myfield\\\":\\\"\\\",\\\"myfield\\\":\\\"\\\",\\\"myfield\\\":\\\"myvalue" as "2006"; File: /var/lib/docker/containers/XYZ/XYZ-json.log
2018-09-19T10:52:43.873Z	INFO	log/harvester.go:276	File is inactive: /var/lib/docker/containers/XYZ/XYZ-json.log. Closing because close_inactive of 5m0s reached.

@exekias
Copy link
Contributor

exekias commented Sep 19, 2018

Hi @SharpEdgeMarshall, did you remove the existing registry file? Try again after removing data/registry file

@SharpEdgeMarshall
Copy link

SharpEdgeMarshall commented Sep 19, 2018

Removed the file and restarted k8s pods but the problem still exist.
UPDATE: i confirm that now seems it's gone, required some time, maybe because it tried to reprocess the logs one more time.

@exekias
Copy link
Contributor

exekias commented Sep 19, 2018

I did a quick test now and it seems to be working for me, you will need to follow these steps:

  • Stop the filebeat pod
  • Delete the registry file (if you do it before stopping the pod Filebeat will recreate it when stopping)
  • Start filebeat with the new version

@teraserg
Copy link

Worked for me too. We had filebeat as daemonset and mounted host filesystem.
Stopped filebeat, cleaned up all registry files on all kubernetes nodes. Redeployed. Worked.

@a-abella
Copy link

a-abella commented Oct 15, 2018

I'm experiencing similar behavior on 6.4.2 (also saw same behavior on 6.4.1) when restarting Filebeat. I'm using docker autodiscover and setting module inputs on container streams.

On first run (no registry file exists), harvesters start up without a problem. On service restart (registry file exists), a few containers exhibit the CRI timestamp problem. To workaround, I have to stop Filebeat, remove the registry, and start Filebeat. This means I lose a few messages because of tail_files: true. I've not yet tried an initial run with tail_files: false because there's a large amount of old log data.

Error message:

ERROR	log/harvester.go:278	Read line error: parsing CRI timestamp: parsing time "01:57:22" as "2006-01-02T15:04:05Z07:00": cannot parse "7:22" as "2006"; File: /var/lib/docker/containers/abc123/abc123-json.log

The docker json log message in question:

{"log":"172.31.0.137 - - [15/Oct/2018:01:57:22 +0000] \"GET /mirror/index.html HTTP/1.1\" 200 6 \"-\" \"ELB-HealthChecker/2.0\" \"-\"\n","stream":"stdout","time":"2018-10-15T01:57:22.893981263Z"}

The docker autodiscover inputs:

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        - condition.or:
            - contains.docker.container.image: nginx
            - contains.docker.container.image: http
          config:
            - module: nginx
              access:
                input:
                  type: docker
                  tail_files: true
                  containers.ids:
                    - ${data.docker.container.id}
                  containers.stream: stdout
              error:
                input:
                  type: docker
                  tail_files: true
                  containers.ids:
                    - ${data.docker.container.id}
                  containers.stream: stderr
        - condition:
            contains.docker.container.image: mysql
          config:
            - module: mysql
              error:
                input:
                  type: docker
                  tail_files: true
                  multiline.pattern: '^[[:space:]]'
                  multiline.negate: false
                  multiline.match: after
                  containers.ids:
                    - ${data.docker.container.id}
        - condition:
            contains.docker.container.image: mongo
          config:
            - module: mongodb
              log:
                input:
                  type: docker
                  tail_files: true
                  multiline.pattern: '^\d{4}-\d{2}-\d{2}'
                  multiline.negate: true
                  multiline.match: after
                  containers.ids:
                    - ${data.docker.container.id}
        - condition:
            and:
              - not:
                  contains.docker.container.image: nginx
              - not:
                  contains.docker.container.image: http
              - not:
                  contains.docker.container.image: mysql
              - not:
                  contains.docker.container.image: mongo
          config:
            - type: docker
              tail_files: true
              containers.ids:
                - ${data.docker.container.id}

@genuss
Copy link

genuss commented Nov 1, 2018

I confirm the last comment's behavior on filebeat version 6.4.2. I too use docker autodiscover. The second case when the issue comes up is when I restart a container.
Looks like filebeat cannot reopen the same log file.
Too workaround I downgraded to 6.4.0 and issue gone away. @a-abella , you may try to downgrade too.

@exekias
Copy link
Contributor

exekias commented Nov 6, 2018

Hi, @a-abella and @genuss, did you clean your registry file before trying 6.4.2?

@genuss
Copy link

genuss commented Nov 6, 2018

The first version of filebeat was 6.4.2 on that server. And while experimenting I also cleaned registry file.

@a-abella
Copy link

a-abella commented Nov 6, 2018

@exekias Yes, I had to remove the registry file every time I stopped the service, or a container prospector would start exhibiting the CRI timestamp problem on service start. Including between updates.

@genuss Rolling back has stopped the issue from occurring, thanks for the suggestion.

@a-abella
Copy link

a-abella commented Dec 1, 2018

@exekias @genuss I've updated Filebeat to 6.5.1 about a week ago, and since then have not experienced the issue again.

@genuss
Copy link

genuss commented Dec 3, 2018

Thanks, @a-abella , will check this week if update works for me.

@a-abella
Copy link

a-abella commented Dec 3, 2018

@genuss Scratch that, just checked right now and one of our servers has started erroring again over the weekend. Once again, it's a docker autodiscover entry that starts up the nginx module:

2018-12-02T19:35:41.015Z        INFO    input/input.go:149      input ticker stopped
2018-12-02T19:35:41.015Z        INFO    input/input.go:167      Stopping Input: 103326399326262808
2018-12-02T19:35:41.015Z        INFO    log/harvester.go:275    Reader was closed: /var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c561017
8c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log. Closing.
2018-12-02T19:35:41.015Z        INFO    input/input.go:149      input ticker stopped
2018-12-02T19:35:41.015Z        INFO    input/input.go:167      Stopping Input: 1526243139307982433
2018-12-02T19:35:41.015Z        INFO    log/harvester.go:275    Reader was closed: /var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log. Closing.
2018-12-02T19:35:41.846Z        INFO    log/input.go:138        Configured paths: [/var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/*.log]
2018-12-02T19:35:41.847Z        INFO    log/input.go:138        Configured paths: [/var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/*.log]
2018-12-02T19:35:41.848Z        INFO    input/input.go:114      Starting input of type: docker; ID: 103326399326262808
2018-12-02T19:35:41.848Z        INFO    input/input.go:114      Starting input of type: docker; ID: 1526243139307982433
2018-12-02T19:35:51.849Z        INFO    log/harvester.go:254    Harvester started for file: /var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log
2018-12-02T19:35:51.849Z        INFO    log/harvester.go:254    Harvester started for file: /var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log
2018-12-02T19:35:51.852Z        ERROR   log/harvester.go:281    Read line error: parsing CRI timestamp: parsing time "0" as "2006-01-02T15:04:05Z07:00": cannot parse "0" as "2006"; File: /var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log
2018-12-02T19:35:51.852Z        ERROR   log/harvester.go:281    Read line error: parsing CRI timestamp: parsing time "1.1\"" as "2006-01-02T15:04:05Z07:00": cannot parse """ as "2006"; File: /var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log

The current registry entry :

{
    "source": "/var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log",
    "offset": 117105083,
    "timestamp": "2018-12-03T17:25:09.417835957Z",
    "ttl": -1,
    "type": "docker",
    "meta": {
        "stream": "stdout"
    },
    "FileStateOS": {
        "inode": 4731930,
        "device": 64514
    }
},
{
    "source": "/var/lib/docker/containers/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771/d145fe4fe0dd537b1473478d9f282e4bedce9a8c1b431c5610178c4f62fe0771-json.log",
    "offset": 130794640,
    "timestamp": "2018-12-03T17:25:09.41340366Z",
    "ttl": -1,
    "type": "docker",
    "meta": {
        "stream": "stderr"
    },
    "FileStateOS": {
        "inode": 4731930,
        "device": 64514
    }
}

Typical log content:

172.31.2.248 - - [03/Dec/2018:17:26:11 +0000] "GET /api/v3/swagger HTTP/1.1" 200 316932 "-" "ELB-HealthChecker/2.0" "-"
2018/12/03 17:26:16 [warn] 10#10: *120753 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/9/39/0000056399 while reading upstream, client: 172.31.0.254, server: default, request: "GET /api/v3/swagger HTTP/1.1", upstream: "fastcgi://10.0.5.10:3000", host: "172.31.0.14:7201"
172.31.0.254 - - [03/Dec/2018:17:26:16 +0000] "GET /api/v3/swagger HTTP/1.1" 200 316962 "-" "ELB-HealthChecker/2.0" "-"
172.31.2.248 - - [03/Dec/2018:17:26:25 +0000] "GET /api/v3/monitors/ HTTP/1.1" 200 36702 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_101)" "138.128.241.190"
172.31.2.248 - - [03/Dec/2018:17:26:26 +0000] "GET /api/v3/monitors/ HTTP/1.1" 200 36702 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_101)" "138.128.241.190"
2018/12/03 17:26:29 [warn] 10#10: *120758 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/0/40/0000056400 while reading upstream, client: 172.31.0.173, server: default, request: "GET /api/v3/swagger HTTP/1.1", upstream: "fastcgi://10.0.5.10:3000", host: "172.31.0.14:7201"
172.31.0.173 - - [03/Dec/2018:17:26:29 +0000] "GET /api/v3/swagger HTTP/1.1" 200 316962 "-" "ELB-HealthChecker/2.0" "-"

@rocketraman
Copy link

@towolf
Copy link
Author

towolf commented Dec 11, 2018

FWIW we went back to this and disabled autodicover. I think it's just more reliable:

filebeat.inputs:                                                                                        
    - type: docker                                                                                      
      containers.ids:                                                                                   
        - "*"                                                                                           
                                                                                                        
# To enable hints based autodiscover, remove `filebeat.config.inputs` configuration and uncomment this: 
#filebeat.autodiscover:                                                                                 
#  providers:                                                                                           
#    - type: kubernetes                                                                                 
#      hints.enabled: true                                                                              

@rocketraman
Copy link

rocketraman commented Dec 18, 2018

@towolf Did you have to add any other configuration, other than you showed above, after disabling autodiscover? I'd like to do this too, as nobody from Elastic seems to care about this error and/or know how to fix it, but I would like to understand what a minimal but somewhat equivalent to autodiscover configuration would be.

@tehho
Copy link

tehho commented Dec 19, 2018

The above fix for cri worked for filebeat 6.5.3.
Needed to ssh in to nodes and remove the /var/lib/filebeat-data/release then apply the filebeat daemonset.
Plox fix

@Evesy
Copy link

Evesy commented Dec 19, 2018

Similar to @rocketraman I've encountered the same issues on 6.5.3. I've documented my rough findings so far here: https://discuss.elastic.co/t/filebeat-6-4-2-and-6-5-1-read-line-error-parsing-cri-timestamp-and-invalid-cri-log-format/159383/6?u=evesy

@rocketraman
Copy link

rocketraman commented Dec 21, 2018

@Evesy and others that may still be having this issue, as per my forum update I believe I was able to narrow the still-occurring CRI errors in my environment to a narrower issue related to nginx in Kubernetes with additional container annotations, and I am linking it here for posterity: #9768

@BulatSaif
Copy link

6.5.4 same problem.
I am using filebeat autodiscover docker.

2019-02-13T06:49:43.248Z	ERROR	log/harvester.go:281	Read line error: parsing CRI timestamp: parsing time "mg/sora-logo.d115c4b4.svg" as "2006-01-02T15:04:05Z07:00": cannot parse "mg/sora-logo.d115c4b4.svg" as "2006"; File: /var/lib/docker/containers/15c08d487a77248186fde41bea72380028247c4401f8ebe7e6fc20248988e65b/15c08d487a77248186fde41bea72380028247c4401f8ebe7e6fc20248988e65b-json.log

Problem appears with nginx container
logs:

{"log":"172.18.0.2 - - [12/Feb/2019:14:18:55 +0000] \"GET /css/app.d35f337d.css HTTP/1.1\" 200 201906 \"https://adminpanel.s1.dev.sora.soramitsu.co.jp/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15\" \"37.29.99.122\"\n","stream":"stdout","time":"2019-02-12T14:18:55.669495346Z"}
{"log":"172.18.0.2 - - [12/Feb/2019:14:18:55 +0000] \"GET /img/sora-logo.d115c4b4.svg HTTP/1.1\" 200 4126 \"https://adminpanel.s1.dev.sora.soramitsu.co.jp/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15\" \"37.29.99.122\"\n","stream":"stdout","time":"2019-02-12T14:18:55.669591175Z"}
{"log":"172.18.0.2 - - [12/Feb/2019:14:18:55 +0000] \"GET /css/chunk-a05fc3e2.5efda3e4.css HTTP/1.1\" 200 573 \"https://adminpanel.s1.dev.sora.soramitsu.co.jp/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15\" \"37.29.99.122\"\n","stream":"stdout","time":"2019-02-12T14:18:55.669597943Z"}

@BulatSaif
Copy link

@exekias, can we reopen issue? Since some still have problems...

@jsoriano
Copy link
Member

@BulatSaif a fix for this will be released in 6.6.1 (#10211) if problems persist by then we'll reopen.

@oliveiraVG
Copy link

oliveiraVG commented Feb 18, 2019

@jsoriano

Do you have ETA on the relase date of 6.6.1?

If that's too far away, does the 7.0 beta have that fix?

@jsoriano
Copy link
Member

jsoriano commented Feb 20, 2019

@oliveiraVG it has just been released https://www.elastic.co/blog/elastic-stack-6-6-1-and-5-6-15-released 🙂

7.0.0-beta1 also includes #10211

@oliveiraVG
Copy link

Thank you!

@ITBlogger
Copy link

I'm still seeing these errors with 6.6.1

@ITBlogger
Copy link

This is the kind of nginx log line it's erroring on:

{"log":"10.50.131.251 - - [27/Feb/2019:05:00:28 +0000] "GET /status HTTP/1.1" 200 8 "-" "kube-probe/1.12"\n","stream":"stdout","time":"2019-02-27T05:00:28.617850486Z"}

@jsoriano
Copy link
Member

jsoriano commented Mar 1, 2019

#10951 probably helps with the timestamp issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

14 participants