Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: concurrent map iteration and map write #3735

Closed
tonobo opened this Issue Jan 25, 2018 · 11 comments

Comments

Projects
None yet
4 participants
@tonobo
Copy link

tonobo commented Jan 25, 2018

What did you do?

Just a few queries and scraping ~1,5k targets.

Environment

  • System information:
level=info ts=2018-01-25T09:28:37.474952519Z caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0, branch=HEAD, revision=85f23d82a045d103ea7f3c89a91fba4a93e6367a)"
level=info ts=2018-01-25T09:28:37.474989196Z caller=main.go:226 build_context="(go=go1.9.2, user=root@6e784304d3ff, date=20180119-12:01:23)"
level=info ts=2018-01-25T09:28:37.475001574Z caller=main.go:227 host_details="(Linux 4.10.0-37-generic #41~16.04.1-Ubuntu SMP Fri Oct 6 22:42:59 UTC 2017 x86_64 internal1 (none))"
  • Logs:

The whole go panic: prom.log.gz

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 25, 2018

@tonobo it might help if you can provide a minimal config to replicate the issue.
btw does it happen on every start ?

@cstyan any idea why this might happen? the locks seem properly used in file.go

fatal error: concurrent map iteration and map write

goroutine 2820346 [running]:
runtime.throw(0x1c1d139, 0x26)
	/usr/local/go/src/runtime/panic.go:605 +0x95 fp=0xc5c29d5c50 sp=0xc5c29d5c30 pc=0x42bca5
runtime.mapiternext(0xc5c29d5e58)
	/usr/local/go/src/runtime/hashmap.go:778 +0x6f1 fp=0xc5c29d5ce8 sp=0xc5c29d5c50 pc=0x40a031
github.com/prometheus/prometheus/discovery/file.(*TimestampCollector).Collect(0xc4201a3b60, 0xc613b36720)
	/go/src/github.com/prometheus/prometheus/discovery/file/file.go:99 +0x17b fp=0xc5c29d5f98 sp=0xc5c29d5ce8 pc=0xaacd7b
github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc66aa367c0, 0xc613b36720, 0x28f9b00, 0xc4201a3b60)
	/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus/registry.go:382 +0x61 fp=0xc5c29d5fc0 sp=0xc5c29d5f98 pc=0x78d411
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc5c29d5fc8 sp=0xc5c29d5fc0 pc=0x45cba1
created by github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus/registry.go:380 +0x2e1
@tonobo

This comment has been minimized.

Copy link
Author

tonobo commented Jan 25, 2018

@krasi-georgiev This happend after 6 days uptime.

I'm running the following config.

alerting:
    alertmanagers:
    -   static_configs:
        -   targets:
            - localhost:9095
global:
    evaluation_interval: 15s
    scrape_interval: 15s
rule_files:
- /etc/prometheus/rules.d/*.yml
- /etc/prometheus/git_rules.d/*.yml
scrape_configs:
-   job_name: elastic_node
    metrics_path: /node_metrics
    scrape_interval: 10s
    scrape_timeout: 10s
    static_configs:
    -   targets:
        - elastic1.example.de:9009
        - elastic2.example.de:9009
        - elastic3.example.de:9009
-   job_name: prometheus
    scrape_interval: 10s
    scrape_timeout: 10s
    static_configs:
    -   targets:
        - localhost:9090
    -   targets:
        - localhost:9100
-   file_sd_configs:
    -   files:
        - /etc/prometheus/targets.d/xxx_*.json
    job_name: xxx
    scrape_interval: 30s
    scrape_timeout: 30s
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 25, 2018

the panic happens in the file service discovery so I think we would need the json files to replicate it.

file_sd_configs:
    -   files:
        - /etc/prometheus/targets.d/xxx_*.json
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jan 25, 2018

I don't think the JSON files would help. We have all the details here, I think :)

https://github.com/prometheus/prometheus/blob/master/discovery/file/file.go#L99-L101 This is reading the fileSD.timestamps while we are modifying it elsewhere.

Now the issue is that we are doing t.lock.RLock(): https://github.com/prometheus/prometheus/blob/master/discovery/file/file.go#L97

While we are locking writes using a completely different lock: https://github.com/prometheus/prometheus/blob/master/discovery/file/file.go#L274 (d.lock.Lock())

cc @cstyan

@tonobo

This comment has been minimized.

Copy link
Author

tonobo commented Jan 25, 2018

There are 1838 json files, do you want all of them? Is it really required?

@tonobo

This comment has been minimized.

Copy link
Author

tonobo commented Jan 25, 2018

@gouthamve ok, i think so :D

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 25, 2018

@gouthamve ooh , shoot , I completely overlooked this , that would be an easy fix than.

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Jan 25, 2018

Looks like a simple fix as @gouthamve pointed out. I would like to claim this issue!

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 25, 2018

@codesome go ahead :) green light...

gouthamve added a commit that referenced this issue Jan 28, 2018

Fixed race condition in map iteration and map write in Discovery (#3735
…) (#3738)

* Fixed concurrent map iteration and map write in Discovery (#3735)

* discovery: Changed Lock to RLock in Collect
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jan 28, 2018

Closed by: #3738

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.