Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in EC2 discovery: NetworkInterfaces #4441

Closed
nhinds opened this Issue Jul 31, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@nhinds
Copy link

nhinds commented Jul 31, 2018

Bug Report

What did you do?
Ran Prometheus 1.8.2 with EC2 service discovery across many EC2 instances in unknown states.

What did you expect to see?
EC2 service discovery works without crashing.

What did you see instead? Under which circumstances?
Prometheus panicked about a nil pointer dereference:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xa262b2]

goroutine 18710894 [running]:
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh.func2(0xc43bf84560, 0x1, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:190 +0x7f2
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInst
ancesPages.func1(0x187aaa0, 0xc43bf84560, 0x1, 0xc47d437da0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api
.go:6785 +0x49
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request.(*Request).EachPage
(0xc49ec08000, 0xc4815cbb60, 0x2, 0x2)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request/request_pagination.go:98 +0xa1
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages(0xc4821b45e8, 0x0, 0xc4815cbca0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6784 +0x11a

This looks similar to #3001, but this time the the nil pointer was not in EC2 tags, it was in the ENI code: https://github.com/prometheus/prometheus/blob/v1.8.2/discovery/ec2/ec2.go#L190

					for _, eni := range inst.NetworkInterfaces {
						subnetsMap[*eni.SubnetId] = struct{}{}
					}

I am unsure what AWS instance state caused the AWS SDK to return a nil pointer in the NetworkInterfaces field.

I have not tried reproducing this with the latest Prometheus version as I am unsure how exactly to replicate it. However, the code in master for iterating over inst.NetworkInterfaces does not seem to handle nil pointers

Environment

  • System information:

    Linux 3.10.0-693.17.1.el7.x86_64 x86_64

  • Prometheus version:

    prometheus, version 1.8.2 (branch: HEAD, revision: 5211b96)
    build user: root@1412e937e4ad
    build date: 20171104-16:09:14
    go version: go1.9.2

  • Alertmanager version:
    (Not relevant)

  • Prometheus configuration file:

# Many scrape configs with configuration like:
    ec2_sd_configs:
      - region: us-west-2
        port: 9100
      - region: us-east-1
        port: 9100
      - region: us-west-2
        port: 9100
        profile: 123456789
      - region: us-east-1
        port: 9100
        profile: 123456789
...

I do not believe the scrape configuration is relevant to this issue, please ask if there is a specific configuration section that is required.

  • Alertmanager configuration file:
    (Not relevant)

  • Logs:

time="2018-07-31T01:24:18Z" level=info msg="Done checkpointing in-memory metrics and chunks in 1.200622853s." source="persistence.go:665" 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xa262b2]

goroutine 18710894 [running]:
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh.func2(0xc43bf84560, 0x1, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:190 +0x7f2
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages.func1(0x187aaa0, 0xc43bf84560, 0x1, 0xc47d437da0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6785 +0x49
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request.(*Request).EachPage(0xc49ec08000, 0xc4815cbb60, 0x2, 0x2)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request/request_pagination.go:98 +0xa1
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages(0xc4821b45e8, 0x0, 0xc4815cbca0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6784 +0x11a
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh(0xc435f35e50, 0xc4a284a6c0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:163 +0x3c8
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).Run(0xc435f35e50, 0x7f429ec048a0, 0xc455a707c0, 0xc48de17920)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:119 +0x219
created by github.com/prometheus/prometheus/discovery.(*TargetSet).updateProviders
        /go/src/github.com/prometheus/prometheus/discovery/discovery.go:249 +0x26f
time="2018-07-31T01:26:44Z" level=info msg="Starting prometheus (version=1.8.2, branch=HEAD, revision=5211b96d4d1291c3dd1a569f711d3b301b635ecb)" source="main.go:87" 
time="2018-07-31T01:26:44Z" level=info msg="Build context (go=go1.9.2, user=root@1412e937e4ad, date=20171104-16:09:14)" source="main.go:88" 
...
time="2018-07-31T01:57:05Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:633" 
time="2018-07-31T01:57:07Z" level=info msg="Done checkpointing in-memory metrics and chunks in 1.855732474s." source="persistence.go:665" 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xa262b2]

goroutine 67452 [running]:
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh.func2(0xc440f48f20, 0x1, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:190 +0x7f2
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages.func1(0x187aaa0, 0xc440f48f20, 0x1, 0xc452e88c00)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6785 +0x49
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request.(*Request).EachPage(0xc420308e00, 0xc42e05fb60, 0x2, 0x2)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request/request_pagination.go:98 +0xa1
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages(0xc44849f690, 0x0, 0xc42e05fca0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6784 +0x11a
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh(0xc453d12870, 0xc452e96630, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:163 +0x3c8
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).Run(0xc453d12870, 0x7f7cba807300, 0xc440160a80, 0xc446536e40)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:119 +0x219
created by github.com/prometheus/prometheus/discovery.(*TargetSet).updateProviders
        /go/src/github.com/prometheus/prometheus/discovery/discovery.go:249 +0x26f
time="2018-07-31T01:59:35Z" level=info msg="Starting prometheus (version=1.8.2, branch=HEAD, revision=5211b96d4d1291c3dd1a569f711d3b301b635ecb)" source="main.go:87" 
time="2018-07-31T01:59:35Z" level=info msg="Build context (go=go1.9.2, user=root@1412e937e4ad, date=20171104-16:09:14)" source="main.go:88" 
...
time="2018-07-31T04:40:39Z" level=info msg="Done checkpointing in-memory metrics and chunks in 1.498823216s." source="persistence.go:665" 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xa262b2]

goroutine 280 [running]:
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh.func2(0xc425f245c0, 0x1, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:190 +0x7f2
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages.func1(0x187aaa0, 0xc425f245c0, 0x1, 0xc430a0b5c0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6785 +0x49
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request.(*Request).EachPage(0xc420300380, 0xc435fd1b60, 0x2, 0x2)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/aws/request/request_pagination.go:98 +0xa1
github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2.(*EC2).DescribeInstancesPages(0xc446331e80, 0x0, 0xc435fd1ca0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/vendor/github.com/aws/aws-sdk-go/service/ec2/api.go:6784 +0x11a
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).refresh(0xc425ff7a40, 0xc44385fec0, 0x0, 0x0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:163 +0x3c8
github.com/prometheus/prometheus/discovery/ec2.(*Discovery).Run(0xc425ff7a40, 0x7f9267d4c810, 0xc425dd4940, 0xc42d355aa0)
        /go/src/github.com/prometheus/prometheus/discovery/ec2/ec2.go:119 +0x219
created by github.com/prometheus/prometheus/discovery.(*TargetSet).updateProviders
        /go/src/github.com/prometheus/prometheus/discovery/discovery.go:249 +0x26f
time="2018-07-31T04:44:47Z" level=info msg="Starting prometheus (version=1.8.2, branch=HEAD, revision=5211b96d4d1291c3dd1a569f711d3b301b635ecb)" source="main.go:87" 
time="2018-07-31T04:44:47Z" level=info msg="Build context (go=go1.9.2, user=root@1412e937e4ad, date=20171104-16:09:14)" source="main.go:88" 
...
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 1, 2018

Looks like the code hasn't changed since 1.8.2, so this is still an issue.

noqcks added a commit to noqcks/prometheus that referenced this issue Aug 7, 2018

handle nil pointer in ec2 discovery
This handles a nil pointer that was being accessed in EC2 discovery.

Fixes: prometheus#4441

Signed-off-by: noqcks <benny@noqcks.io>

brian-brazil added a commit that referenced this issue Aug 7, 2018

handle nil pointer in ec2 discovery (#4469)
This handles a nil pointer that was being accessed in EC2 discovery.

Fixes: #4441

Signed-off-by: noqcks <benny@noqcks.io>
@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.