Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update default image versions #3007

Conversation

pgier
Copy link
Contributor

@pgier pgier commented Feb 4, 2020

Prometheus image should default to latest version in compatibility
matrix. Also updates Alertmanager and Thanos to latest stable versions.

@yeya24
Copy link
Contributor

yeya24 commented Feb 4, 2020

I already opened a similar PR #2901. But the CI always fails.

@pgier
Copy link
Contributor Author

pgier commented Feb 4, 2020

Ah sorry, I didn't see that one!

@lilic
Copy link
Contributor

lilic commented Feb 4, 2020

@yeya24 I think its because of the ordering of your vars. If you don't mind closing yours, as this one fixes a few more things. Hope thats okay? :) Thanks!

@pgier
Copy link
Contributor Author

pgier commented Feb 4, 2020

@lilic I don't think it's the ordering, since if that was the issue it would probably fail early during build, wouldn't it?

@yeya24
Copy link
Contributor

yeya24 commented Feb 4, 2020

@lilic Sure, I will close mine.

@lilic
Copy link
Contributor

lilic commented Feb 4, 2020

Yes, its a global var not inline defined, strange. 🤷‍♀

@pgier
Copy link
Contributor Author

pgier commented Feb 4, 2020

Looks like the same test failure we saw before, I'll try to investigate further.

@pgier
Copy link
Contributor Author

pgier commented Feb 4, 2020

I tracked the issue down to prometheus v2.8.0 and higher, and seems to be caused by a change in the error handling in the scrape client (prometheus/prometheus#5182). The issue is that the certificates are not available in the pod when the test is running, and in Prometheus 2.7.2 and lower, the invalid cert actually causes Prometheus to crash with nil pointer.

level=error ts=2020-02-04T22:23:17.800304565Z caller=scrape.go:147 component="scrape manager" scrape_pool=allns-y-promarbitraryfsacc-allowed-tls-file-q577gu-0/test/0 msg="Error creating HTTP client"
 err="unable to use specified CA cert /etc/ca-certificates/example-ca.pem: open /etc/ca-certificates/example-ca.pem: no such file or directory"               
panic: runtime error: invalid memory address or nil pointer dereference                                                                                       
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x669c12]                                                                                       
                                                                               
goroutine 416 [running]:                                                                                                                                      
net/http.(*Client).deadline(0x0, 0xc00009cf30, 0x40bb8f, 0xc000af5fb0)                                                                                        
        /usr/local/go/src/net/http/client.go:187 +0x22                                                                                                        
net/http.(*Client).do(0x0, 0xc000b55100, 0x0, 0x0, 0x0)                                                                                                       
        /usr/local/go/src/net/http/client.go:527 +0xab                                                                                                               
net/http.(*Client).Do(0x0, 0xc000b55100, 0x23, 0xc00099df70, 0x9)                                                                                             
        /usr/local/go/src/net/http/client.go:509 +0x35                                                                                                        
github.com/prometheus/prometheus/scrape.(*targetScraper).scrape(0xc000b03890, 0x1fd4a80, 0xc00007bb00, 0x1fb2780, 0xc0004d29a0, 0x0, 0x0, 0x0, 0x0)                           
        /app/scrape/scrape.go:471 +0x111                              
github.com/prometheus/prometheus/scrape.(*scrapeLoop).run(0xc000b07500, 0x6fc23ac00, 0x2540be400, 0x0) 
        /app/scrape/scrape.go:813 +0x487
created by github.com/prometheus/prometheus/scrape.(*scrapePool).sync
        /app/scrape/scrape.go:336 +0x45d 

And this allows the test to succeed for some reason. Prometheus 2.8.0 and later prints the error but keeps running, but I guess it's not able to scrape and the test just times out.

Anyway, this seems to be a problem in our tests and not an issue with Prometheus.

Prometheus image should default to latest version in compatibility
matrix.  Also updates Alertmanager and Thanos to latest stable versions.
The allowed-tls-file and denied-tls-file tests of service monitor config
were not able to access tls certificate files on the local filesystem.
This issue was only exposed when updating the default Prometheus version
above 2.8.0 because this version introduced validation of the TLS
config.

This changes the config for these two tests to mount a secret containing
these two files.
@pgier pgier force-pushed the upgrade-default-prometheus-alertmanager-thanos-versions branch from 13fab85 to 869f7b7 Compare February 7, 2020 02:19
@brancz
Copy link
Contributor

brancz commented Feb 7, 2020

Could we double check whether an empty file would have the same result? I think it’s just the existence of a file that is referenced.

@pgier
Copy link
Contributor Author

pgier commented Feb 7, 2020

@brancz Looks like it needs a real certificate, the test fails with a timeout when using an empty file, and the Prometheus logs print an error.

level=error ts=2020-02-07T13:17:00.063Z caller=manager.go:123 component="scrape manager" msg="error creating new scrape pool" err="error creating HTTP client: unable to use specified CA cert /etc/ca-certificates/cert.pem" scrape_pool=allns-y-promarbitraryfsacc-allowed-tls-file-q5c261-0/test/0

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! lgtm

@brancz brancz merged commit 71b5e4f into prometheus-operator:master Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants