Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soliciting Prometheus CI requirements for Inclusive Integration with CNCF Projects #2497

Closed
hh opened this Issue Mar 15, 2017 · 34 comments

Comments

Projects
None yet
@hh
Copy link

hh commented Mar 15, 2017

"CNCF is helping develop a cloud native software stack that enables cross-cloud deployments. Cross-project CI that ensures ongoing interoperability is especially valuable." - Dan Kohn Executive Director CNCF

[cncf-ci-public] CNCF CI Goals
[cncf-ci-public] Soliciting CI requirements via Project GitHub Issues

This github issue is to provide a highly visible invite to be part of creating a cross-cloud cross-project CI within the diverse software communities of the Cloud Native Compute Foundation.

To fully understand our needs and expectations, some help documenting the current state of the Prometheus CI and ongoing requirements of the Prometheus community would be useful.

https://github.com/cncf/wg-ci/blob/master/projects/prometheus.mkd

As we collect Prometheus and other project CI requirements, we'll use the @cncf/cncf-ci-working-group issue at cncf/wg-ci#12 and encourage you to join the discussion on the cncf-ci Mailing List

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 15, 2017

@sdurrheimer is this something you could lead from the Prometheus side?

@gianarb

This comment has been minimized.

Copy link

gianarb commented Mar 15, 2017

Hello @beorn7 ! I pinged @brian-brazil that forwarded me to @sdurrheimer but we can continue here.
Maybe just verify if this list is still good and how we can make it more detailed.

I am also thinking about loading and performance tests. CNCF is going to have a big number of servers that can be used to stress Prometheus. I suppose that it's something that you are already doing. We can try to make it more frequent there are other projects that require this kind of tests.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Mar 15, 2017

Hey @gianarb. Actually, we just started reproducible load/performance testing very recently, to be found here https://github.com/prometheus/prombench.

It provisions a flexible amount of AWS EC2 instances and deploys a Kubernetes cluster with a production-like workload and can run different Prometheus versions against each other.
It would ultimately be cool to have that run nightly and give us daily performance reports on weather the current version performs as well or better than the last one.
Currently that's tightly bound to AWS. Generalizing it or making it fit to the CNCF servers would be awesome. For that we need more information on the target environment obviously.

@gianarb

This comment has been minimized.

Copy link

gianarb commented Mar 15, 2017

It would ultimately be cool to have that run nightly and give us daily performance reports on weather the current version performs as well or better than the last one.

@hh I think this one can be a good requirement for our working group. I mean. A way to collect all the results that come from stress tests and a way to show them, maybe with some analysis can be applied across projects.

@fabxc I am going to have a look on the prombench project, thanks. About the target environment, I can try to get some info but it will be something usable by API. Let's see :)

@SpamapS

This comment has been minimized.

Copy link

SpamapS commented Mar 15, 2017

Coming from a CI-generalist, not a Prometheus expert, but it would be good to ask "What are the most important integration points for Prometheus?" So if Prometheus has a particular set of services that it depends on, or that depend on it, testing those together is a good way to ensure that you're staying true to API contracts and detecting when your dependencies and reverse dependencies are broken too.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Mar 15, 2017

Prometheus's core functionality is very much self-contained, which makes this whole aspect easier. The only integration worth testing at this point seems to be Prometheus<>Alertmanager, which doesn't have all that much surface area.

Prometheus and Alertmanager OTOH both integrate with a wide range of other systems to "find stuff to monitor" and "send alerts to other things" respectively. Those are largely contributed by other users and we don't have the expertise nor time to test those e2e. The plain amount of them makes it infeasible. Additionally, many integrations are with commerical offerings such as PagerDuty. We don't want to sign up for paid accounts of 10 different services to run e2e tests I suppose.

@SpamapS

This comment has been minimized.

Copy link

SpamapS commented Mar 15, 2017

Of course, well written services won't have a ton of integration points. Sounds like you have at least one in Alertmanager. If you're driving adoption of offerings like PagerDuty. They might be happy to give you free accounts for CI, it's certainly worth a shot and you can feed those secrets in via the github repo settings.

I also see some zookeeper capability in Prometheus, so it would be good to fully exercise that bit of integration, especially for things like "Do prometheus services survive when ZK is partitioned, expanded, and degraded".

These can be tricky to handle in limited settings like TravisCI, but they're just the sort of thing we want to help CNCF enable with Zuul's multi-node testing capabilities.

@gianarb

This comment has been minimized.

Copy link

gianarb commented Apr 5, 2017

Hello! I shared this doc as a proposal for a very tiny and specific upgrade. https://goo.gl/1HtMND @fabxc had look some days ago. I am a bit busy these days because I am moving to Italy but in the meantime, I am happy to receive some feedback and idea if you can.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Apr 5, 2017

So, I was also thinking we could run the suite against PR's tagged "performance". A perfect use-case would be @beorn7's performance PRs that were recently merged: #2559 #2527 #2528 #2529.

Here the patch was internally run and the results/graphs were presented, but an automated dashboard that exposes this info for any PR with a specific label (performance, for ex.) would be good.

@gianarb

This comment has been minimized.

Copy link

gianarb commented Apr 5, 2017

@gouthamve that's a good idea. I am going to add this into the doc
Somebody has a good experience with stress testing and can help us to figure out a good output/way to collect info for this kind of case? As I wrote, this pattern can be applied to other projects. It means that something that we can make "common" can be super good.

@SuperQ

This comment has been minimized.

Copy link
Member

SuperQ commented Apr 27, 2017

A couple of integration test points come to mind:

  • Testing various service discovery methods
    • This could be difficult, but not impossible because we would need to attach access to several cloud providers, or mock their interfaces.
  • Testing some or all of the core exporters
    • node_exporter (as mentioned above)
    • mysqld/snmp/blackbox/etc
    • This is made complicated because some require additional backends, like mysql.
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2017

May be relevant for #2935

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 8, 2018

@gianarb, @hh I hope It is not too late to get this back in motion again.

I prepared a small draft of what we need and will wait few days to see if the other maintainers will add some more, but we have enough to start with.
https://docs.google.com/document/d/1yTPWtWERuyDRCHkyrRYkqBPlepk7CNWRAyqDMuJ9bP0/edit#

Maybe best if can setup a short meetup and decide the first steps to get the party started :)

I am on the prometheus-dev channel if you want to ping me.

@lixuna

This comment has been minimized.

Copy link

lixuna commented Feb 8, 2018

Hi, @krasi-georgiev.

We'd like to resume conversations about collaborating on Prometheus CI testing as well. FYI, the Cross-Cloud CI project has been moved into an org with sub-repos for the different components at https://github.com/crosscloudci. We'll probably want to break things out into tickets for handling specific items like the e2e tests and build configuration.

Is there an email I can use for scheduling a meeting?

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 8, 2018

I am really glad to hear that.

I created a google group and created an event for Thursday so anyone interested can subscribe to the group and will receive an invite.
https://groups.google.com/forum/#!topic/prometheus-ci/ghsTIGuD2WA

The date and time is not set yet so first lets see who wants to attend and will reschedule if needed.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

Is there a reason we can't use the existing -developers list for this? It's not as if it gets much developer traffic already.

@lixuna

This comment has been minimized.

Copy link

lixuna commented Feb 8, 2018

At a high-level, it looks like we have two discussions:

  1. Collaboration on a Prometheus CI cluster that meets the needs of the Prometheus project.

  2. Collaboration on testing Prometheus in the Cross-Cloud CI project that meet the goals of CNCF (eg. Cross-Cloud Dashboard at http://cncf.ci)

As the Cross-Cloud CI project is intended to compliment a project’s CI system, we think we will be able to share and re-use items (eg. e2e tests).

We have created a ticket on the Cross-Cloud CI project crosscloudci/crosscloudci#7 for the Cross-Cloud CI testing discussion.

We can continue to use this ticket (if desired) along with either mailing list for the Prometheus CI cluster discussion.

The next CNCF CI WG meeting will be on Tue, Feb 13 at 8am Pacific Time. At the top of the meeting, there will be a presentation of the updated CI System + Dashboard (cncf.ci).

If you would like to join and discuss additional items, please feel free to add your topic to the agenda.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 8, 2018

@brian-brazil didn't want to overload the dev channel as I thought most subscribers won't be interested in this topic and when I schedule a meeting I can assign to a group so everyone gets an invite. Doing this in the dev channel would be a bit too much and might attract the wrong audience.

@lixuna the google docs is just an early draft so I was hoping to get it a bit more polished before opening an issue. Google doc is better for collaborative editing so will post a message to the issue a bit later when I get feedback form more people.

If Are you suggesting to use the Feb 13 meeting for the Prometheus CI requirement I think that would be too soon to get the requirement in shape an organise the right people for the topic..

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

-dev is for development discussion, and I believe this qualifies. I'd prefer we not prematurely create new mailing lists as that makes things hard to track.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 8, 2018

What about my other comment for the meetings invitations?

Didn't understand why this makes it hard to track. you subscribe and get an email on every post, event etc. In general I don't mind and I didn't consider using this list long term anyway , just till we work out how to start working on this and then will use the usual irc,github etc.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

You can always invite people individually.

Didn't understand why this makes it hard to track. you subscribe and get an email on every post, event etc.

Having to spot and subscribe to every new mailing list that might crop up is not easy.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 8, 2018

whichever is easiest I don't mind at all.

@lixuna

This comment has been minimized.

Copy link

lixuna commented Feb 9, 2018

@krasi-georgiev The CI WG presentation on Tuesday, February 13th would give an overview of the Cross-Cloud CI project and how Prometheus e2e tests provide results to CNCF Cross-Cloud Dashboard https://cncf.ci. The next steps would be discussing collaboration on e2e tests.

We can defer discussing the Prometheus CI project (which will be more extensive), until the requirements are more polished.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 10, 2018

@lixuna Yep I think it is too yearly to discuss the requirements. I think it is best to do this on your next meetup.

Can you send an invite for your meeting on Tuesday to prometheus-ci@googlegroups.com and I will cancel the one I scheduled.
Would it be possible to also give us an overview of how the CNCF CI project is organised and what tools do you use so it gives us an idea how we can fit in the project.

btw @Conorbro posted in the Prometheus-dev group that he already started working on some CI tests around the Service Discovery in Prometheus so if he can join the meetup on Tuesday and give a short demo that would be a good start.

I am really excited to see where this collaboration takes us!

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 12, 2018

@lixuna one more confirmed and few pending for the meeting on Tuesday.
still waiting for the invite.

@lixuna

This comment has been minimized.

Copy link

lixuna commented Feb 12, 2018

@krasi-georgiev I've invited prometheus-ci@googlegroups.com to tomorrow's meeting. We welcome @Conorbro and anyone else who is interested in joining these public, twice monthly meetings.

For your reference, the CI WG meeting is listed on the CNCF Public Events calendar.

An overview of the Cross-cloud CI project to be posted in a new comment.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 12, 2018

Thanks!

@taylor

This comment has been minimized.

Copy link

taylor commented Feb 12, 2018

@krasi-georgiev,

At a high-level, the Cross-cloud CI project consists of:

  • Cross-cloud testing system: build pipeline (optionally use a project's build artifacts), cloud provisioning (cross-cloud), app deployments and e2e testing (cross-project)
  • Status repository server: collects testing results and artifact information from the testing system
  • Status dashboard: displays latest results from the status repository server

Some of the tools used in the cross-cloud testing systems include Gitlab, Terraform, Cloud-init, and Helm. CI integrations use standard libraries where possible (eg. GitLab API, Jenkins API, Docker container registries).

After successful deployment of the Prometheus app container(s), the e2e container is deployed and the tests it contains run.

We would welcome your help with the Prometheus project’s e2e tests that will have results displayed on https://cncf.ci/. Some ways to help include: sharing any existing e2e tests and instructions for running them, creating new e2e tests, standardizing on a location and procedure to run the Prometheus e2e tests outside of the Prometheus project.

We'll go over some of this during the CI WG call tomorrow.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 12, 2018

Thanks for the overview.
@Conorbro was off today, but if am hoping he finds some time to organise a short demo for tomorrow on some work he has done for e2e tests for the service discovery.

the main priority is to replicate what Fabian did a while back with https://github.com/prometheus/prombench and make it run at a larger scale on Packet.

@brancz should be able to give more details tomorrow.

@Conorbro

This comment has been minimized.

Copy link
Contributor

Conorbro commented Feb 13, 2018

Hey folks! I can join the meeting today and explain the approach I've taken to SD CI. No demo prepared unfortunately but the Jenkins I've setup w/ sample tests for EC2 and ZooKeeper can be viewed here.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 16, 2018

@brancz , @Conorbro I think we can close this as any new discussions and proposals can be done in
https://github.com/crosscloudci/crosscloudci

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Feb 16, 2018

Sounds good to me

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 16, 2018

@brancz I don't have access so please feel free to close and if someone objects will reopen.

@brancz brancz closed this Feb 16, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.