Submit Queue / munger: keep state over restarts #1042

lavalamp · 2016-05-23T22:39:51Z

Could be as simple as storing a static file in GCS or as complex as adding a database to the things we run in the utility cluster.

P0 Requirements:

remember merge stats across restarts.

lavalamp · 2016-05-23T23:31:01Z

@eparis @mikedanese @fejta maybe we can come up with a list of things we want to save.

@mhrgoog It'd be great if we could collect very explicitly in one place the set of things that are getting persisted.

ghost · 2016-05-24T04:16:15Z

Right now I am brainstorming and throwing ideas against the wall. I am not sure if this justifies a design doc or not. Here are three potential possibilities:

A protobuffer that is stored in a file.
Pros: Provides backwards compatibility and parsing
Cons: It's just a protobuffer. One must read in the whole structure before it can be used, concurrent operations may not be easy.
Key value store: We can use some key value store

Pros: Concurrent use should be easier.
Not sure which one to pick and how robust they are. Maybe this is obvious to veterans of the team

Full on relational DB

Pros: fun with queries and ability to mine information flexibly
Cons: Schema management can be a hassle.

I think even if we knew what we wanted to save now the list would change. But knowing the size of data we want would help a lot.

@fejta you have any ideas of what you are looking for?

apelisse · 2016-05-24T04:43:46Z

What problem are we trying to fix here?

fejta · 2016-05-24T05:44:10Z

@mhrgoog The data in http://submit-queue.k8s.io/#/e2e is great... except it disappears whenever we restart the merge queue. I want it to serialized to a GCS object so we can avoid that.

Specifically I need to be able to measure the following each week:

What percentage of the time was the merge bot healthy last week?
Which job was the most unhealthy last week?
How many things did it merge in the past week?

Right now I cannot because whenever we restart the mergebot everything is lost. So I wind up tracking these values since 4 hours ago instead.

At this point I am not concerned about fun with queries or concurrency. I want to be able to answer those three questions.

lavalamp · 2016-05-24T05:57:40Z

I recommend a json object over a protobuf.
On May 23, 2016 10:44 PM, "Erick Fejta" notifications@github.com wrote:

@mhrgoog https://github.com/mhrgoog The data in
http://submit-queue.k8s.io/#/e2e is great... except it disappears
whenever we restart the merge queue. I want it to serialized to a GCS
object so we can avoid that.

Specifically I need to be able to measure the following each week:

What percentage of the time was the merge bot healthy last week?

Which job was the most unhealthy last week?

How many things did it merge in the past day?

Right now I cannot because whenever we restart the mergebot everything is
lost. So I wind up tracking these values since 4 hours ago instead.

At this point I am not concerned about fun with queries or concurrency. I
want to be able to answer those three questions.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1042 (comment)

eparis · 2016-05-24T13:11:14Z

I'm hearing about saving and analyzing lists of time series data. Neither are things I want the mungegithub tools to get much better at. I'd rather mungegithub remained focused on github and automating stuff around github and we build 'something else' to handle the data/visualization aspects we have been adding.

Should we be dumping (or having something poll) the e2e tests data and the history data into something like influxdb which is actually designed to hold the data? And I think has easy stuff to do visualization and understanding on that data?

I'm the one who started the process of showing stats in the submit-queue, but as we want more we're probably best to ask what the right solution is, not what continues to be 'easy' to bolt onto the side...

Especially since in my mind the thing that would be GREAT to save across reboot is the proxied cache of github object state. So we don't have such a slow re-start and we don't run out of API tokens on restart...

lavalamp · 2016-05-24T19:02:54Z

I think @eparis is probably right, the best thing would be to publish metrics (I think @apelisse or @rmmh already started publishing via promethieus?) and scrape them regularly so we can get time-series data for this stuff.

...however I'm super interested in getting something working yesterday. I'm OK with different short- and long- term solutions.

eparis · 2016-05-24T19:11:40Z

@lavalamp no fights from me.

lavalamp · 2016-05-24T20:24:09Z

Remembering the queue order across restarts would be super handy, too. Right now it's chugging away on an unimportant PR instead of 1.3 PRs.

lavalamp · 2016-08-26T00:26:12Z

I don't think we have an immediate need here now. We keep the github cache and the stat-scraping script handles queue restarts. Closing.

ghost self-assigned this May 23, 2016

lavalamp added area/mungegithub kind/velocity-improvement labels May 23, 2016

ghost removed their assignment Jun 2, 2016

lavalamp closed this as completed Aug 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submit Queue / munger: keep state over restarts #1042

Submit Queue / munger: keep state over restarts #1042

lavalamp commented May 23, 2016

lavalamp commented May 23, 2016

ghost commented May 24, 2016

apelisse commented May 24, 2016

fejta commented May 24, 2016 •

edited

Loading

lavalamp commented May 24, 2016

eparis commented May 24, 2016

lavalamp commented May 24, 2016

eparis commented May 24, 2016

lavalamp commented May 24, 2016

lavalamp commented Aug 26, 2016

Submit Queue / munger: keep state over restarts #1042

Submit Queue / munger: keep state over restarts #1042

Comments

lavalamp commented May 23, 2016

lavalamp commented May 23, 2016

ghost commented May 24, 2016

apelisse commented May 24, 2016

fejta commented May 24, 2016 • edited Loading

lavalamp commented May 24, 2016

eparis commented May 24, 2016

lavalamp commented May 24, 2016

eparis commented May 24, 2016

lavalamp commented May 24, 2016

lavalamp commented Aug 26, 2016

fejta commented May 24, 2016 •

edited

Loading