Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Make status faster (in theory). #7766
Conversation
|
I tried testing this on a controller with a couple thousand unit agents. I got up to around 2800 unit agents, and 'time juju status' when everything was happy was around 6.7s. Directly after 'juju upgrade-juju', all the agents were connecting to the new controller, CPU was generally maxed, and 'time juju status' took 2m50s. Once things quieted down a bit and the only thing in the log file was now "waking to check leases", 'time juju status' was 4.8, 5.3, 5.6, 5.8, 5.0s. Its quite hard for me to test how well 'juju status' is under actual load (how do you know if the time is different because the load is different, or if it is because the code is different). So I can't say that 2m50s is significantly different from 2m2s (that your code makes things slower under load), and variation means I can't even really say that it made a concrete difference under load. So I can't say for sure whether this made a concrete difference or not. I think the only way to find the difference would be to start over again, and run enough 'juju status' to see the difference in the noise. |
| + return noStatus, errors.Annotate(err, "could not fetch model") | ||
| + } | ||
| + if context.status, err = context.model.LoadModelStatus(); err != nil { | ||
| + return noStatus, errors.Annotate(err, "could not load model status values") |
jameinel
Aug 21, 2017
Owner
Would we want to do this contextually on "len(args.Patterns) <= 0" ? I'm just wondering if "juju status foo" is going to have to load the status of everything first.
| appUnitMap := make(map[string]*state.Unit) | ||
| for _, u := range units { | ||
| - appUnitMap[u.Name()] = u | ||
| + if u.ApplicationName() == app.Name() { |
jameinel
Aug 21, 2017
Owner
3000 units in a model with 190 applications becomes 1903000 to evaluate this loop.
If you want to do this, we need to first put the units into a map on app name => units. (which happens to be what I was specifically testing, ~190 apps, each with 15 units.)
That may not be the specific layout, but given you can convert this from O(NM) to O(N*1) by putting it into a hash table, I think its better to do so.
| machinesMap[id] = hostStatus | ||
| cache[id] = hostStatus | ||
| for _, machine := range machines[1:] { | ||
| parent, ok := cache[state.ParentId(machine.Id())] | ||
| if !ok { | ||
| - panic("We've broken an assumpution.") | ||
| + logger.Errorf("programmer error, please file a bug, reference this whole log line: %q, %q", id, machine.Id()) |
| + ipAddresses := c.ipAddresses[machineID] | ||
| + spaces := c.spaces[machineID] | ||
| + linkLayerDevices := c.linkLayerDevices[machineID] | ||
| + |
jameinel
Aug 21, 2017
Owner
A lot of this data never gets rendered in tabular format.
I wonder if we should move away from a FullStatus that generates everything and a renderer that filters out some of it.
| @@ -1062,10 +1082,24 @@ func populateStatusFromStatusInfoAndErr(agent *params.DetailedStatus, statusInfo | ||
| agent.Since = statusInfo.Since | ||
| } | ||
| +// contextMachine overloads the Status call to use the cached status values, | ||
| +// and delegates everything else to the Machine. | ||
| +// TODO: cache presence as well. |
jameinel
Aug 21, 2017
Owner
Presence is already in-memory structures, I don't think we need yet-another cache for it.
I do have code that would allow you to ask about the presence information of multiple agents in one channel send and receive, instead of each agent being a separate request.
I just didn't bother updating it, because all the accesses were thisunit.Status() so we didn't have a way to call the bulk presence function.
howbazaar
Aug 21, 2017
Owner
How efficient would it be to ask for all the unit agent presence in one go?
| @@ -246,8 +246,12 @@ func allCollections() collectionSchema { | ||
| // ----- | ||
| // These collections hold information associated with applications. | ||
| - charmsC: {}, | ||
| - applicationsC: {}, | ||
| + charmsC: {}, |
jameinel
Aug 21, 2017
Owner
I stopped here for tonight. I may try to give it another look tomorrow.
Would it be reasonable to document why we want a given index somewhere?
babbageclunk
approved these changes
Aug 22, 2017
Looks like a good change - it seems like reducing the number of queries would be worthwhile. Have you done any profiling of this code?
| if context.applications, context.units, context.latestCharms, err = | ||
| - fetchAllApplicationsAndUnits(c.api.stateAccessor, len(args.Patterns) <= 0); err != nil { | ||
| + fetchAllApplicationsAndUnits(c.api.stateAccessor, context.model, len(args.Patterns) <= 0); err != nil { |
babbageclunk
Aug 22, 2017
Member
I realise this is pre-existing, but the <= here seems needlessly obtuse. There's no way len can return a negative result, right?
| appUnitMap := make(map[string]*state.Unit) | ||
| for _, u := range units { | ||
| - appUnitMap[u.Name()] = u | ||
| + if u.ApplicationName() == app.Name() { |
jameinel
Aug 21, 2017
Owner
3000 units in a model with 190 applications becomes 1903000 to evaluate this loop.
If you want to do this, we need to first put the units into a map on app name => units. (which happens to be what I was specifically testing, ~190 apps, each with 15 units.)
That may not be the specific layout, but given you can convert this from O(NM) to O(N*1) by putting it into a hash table, I think its better to do so.
| + | ||
| +// LoadModelStatus retrieves all the status documents for the model | ||
| +// at once. Used to primarily speed up status. | ||
| +func (m *Model) LoadModelStatus() (*ModelStatus, error) { |
babbageclunk
Aug 22, 2017
Member
I think @jameinel's right - it would be a pity to dramatically slow down status with a filter pattern on a large model. Would it be hard to apply the filters here?
| + return doc, nil | ||
| +} | ||
| + | ||
| +func (m *ModelStatus) transform(doc statusDocWithID) status.StatusInfo { |
howbazaar
added some commits
Aug 22, 2017
|
$$merge$$ |
|
Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju |
howbazaar commentedAug 21, 2017
•
Edited 1 time
-
howbazaar
Aug 21, 2017
juju status can be slow on big models in big controllers.
This branch attacks the problem from two different directions. Firstly the missing indices on the applications and machine collections. Both of these are fully loaded when status is executed. In controllers with many models the query would involve table scans.
Secondly when the apiserver code was iterating over the machines an units, each one had multiple calls to some value stored in the statuses collection. Instead of executing these queries one at a time, all the statuses for the model are loaded at once at the start of the status call.
This then reduces the number of calls for the status values by 2 per machine, and 3 per unit (or 4 per unit if SetStatus has never been called on the application. Also reduces the queries on statuses history by 1 per unit by making the full workload version status document available through the cache. As well as the status caching, a single query is made to the db for all the units in the model rather than one call per application.
So, for a model with 50 applications, 2500 units and 300 machines, this branch reduces the query count by 10648 (or 13148 if no app called SetStatus).
QA steps
Run
juju status