Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

/api/flux/v6/images?service=<all> returns context deadline exceeded #913

Closed
foot opened this issue Jan 24, 2018 · 15 comments
Closed

/api/flux/v6/images?service=<all> returns context deadline exceeded #913

foot opened this issue Jan 24, 2018 · 15 comments

Comments

@foot
Copy link

foot commented Jan 24, 2018

The query is probably too big to lookup/encode/transmit in 10s.

Use cases:

  • "Deploy Status" Dashboard
  • building web-UI equivalent fluxctl release --all --update-image=library/hello:v2

We need to choose a subset of the tags.

Ideal response:

  • /images?service=<all>&tagCreatedAt=gtCurrent: Include all the tags where tag.CreatedAt > Current.CreatedAt
    • This would allow us to display in the UI:
      • "You are 4 tags behind latest"
      • "You are 3 weeks behind latest tag"

Small caveat: the ideal response might be too big sometimes? The next idealist response would be:

  • /images?service=<all>&tagLimit=1&onlyFilteredTags=true: include 1 tag for each image that matches whatever tagFilter is set.
    • Which would allow us to say in the ui:
      • "You are 3 weeks behind latest tag"
@foot
Copy link
Author

foot commented Feb 1, 2018

If we could declare a "tagsBefore" and "tagsAfter" additionally to "tagsLimit" (mirroring the /history?params) we would have a lot flexibility in the UI to overlay image information in a timeline like manner.

@foot
Copy link
Author

foot commented Feb 1, 2018

Ideal migration strategy would be to:

  1. Bump the api:
    • add a v7/images? endpoint that supports more options, if we immediately get a 404 show old ui and show a "please upgrade" dialog with confidence.

@squaremo
Copy link
Member

squaremo commented Feb 7, 2018

A bit of evidence for what's happening ..

If I curl the API at our dev environment, I get context exceeded (i.e., a timeout). If I port-forward to the fluxd pod and curl the same API endpoint, I get a full result -- it's about 3MB.

This makes me suspect one of a couple of causes:

  1. the total latency in transmitting this amount of data over the hops from the API to the daemon back to the API just adds up to > 10s (our API timeout)
  2. there's a maximum frame size on one of the hops which causes the response to be dropped, and we time out waiting for it to arrive.

In either case, a remedy is to reduce the amount of data sent. Currently we send O(containers x available images) items; if we summarised it to just the current image, the latest image according to filter policy, and the distance between, that would reduce it to O(containers). We would probably have to implement a shim in the service to do the summary for us if the daemon isn't new enough, but that may be good enough.

@squaremo
Copy link
Member

squaremo commented Feb 7, 2018

I'm willing to make this the default mode of the new API endpoint, and adjust the output of fluxctl list-images to correspond.

@foot
Copy link
Author

foot commented Feb 7, 2018

summarised it to just the current image, the latest image according to filter policy, and the distance between

Yep! That all sounds grand.

Service summary shim would be nice to have but the deadline exceeded is between the service and the daemon? I think a snappy 404 + please upgrade is preferable to extending deadlines perhaps.

@squaremo
Copy link
Member

squaremo commented Feb 7, 2018

I think a snappy 404 + please upgrade is preferable to extending deadlines perhaps.

If we're going to use this API for the main bit of the GUI, I'd rather not suddenly lock people out of it. I don't propose extending the timeout -- just reducing the amount of data sent over the last couple of hops back to the GUI. It may not be enough, but we'll pretty quickly see if this helps, in dev.

@foot
Copy link
Author

foot commented Feb 7, 2018

I'd rather not suddenly lock people out of it.

For sure we want everyone to have all the cool new stuff! If we can change the service to support this new API even w/ old fluxds that would be awesome. I'd imagined that we might not be able to do that, so was planning on leaving the current UI in place for older fluxds. (+please upgrade banner). Here's hoping its enough though! Would save another ui version "fork" 👌

@squaremo
Copy link
Member

squaremo commented Feb 7, 2018

Rough implementation plan:

  • Add an argument to the ListImages invocation called summary. This is a hint to the server (daemon or service) that the response should use the summary fields rather than include all information. It must be optional to observe it, so that old daemons can still process requests.
  • Add summary fields to the ListImages response. If full data is sent, but a summary is required, it can be calculated then and there. A summary response won't be sent unless requested, so there is no danger of a client receiving a response it doesn't understand.

@aaron7
Copy link
Contributor

aaron7 commented Mar 16, 2018

Current ListImages response:

[
  { 
    ID: "workload",
    Containers: [
      {
         name: "container name",
         available: [{},{},{},{},{}],
         current: {},
      }

Having a ?summary query parameter hide fields could be problematic if not well documented because it's not always clear what fields are being hidden and how this changes over time (unless ?summary removes all other fields apart from summary). We could include it in every response and use other filters for reducing the size of the response.

      {
         name: "container name",
         available: [{},{},{},{},{}],
         current: {},
         summary: {
               newAvailableTagsCount: 12,
      }

To reduce data size, we could add a limit filter which is a little bit more explicit on a per field basis. ?avaliableLimit=1 as it's in the available field where the majority of data is.

The above seem to be details which can be discussed in the PR - so I will get that up.

There is the case where if the client sets the limit but the summary information is not available then it can't calculate that information, but this shouldn't be the case if both were introudced in the same version.

@squaremo
Copy link
Member

Having a ?summary query parameter hide fields could be problematic if not well documented because it's not always clear what fields are being hidden and how this changes over time

Not sure what you mean, @aaron7 -- do you mean that we might change how ?summary works, and that could break API clients (like the Weave Cloud UI)? Or something else.

@squaremo
Copy link
Member

Oh I think I understand -- summary is too general and we might want to expand its meaning later. Yes, use something more specific, so we can fix its meaning.

@aaron7
Copy link
Contributor

aaron7 commented May 8, 2018

From in-person discussion with @squaremo :

Current ListImages response is:

[                   <------------- list of workloads
  { 
    ID: "workload",
    Containers: [                    <-------------- list of containers for a workload
      {
         name: "container name",
         available: [{},{},{},{},{}],
         current: {},
      }

We would like to add some summary information for objects under the list of Containers:

  • availableTagsCount - count of all the available tags
  • newAvailableTagsCount - count of all the new available tags (ones after the current)

@squaremo is tags the right word to use here or would images be more suitable?


These will be added as extra fields and all requests will include these by default. We will however introduce a new query parameter to allow the client to only request specific fields.

i.e. ?containerFields=name,current,availableTagsCount would only return the name, current tag object and the count of available tags. If this query parameter is not used, by default all fields would be returned.

@squaremo
Copy link
Member

squaremo commented May 8, 2018

We would like to add some summary information for objects under the list of Containers:

availableTagsCount - count of all the available tags
newAvailableTagsCount - count of all the new available tags (ones after the current)
@squaremo is tags the right word to use here or would images be more suitable?

images is better here; although the base, untagged image will be the same for a particular container, with only the tag varying, we typically use whole image refs (i.e., both the base and the tag) as a unit, and don't use tags on their own.

These will be added as extra fields and all requests will include these by default. We will however introduce a new query parameter to allow the client to only request specific fields.

... within the container struct.

i.e. ?containerFields=name,current,availableTagsCount would only return the name, current tag object and the count of available tags. If this query parameter is not used, by default all fields would be returned.

✔️ I like this approach, because it's both forward- and backward-compatible (or at least, it punts compatibility to another layer). NB we would want to return a HTTP 40x status if we don't understand any of the fields mentioned.

@squaremo
Copy link
Member

squaremo commented May 8, 2018

We may also want the fields

  • filteredImagesCount
  • newFilteredImagesCount

since they will often (always?) be more useful than the available* counts.

@foot
Copy link
Author

foot commented May 8, 2018

Yes!

Additionally it would be awesome to have

  • latest: {}: the latest image that matches any set tag filters.

Then we'd have all the summary info for the deploy-status-table and could avoid available: [] there too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants