Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create better status for concurrency-limited & duplicated builds #9100

Closed
ericholscher opened this issue Apr 12, 2022 · 12 comments
Closed

Create better status for concurrency-limited & duplicated builds #9100

ericholscher opened this issue Apr 12, 2022 · 12 comments
Assignees
Labels
Needed: design decision A core team decision is required

Comments

@ericholscher
Copy link
Member

ericholscher commented Apr 12, 2022

Currently when a build is concurrency limited it looks like this:

Screen Shot 2022-04-12 at 9 54 58 AM

And on the list screen:

Screen Shot 2022-04-12 at 9 54 55 AM

This is not communicating to the user the actual state of things. This build isn't waiting to run, it's delayed because of a concurrency limit. We should be communicating this more clearly to users.

Similarly, duplicated builds are marked as failed:

Screen Shot 2022-04-12 at 9 55 55 AM

This makes the user think something went wrong, but the build was actually just cancelled. This should be communicated to the user in a nicer way as well.

Implementation

We could either add new statuses for these outcomes, or just change the display of the status based on the failure message. I don't feel strongly here, but it seems like an actual status is the most clear.

@agjohnson agjohnson added the Needed: design decision A core team decision is required label Apr 13, 2022
@agjohnson
Copy link
Contributor

agjohnson commented Apr 13, 2022

So, I actually think "Triggered" isn't too bad. To the user, "Triggered" vs "Queued" is still effectively "Your build will be attempted soon". To me, this doesn't feel like an important designation as we've mostly reduced the time builds sit around before being accepted by a builder. So, "Triggered" feels like it's mostly gone away and "Queued" is the state we're talking about here. And it feels safe to lump both into one state.

I would agree that "Triggered" is not ideal though. I've advocated for "Queued" instead of "Triggered" for a few of the reasons here.

I am treating "Triggered" as "Queued" visually for now in fact, and will soon release something closer to:

image

I agree surfacing build metadata would be a UX improvement though, but I think that our previous conversations here were on the right track:

#3399

The issue that we have with errors now is that the logic to determine if a build had an error is template logic 😬. These really should be some model state so that I can include basic conditional elements in multiple templates.

For example, if we wanted to show some error states on the build listing, I would inspect the build error objects attached to the build:

image

Currently when a build is concurrency limited it looks like this

I actually noticed that after a minute, concurrency limited builds are failed:

image

Notice the timeframes on the right, it seems to happen after a minute or two. That's definitely not a great designation for a build that will be retried.

Similarly, duplicated builds are marked as failed

Agreed, this also feels odd. I would probably want to show this as "Cancelled".

With attached build error metadata, we could visually show a project that failed because of a user build error vs one that failed because of an application error, all by inspecting the build error classes attached: {% if build.errors.has_class(BuildErrorApplicationError) %}We bunged something up{% endif %}

So immediately, I feel like work might be:

  • Concurrency limited builds (and maybe duplicated builds) shouldn't end up in a failed state. Perhaps the bugs we've had here are related to build state.
  • We re-raise the discuss about error modeling and get that on one of the upcoming sprints. I think we're at agreement with implementation there.

@humitos
Copy link
Member

humitos commented Apr 13, 2022

I agree with @agjohnson that Queued could be a good replacement for Triggered. However, I don't feel super strong, but considering we are going to do these changes, I think it's good to include it here.

One thing that I'd like to simplify and unify is the usage of state, status, and success to reduce complexity. We should only have state, communicating where and why in the same word the build is in the process. This means, for example, splitting the current finished state into succeeded states and failed states:

  • Succeeded states: Succeed
  • Failed states: Duplicated, Cancelled, Command failed, Timed out, Memory error, etc

This won't require checking succeed to know if it succeed or failed. Also, it will keep the data consistent in our database, avoiding having to define success immediately after the build is created (see

).

We talked about where and why (state, and status) in #7123 (comment) which could help to have some context here.

Basically, I'm proposing to expand state= field to have more states and communicate exactly what's happening with the build. If we need to communicate a longer error message, we can use Build.error as we are doing now, and Build.warnings (or Build.messages or similar) to communicate extra messages related to the build (for example a normal message could be "You can get the most from Read the Docs by using a config file. Check out our docs!" and an error message "Your build is consuming too many resources, please follow this guide to reduce them"); which is being discussed in the issue linked by Anthony: #3399

@agjohnson
Copy link
Contributor

agjohnson commented Apr 14, 2022

I agree with all of that! 👍

Edit: I thought there was a naming discrepancy with the above plan wrt state/status, but I think we actually have differing plans here.

I do agree that there are some more immediate fixes we can make on this issue though. But given the above discussion, maybe we just pick solutions that aren't at odds with our more complete vision. So, perhaps let's find a middle ground for the fixes for this issue first?

I'll open up an issue to continue this particular conversation, and let's revisit this in a sprint. I think we have really good plans here, and think we just need to formulate a plan to bring everything together. Perhaps it makes sense to talk through new build errors and build status all at once?


@humitos
Copy link
Member

humitos commented Apr 25, 2022

I'm not sure what's the immediate change wanted on this issue. It seems the change required will be dictated by the decision made in the linked issues about the new states. I think we have two separate issues here:

This build isn't waiting to run, it's delayed because of a concurrency limit. We should be communicating this more clearly to users

It seems that we are all in agreement about renaming Triggered state into Queued, which seems to reduce the confusion reported by Eric originally and it's compatible with the planned future.

duplicated builds are marked as failed

This issue won't be solved by renaming Triggered state and this one requires a final decision about the states. I'm proposing using Duplicated as a final state in #3399 (comment)

@agjohnson @ericholscher what's the wanted/required next action for this issue?

@agjohnson
Copy link
Contributor

I think we jumped right into refactoring where are refactor really isn't needed right now. To add a duplicated status, we only need to add Build.status = Build.DUPLICATED, and is possible with our current data structure.

@humitos
Copy link
Member

humitos commented Apr 26, 2022

@agjohnson I'm not sure to follow you here. We are already setting the Build.status as duplicated in

build.status = DuplicatedBuildError.status

That code makes the UI show the message "Error: Duplicated build". However, @ericholscher's concern is about marking a duplicated build as failed (state=Finished and success=False) --which I think it's the best we can do with the current implementation. If not, what is the expected combination of values for the fields state, success, status, exit_code, and error in the duplicated build scenario? Also, what are the UI changes that we want to reflect this if any?

@agjohnson
Copy link
Contributor

agjohnson commented Apr 26, 2022

Whoops, I wasn't paying close attention, my example was not the full example I meant to give 🙃

what is the expected combination of values for the fields state, success, status, exit_code, and error in the duplicated build scenario

I meant to say add a `Build.state = "cancelled", so:

Build(state="cancelled", status="duplicated", success=False)

I think cancelled should eventually be a Build.state value. I would prefer builds that are cancelled (by user, automatically, and when there is a duplicated build) to show as a soft failure state in the UI -- not the same error state as a build command failing.

So eventually, there is also:

Build(state="cancelled", status="by_user")

Build(state="cancelled", status="automatically", messages=[BuildMoreRecentCommitError()])

etc

humitos added a commit that referenced this issue Apr 27, 2022
This is just a POC to show some of the work involved in #9100
@humitos
Copy link
Member

humitos commented Apr 27, 2022

I'm still confused and I'm not sure what's the work expected. In particular, because this involves UI changes and I'm not sure how we want to visualize them.

  • What's the benefit of adding an extra "final state" if we are adding just one? It seems we can have the same behavior without adding an extra final state and just using "Finished" and checking the .status from the UI: Build(state="finished", status="cancelled_by_user"), etc
  • Adding an extra final state requires changes in the after_return Celery handler (it won't be always "Finished"), making the logic more complex. We will need to check if the build already has another final state and in that case, don't change it:
    self.update_build(BUILD_STATE_FINISHED)
  • We need to change the UI on build details to show "Cancelled" state as a final state instead of an intermediate one with the spinner next to it
  • This work involves touching build_detail.html template and Knockout.js code
  • Changes to the UI will be discarded with the new ext-theme

That said, do we want to do this work now and discard it in the next months when we release ext-theme? Is there a big difference between "Build cancelled. Duplicated build" (this change proposed) vs. "Build failed. Error: Duplicated build" (current implementation) that is worth this effort?

I opened a POC showing some of the changes required as an example, but I wasn't able to make the JS work, tho. See #9145

@humitos
Copy link
Member

humitos commented Apr 27, 2022

Also, note that "duplicated builds" feature is under a feature flag and we weren't able to remove it yet because #7306 (comment). So, I think the easiest path to move forward is to remove duplicated builds for all users until it works properly and we have the UI implemented in the ext-theme properly.

@agjohnson
Copy link
Contributor

It seems we can have the same behavior without adding an extra final state and just using "Finished" and checking the .status from the UI: Build(state="finished", status="cancelled_by_user")

Long term, this is sort of close to what I'm describing too, but I think that refactor is either far out or just a straight up no-go. Short term, I think we work with what we have and don't change any patterns around.

So for that, I don't think we should use a different pattern to describe build state. We use build.state/build.get_state_display()/etc to describe state already, and should work with that pattern. It's mostly fine and does what it needs to.

Also, Build.state = "cancelled" makes more sense to me, as there is no Build.success for "cancelled" state builds, it's neither value.

The main issue is the build.success field. As long as we have a build.success field, build.state = "finished" is going to be overloaded. The meaning of build.state = "finished" depends on build.success, and can mean the build failed, the build passed, or just the build finished. Adding build.status = "cancelled" as effectively a tertiary value to build.success would almost certainly make this confusion worse.

Long term, we can continue discussion about whether it makes sense to move build.success into a state or status value. But again, this might be no-go, it seems low value.

That said, do we want to do this work now and discard it in the next months when we release ext-theme?

Do we want to wait until the old dashboard templates are deprecated? I'm guessing no, and would probably lean towards just surfacing the state in a beta of the new templates. But we still need to add the model field value and store it in the builder code. Our old templates will need to compensate for this and skip the state.

This is where I would point back to abstracting Build data through an abstract class or using Build model methods to translate Build data. The check for finished builds should almost certainly be a model method at this point, and would allow for:

class Build:

    def is_finished(self):
        if settings.RTD_EXT_THEME_ENABLED:
            return self.state == "finished"
        else:
            return self.state in ["finished", "cancelled"]

Either way, it would be incredibly wise to discontinue hard coding values in templates, which is another reason this change is more complicated than it needs to be.

humitos added a commit that referenced this issue May 4, 2022
This commit introduces a new build final state: "Cancelled".
This state is shown in the build listing page and also in the build details
page. The build will end up on this state when:

- The user cancels the build by clicking on the button
- The build is cancelled due to inactivity
- The build is cancelled because it was duplicated

Ther are a lot of changes required for this:

 - HTML template logic for build details and listing pages
 - API changes to add a new value
 - Knockout.js logic for build details page
 - `Build` model changes to support a new state
 - Update different places of core logic to change checking for `finished` state
   to `BUILD_FINAL_STATES` instead
 - Documentation changes to mention the new state
 - Celery build handler `after_return` to decide if it's `finished` or
   `cancelled` the ending state
 - Logic to decide if the build should be killed due to inactivity

Reference #9100
@humitos
Copy link
Member

humitos commented May 4, 2022

I updated my initial draft PR in #9145 which covers this. Let's continue the conversation for the details on that PR.

@humitos
Copy link
Member

humitos commented May 23, 2022

I'm closing this issue because the actionable from it was already done in #9145. Besides, there are other issues opened to track the other parts required in the backend and in the frontend. Feel free to re-open if you consider there is more work required here.

@humitos humitos closed this as completed May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required
Projects
Archived in project
Development

No branches or pull requests

3 participants