Skip to content

Introduce Kill command to manually abort an invocation #418

Closed
tillrohrmann wants to merge 2 commits intorestatedev:mainfrom
tillrohrmann:issue#406
Closed

Introduce Kill command to manually abort an invocation #418
tillrohrmann wants to merge 2 commits intorestatedev:mainfrom
tillrohrmann:issue#406

Conversation

@tillrohrmann
Copy link
Copy Markdown
Contributor

This PR is based on #415 in order to avoid merge conflicts. Only the last two commits are relevant for this PR.

The kill command fails an invocation with the error code 10 which
represents the aborted grpc status code. Moreover, it sends an abort
message to the invoker to stop executing this invocation.

If the specified service invocation is not executing (invoked or
suspended), then the command will be ignored.

This fixes #406.

Comment on lines +21 to +33

/// Error codes used to communicate failures of invocations. Currently, this is tied to
/// grpc's error codes. See https://github.com/grpc/grpc/blob/master/doc/statuscodes.md#status-codes-and-their-use-in-grpc
/// for more details.
pub enum ErrorCode {
Aborted = 10,
}

impl From<ErrorCode> for i32 {
fn from(value: ErrorCode) -> Self {
value as i32
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't define a new error code struct. Rather simply use Code from tonic, as we're currently doing in other places, like the invoker.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so going back to this again, here is my current reasoning about this topic:

  • This type is probably in the wrong place here and with the wrong name. As already mentioned in some other PRs, I feel common is becoming a pit where we put all the types we don't know where else to put. This mod in particular seems really off, as GenericError is not here, there's ConversionError which seems unrelated to everything else, and should probably be defined together with the conversions this belongs to.
  • The name of the type is way too generic and we already have somewhere else the terminology restate error codes. Somehow the mod/name combination of this type should explicit in some way that we're talking about an invocation error code or similar.
  • Our error code space today is the same of grpc error codes, and going forward (assuming we'll continue with grpc), it must remain at least a subset of that. We have no reason now to believe we won't continue with grpc, so I would rather not build with that in mind.

Because of all the aforementioned points, I think it makes little sense to define a new data structure for error codes, adding more and more types to maintain to our type system. If we really want to have a name/type for invocation error codes, then type InvocationErrorCode = tonic::Code is enough IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced the ErrorCode for the following reasons:

  • At the moment we are using grpc but it is not set in stone that this won't change in the future (e.g. supporting a different rpc protocol). Hence, I believe we should build the runtime in such a way that it can work with grpc but also with a different rpc format.
  • Having our own ErrorCodes will allow us to be more expressive and not limited by grpc's restrictions. E.g. if we support a different rpc format we might be able to use the richer error codes. For grpc we would have to translate from our ErrorCode to the grpc error codes when we interact with grpc services. This would also entail that the invoker should use our own ErrorCode instead of tonic's.
  • I would like to avoid spreading the tonic dependency throughout our code base. So far we only need the tonic dependency at the periphery.

If you have a good idea for a name, then let's change it. Also the internal organization of common can be changed.

GenericError is currently in the common::utils module. Maybe we move it over to errors. This would be unrelated to this PR.

The ConversionError is in the common crate because it is part of the storage_api crate as well as used in the storage_proto crate. Both crates don't know each other. I guess one could move the error type to storage_api, then introduce a different error for the storage_proto crate and finally add a translation from this new error type to the ConversionError in the storage_rocksdb crate. I haven't done this because it seemed too much work for too little gain at the moment.

Copy link
Copy Markdown
Contributor

@slinkydeveloper slinkydeveloper May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For grpc we would have to translate from our ErrorCode to the grpc error codes when we interact with grpc services.

But viceversa is also something we need to support, because we might receive some specific status code the user can throw in their code: https://github.com/restatedev/e2e/blob/6ea14b1a691b2eba9e42915bfd7c14170885ddcf/services/java-services/src/main/java/dev/restate/e2e/functions/errors/FailingService.java#L29. That's why I said this ErrorCode must contain at least all the error codes of tonic::Status, and why this type doesn't make much sense now, because it needs to look exactly like tonic::Code to make sense and use it in EntryResult and other places.

At the moment we are using grpc but it is not set in stone that this won't change in the future (e.g. supporting a different rpc protocol). Hence, I believe we should build the runtime in such a way that it can work with grpc but also with a different rpc format.

If and when we'll replace grpc, or add on top of grpc another rpc system, then we'll need our own error space, at least containing the grpc error codes, and provide conversions back and forth the rpc protocol specific data structures. We had this discussion many times that we shouldn't overengineer for this eventuality.

I would like to avoid spreading the tonic dependency throughout our code base. So far we only need the tonic dependency at the periphery.

Consider that tonic with default-features brings only the grpc domain types, no network code and similar, so it's quite slim.


In general, my comment is that I personally find the whole error code stuff quite a mess right now, because sometimes we use tonic::Code, sometimes i32 (because in past we already said we don't want the tonic dependency, although very slim), and now we're adding a new type, not really fully fleshed out, used only in a specific place and not replacing all the other usages of i32, tonic::Code and friends, all of that throws me a bit off because it doesn't solve any problem we currently have, it just adds complexity of the codebase adding a new thing that we need to refactor later and unify. For the sake of getting this task done, I would have preferred to simply use the tonic::Code in the state machine code (or even just the aborted code number), and then get later to the whole error code story with a proper refactoring issue, putting together all the requirements we want to fulfill.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, let's keep as it is, I've opened this issue #421

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Let's follow up on this issue.

Having the ErrorCode with the single code that is used in one place won't make things worse than having the error code hard coded there, so it should not really add complexity for future refactorings.

@tillrohrmann
Copy link
Copy Markdown
Contributor Author

Thanks for the review @slinkydeveloper. I've responded to your comments and pushed some fixup commits.

The kill command fails an invocation with the error code 10 which
represents the aborted grpc status code. Moreover, it sends an abort
message to the invoker to stop executing this invocation.

If the specified service invocation is not executing (invoked or
suspended), then the command will be ignored.

This fixes #406.
@tillrohrmann
Copy link
Copy Markdown
Contributor Author

Sorry, I had to do a force push. I hope this does not cause too much trouble.

@slinkydeveloper
Copy link
Copy Markdown
Contributor

Merged in #422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce kill state machine command (partition processor)

2 participants