Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated LRO guidelines #517

Merged
merged 10 commits into from May 6, 2024
240 changes: 166 additions & 74 deletions azure/ConsiderationsForServiceDesign.md
Expand Up @@ -6,6 +6,7 @@

| Date | Notes |
| ----------- | -------------------------------------------------------------- |
| 2024-Mar-17 | Updated LRO guidelines |
| 2024-Jan-17 | Added guidelines on returning string offsets & lengths |
| 2022-Jul-15 | Update guidance on long-running operations |
| 2022-Feb-01 | Updated error guidance |
Expand Down Expand Up @@ -207,7 +208,7 @@ It is good practice to define the path for action operations that is easily dist
2) use a special character not in the set of valid characters for resource names to distinguish the "action" in the path.

In Azure we recommend distinguishing action operations by appending a ':' followed by an action verb to the final path segment. E.g.
```http
```text
https://.../<resource-collection>/<resource-id>:<action>?<input parameters>
```

Expand All @@ -216,7 +217,7 @@ cannot collide with a resource path that contains user-specified resource ids.

## Long-Running Operations

Long-running operations are an API design pattern that should be used when the processing of
Long-running operations (LROs) are an API design pattern that should be used when the processing of
mikekistler marked this conversation as resolved.
Show resolved Hide resolved
an operation may take a significant amount of time -- longer than a client will want to block
waiting for the result.

Expand All @@ -225,33 +226,111 @@ a _status monitor_, which is an ephemeral resource that will track the status an
The status monitor resource is distinct from the target resource (if any) and specific to the individual
operation request.

A POST or DELETE operation returns a `202 Accepted` response with the status monitor in the response body.
A long-running POST should not be used for resource create -- use PUT as described below.
PATCH must never be used for long-running operations -- it should be reserved for simple resource updates.
If a long-running update is required it should be implemented with POST.
There are four types of LROs allowed in Azure REST APIs:

1. An LRO to create or replace a resource that involves additional long-running processing.
mikekistler marked this conversation as resolved.
Show resolved Hide resolved
2. An LRO to delete a resource.
3. An LRO to perform an action on or with an existing resource (or resource collection).
4. An LRO to perform an action not related to an existing resource (or resource collection).

The following sections describe these patterns in detail.

### Create or replace a resource requiring additional long-running processing
<a href="#put-with-additional-long-running-processing"></a> <!-- Preserve anchor of previous heading -->

A special case of long-running operations that occurs often is a PUT operation to create or replace a resource
that involves some additional long-running processing.
One example is a resource that requires physical resources (e.g. servers) to be "provisioned" to make the resource functional.

In this case:
- The operation must use the PUT method (NOTE: PATCH is never allowed here)
- The URL identifies the resource being created or replaced.
- The request and response body have identical schemas & represent the resource.
- The request may contain an `Operation-Id` header that the service will use as
mikekistler marked this conversation as resolved.
Show resolved Hide resolved
the ID of the status monitor created for the operation.
mikekistler marked this conversation as resolved.
Show resolved Hide resolved
- If the `Operation-Id` matches an existing operation and the request content is the same,
treat as a retry and return the same response as the earlier request.
Otherwise fail the request with a `409-Conflict`.

```text
PUT /items/FooBar&api-version=2022-05-01
Operation-Id: 22

{
"prop1": 555,
"prop2": "something"
}
```

There is a special form of long-running operation initiated with PUT that is described
in [Create (PUT) with additional long-running processing](./Guidelines.md#put-operation-with-additional-long-running-processing).
The remainder of this section describes the pattern for long-running POST and DELETE operations.
In this case the response to the initial request is a `201 Created` to indicate that
the resource has been created or `200 OK` when the resource was replaced.
The response body should be a representation of the resource that was created,
and should include a `status` field indicating the current status of the resource.
A status monitor is created to track the additional processing and the ID of the status monitor
is returned in the `Operation-Id` header of the response.
The response must also include an `Operation-Location` header for backward compatibility.
If the resource supports ETags, the response may contain an `etag` header and possibly an `etag` property in the resource.
mikekistler marked this conversation as resolved.
Show resolved Hide resolved

```text
HTTP/1.1 201 Created
Operation-Id: 22
Operation-Location: https://items/operations/22
etag: "123abc"

{
"id": "FooBar",
"status": "Provisioning",
"prop1": 555,
"prop2": "something",
"etag": "123abc"
}
```

This diagram illustrates how a long-running operation with a status monitor is initiated and then how the client
The client will issue a GET to the status monitor to obtain the status of the operation performing the additional processing.

```text
GET https://items/operations/22?api-version=2022-05-01
```

When the additional processing completes, the status monitor indicates if it succeeded or failed.

```text
HTTP/1.1 200 OK

{
"id": "22",
"status": "Succeeded"
}
```

If the additional processing failed, the service may delete the original resource if it is not usable in this state,
but should clearly document this behavior.

### Long-running delete operation

A long-running delete operation returns a `202 Accepted` with a status monitor which the client uses to determine the outcome of the delete.

The resource being deleted should remain visible (returned from a GET) until the delete operation completes successfully.

When the delete operation completes successfully, a client must be able to create a new resource with the same name without conflicts.

This diagram illustrates how a long-running DELETE operation is initiated and then how the client
determines it has completed and obtains its results:

```mermaid
sequenceDiagram
participant Client
participant API Endpoint
participant Status Monitor
Client->>API Endpoint: POST/DELETE
Client->>API Endpoint: DELETE
API Endpoint->>Client: HTTP/1.1 202 Accepted<br/>{ "id": "22", "status": "NotStarted" }
Client->>Status Monitor: GET
Status Monitor->>Client: HTTP/1.1 200 OK<br/>Retry-After: 5<br/>{ "id": "22", "status": "Running" }
Client->>Status Monitor: GET
Status Monitor->>Client: HTTP/1.1 200 OK<br/>{ "id": "22", "status": "Succeeded" }
```

1. The client sends the request to initiate the long-running operation.
The initial request could be a POST or DELETE method.
1. The client sends the request to initiate the long-running DELETE operation.
The request may contain an `Operation-Id` header that the service uses as the ID of the status monitor created for the operation.

2. The service validates the request and initiates the operation processing.
Expand All @@ -260,8 +339,8 @@ Otherwise the service responds with a `202-Accepted` HTTP status code.
The response body is the status monitor for the operation including the ID, either from the request header or generated by the service.
When returning a status monitor whose status is not in a terminal state, the response must also include a `retry-after` header indicating the minimum number of seconds the client should wait
before polling (GETing) the status monitor URL again for an update.
For backward compatibility, the response may also include an `Operation-Location` header containing the absolute URL
of the status monitor resource (without an api-version query parameter).
For backward compatibility, the response must also include an `Operation-Location` header containing the absolute URL
of the status monitor resource, including an api-version query parameter.

3. After waiting at least the amount of time specified by the previous response's `Retry-after` header,
the client issues a GET request to the status monitor using the ID in the body of the initial response.
Expand All @@ -274,14 +353,11 @@ If the operation is still being processed, the status field will contain a "non-

5. After the operation processing completes, a GET request to the status monitor returns the status monitor with a status field set to a terminal value -- `Succeeded`, `Failed`, or `Canceled` -- that indicates the result of the operation.
If the status is `Failed`, the status monitor resource contains an `error` field with a `code` and `message` that describes the failure.
If the status is `Succeeded` and the LRO is an Action operation, the operation results will be returned in the `result` field of the status monitor.
If the status is `Succeeded` and the LRO is an operation on a resource, the client can perform a GET on the resource
to observe the result of the operation if desired.

6. There may be some cases where a long-running operation can be completed before the response to the initial request.
6. There may be some cases where a long-running DELETE operation can be completed before the response to the initial request.
In these cases, the operation should still return a `202 Accepted` with the `status` property set to the appropriate terminal state.

7. The service is responsible for purging the status-monitor resource.
7. The service is responsible for purging the status monitor resource.
It should auto-purge the status monitor resource after completion (at least 24 hours).
The service may offer DELETE of the status monitor resource due to GDPR/privacy.

Expand All @@ -291,6 +367,9 @@ An action operation that is also long-running combines the [Action Operations](#
with the [Long Running Operations](#long-running-operations) pattern.

The operation is initiated with a POST operation and the operation path ends in `:<action>`.
A long-running POST should not be used for resource create: use PUT as described above.
PATCH must never be used for long-running operations: it should be reserved for simple resource updates.
If a long-running update is required it should be implemented with POST.

```text
POST /<service-or-resource-url>:<action>?api-version=2022-05-01
Expand All @@ -302,7 +381,7 @@ Operation-Id: 22
}
```

The response is a `202 Accepted` as described above.
A long-running action operation returns a `202 Accepted` response with the status monitor in the response body.

```text
HTTP/1.1 202 Accepted
Expand Down Expand Up @@ -332,82 +411,95 @@ HTTP/1.1 200 OK
}
```

### PUT with additional long-running processing
This diagram illustrates how a long-running action operation is initiated and then how the client
determines it has completed and obtains its results:

A special case of long-running operation that occurs often is a PUT operation to create or replace a resource
that involves some additional long-running processing.
One example is a resource requires physical resources (e.g. servers) to be "provisioned" to make the resource functional.
In this case, the request may contain an `Operation-Id` header that the service will use as
the ID of the status monitor created for the operation.
```mermaid
sequenceDiagram
participant Client
participant API Endpoint
participant Status Monitor
Client->>API Endpoint: POST
API Endpoint->>Client: HTTP/1.1 202 Accepted<br/>{ "id": "22", "status": "NotStarted" }
Client->>Status Monitor: GET
Status Monitor->>Client: HTTP/1.1 200 OK<br/>Retry-After: 5<br/>{ "id": "22", "status": "Running" }
Client->>Status Monitor: GET
Status Monitor->>Client: HTTP/1.1 200 OK<br/>{ "id": "22", "status": "Succeeded", "result": { ... } }
```

```text
PUT /items/FooBar&api-version=2022-05-01
Operation-Id: 22
1. The client sends the request to initiate the long-running action operation.
The request may contain an `Operation-Id` header that the service uses as the ID of the status monitor created for the operation.
mikekistler marked this conversation as resolved.
Show resolved Hide resolved

{
"prop1": 555,
"prop2": "something"
}
```
2. The service validates the request and initiates the operation processing.
If there are any problems with the request, the service responds with a `4xx` status code and error response body.
Otherwise the service responds with a `202-Accepted` HTTP status code.
The response body is the status monitor for the operation including the ID, either from the request header or generated by the service.
When returning a status monitor whose status is not in a terminal state, the response must also include a `retry-after` header indicating the minimum number of seconds the client should wait
before polling (GETing) the status monitor URL again for an update.
For backward compatibility, the response may also include an `Operation-Location` header containing the absolute URL
of the status monitor resource, including an api-version query parameter.

In this case the response to the initial request is a `201 Created` to indicate that the resource has been created
or `200 OK` when the resource was replaced.
The response body contains a representation of the created resource, which is the standard pattern for a create operation.
A status monitor is created to track the additional processing and the ID of the status monitor
is returned in the `Operation-Id` header of the response.
The response may also include an `Operation-Location` header for backward compatibility.
If the resource supports ETags, the response may contain an `etag` header and possibly an `etag` property in the resource.
3. After waiting at least the amount of time specified by the previous response's `Retry-after` header,
the client issues a GET request to the status monitor using the ID in the body of the initial response.
The GET operation for the status monitor is documented in the REST API definition and the ID
is the last URL path segment.

```text
HTTP/1.1 201 Created
Operation-Id: 22
Operation-Location: https://items/operations/22
etag: "123abc"
4. The status monitor responds with information about the operation including its current status,
which should be represented as one of a fixed set of string values in a field named `status`.
If the operation is still being processed, the status field will contain a "non-terminal" value, like `NotStarted` or `Running`.

{
"id": "FooBar",
"etag": "123abc",
"prop1": 555,
"prop2": "something"
}
```
5. After the operation processing completes, a GET request to the status monitor returns the status monitor with a status field set to a terminal value -- `Succeeded`, `Failed`, or `Canceled` -- that indicates the result of the operation.
If the status is `Failed`, the status monitor resource contains an `error` field with a `code` and `message` that describes the failure.
If the status is `Succeeded`, the operation results (if any) are returned in the `result` field of the status monitor.

The client will issue a GET to the status monitor to obtain the status of the operation performing the additional processing.
6. There may be some cases where a long-running action operation can be completed before the response to the initial request.
In these cases, the operation should still return a `202 Accepted` with the `status` property set to the appropriate terminal state.

```text
GET https://items/operations/22?api-version=2022-05-01
```
7. The service is responsible for purging the status monitor resource.
It should auto-purge the status monitor resource after completion (at least 24 hours).
The service may offer DELETE of the status monitor resource due to GDPR/privacy.

When the additional processing completes, the status monitor will indicate if it succeeded or failed.
### Long-running action operation not related to a resource

```text
HTTP/1.1 200 OK
When a long-running action operation is not related to a specific resource (a batch operation is one example),
another approach is needed.

{
"id": "22",
"status": "Succeeded"
}
```
This type of LRO should be initiated with a PUT method on a URL that represents the operation to be performed,
and includes a final path parameter for the user-specified operation ID.
The response of the PUT includes a response body containing a representation of the status monitor for the operation
and an `Operation-Location` response header that contains the absolute URL of the status monitor.
In this type of LRO, the status monitor should include any information from the request used to initiate the operation,
so that a failed operation could be reissued if necessary.

If the additional processing failed, the service may delete the original resource if it is not usable in this state,
but would have to clearly document this behavior.
Clients will use a GET on the status monitor URL to obtain the status and results of the operation.
Since the HTTP semantic for PUT is to create a resource, the same schema should be used for the PUT request body,
the PUT response body, and the response body of the GET for the status monitor for the operation.
For this type of LRO, the status monitor URL should be the same URL as the PUT operation.

### Long-running delete operation
The following examples illustrate this pattern.

A long-running delete operation follows the general pattern of a long-running operation --
it returns a `202 Accepted` with a status monitor which the client uses to determine the outcome of the delete.
```text
PUT /translate-operations/<operation-id>?api-version=2022-05-01

The resource being deleted should remain visible (returned from a GET) until the delete operation completes successfully.
<JSON body with parameters for the operation>
```

Note that the client specifies the operation id in the URL path.

When the delete operation completes successfully, a client must be able to create new resource with same name without conflicts.
A successful response to the PUT operation should have a `201 Created` status and response body
that contains a representation of the status monitor _and_ any information from the request used to initiate the operation.

The service is responsible for purging the status monitor after some period of time,
but no earlier than 24 hours after the completion of the operation.
The service may offer DELETE of the status monitor resource due to GDPR/privacy.

### Controlling a long-running operation

It might be necessary to support some control action on a long-running operation, such as cancel.
This is implemented as a POST on the status monitor endpoint with `:<action>` added.

```text
POST /<status-monitor-url>:cancel?api-version=2022-05-01
POST /<status-monitor-endpoint>:cancel?api-version=2022-05-01
```

A successful response to a control operation should be a `200 OK` with a representation of the status monitor.
Expand Down