Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: API review checklist #14399

Merged
merged 6 commits into from
Jan 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,13 @@ and false.
* If a PR includes a deprecation/breaking change, notification should be sent to the
[envoy-announce](https://groups.google.com/forum/#!forum/envoy-announce) email list.

# API changes

If you change anything in the [api tree](https://github.com/envoyproxy/envoy/tree/master/api),
please read the [API Review
Checklist](https://github.com/envoyproxy/envoy/tree/master/api/review_checklist.md)
and make sure that your changes have addressed all of the considerations listed there.

# Adding new extensions

For developers adding a new extension, one can take an existing extension as the starting point.
Expand Down
8 changes: 8 additions & 0 deletions PULL_REQUESTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,11 @@ PR description.
If you mark existing APIs or code as deprecated, when the next release is cut, the
deprecation script will create and assign an issue to you for
cleaning up the deprecated code path.

### <a name="api"></a>API Changes

If this PR changes anything in the [api tree](https://github.com/envoyproxy/envoy/tree/master/api),
please read the [API Review
Checklist](https://github.com/envoyproxy/envoy/tree/master/api/review_checklist.md)
and make sure that your changes have addressed all of the considerations listed there.
Any relevant considerations should be documented under "API Considerations" in the PR description.
1 change: 1 addition & 0 deletions PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/master/api/review_checklist.md):]
6 changes: 6 additions & 0 deletions api/STYLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ In addition, the following conventions should be followed:
pattern forces developers to explicitly choose the correct enum value for
their use case, and avoid misunderstanding of the default behavior.

* For time-related fields, prefer using the well-known types `google.protobuf.Duration` or
`google.protobuf.Timestamp` instead of raw integers for seconds.

* If a field is going to contain raw bytes rather than a human-readable string, the field should
be of type `bytes` instead of `string`.

* Proto fields should be sorted logically, not by field number.

## Package organization
Expand Down
154 changes: 154 additions & 0 deletions api/review_checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# API Review Checklist
htuch marked this conversation as resolved.
Show resolved Hide resolved

This checklist is intended to be used when reviewing xDS API changes.
Users who wish to contribute API changes should read this and proactively
consider the answers to these questions before sending a PR.
alyssawilk marked this conversation as resolved.
Show resolved Hide resolved

## Feature Enablement
- Are the default values going to cause behavior changes for existing users
who do not know about the change and have not updated the resources being
served by their control plane?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think much of this overlaps in spirit with the GOVERNANCE runtime guard section. I know the API stuff is different since it affects more than just envoy, but it'd still be good to address the overlap, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think we should port the relevant parts of GOVERNANCE.md across to an API GOVERNANCE.md; it might be copy+paste.

We are working towards a multi-stakeholder governance for the APIs (under CNCF, see https://github.com/cncf/udpa/blob/master/README.md), so we should evolve the API docs with that in mind.

This might seem a minor thing, but I think this decoupling is critical to ensuring that other adopters of xDS are comfortable and don't feel locked into Envoy project governance. We will do this while retaining the agree upon foundations from GOVERNANCE.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a governance file in cncf/xds and just point at that from api/GOVERNANCE.md in this repo, to make it clear that it's a separate thing? Actually, for that matter, should we do the same thing with files like this one and api/STYLE.md as well?

In any case, I agree that we should document the governance of the API and get all of these docs properly aligned, but I think a lot of this is outside the scope of this PR. Can we follow up on that separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, @markdroth can you file an issue on cncf/xds around this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- If yes, do we have some estimate of how many users will be affected?
- Why is it justified to change the default behavior, rather than making
this feature opt-in?
- Some possible justifications include security concerns with existing
behavior, or a desire to eliminate legacy behavior.
- What is the plan to make this change in a safe way? For example, is the
transition going to be staged over the course of several minor xDS versions?
- How will we warn users about this change?
- Possible ways to do this include release notes, announcements, warnings
from the code, etc.
- Will users have a way to disable this change if it causes problems?
- If not, why do we think that's okay? (It might be the case that we think
it will not actually affect anyone, or no one will care.)
- If so, is the mechanism to disable it part of the xDS API, or is it
acceptable to have a separate knob for this in the client? (See also
"Genericness" below -- if this is not part of the API, will every xDS
client need to add a different knob? Is consistency across clients
important for this?)

## Style
- Is the PR aligned with the [API style guidelines](STYLE.md)?

## Validation Rules
- Does the field have protocgen-validate rules to enforce constraints on
its value?
- Is the field mandatory? Does it have the required rule?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can enforce any of these with fix_format / proto scripts. If we had a required required/not_required annotation, or required "unbounded" annotations, so folks had to explicitly annotate, it'd be annoying but might catch more.
I'd file this under "food for thought" where you and the other shephards can decide if it's worth it after seeing what slips through review over time :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @htuch think about this one.

Copy link
Member

@htuch htuch Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done. I think if I had my wishlist of tooling, we would have something that during code review annotates the PR when a new field is introduced and is missing constraints to verify.

@markdroth can you be more explicit and say something like "Is the field mandatory? If so, are protocgen-validate constraints defined to enforce this?".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a separate bullet about having PGV rules, since that basically applies to all of the bullets in this section.

- Is the field numeric? Is it bounded correctly?
- Is the field repeated? Does it have the correct min/max items numbers? Are
the values unique?
- If a field may eventually be repeated but initially there is a desire to
cap it to a single value, consider using repeated with a max length of 1,
which is easier to relax in the future.

## Deprecations
- When a field or enum value is deprecated, according to the minor/patch
versioning policy this implies a minor version for support removal. Is the
work necessary to add support for the newer replacement field acceptable to
known xDS clients in this time frame?
- No deprecations are allowed when an alternative is "not ready yet" in any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't we traditionally added a new field and deprecated old ones in the same PR? I guess better project party is part of pulling the APIs out here. I think it'd be helpful if before we codify this rule, we have some process around adding a field, making sure the migrations are done in both projects, being able to determine when they're done and having issues filed to do the deprecation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can play this by ear. Right now, there are only 2 clients, and the API shepherds all talk to each other, so it's fairly easy to coordinate this without much red tape. Over time, if this becomes more of a problem, we can add more process as appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we redact this for now or add an "or get sign off from each project stakeholder for immediate deprecation"? Given n=2 I think that's less onerous and as we add more stakeholders I think we'd have to move towards the deprecation dance anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a sentence indicating to ask the API shepherds if you're not sure about the status of a feature in the major known clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm on board with playing this by ear, but the documentation this still disallows changes like I mentioned. I want us to be explicit here. I don't want to pedantic, but I really want to avoid having codified rules we encourage people in the know to ignore. If we're Ok playing it by ear, let's remove the restriction that you can't deprecate a field until the replacement is ready in all consuming stacks. If we're not OK doing this with shephard approval, we should make it clear it is now disallowed to add A and deprecate B in the same PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, sorry -- I somehow didn't grok that it was the ability to explicitly sign off on exceptions that you wanted to call out. I've added a clause about that.

major known xDS client (Envoy or gRPC at this point), unless the
maintainers of that xDS client have signed off on the change. If you are not
sure about the current state of a feature in the major known xDS clients,
ask one of the API shepherds.
- Is this deprecated field documented in a way that points to the preferred
newer method of configuration?

## Extensibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention TypedExtensionConfig somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Is this feature being directly added into the API itself, or is it being
introduced via an extension point (i.e., using `TypedExtensionConfig`)?
- If not via an extension point, why not?
- If no appropriate extension point currently exists, is this a good
opportunity to add one? Can we move some existing "core" functionality
into a set of standardized plugins for an extension point?
- Do we have good documentation showing what to plug into the extension point?
(At minimum, it should have a comment pointing to the common protos to
be plugged into the extension point.)
- If an enum is being introduced, should this be a oneof with empty messages
for future API growth?
- When a new field is introduced for a distinct purpose, should this be a
message to allow for future API growth?

## Consistency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I'd like to do a better job of is placing protos that might eventually become shared in shared locations to begin with. We've seen historically situations where we have had to copy+paste protos to common directories or refer to protos in strange locations (e.g. some rando filter referring to the access log matcher in the access log tree). This advice also includes thinking carefully about whether some sub-messages might be one day useful elsewhere.

It's hard to be super prescriptive, but good to be thoughtful about. Basically, without major versioning as a device, refactors are messy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bullet about this.

- Can the proposed API reuse part or all of other APIs?
- Can some other API be refactored to be part of it, or vice versa?
- Example: Can it use common types such as matcher or number?
- Are there similar structures that already exist?
- Is the naming convention consistent with other APIs?
- If there are new proto messages being added, are they in the right
place in the tree? Consider not just the current users of this proto
but also other possible uses in the future. Would it make more sense
to make the proto a generic type and put it in a common location, so
that it can be used in other places in the future?

## Interactions With Other Features
- Will this feature interact in strange ways with other features, either
existing or planned?
- For example, if you are defining a new cluster type, how will the
new type implement all of the features currently configured via CDS?
- If this is a change in the upstream side of the API, will it work properly
with LRS load reporting?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another bullet point "If this change involves extension configuration, how will it interact with ECDS?" CC @kyessenov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Will there be combinations of features that won't work properly? If so,
please document each combination that won't work and justify why this is
okay. Is there some other way to structure this feature that would not
cause conflicts with other features?
- If this change involves extension configuration, how will it interact
with ECDS?

## Genericness
- Is this an Envoy-specific or proxy-specific feature? How will it apply to
xDS clients other than Envoy (e.g., gRPC)?

## Dependencies
- Does this feature pull in any new dependencies for clients?
- Are these dependencies optional or required?
- Will the dependencies cause problems for any known xDS clients (e.g.,
[Envoy's dependency policy](https://github.com/envoyproxy/envoy/blob/master/DEPENDENCY_POLICY.md))?

## Failure Modes
- What is the failure mode if this feature is configured but is not working
for some reason?
- Is the failure mode what users will expect/want?
- Is this failure mode specified in the API, or is each client expected to
handle it on its own? Consistency across clients is preferred; if there's
a reason this isn't feasible, please explain.

## Scalability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think config size scalability is also a huge deal. We should make sure we're careful not to add things that might blow out config size, e.g. some huge proto that is mandatory and attached to every single route entry, or things that are expensive during config ingestion, such as JSON round-trips.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of bullets about this.

- Does this feature add new per-request functionality? How much overhead does
it add on the per-request path?
- Are there ways that the API could be structured differently to make it
possible to implement the feature in more efficient ways? (Even if this
efficiency is not needed now, it may be something we will need in the future,
and we will save pain in the long run if we structure the API in a way that
gives us the flexibility to change the implementation later.)
- How does this feature affect config size? For example, instead of
adding a huge mandatory proto to every single route entry, consider
ways of setting it at the virtual host level and then overriding only
the parts that change on a per-route basis.
- Will the change require multiple round trips via the REST API?

## Monitoring
- Is there any behavior associated with this feature that will require
monitoring?
- How will the data be exposed to monitoring?
- Is the monitoring configuration part of the xDS API, or is it client-specific?

## Documentation
- Can a user look at the docs and understand it without a bunch of extra
context?
- Pay special attention to documentation around extensions and `typed_config`.
Users generally find this extremely confusing. There should be examples
showing how to configure extension points and optimally all known public
extensions (there is tooling work in progress to automate this).
- Larger features should contain architecture overview documentation with
relevant cross-linking.
- Relevant differences between clients need to be documented (in the future
we will build tooling to allow for common documentation as well as per-client
documentation).

## Where to Put New Protos
- The xDS API is currently partly in the Envoy repo and partly in the
cncf/xds repo. We will move pieces of the API to the latter repo
slowly over time, as they become less Envoy-specific, until eventually
the whole API has been moved there. If your change involves adding
new protos, they should generally go in the new repo.