Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: first API draft and generation #315

Merged
merged 21 commits into from
Nov 20, 2020
Merged

Conversation

robinbraemer
Copy link
Contributor

@robinbraemer robinbraemer commented Nov 14, 2020

Related issue

#311

Please consider first merging #313.

Proposed changes

The designs of the APIs are in active discussion with @zepatrik, where we decided that I am the "assignee" who takes more role in maintenance of the APIs because of my former knowledge working with APIs using ProtoBuf / gRPC / Client generation.

@zepatrik and I settled down that the two first major parts are API definitions and database schema that affect development of the ACL system, which makes this PR to get merged ASAP in order to proceed.

After this PR got merged I'll develop on the gapic client generation which users will be using as well as ACL nodes internally for their intercommunications as well as replace the current protos so @zepatrik and I can proceed developing using the new protos and the gapic generated clients.

Access control for Keto client requests itself

As per the discussion with @aeneasr and @zepatrik from the meeting, API endpoints of Keto are only authenticated using mTLS and/or authorized through an identity & access proxy (like oathkeeper).

Therefore I decided to split each "access concern" by the following gRCP services, so that administrators can also configure access by using path matching with mesh proxies like in Istio/Linkerd/...

Services http paths:

  • /keto.acl.v1.WriteService/<method>
  • /keto.acl.v1.ReadService/<method>
  • /keto.acl.v1.CheckService/<method>
  • /keto.acl.v1.WatchService/<method>
  • /keto.acl.admin.v1.AdminService/<method>

Example network policies:

  • only allow traffic from very privileged clients to operate on the AdminService service.
  • only allow traffic from specific clients to access ReadService & CheckService service.
  • only allow traffic from specific clients to access WatchService & AdminService service.
  • etc.

This design does not only allow broad access control by service, but it is also possible to further restrict method access as normal:

  • only allow traffic from specific clients to access /keto.acl.v1.ReadService/ListRelationTuples service method.
  • only allow traffic from specific clients to access /keto.acl.v1.WriteService/DeleteRelationTuples service method.
  • only allow traffic from specific clients to access /keto.acl.v1.WatchService/Watch & /keto.acl.v1.WriteService/WriteRelationTuples service methods.
  • etc.

Broadly allowing to match the service by the path prefix (e.g. /keto.acl.v1.WriteService/*) is more scalable than possible if we would have a mono service (e.g. AclService) that would have all read/write/watch/... RPCs in it.

This is because a mono service does not separate concern and would not support administrators to broadly allow clients in a network to a single one access concern. They do also not need to know of every single RPC in advance nor update their network policies in case we add additional RPCs to a service concern.

Opting for a mono service with all RPCs would also open a high security vulnerability because administrators do not know which RPCs we might add in the future.

Design details / practices

The API takes several other well known API design practices as specified in AIP (API Improvement Proposals).

Implementation note

Note, that implementing mechanisms such as filtering, ordering or zookie do not need to get implemented right away, but should be the goal to be in in the Keto v1.0.0 release, where the v1 APIs gets marked as fixed and stable!

Exposing nothing by default to keep our package API slim and later decide what to expose via `pkg`.
@robinbraemer robinbraemer added blocking Blocks milestones or other issues or pulls. rfc A request for comments to discuss and share ideas. labels Nov 14, 2020
@robinbraemer robinbraemer added this to the Next Gen Keto milestone Nov 14, 2020
@robinbraemer robinbraemer marked this pull request as draft November 14, 2020 13:51
@robinbraemer robinbraemer marked this pull request as ready for review November 14, 2020 15:02
@robinbraemer robinbraemer marked this pull request as draft November 14, 2020 15:05
@robinbraemer robinbraemer marked this pull request as ready for review November 14, 2020 15:34
@robinbraemer robinbraemer marked this pull request as draft November 15, 2020 14:14
@robinbraemer robinbraemer self-assigned this Nov 15, 2020
@robinbraemer robinbraemer marked this pull request as ready for review November 16, 2020 19:10
@robinbraemer
Copy link
Contributor Author

robinbraemer commented Nov 16, 2020

I think the first API designs are ready now, so that we can discuss/merge this PR and concentrate on other details, like Watch and Node internal API later and I can keep going! ^^ @zepatrik

@robinbraemer
Copy link
Contributor Author

robinbraemer commented Nov 17, 2020

@zepatrik are the API definitions okay so far? Do ask me anything thats unclear to you. I would also jump on a call.

@zepatrik
Copy link
Member

I'll review it in ~1h

@robinbraemer
Copy link
Contributor Author

Take your time, no pressure 🍰 😄

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I definitely need to learn a bit more about protobuf 😅

Makefile Outdated
# Generate after linting succeeded
#
.PHONY: buf
buf: buf-lint buf-gen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always add the newlines in the end of files 😉

@@ -0,0 +1,99 @@
// Copyright 2020 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this file? I could not find any documentation in GAPIC that would explain the need for this. Also, the Copyright Google is probably a copy/paste error?
If this is required for gapic, I'd prefer adding it together with gapic client generation in one PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll remove and put it back for gapic PR then.

# Notes

> ORY Keto is still a `sandbox` project.
This makes the included api version `v1` subject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe make this v0 instead? The first alpha and beta releases of keto will also be v0.x.x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I wasn't sure whether we release next gen keto with a stable v1 directly, but this is not how it goes, since users will be using the service very early I guess.

In protobuf we can use v1alpha1, the first alpha of v1, then v1beta1 and then eventually v1 for stable definitions.

If we go that path we will always need to write version mappings of the messages versions on server side (map v1alpha1 -> v1beta1 -> v1), deprecate old versions and eventually remove the support at some point.

Sure, we can do this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be possible to go with v0 that can have breaking changes and then move on to v1 once we release Keto v1.x.x? Otherwise, v1alpha and v1beta are good 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove support for old versions immediately and guarantee compatibility only at a point where we are confident that not much will change.

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how long we have the alpha/beta versions public and how many clients are using an older version before we should remove them. Generally the goal is to reach v1 as soon as possible.

This is how versioning is usually seen for ProtoBuf APIs:

  • v1alpha - Highly subject to breaking changes, very early stage
  • v1beta - Pretty reliable, may change. Should have a long evaluation time before going v1
  • v1 - Stable, API (RPCs, fields, messages) can only be added without breaking existing functionality
  • ...perhaps a new v2... cycle when needing to break v1, still v1 has loooong time support

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's what I think as well. So we just start with v1alpha and move on when we think it is appropriate.


// The admin service for doing administrative tasks in the ACL system.
service AdminService {
option (google.api.default_host) = "keto.exampleapis.com";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would everyone have to adapt this to their setup? What is the benefit?

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is the default service host that GAPIC requires to be set for every service. GAPIC generated clients have this host as default for their respective service, but our users would have to pass in their correct API endpoint they deployed and pass it to client. Google uses it to "hard code" their services hostname directly to their clients.

This is not too problematic, I think.
The only error a user would see if he creates a new client and forgets to pass in his correct host would be something like a "hostname keto.exampleapis.com not found" / DNS lookup failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so we add this with GAPIC in a separate PR right? Also, I would prefer to then be something like

option (google.api.default_host) = "you.have.to.set.hostname.to.your.instance";

😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

Comment on lines 7 to 12
option go_package = "github.com/ory/keto/api/keto/acl/admin/v1;admin";
option csharp_namespace = "Ory.Keto.Acl.Admin.V1";
option java_multiple_files = true;
option java_outer_classname = "AdminServiceProto";
option java_package = "sh.ory.keto.acl.admin.v1";
option php_namespace = "Ory\\Keto\\Acl\\Admin\\V1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these options following some best practice?

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything does! :)
As a sample reference.

In case, do you want to use another java_package domain for ORY? I guess not since ORY uses sh.ory.<project>... as java path (e.g. as here).
The API path will perfectly go hand in hand in case we want to make a Keto java client where we will have the generated proto code under package sh.ory.keto.acl....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The relation this check.
string relation = 2;
// The concrete subject id to check.
string subject_id = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of a subject set, would this mean that it has to be encoded? Wouldn't it be better if this was of type Subject? I am not sure what the use case would be but it is really just a definition thing, so we can allow it without any drawbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sure!

I'll put that in, so one can Check on a SubjectSet (without expantion I guess).

What I just thought is that it might confuse API users if they see they can do a check on a SubjectSet.

  • I mean one could understand that the semantic is that the server expands the SubjectSet to subject ids, therefor making a Check request on multiple subject ids. Like WHAT?! :D

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do we also want to add an skip_expand bool to the CheckRequest.

  • skip_expand defaults to false so expand checks are done by default (probably what the user expects?)
  • setting skip_expand=true skips SubjectSet expanding and any rewrite rules?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the same if you expand a SubjectSet or not?
And isn't skip_expand the same as using read to see whether this exact tuple is stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean the expand API (2.4.5 Expand in paper), whether if we simply add a skip_expand field to the CheckRequest that the client can explicitly say whether SubjectSets and Rewrite Rules apply for a check request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not be a breaking change so let's add it later. Tracked as #323

// *It is recommended to perform checks using slightly stale
// data (e.g. token older than 3-10 seconds) for minimum latency
// and where the application is allowed to accept slightly off checks.
bytes snaptoken = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to add them once we have them in the actual code? As they are not strictly required, we can first implement everything without snaptokens/zookies/... and add them in the next step? Do you think it should be in the API definition already anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do think having them in the API right from the beginning is better.
Users need to learn and adopt the tokens beforehand so they don't need to rewrite their code logics too much.

For now they do not have any special effect since the server always uses the newest tuples.
We could make the server always return a coming_soon snaptoken for now. Not an empty one!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea

// result of this check.
//
// Leave this field blank if...
// - your application strictly requires to act on up-to-date data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the read-api:

If the request doesn’t contain a zookie, Zanzibar will choose a reasonably recent snapshot, possibly offering a lower latency response than if a zookie were provided.

Do you think the same applies to check or is a check request without a zookie automatically a "content-change" check that returns the new versions zookie? From my understanding it would be possible to have a check without a zookie that then is evaluated at a "reasonably recent snapshot", but that returns no new zookie. Maybe my understanding is wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm about to redesign the CheckRequest a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// The service to watch for changes in the system,
// such as for Access Control Lists and namespace configs.
service WatchService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watch service will require a lot of background setup first, lets add the API definition together with that functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.


// Write-delta for a WriteRelationTuplesRequest.
message RelationTupleWriteDelta {
enum Action {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you add this action instead of having different rpcs? Is there a best practice regarding this defined somewhere?

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple reasons for this design:

  • There would be quite a few and similar looking RPCs to cover each action.
  • We can allow clients to do a transaction on multiple tuples in a single RPC call
    • This is very convenient for users in case the user wants all actions to succeed or none!
    • ...while also supporting behaviours like delete tuple A and only insert tuple B if it does not already exists, otherwise I want complete transaction to fail and give me an error
      • would be pretty easy to implement with SQL (I've done that :))

Here are some similar refs:

We can always add RPC later if required.

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or any other example where 2 or more modifications must succeed.

Imagine this (and I had this scenario learned the hard way):

  • Some system inserts 3 tuples in response to an event (e.g. an order, user registration, ...)
  • The system continuously fails at the 3rd insert rpc.
  • Now the system would need to rollback somehow and delete the first 2 inserted tuples for clean up.
  • ...this is just a mess, could be solved client side, but only with much overhead to get right ...You get what I mean :)

In these often cases, transactions should be offered server side.
And if the client does not need a transaction he will just make multiple RPCs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 agree, would really be awesome to have transactional requests here

Comment on lines 33 to 60
enum Action {
// Unspecified.
// The `WriteRelationTuples` rpc ignores this
// RelationTupleWriteDelta if an action was unspecified.
ACTION_UNSPECIFIED = 0;

// Like INSERT with the exception that if the RelationTuple
// already exists performs an UPDATE instead.
UPSERT = 1;

// Insertion of a new RelationTuple.
//
// The `WriteRelationTuples` rpc errors if the
// specified RelationTuple already exists.
INSERT = 2;

// Update of the existing RelationTuple with
// the intend to refresh its snapshot token.
//
// The `WriteRelationTuples` rpc errors if the
// specified RelationTuple was not found.
UPDATE = 3;

// Deletion of the RelationTuple.
// The `WriteRelationTuples` rpc returns NO error
// if the specified RelationTuple was not found.
DELETE = 4;
}
Copy link
Member

@zepatrik zepatrik Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only makes sense if we can identify relation tuples somehow, but I don't think it is really possible. An update essentially means

  1. delete the old tuple
  2. insert the new one

We can add that as a separate atomic operation RPC later on, but for now clients can just do these steps themselves. As the parameters will therefore be different for the action types, we should rather separate this into individual RPCs. What do you think?

Copy link
Contributor Author

@robinbraemer robinbraemer Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, e.g. the UPDATE action makes little to no sense with the Tuple model.

I'd like to have a single RPC still for transaction support upfront and that we have less Tuple mutation RPCs as possible.

How about having these actions:

  • INSERT - not errors tx if already exists
  • DELETE - not errors tx if not found

Later we can add actions like:

  • INSERT_NEW - errors tx if already exists
  • DELETE_FOUND - errors tx if not found

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a real update that allows deletion and insertion atomically in one call does make sense but is not a hard requirement, so we can add it later. The actions you proposed are perfect and can all be used in one RPC with this action enum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked as #328
Should we maybe remove the etags and solve everything consistently? Would be a better approach IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@robinbraemer
Copy link
Contributor Author

The snaptoken now is of type string to ease logging/debugging/console output.
The server can use hex string<->bytes mapping.

@robinbraemer
Copy link
Contributor Author

@zepatrik if I did not miss anything, we can squash and merge this baby. :)

@zepatrik
Copy link
Member

Nice 🎉

@zepatrik zepatrik merged commit bda5d8b into ory:zanzibar Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking Blocks milestones or other issues or pulls. rfc A request for comments to discuss and share ideas.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants