feat: first API draft and generation #315

robinbraemer · 2020-11-14T13:05:31Z

Related issue

Please consider first merging #313.

Proposed changes

The designs of the APIs are in active discussion with @zepatrik, where we decided that I am the "assignee" who takes more role in maintenance of the APIs because of my former knowledge working with APIs using ProtoBuf / gRPC / Client generation.

@zepatrik and I settled down that the two first major parts are API definitions and database schema that affect development of the ACL system, which makes this PR to get merged ASAP in order to proceed.

After this PR got merged I'll develop on the gapic client generation which users will be using as well as ACL nodes internally for their intercommunications as well as replace the current protos so @zepatrik and I can proceed developing using the new protos and the gapic generated clients.

Access control for Keto client requests itself

As per the discussion with @aeneasr and @zepatrik from the meeting, API endpoints of Keto are only authenticated using mTLS and/or authorized through an identity & access proxy (like oathkeeper).

Therefore I decided to split each "access concern" by the following gRCP services, so that administrators can also configure access by using path matching with mesh proxies like in Istio/Linkerd/...

Services http paths:

/keto.acl.v1.WriteService/<method>
/keto.acl.v1.ReadService/<method>
/keto.acl.v1.CheckService/<method>
/keto.acl.v1.WatchService/<method>
/keto.acl.admin.v1.AdminService/<method>

Example network policies:

only allow traffic from very privileged clients to operate on the AdminService service.
only allow traffic from specific clients to access ReadService & CheckService service.
only allow traffic from specific clients to access WatchService & AdminService service.
etc.

This design does not only allow broad access control by service, but it is also possible to further restrict method access as normal:

only allow traffic from specific clients to access /keto.acl.v1.ReadService/ListRelationTuples service method.
only allow traffic from specific clients to access /keto.acl.v1.WriteService/DeleteRelationTuples service method.
only allow traffic from specific clients to access /keto.acl.v1.WatchService/Watch & /keto.acl.v1.WriteService/WriteRelationTuples service methods.
etc.

Broadly allowing to match the service by the path prefix (e.g. /keto.acl.v1.WriteService/*) is more scalable than possible if we would have a mono service (e.g. AclService) that would have all read/write/watch/... RPCs in it.

This is because a mono service does not separate concern and would not support administrators to broadly allow clients in a network to a single one access concern. They do also not need to know of every single RPC in advance nor update their network policies in case we add additional RPCs to a service concern.

Opting for a mono service with all RPCs would also open a high security vulnerability because administrators do not know which RPCs we might add in the future.

Design details / practices

The API takes several other well known API design practices as specified in AIP (API Improvement Proposals).

Deltas and States - for writing / modifying / deleting RelationTuples.
Resource freshness validation - for consistent read-modify-write request flows as specified at 2.4.2 in paper
Token/Cursor based Pagination - for List... RPC responses
Filtering & Ordering - for filtering and ordering List... RPC responses

Implementation note

Note, that implementing mechanisms such as filtering, ordering or zookie do not need to get implemented right away, but should be the goal to be in in the Keto v1.0.0 release, where the v1 APIs gets marked as fixed and stable!

Exposing nothing by default to keep our package API slim and later decide what to expose via `pkg`.

robinbraemer · 2020-11-16T19:12:52Z

I think the first API designs are ready now, so that we can discuss/merge this PR and concentrate on other details, like Watch and Node internal API later and I can keep going! ^^ @zepatrik

robinbraemer · 2020-11-17T11:57:47Z

@zepatrik are the API definitions okay so far? Do ask me anything thats unclear to you. I would also jump on a call.

zepatrik · 2020-11-17T12:05:46Z

I'll review it in ~1h

robinbraemer · 2020-11-17T12:07:11Z

Take your time, no pressure 🍰 😄

zepatrik

Nice, I definitely need to learn a bit more about protobuf 😅

zepatrik · 2020-11-17T10:47:17Z

Makefile

+# Generate after linting succeeded
+#
+.PHONY: buf
+buf: buf-lint buf-gen


Please always add the newlines in the end of files 😉

zepatrik · 2020-11-17T13:31:23Z

api/google/api/client.proto

@@ -0,0 +1,99 @@
+// Copyright 2020 Google LLC


What is the purpose of this file? I could not find any documentation in GAPIC that would explain the need for this. Also, the Copyright Google is probably a copy/paste error?
If this is required for gapic, I'd prefer adding it together with gapic client generation in one PR.

Yes, I'll remove and put it back for gapic PR then.

zepatrik · 2020-11-17T13:33:14Z

api/keto/README.md

+# Notes
+
+> ORY Keto is still a `sandbox` project.
+This makes the included api version `v1` subject


Can we maybe make this v0 instead? The first alpha and beta releases of keto will also be v0.x.x

Okay, I wasn't sure whether we release next gen keto with a stable v1 directly, but this is not how it goes, since users will be using the service very early I guess.

In protobuf we can use v1alpha1, the first alpha of v1, then v1beta1 and then eventually v1 for stable definitions.

If we go that path we will always need to write version mappings of the messages versions on server side (map v1alpha1 -> v1beta1 -> v1), deprecate old versions and eventually remove the support at some point.

Sure, we can do this!

Wouldn't it be possible to go with v0 that can have breaking changes and then move on to v1 once we release Keto v1.x.x? Otherwise, v1alpha and v1beta are good 👍

I would remove support for old versions immediately and guarantee compatibility only at a point where we are confident that not much will change.

It depends on how long we have the alpha/beta versions public and how many clients are using an older version before we should remove them. Generally the goal is to reach v1 as soon as possible.

This is how versioning is usually seen for ProtoBuf APIs:

v1alpha - Highly subject to breaking changes, very early stage

v1beta - Pretty reliable, may change. Should have a long evaluation time before going v1

v1 - Stable, API (RPCs, fields, messages) can only be added without breaking existing functionality

...perhaps a new v2... cycle when needing to break v1, still v1 has loooong time support

Right, that's what I think as well. So we just start with v1alpha and move on when we think it is appropriate.

zepatrik · 2020-11-17T13:57:57Z

api/keto/acl/admin/v1/admin_service.proto

+
+// The admin service for doing administrative tasks in the ACL system.
+service AdminService {
+  option (google.api.default_host) = "keto.exampleapis.com";


Would everyone have to adapt this to their setup? What is the benefit?

No, this is the default service host that GAPIC requires to be set for every service. GAPIC generated clients have this host as default for their respective service, but our users would have to pass in their correct API endpoint they deployed and pass it to client. Google uses it to "hard code" their services hostname directly to their clients.

This is not too problematic, I think.
The only error a user would see if he creates a new client and forgets to pass in his correct host would be something like a "hostname keto.exampleapis.com not found" / DNS lookup failed.

Right, so we add this with GAPIC in a separate PR right? Also, I would prefer to then be something like

option (google.api.default_host) = "you.have.to.set.hostname.to.your.instance";

😉

zepatrik · 2020-11-17T13:58:17Z

api/keto/acl/admin/v1/admin_service.proto

+option go_package = "github.com/ory/keto/api/keto/acl/admin/v1;admin";
+option csharp_namespace = "Ory.Keto.Acl.Admin.V1";
+option java_multiple_files = true;
+option java_outer_classname = "AdminServiceProto";
+option java_package = "sh.ory.keto.acl.admin.v1";
+option php_namespace = "Ory\\Keto\\Acl\\Admin\\V1";


Are these options following some best practice?

Yes, everything does! :)
As a sample reference.

In case, do you want to use another java_package domain for ORY? I guess not since ORY uses sh.ory.<project>... as java path (e.g. as here).
The API path will perfectly go hand in hand in case we want to make a Keto java client where we will have the generated proto code under package sh.ory.keto.acl....

Yes, you nailed it already 😉
https://github.com/ory/sdk/tree/master/clients/hydra/java#getting-started

zepatrik · 2020-11-17T14:06:16Z

api/keto/acl/v1/check_service.proto

+  // The relation this check.
+  string relation = 2;
+  // The concrete subject id to check.
+  string subject_id = 3;


In case of a subject set, would this mean that it has to be encoded? Wouldn't it be better if this was of type Subject? I am not sure what the use case would be but it is really just a definition thing, so we can allow it without any drawbacks.

Oh, sure!

I'll put that in, so one can Check on a SubjectSet (without expantion I guess).

What I just thought is that it might confuse API users if they see they can do a check on a SubjectSet.

I mean one could understand that the semantic is that the server expands the SubjectSet to subject ids, therefor making a Check request on multiple subject ids. Like WHAT?! :D

BTW do we also want to add an skip_expand bool to the CheckRequest.

skip_expand defaults to false so expand checks are done by default (probably what the user expects?)

setting skip_expand=true skips SubjectSet expanding and any rewrite rules?

Isn't it the same if you expand a SubjectSet or not?
And isn't skip_expand the same as using read to see whether this exact tuple is stored?

Yes, I mean the expand API (2.4.5 Expand in paper), whether if we simply add a skip_expand field to the CheckRequest that the client can explicitly say whether SubjectSets and Rewrite Rules apply for a check request.

This would not be a breaking change so let's add it later. Tracked as #323

zepatrik · 2020-11-17T14:08:31Z

api/keto/acl/v1/check_service.proto

+  // *It is recommended to perform checks using slightly stale
+  // data (e.g. token older than 3-10 seconds) for minimum latency
+  // and where the application is allowed to accept slightly off checks.
+  bytes snaptoken = 4;


Maybe it would be better to add them once we have them in the actual code? As they are not strictly required, we can first implement everything without snaptokens/zookies/... and add them in the next step? Do you think it should be in the API definition already anyway?

Yes, I do think having them in the API right from the beginning is better.
Users need to learn and adopt the tokens beforehand so they don't need to rewrite their code logics too much.

For now they do not have any special effect since the server always uses the newest tuples.
We could make the server always return a coming_soon snaptoken for now. Not an empty one!

That is a good idea

zepatrik · 2020-11-17T14:15:37Z

api/keto/acl/v1/check_service.proto

+  // result of this check.
+  //
+  // Leave this field blank if...
+  //  - your application strictly requires to act on up-to-date data


From the read-api:

If the request doesn’t contain a zookie, Zanzibar will choose a reasonably recent snapshot, possibly offering a lower latency response than if a zookie were provided.

Do you think the same applies to check or is a check request without a zookie automatically a "content-change" check that returns the new versions zookie? From my understanding it would be possible to have a check without a zookie that then is evaluated at a "reasonably recent snapshot", but that returns no new zookie. Maybe my understanding is wrong here.

Right, I'm about to redesign the CheckRequest a bit.

zepatrik · 2020-11-17T14:24:25Z

api/keto/acl/v1/watch_service.proto

+
+// The service to watch for changes in the system,
+// such as for Access Control Lists and namespace configs.
+service WatchService {


The watch service will require a lot of background setup first, lets add the API definition together with that functionality.

I'll remove it.

zepatrik · 2020-11-17T14:27:14Z

api/keto/acl/v1/write_service.proto

+
+// Write-delta for a WriteRelationTuplesRequest.
+message RelationTupleWriteDelta {
+  enum Action {


Why would you add this action instead of having different rpcs? Is there a best practice regarding this defined somewhere?

There are multiple reasons for this design:

There would be quite a few and similar looking RPCs to cover each action.

We can allow clients to do a transaction on multiple tuples in a single RPC call

This is very convenient for users in case the user wants all actions to succeed or none!

...while also supporting behaviours like delete tuple A and only insert tuple B if it does not already exists, otherwise I want complete transaction to fail and give me an error

would be pretty easy to implement with SQL (I've done that :))

Here are some similar refs:

IAM googleapis

GKE googleapis

"Actions" can even be whole messages sometimes as in Spanner googleapis

We can always add RPC later if required.

Or any other example where 2 or more modifications must succeed.

Imagine this (and I had this scenario learned the hard way):

Some system inserts 3 tuples in response to an event (e.g. an order, user registration, ...)

The system continuously fails at the 3rd insert rpc.

Now the system would need to rollback somehow and delete the first 2 inserted tuples for clean up.

...this is just a mess, could be solved client side, but only with much overhead to get right ...You get what I mean :)

In these often cases, transactions should be offered server side.
And if the client does not need a transaction he will just make multiple RPCs.

👍 agree, would really be awesome to have transactional requests here

zepatrik · 2020-11-19T12:01:44Z

api/keto/acl/v1alpha1/write_service.proto

+  enum Action {
+    // Unspecified.
+    // The `WriteRelationTuples` rpc ignores this
+    // RelationTupleWriteDelta if an action was unspecified.
+    ACTION_UNSPECIFIED = 0;
+
+    // Like INSERT with the exception that if the RelationTuple
+    // already exists performs an UPDATE instead.
+    UPSERT = 1;
+
+    // Insertion of a new RelationTuple.
+    //
+    // The `WriteRelationTuples` rpc errors if the
+    // specified RelationTuple already exists.
+    INSERT = 2;
+
+    // Update of the existing RelationTuple with
+    // the intend to refresh its snapshot token.
+    //
+    // The `WriteRelationTuples` rpc errors if the
+    // specified RelationTuple was not found.
+    UPDATE = 3;
+
+    // Deletion of the RelationTuple.
+    // The `WriteRelationTuples` rpc returns NO error
+    // if the specified RelationTuple was not found.
+    DELETE = 4;
+  }


This only makes sense if we can identify relation tuples somehow, but I don't think it is really possible. An update essentially means

delete the old tuple

insert the new one

We can add that as a separate atomic operation RPC later on, but for now clients can just do these steps themselves. As the parameters will therefore be different for the action types, we should rather separate this into individual RPCs. What do you think?

Yes, e.g. the UPDATE action makes little to no sense with the Tuple model.

I'd like to have a single RPC still for transaction support upfront and that we have less Tuple mutation RPCs as possible.

How about having these actions:

INSERT - not errors tx if already exists

DELETE - not errors tx if not found

Later we can add actions like:

INSERT_NEW - errors tx if already exists

DELETE_FOUND - errors tx if not found

Having a real update that allows deletion and insertion atomically in one call does make sense but is not a hard requirement, so we can add it later. The actions you proposed are perfect and can all be used in one RPC with this action enum.

Tracked as #328
Should we maybe remove the etags and solve everything consistently? Would be a better approach IMO.

robinbraemer · 2020-11-19T16:56:29Z

The snaptoken now is of type string to ease logging/debugging/console output.
The server can use hex string<->bytes mapping.

robinbraemer · 2020-11-19T17:07:26Z

@zepatrik if I did not miss anything, we can squash and merge this baby. :)

zepatrik · 2020-11-20T09:13:34Z

Nice 🎉

robinbraemer added 3 commits November 13, 2020 20:54

chore: make all go packages internal

31589cf

Exposing nothing by default to keep our package API slim and later decide what to expose via `pkg`.

feat: first WIP draft of the Keto APIs using bufbuild

a2176cc

document api layout

2289c30

robinbraemer requested review from aeneasr and zepatrik November 14, 2020 13:07

robinbraemer assigned zepatrik Nov 14, 2020

robinbraemer added blocking Blocks milestones or other issues or pulls. rfc A request for comments to discuss and share ideas. labels Nov 14, 2020

robinbraemer added this to the Next Gen Keto milestone Nov 14, 2020

robinbraemer marked this pull request as draft November 14, 2020 13:51

improve WriteService & ReadService

0f0879e

robinbraemer marked this pull request as ready for review November 14, 2020 15:02

robinbraemer marked this pull request as draft November 14, 2020 15:05

fix lint errors & split services into their own .proto

4aed11f

robinbraemer marked this pull request as ready for review November 14, 2020 15:34

correct java_outer_classname

c8b04c3

robinbraemer marked this pull request as draft November 15, 2020 14:14

robinbraemer mentioned this pull request Nov 15, 2020

Protobuf APIs & tooling #311

Closed

robinbraemer self-assigned this Nov 15, 2020

more comments & important improves

4b104d1

robinbraemer marked this pull request as ready for review November 16, 2020 19:10

robinbraemer added 2 commits November 17, 2020 12:54

update comments & remove ABAC

94c1347

fix comment

ae750db

zepatrik reviewed Nov 17, 2020

View reviewed changes

robinbraemer added 2 commits November 17, 2020 16:30

remove google proto

2e756f4

add new line

d2cd210

robinbraemer added 3 commits November 17, 2020 18:05

move v1 to v1alpha1

4d2c847

remove watch service for now

393a639

update CheckRequest

db6c6f6

zepatrik modified the milestones: Next Gen Keto - next milestone, Next Gen Keto - fist working version Nov 18, 2020

zepatrik mentioned this pull request Nov 18, 2020

feat: namespace configuration #324

Merged

update object comments

3f45059

zepatrik requested changes Nov 19, 2020

View reviewed changes

robinbraemer added 2 commits November 19, 2020 17:50

flatten object

5425b72

INSERT & DELETE actions only

456adfd

robinbraemer added 3 commits November 19, 2020 17:58

fix snaptoken type

db1c755

regenerate proto

e7440c9

remove Object message

4219c6a

chore: cosmetic improvements

63eec13

zepatrik merged commit bda5d8b into ory:zanzibar Nov 20, 2020

zepatrik mentioned this pull request Nov 20, 2020

feat: incorporate new GRPC API structure #331

Merged

robinbraemer mentioned this pull request Nov 24, 2020

Relations are not unique #292

Closed

feat: first API draft and generation #315

feat: first API draft and generation #315

Conversation

robinbraemer commented Nov 14, 2020 • edited Loading

Related issue

Proposed changes

Access control for Keto client requests itself

Design details / practices

Implementation note

robinbraemer commented Nov 16, 2020 • edited Loading

robinbraemer commented Nov 17, 2020 • edited Loading

zepatrik commented Nov 17, 2020

robinbraemer commented Nov 17, 2020

zepatrik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinbraemer Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinbraemer Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinbraemer Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinbraemer Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinbraemer Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

robinbraemer Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zepatrik Nov 19, 2020 • edited Loading

Choose a reason for hiding this comment

robinbraemer Nov 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinbraemer commented Nov 19, 2020

robinbraemer commented Nov 19, 2020

zepatrik commented Nov 20, 2020

robinbraemer commented Nov 14, 2020 •

edited

Loading

robinbraemer commented Nov 16, 2020 •

edited

Loading

robinbraemer commented Nov 17, 2020 •

edited

Loading

robinbraemer Nov 17, 2020 •

edited

Loading

robinbraemer Nov 17, 2020 •

edited

Loading

robinbraemer Nov 17, 2020 •

edited

Loading

robinbraemer Nov 17, 2020 •

edited

Loading

robinbraemer Nov 17, 2020 •

edited

Loading

robinbraemer Nov 17, 2020 •

edited

Loading

zepatrik Nov 19, 2020 •

edited

Loading

robinbraemer Nov 19, 2020 •

edited

Loading