Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Work on authenticated downloads #3167

Merged
merged 36 commits into from
Nov 26, 2019
Merged

Conversation

alexlarsson
Copy link
Member

I.e. purchases, etc. Initial work only handles the lowlevel http/token stuff.

@alexlarsson alexlarsson force-pushed the wip/auth branch 3 times, most recently from 784c194 to 1cc12fc Compare October 14, 2019 13:38
@alexlarsson
Copy link
Member Author

This now has the basic core for using and choosing when to use bearer tokens (although the actual tokens are just gotten from an env var), including the problematic case of p2p as discussed in
https://lists.freedesktop.org/archives/flatpak/2019-October/001756.html

The solution here is to protect .commit objects and to iterate on getting the tokens (but do try to bunch them as much as possible) and using the ostree-metadata branch for getting the information about whether a token is needed. This is recorded as a single need_token uint32, which currenly just means != 0 needs a token, but I plan to use different value to separate the donation and purchase cases so that we can avoid having to log in to get a token for things that just are donations and could be handled 100% client side.

@pwithnall @ramcq @mwleeds @dbnicholson can you have a look at this? In particular the p2p code which changes stuff around much more, and is hardly what I'd call nice. Does it seem correct? Is there some way to do this better?

@mwleeds
Copy link
Collaborator

mwleeds commented Oct 14, 2019

Yeah I'll take a look at this soon.

Copy link
Collaborator

@pwithnall pwithnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since the lower level APIs don't allow you to pass different tokens
for different parts change this function to support passing a subset
of the resolves, so that we can pass all that need a specific token in
one go, and then call this multiple times. The way we handle this is
by saving all the original ref_to_checksum hashtables for all results
and then re-create them with the subset of refs needed when pulling.

Could you expand the lower level APIs to support passing different tokens for different collection IDs?


I didn’t review all the code in detail, but I couldn’t see anything obviously wrong with the P2P code, and couldn’t think of a particularly better way of doing things (apart from my comment above).

app/flatpak-builtins-build-commit-from.c Outdated Show resolved Hide resolved
app/flatpak-builtins-build-export.c Outdated Show resolved Hide resolved
common/flatpak-transaction.c Outdated Show resolved Hide resolved
common/flatpak-transaction.c Outdated Show resolved Hide resolved
{
FlatpakTransactionOperation *op = l->data;

op->resolved_token = g_strdup (token);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paranoia: I’d add a g_assert (op->resolved_token == NULL) or g_clear_pointer (&op->resolved_token, g_free) first, just in case this ends up getting called twice.

common/flatpak-transaction.c Outdated Show resolved Hide resolved
g_free (resolve);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Spurious blank line.

@alexlarsson
Copy link
Member Author

@pwithnall thanks for the review, could you also have a check on the latest commits that try to use the ostree-metadata for resolving things?

@alexlarsson
Copy link
Member Author

Oh, and yes, if we end up doing this we should probably expand the lower level APIs in order for some of these things to be a bit cleaner. I don't want to block on that while experimenting with this though.

@pwithnall
Copy link
Collaborator

pwithnall commented Oct 15, 2019

could you also have a check on the latest commits that try to use the ostree-metadata for resolving things?

They seem reasonable from a high-level read through of them.

Before trusting the data in ostree-metadata, you need to ensure it’s signed, but that should be the case already.

@@ -3665,6 +3669,8 @@ flatpak_repo_update (OstreeRepo *repo,

/* Add bindings to the metadata. */
new_summary_commit_dict = g_variant_dict_new (new_summary);
g_variant_dict_insert_value (new_summary_commit_dict, "xa.commits",
g_variant_builder_end (&commits_builder));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sad to find out that ostree-metadata doesn't contain the refs mapping already. I think this is needed, but perhaps we can do this in ostree? I.e., instead of rolling the ostree-metadata code in both the ostree and flatpak summary generation CLI commands it can be done in the regenerate_summary API if the repo has a collection ID. Then augment it to always include the collection refs map as in the summary.

@pwithnall I was under the impression that ostree-metadata was intended to be a replacement for the summary file. Is there a reason it doesn't have the collection refs map in it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that ostree-metadata was intended to be a replacement for the summary file. Is there a reason it doesn't have the collection refs map in it?

It’s not meant to be a direct replacement. You still use the (unsigned) summary file for the collection–refs → commit checksums map, and then verify them using the signed binding data in the commit metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means you have to pull each commit object for verification in the p2p case, which might require authorization for purchased refs. That means that the refs can't be reliably resolved unless the user is authenticated, which is what @alexlarsson is trying to avoid.

Do you have any objections if the collection map is included in ostree-metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ostree-metadata data is mostly from flatpak anyway (ostree only sets two properties there) so its not a huge difference.

So, the p2p summary file does have the commit id in it, but that is the commit id of the ref as stored in the p2p repo. What this commit adds is the commit id of the cached ref flatpak metadata in the ostree-metadata ref, which may be different from the commit id of the actual ref.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means you have to pull each commit object for verification in the p2p case, which might require authorization for purchased refs. That means that the refs can't be reliably resolved unless the user is authenticated, which is what @alexlarsson is trying to avoid.

Do they have to be reliably resolved until you’re actually ready to go through the authenticator?

Do you have any objections if the collection map is included in ostree-metadata?

Yes, some mild ones: (a) it increases the size of ostree-metadata and means it has to be updated much more frequently than it currently is; (b) the collection map still needs to be in the summary file too, because otherwise it’s not possible to resolve ostree-metadata itself. Having it in two places, or parts of it in two places, is ugly and confusing.

What this commit adds is the commit id of the cached ref flatpak metadata in the ostree-metadata ref, which may be different from the commit id of the actual ref.

Sorry, I don’t understand. What do you mean by ‘cached ref’ and ‘actual ref’?

Copy link
Contributor

@dbnicholson dbnicholson Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to fix this the PR adds an array that matches the order of xa.cache that just contains the commit that the cached data refers to.

That’s the bit of information I’d missed, thanks. This makes sense now, and I can’t immediately think of a better approach, assuming you can’t break backwards compatibility with the xa.cache format and change it to map from checksum → cache data, rather than ref name → cache data.

This still feels like a general ostree p2p issue to me, though, since you can't trust the ref map in the peer's summary. If you include the ref map also in the ostree-metadata commit, then you can validate the peer's mirrored ref without actually pulling it. The flow would be:

  • Fetch the summary from the peer
  • Pull refs/mirrors/<id>/ostree-metadata and verify it against the remote's trusted key
  • Verify that the commits pointed to by the collection map in the summary match the ref map in the mirrored ostree-metadata commit
  • Use the ref data and additional metadata (such as xa.cache) in the mirrored ostree-metadata, filtering the refs to those in the summary collection map

Then all you're getting from the peer is a list of which refs it has. The rest of the information comes from the remote and is verified. You can the carry on pulling mirrored commits with certainty that they came from the remote.

This is the essentially the same as what this commit is doing but rather in ostree. And, if the full ref map is included, then you can get the verified commit timestamp from the remote as well to determine if it's older than what you already have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I implemented that in ostreedev/ostree#1946. I do believe that's the right thing to do, but I've spent much less time on P2P than on other parts of ostree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think you can reliably do that, since the peer might update some refs from a collection out of sync with other refs from the same collection (say, they update from a USB stick which has some, but not all, apps). At that point, whatever ostree-metadata commit they have for that collection (new or old) is going to have a mapping of collection–ref to checksum which doesn’t match some of the refs in the peer’s repository.

That’s why I think the verification of “is it OK for collection–ref R to point to commit C” needs to be done via commit bindings, since the verification data (the binding) is guaranteed to be in sync with the object you’re verifying (the commit).

I see two ways of dealing with this:

  1. Operate on commit checksums, rather than collection–refs, everywhere except in the initial mapping from collection–ref to checksum (which happens using the map from the summary file). Then the metadata for several versions (commits) of an app can be carried around without conflicting, and peers use the data for whichever commit they have for a given app.
  2. Be OK with (getting a token and) downloading commit objects which we might then throw away because their binding data doesn’t validate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i don't believe the change in ostreedev/ostree#1946 is really useful/correct, and has commented so there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, from a practical point of view, its not going to be guaranteed that all the refs in a p2p peer were downloaded at the same time, but requiring them to match the ostree-meta from upstream requires that. And anyway, we can just verify the signatures and binding of the commit objects we download, so why would we need to verify the commits themselves against the ostree-metadata.

Now, obviously this is for the refs themselves, for the data in the ostree-meta commit we do need to know what commits it refers to, but the solution in this PR seems more efficient and clean for achieving that.

@alexlarsson
Copy link
Member Author

@kalev @hughsie Hey, can you take a look at the API here that gnome-software would have to implement. In particular 8134dd4

Does that seem reasonable?

@alexlarsson
Copy link
Member Author

I did some work on flat-manager to support the token-type data and require authentication: flatpak/flat-manager#29

@kalev
Copy link
Contributor

kalev commented Oct 23, 2019

@kalev @hughsie Hey, can you take a look at the API here that gnome-software would have to implement. In particular 8134dd4

Does that seem reasonable?

I think it looks reasonable, but as always, the real test is trying to actually use it :) We'll see how well it works once someone tries to implement the gnome-software side.

}

g_print ("Waiting for browser...\n");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like this is actually doing any waiting ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its waiting for the browser to handle the url, and eventually get redirected to the local host uri telling the Authenticator the token. After that it will send the done signal and continue .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, it will return to the main loop spinning code.

if (browser == NULL)
browser = "xdg-open";

/* TODO: Use better way to find default browser */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just use g_app_info_launch_default_for_uri ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

/* out */
char *resolved_commit;
GBytes *resolved_metadata;
guint64 download_size;
guint64 installed_size;
char *eol;
char *eol_rebase;
gint32 token_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed discussion of this somewhere else in this PR, but I was hoping to understand the semantics of this a bit more. From the flat-manager side:

I’ll have a more detailed look later, but for the token type there are two answers. First the technical, token type zero is “behave as before” (ie no tokens needed), and anything else is undefined in general. Any non zero value triggers an authenticator call passing the ref and the token type. It is then up to the Authenticator what to do about it. It could even return an empty token if it want to signal that no token was needed anyway.

The second answer is how I imagine flathub to use this. I think we’ll have two values. One for “needs purchase” and one for “needs donation flow”. Both of these will just run some webflow that end up in a possibly empty token. However the purchase one will start with a login and will always try to get a token, whereas the donation one will uses an anonymous webflow the first time and then track locally that it already did so so it can be skipped the next time.

Perhaps this should be a flags value rather than a specific value? Seems like that might be a little more flexible down the road.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "flags value" here. Its intentionally an opaque (except zero) integer to flatpak in order to allow each authenticator to decide how to use this as best fits the requirements for that particular backend.


data.request = request;
if (!flatpak_auth_request_ref_tokens (authenticator, request, remote, refs, cancellable, error))
return FALSE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are returning without resetting priv->active_webflow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean, this is an async call and its reset a few lines below after iterating the mainloop, no?

This is unused, but can be inserted in the tests if something is
segfaulting so that you can see the backtrace.
These are explicitly made short to save space, so lets have defines
for them to make sure we don't mistype them, especially as we
will be adding new keys.
If we're not doing fancy ui, print errors on stderr. This was biting me
in some tests where I'd like to grep for some strings in the stderr
output.
Historically the p2p resolve code always did a parallel call to find
all the available commits for the refs, and then it took the results
and pulled only the commits for all the refs so that it could resolve
against the exact commits that were available (which might not match
with whatever metadata we have in the local ostree-metadata copy.

This splits this into two phases, the first that uses the summary only,
and a second one that pulls the commit.

The reason for this is that we want to be able to do some stuff inbetween
these, such as resolving some refs via the ostree-metadata and maybe
requesting bearer tokens that we need for pulling the commit objects.
This tries to resolve the p2p resolve operation from the info in
a ostree-metadata commit. This only works if the resolve ended up
on the same commit id as what was available in the ostree-metadata
which may not be correct if the two are not synchronized.
If the commit is available in the ostree-metadata and it matches what
the latest available commit in the p2p results then resolve it to that, so
we don't have to download the commit object.
Also, since the lower level APIs don't allow you to pass different tokens
for different parts change this function to support passing a subset
of the resolves, so that we can pass all that need a specific token in
one go, and then call this multiple times. The way we handle this is
by saving all the original ref_to_checksum hashtables for all results
and then re-create them with the subset of refs needed when pulling.
The p2p case is kinda weird wrt tokens. We can do most of the basics,
like which refs need updating using the partial summary from the p2p
mirrors, but we can't rely 100% on the ostree-metadata info for core
info like permissions or dependencies, since it may be out-of-sync.

So, if the information in the ostree-metadata doesn't match the
commit we're resolving, the p2p resolve code actually pulls the actual
commit objects as part of a resolve.

Now, the commit objects are protected by bearer tokens, so we need to
pass them while doing this pull. Unfortunately the information about
which refs requires tokens are part of the commit, which is a circular
dependency. We resolve this by relying on the (possibly stale, but
probably ok) copy of the need-token info in the ostree-repo metadata.

So, we do the first part of the p2p resolve, then for all the
not-yet-resolved ops (i.e. ones that actually need updates) we look
in the ostree-metadata for which refs need tokens, generate tokens
and then do the pulling with the tokens.

This is an iterative process, because resolving a ref can create more
update operations, which may need more tokens.
Unfortunately we lose some error information when we pull multiple
refs, ending with a generic "something failed" error rather than the
401 error so in the p2p case we can't verify that we get the right
errors.
For now this just has a portal-like API for requesting tokens
for a list of refs.
This will be used by later code in combination with the gdbus
generated code.
This is a trivial implementation of org.freedesktop.Flatpak.Authenticator
that just reads the contents of the "required-token" file and returns
that as the tokens for all refs.
When we need a bearer token, look up the configured authenticator for
the remote and ask it for tokens. Also updates the test-auth test
with to use the new test authenticator instead of the previous
env var hack.
When resolving the transactions we call RequestRefTokens as needed
to get bearer tokens for some refs. These calls can also emit
the Webflow signal on the request object with a url. It is then
up to the client to show this url in some way.

Once the required operations are done in the browser it will redirect
to some url that will reach the authenticator, telling it that the
operation is done and the final result of it. At that point the
authenticator will emit the WebflowDone signal and continue.

If the cliend doesn't want to do the web flow it can call the close
operation on the request object.
If request-webflow file exists, then the authenticator will listen
to a local socket and start a webflow request with a uri pointing to it.
If anything connects to the uri it will consider the flow ok and continue.
If the client calls close() instead it will silently succeed anyway
if require-webflow doesn't exists, and fail if is exists.
These signals are emitted when the authenticator needs some kind of
web-based authentication. If the caller implements webflow-start and
returns TRUE, then it needs to show the user the URL and allow the user
to interact with it.

Typically this ends with the web-page being redirected to a url to
localhost or similar which tells the authenticator the result of the
operations. This will cause the webflow-done signal to be emitted and
the transaction operation to continue. If something goes wrong (or the
signal is not handled) it will also report webflow-done, but then the
transaction will fail in a normal way.

Generally all users of FlatpakTransaction need to do is:

 On webflow-start, show a browser window with the url and return TRUE.

 On webflow-done, close the browser window if its still visible.

 If the user closes the browser window early, call
 flatpak_transaction_abort_webflow().
This shows the url in the default browser. Currenly it just looks
at $BROWSER and falls back to xdg-open.
We just verify that we can roundtrip via the cli by setting
BROWSER=curl to "finish" the webflow.
This makes it very easy to reuse a single authenticator for several
remotes. This is useful for the a default authenticator implementation
that we can ship with flatpak and use for e.g. flathub.
This allows the authenticator to handle each token type differently.
For example, this allows a "purchase" type to run the donation
webflow, but not require login (and then store the fact that this was
run locally).
This allows the authenticator to directly do UI and parent it to the
relevant window. The actual parent string is specified just like
the xdg-desktop-portal one.

There is a new flatpak_transaction_set_parent_window() function that
clients can use to signal the what window they want to be parented to.
This is from the summary and can be used as the default token type
if all/most refs need a token.
@alexlarsson
Copy link
Member Author

Rebase, now without the authenticator. Instead this is available separately here: https://github.com/flatpak/flatpak-sample-authenticator

The long term plan is to allow installing the authenticator from flatpak itself automatically, this will allow updating it easier.

@alexlarsson
Copy link
Member Author

I think the basics here is ok, and I want to do an unstable release with this in it so that people can test it easier. So, i'm going to merge this as is, even though I plan to continue working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants