Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming API won't accept client_credentials Access Tokens #24116

Open
mszajna opened this issue Mar 15, 2023 · 26 comments
Open

Streaming API won't accept client_credentials Access Tokens #24116

mszajna opened this issue Mar 15, 2023 · 26 comments
Labels
api REST API, Streaming API, Web Push API streaming Streaming server suggestion Feature suggestion

Comments

@mszajna
Copy link

mszajna commented Mar 15, 2023

Steps to reproduce the problem

  1. Register an app following the docs
  2. Receive client_id and client_secret in the response
  3. Authenticate using client_credentials flow following the docs
  4. Receive an access_token
  5. Try to use the token to access streaming endpoints curl https://streaming.mastodon.social/api/v1/streaming/public -H"Authorization: Bearer $ACCESS_TOKEN"

Expected behaviour

200 OK

Actual behaviour

401, {"error":"Error: Invalid access token"}

Detailed description

Since #23989 the streaming API no longer support unauthenticated access. However, it looks like it won't accept traffic authenticated via client_credentials flow either.

The endpoint used is documented as

GET /api/v1/streaming/public HTTP/1.1
Returns all public statuses
Returns: update, delete, status.update
OAuth: Public, or app token + read:statuses

My understanding is that "app token" refers precisely to client_credentials-authenticated machine-to-machine traffic. (And public access is gone as of #23989)

Looks like the issue relates to this query that assumes the token has a resource_owner_id, which is not the case for machine-to-machine tokens. The problem seems to predate #23989, only that before there was little reason to attempt to authenticate these requests.

Specifications

Mastodon 4.1.0

@mszajna mszajna added the bug Something isn't working label Mar 15, 2023
@kabiiQ
Copy link

kabiiQ commented May 2, 2023

Experiencing this as well trying to integrate Mastodon! Doesn't look intentional.

@Mai-Lapyst
Copy link

Digged around and found out this is actualy because the endpoint /oauth/token gives you a token that is not bound to any user, but the streaming api expects one here:

https://github.com/ClearlyClaire/mastodon/blob/e52169334bfbd8dc72641f340111e9db8072dd4a/streaming/index.js#L214

Utilizing an token that is indeed bound to an user gets the correct result. (You can simply accire one by going in the browser to /settings/applications and either create a new application or select one present to show & manage your users access token.)

@ThisIsMissEm
Copy link
Contributor

If @Gargron approves of support client credentials / app tokens for streaming's authentication strategy, and someone wishes to sponsor the development work, then I'd be happy to work on this.

I'd estimate this to be at least several hours of development work, so like €250-600 to develop this, using a mid-market rate freelance development. (If I applied my actual freelancing rate, it'd be €500-1000)

@astro
Copy link

astro commented Jul 28, 2023

The streaming API actually accepts the token in a GET query parameter: https://mastodon.example.com/api/v1/streaming/public?access_token=...

@ThisIsMissEm
Copy link
Contributor

The streaming API actually accepts the token in a GET query parameter: https://mastodon.example.com/api/v1/streaming/public?access_token=...

Yes, you can pass it there, but then it may end up in logs, which is considered unsafe, but the alternative is via abusing the Sec-WebSocket-Subprotocol header. We need to refactor this to actually use one-time/short-lived tokens issued from Rails, but this is a big breaking change.

Additionally the token should be passed in the subscribe message (i.e., it shouldn't be different "endpoints" on streaming, just the one, and everything should happen via subscribe messages.

For eventsource, it's a bit more complex as you need a URL for it.

@ThisIsMissEm
Copy link
Contributor

But, regardless, the access_token parameter still won't work for client credentials, only user's access tokens.

@andypiper andypiper added the api REST API, Streaming API, Web Push API label Aug 8, 2023
@ThisIsMissEm
Copy link
Contributor

@andypiper I noticed that you've labelled this as api but it also needs the streaming label, please :)

@trwnh trwnh added the streaming Streaming server label Sep 16, 2023
@alexjc
Copy link

alexjc commented Oct 14, 2023

I'm willing to put up €250 as a bounty for this, in order to make the streaming API work if you have a valid OAuth app as a client but not necessarily a user account. If someone else is also able to contribute that'd be awesome too...

(Thanks for the breakdown @ThisIsMissEm!)

@lxcode
Copy link

lxcode commented Oct 14, 2023

I'd be happy to contribute as well, but AFAIK I think it still needs signoff from @Gargron.

@ThisIsMissEm
Copy link
Contributor

Okay, given the interest, I've just been taking a look this afternoon, whilst the best way forwards on Streaming is almost certainly refactoring away from access tokens as the authentication mechanism (towards one-time URLs generated by Ruby/Rails), this would be a major breaking change, so something we could only do on a /api/v2/streaming endpoint.

If we say we want to have client credentials supported on the existing v1 endpoints for streaming, then we'd need to change some of the code dealing with authentication (we'd only allow client credentials to connect to public channels), we'd also need to handle the lack of an accountId in many places (where it's assumed to exist, such as the connections to the "system" channels).

There are also two refactors I'd like to do in the streaming code that would improve it's performance, which is merging all the system channels into one (they're relatively low volume) as currently system channels make up the bulk of redis subscriptions. Additionally, refactoring the filtering code to actually pre-calculate the filters, outside of the processing of a message for sending (currently in certain circumstances, every single message on a public channel can hit the database with a query, which produces significantly more database load.

That said, those two things would be optional on the scale of "making client credentials work"

@ThisIsMissEm
Copy link
Contributor

ThisIsMissEm commented Oct 14, 2023

I've just finished a spike on implementing support for Client Credentials in Streaming this afternoon, and I think I have it working.

Open questions we should likely consider:

  • should there be an additional scope for allowing access using client credentials to read streaming?
  • should administrators be able to disable streaming access for client credentials?
  • should administrators of instances be able to see all OAuth Applications that are registered with the instance (especially those accessing streaming APIs)?

Implementing this took me ~4-6 hours.

I did also discover an interesting security concern relating to streaming, which I'll report via security advisories (just in case there is actually a security impact)

@alexjc
Copy link

alexjc commented Oct 17, 2023

@ThisIsMissEm That's awesome news! I'm all for the additional scope (if it doesn't break anything else), as that'd help with oversight and administration.

@Gargron I think this is an important feature as many community-building services are no longer possible through code/bots anymore now that the public/streaming APIs have been closed up in 4.2. Centralized social media have algorithmic feeds, but distributed networks like Mastodon rely on third-party bots to do community-building in an opt-in way, and allowing OAuth access is a great way to do this so that it's transparent and respectful. So, I hope this gets considered for fixing soon!

@ThisIsMissEm
Copy link
Contributor

So in a big change, BuzzRelay has actually implemented proper Relay support, so this may no longer be necessary as a path forwards.

I have suggested that there should be a flow that helps an instance operator add a relay from a website, so like you visit https://relay.fedi.buzz/ and there's a form that accepts your instance URL, and then it redirects you into a flow in the admin panel to add the relay, communicating exactly what this means.

@alexjc
Copy link

alexjc commented Oct 19, 2023

I feel un-easy about relying on one central service for things that should be more distributed. A functional OAuth system for clients feels like a worthwhile thing to have regardless of this...

@ThisIsMissEm
Copy link
Contributor

I feel un-easy about relying on one central service for things that should be more distributed. A functional OAuth system for clients feels like a worthwhile thing to have regardless of this...

@alexjc the point is more the most correct and consensual method of implementing this sort of data collection and redistribution is to use opt-in relays, client credentials in the streaming API is a hack, and has privacy implications (and also data quality issues where you may not receive all the events, e.g., deletes, due to network interruptions or software restarts. Streaming isn't based on a durable queue of events, but instead on ephemeral point-in-time events: there's no way to process events that happened whilst you weren't connected.

Relays on the other hand follow standard activitypub redistribution logic, and have delivery retries and timeouts, and the machinery allows for the relay to say "hey, I was down for a bit, let me query each outbox I know about to get any events I missed"

So I've mixed feelings about actually implementing this if folks are starting to switch to and support relays for this type of activity. I think effort may be better spent in supporting things like "relay requests" and a "add relay" workflow, that's kinda oauth-y in that it uses redirects to handle setting up the relay.

@ThisIsMissEm
Copy link
Contributor

I think it's also compound by, as I noted, no way for admins to see who is connected to streaming and extracting data from a server, where as relay's are very opaque: you have a defined list of them, and they'll only ever receive public activities.

We could also add a mechanism for "OAuth Application Management" to the admin panel, but ever install of a public client (any mobile or browser-based app, that doesn't have a server component) appears as an entirely new application registration, so it can be difficult to say "group by app name" or something similar.. and you don't really want a table of thousands of similar rows.

@alexjc
Copy link

alexjc commented Oct 19, 2023

Hmm, the fact that Relaying is opt-in completely negates its viability as a solution here. Only OAuth would really work, assuming it did work for clients — as one would expect from the API docs. This is classified as a bug in the tracker, and I agree. I think it should be fixed as the recent change to the defaults in 4.2.0 is harming the social cohesion of the fediverse.

@ThisIsMissEm
Copy link
Contributor

Right, but you understand that the lack of consent here is a real conflict point, right? Like, as far as I know, a lot of the Mastodon APIs will not work without a user present, i.e., client credentials don't generally work due to the permissions system in Mastodon.

i.e., currently you can approve an OAuth 2.0 application for the admin:read:domain_blocks scope whilst not being an administrator with those permissions, and the API call just fails due to a permissions error, despite you using an access token or client credentials with the appropriate scope.

@alexjc
Copy link

alexjc commented Oct 19, 2023

Well, 4.2 shut down the public streaming without consent of the server admins. What you're bringing up are interesting design issues going forward, but the reality of this is an API that already exists and used to work. Now it no longer works as you'd expect from the docs. It's labeled as a bug.

@ThisIsMissEm
Copy link
Contributor

Well, 4.2 shut down the public streaming without consent of the server admins.

Yes, to bring that API inline with the security constraints present on the other APIs.

It's labeled as a bug.

Labels can be wrong.

Like, we could implement this, but it does have some serious data privacy and security implications, and even though I do have a working implementation, I think it'll be blocked on rectifying the ability to control and moderate applications, as certainly not ever application with client credentials will work in ethical ways. (see also, cambridge analytica or the way in which malicious servers/people try to subvert blocks placed on them in order to cause harm.

@alexjc
Copy link

alexjc commented Oct 20, 2023

Interesting how you only mentioned security concerns 15 messages into the discussion, and even then you provided a workaround that claimed it was a better technical solution with the data still available. I count at least three ways to get the data; it's a public social network after all.

To me this is only a question of HOW we want good actors to access the network, not so much about security. OAuth from a client perspective seems the cleanest, elegant and most standard approach. (Sorry to hear the code is not as elegant as you'd like, I'd agree those refactors you mention are worth doing if it's that bad.)

@ThisIsMissEm
Copy link
Contributor

ThisIsMissEm commented Oct 20, 2023 via email

@lxcode
Copy link

lxcode commented Oct 26, 2023

To be clear, I think fediverse-wide relays are a good idea — they'll make lots of things easier. But coming from the T&S research community use case: what we're trying to achieve here is the general utility of "the firehose", like Twitter and now Bluesky. In this world, you could simply add rules to stream messages with hashtags related to white supremacy, NCII or child abuse across the entirety of the service, or analyze trends in a random overall sample of public messages.

Obviously servers dedicated to or tolerant of malicious activity are not going to opt in to a relay. It is prohibitive to create an account when faced with 10k+ servers. In these cases, the alternative to a public streaming API isn't relays, it's people switching to more resource-intensive web scraping or doing wasteful, wide-scale ingest of many federated timelines, then deduplicating. If one wants to study Mastodon in a manner akin to Twitter, relays will rarely work because they won't be representative.

With regards to consent, I assume users have already consented to their public messages being public when they sign up to a public instance. What we're talking about here is simply mode of access. So with an eye toward balancing these concerns, it seems like the options are a combination of:

  • New status quo. Streaming requires an account.
  • Resolving the issue here: streaming API accessible from a registered app.

Plus:

  • Opt-in relays. Not utterly useless, but not very useful. Basically unusable for any kind of research purpose.
  • Opt-out relays. This would be more stable than streaming API and would be closest to a firehose.

Regardless of whether the streaming API requires an account (I of course prefer it didn't), I think opt-out relays are a better solution than opt-in. One way to think of this could be: unless you are a private server (i.e. one that does not expose an unauthenticated feed on your front page), you by default enable the streaming API in some fashion and also opt into the relay system. Would this be potentially workable?

@erincandescent
Copy link

I think "Streaming API is accessible (by default) without credentials for feeds which are public" is a fine option, but its key (to me) that its more of an informed decision than (just) enabled by default.

I'm not sure what being "opted-in to the relay system by default" would entail - a relay that anyone could subscribe to? a bidirectional relay? (Many of the biggest operating relays have absolutely gigantic amounts of traffic, and are known for absolutely swamping smaller instances)

@ocdtrekkie
Copy link

@lxcode A big problem is that the equivalent of the firehose is the relay setup where you need to gain the server's consent. While they might have handed it out fairly freely to researchers, there are prominent stories about how sensitive Twitter considered access to the firehose: https://www.theverge.com/2022/6/8/23159898/twitter-musk-firehose-bot-complaints-data-sec-deal

Making it possible for you to access it without the server owner's permission would be dangerous and definitely not equivalent to letting anyone with an account pull it.

@astro
Copy link

astro commented Dec 19, 2023

@ocdtrekkie

because of its value for ad-targeting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api REST API, Streaming API, Web Push API streaming Streaming server suggestion Feature suggestion
Projects
None yet
Development

No branches or pull requests