Streaming API won't accept client_credentials Access Tokens #24116

mszajna · 2023-03-15T11:37:08Z

Steps to reproduce the problem

Register an app following the docs
Receive client_id and client_secret in the response
Authenticate using client_credentials flow following the docs
Receive an access_token
Try to use the token to access streaming endpoints curl https://streaming.mastodon.social/api/v1/streaming/public -H"Authorization: Bearer $ACCESS_TOKEN"

Expected behaviour

200 OK

Actual behaviour

401, {"error":"Error: Invalid access token"}

Detailed description

Since #23989 the streaming API no longer support unauthenticated access. However, it looks like it won't accept traffic authenticated via client_credentials flow either.

The endpoint used is documented as

GET /api/v1/streaming/public HTTP/1.1
Returns all public statuses
Returns: update, delete, status.update
OAuth: Public, or app token + read:statuses

My understanding is that "app token" refers precisely to client_credentials-authenticated machine-to-machine traffic. (And public access is gone as of #23989)

Looks like the issue relates to this query that assumes the token has a resource_owner_id, which is not the case for machine-to-machine tokens. The problem seems to predate #23989, only that before there was little reason to attempt to authenticate these requests.

Specifications

Mastodon 4.1.0

The text was updated successfully, but these errors were encountered:

kabiiQ · 2023-05-02T19:24:56Z

Experiencing this as well trying to integrate Mastodon! Doesn't look intentional.

Mai-Lapyst · 2023-07-21T15:48:11Z

Digged around and found out this is actualy because the endpoint /oauth/token gives you a token that is not bound to any user, but the streaming api expects one here:

https://github.com/ClearlyClaire/mastodon/blob/e52169334bfbd8dc72641f340111e9db8072dd4a/streaming/index.js#L214

Utilizing an token that is indeed bound to an user gets the correct result. (You can simply accire one by going in the browser to /settings/applications and either create a new application or select one present to show & manage your users access token.)

ThisIsMissEm · 2023-07-27T15:07:25Z

If @Gargron approves of support client credentials / app tokens for streaming's authentication strategy, and someone wishes to sponsor the development work, then I'd be happy to work on this.

I'd estimate this to be at least several hours of development work, so like €250-600 to develop this, using a mid-market rate freelance development. (If I applied my actual freelancing rate, it'd be €500-1000)

astro · 2023-07-28T11:46:42Z

The streaming API actually accepts the token in a GET query parameter: https://mastodon.example.com/api/v1/streaming/public?access_token=...

ThisIsMissEm · 2023-07-29T13:35:41Z

The streaming API actually accepts the token in a GET query parameter: https://mastodon.example.com/api/v1/streaming/public?access_token=...

Yes, you can pass it there, but then it may end up in logs, which is considered unsafe, but the alternative is via abusing the Sec-WebSocket-Subprotocol header. We need to refactor this to actually use one-time/short-lived tokens issued from Rails, but this is a big breaking change.

Additionally the token should be passed in the subscribe message (i.e., it shouldn't be different "endpoints" on streaming, just the one, and everything should happen via subscribe messages.

For eventsource, it's a bit more complex as you need a URL for it.

ThisIsMissEm · 2023-07-29T13:36:40Z

But, regardless, the access_token parameter still won't work for client credentials, only user's access tokens.

ThisIsMissEm · 2023-09-16T18:17:38Z

@andypiper I noticed that you've labelled this as api but it also needs the streaming label, please :)

alexjc · 2023-10-14T07:37:58Z

I'm willing to put up €250 as a bounty for this, in order to make the streaming API work if you have a valid OAuth app as a client but not necessarily a user account. If someone else is also able to contribute that'd be awesome too...

(Thanks for the breakdown @ThisIsMissEm!)

lxcode · 2023-10-14T13:05:41Z

I'd be happy to contribute as well, but AFAIK I think it still needs signoff from @Gargron.

ThisIsMissEm · 2023-10-14T17:51:46Z

Okay, given the interest, I've just been taking a look this afternoon, whilst the best way forwards on Streaming is almost certainly refactoring away from access tokens as the authentication mechanism (towards one-time URLs generated by Ruby/Rails), this would be a major breaking change, so something we could only do on a /api/v2/streaming endpoint.

If we say we want to have client credentials supported on the existing v1 endpoints for streaming, then we'd need to change some of the code dealing with authentication (we'd only allow client credentials to connect to public channels), we'd also need to handle the lack of an accountId in many places (where it's assumed to exist, such as the connections to the "system" channels).

There are also two refactors I'd like to do in the streaming code that would improve it's performance, which is merging all the system channels into one (they're relatively low volume) as currently system channels make up the bulk of redis subscriptions. Additionally, refactoring the filtering code to actually pre-calculate the filters, outside of the processing of a message for sending (currently in certain circumstances, every single message on a public channel can hit the database with a query, which produces significantly more database load.

That said, those two things would be optional on the scale of "making client credentials work"

ThisIsMissEm · 2023-10-14T20:43:14Z

I've just finished a spike on implementing support for Client Credentials in Streaming this afternoon, and I think I have it working.

Open questions we should likely consider:

should there be an additional scope for allowing access using client credentials to read streaming?
should administrators be able to disable streaming access for client credentials?
should administrators of instances be able to see all OAuth Applications that are registered with the instance (especially those accessing streaming APIs)?

Implementing this took me ~4-6 hours.

I did also discover an interesting security concern relating to streaming, which I'll report via security advisories (just in case there is actually a security impact)

alexjc · 2023-10-17T07:49:57Z

@ThisIsMissEm That's awesome news! I'm all for the additional scope (if it doesn't break anything else), as that'd help with oversight and administration.

@Gargron I think this is an important feature as many community-building services are no longer possible through code/bots anymore now that the public/streaming APIs have been closed up in 4.2. Centralized social media have algorithmic feeds, but distributed networks like Mastodon rely on third-party bots to do community-building in an opt-in way, and allowing OAuth access is a great way to do this so that it's transparent and respectful. So, I hope this gets considered for fixing soon!

ThisIsMissEm · 2023-10-18T15:32:14Z

So in a big change, BuzzRelay has actually implemented proper Relay support, so this may no longer be necessary as a path forwards.

I have suggested that there should be a flow that helps an instance operator add a relay from a website, so like you visit https://relay.fedi.buzz/ and there's a form that accepts your instance URL, and then it redirects you into a flow in the admin panel to add the relay, communicating exactly what this means.

alexjc · 2023-10-19T08:54:45Z

I feel un-easy about relying on one central service for things that should be more distributed. A functional OAuth system for clients feels like a worthwhile thing to have regardless of this...

ThisIsMissEm · 2023-10-19T14:25:34Z

I feel un-easy about relying on one central service for things that should be more distributed. A functional OAuth system for clients feels like a worthwhile thing to have regardless of this...

@alexjc the point is more the most correct and consensual method of implementing this sort of data collection and redistribution is to use opt-in relays, client credentials in the streaming API is a hack, and has privacy implications (and also data quality issues where you may not receive all the events, e.g., deletes, due to network interruptions or software restarts. Streaming isn't based on a durable queue of events, but instead on ephemeral point-in-time events: there's no way to process events that happened whilst you weren't connected.

Relays on the other hand follow standard activitypub redistribution logic, and have delivery retries and timeouts, and the machinery allows for the relay to say "hey, I was down for a bit, let me query each outbox I know about to get any events I missed"

So I've mixed feelings about actually implementing this if folks are starting to switch to and support relays for this type of activity. I think effort may be better spent in supporting things like "relay requests" and a "add relay" workflow, that's kinda oauth-y in that it uses redirects to handle setting up the relay.

ThisIsMissEm · 2023-10-19T14:29:05Z

I think it's also compound by, as I noted, no way for admins to see who is connected to streaming and extracting data from a server, where as relay's are very opaque: you have a defined list of them, and they'll only ever receive public activities.

We could also add a mechanism for "OAuth Application Management" to the admin panel, but ever install of a public client (any mobile or browser-based app, that doesn't have a server component) appears as an entirely new application registration, so it can be difficult to say "group by app name" or something similar.. and you don't really want a table of thousands of similar rows.

alexjc · 2023-10-19T19:01:39Z

Hmm, the fact that Relaying is opt-in completely negates its viability as a solution here. Only OAuth would really work, assuming it did work for clients — as one would expect from the API docs. This is classified as a bug in the tracker, and I agree. I think it should be fixed as the recent change to the defaults in 4.2.0 is harming the social cohesion of the fediverse.

ThisIsMissEm · 2023-10-19T19:29:56Z

Right, but you understand that the lack of consent here is a real conflict point, right? Like, as far as I know, a lot of the Mastodon APIs will not work without a user present, i.e., client credentials don't generally work due to the permissions system in Mastodon.

i.e., currently you can approve an OAuth 2.0 application for the admin:read:domain_blocks scope whilst not being an administrator with those permissions, and the API call just fails due to a permissions error, despite you using an access token or client credentials with the appropriate scope.

alexjc · 2023-10-19T20:05:31Z

Well, 4.2 shut down the public streaming without consent of the server admins. What you're bringing up are interesting design issues going forward, but the reality of this is an API that already exists and used to work. Now it no longer works as you'd expect from the docs. It's labeled as a bug.

ThisIsMissEm · 2023-10-19T20:40:46Z

Well, 4.2 shut down the public streaming without consent of the server admins.

Yes, to bring that API inline with the security constraints present on the other APIs.

It's labeled as a bug.

Labels can be wrong.

Like, we could implement this, but it does have some serious data privacy and security implications, and even though I do have a working implementation, I think it'll be blocked on rectifying the ability to control and moderate applications, as certainly not ever application with client credentials will work in ethical ways. (see also, cambridge analytica or the way in which malicious servers/people try to subvert blocks placed on them in order to cause harm.

alexjc · 2023-10-20T05:25:08Z

Interesting how you only mentioned security concerns 15 messages into the discussion, and even then you provided a workaround that claimed it was a better technical solution with the data still available. I count at least three ways to get the data; it's a public social network after all.

To me this is only a question of HOW we want good actors to access the network, not so much about security. OAuth from a client perspective seems the cleanest, elegant and most standard approach. (Sorry to hear the code is not as elegant as you'd like, I'd agree those refactors you mention are worth doing if it's that bad.)

ThisIsMissEm · 2023-10-20T08:19:24Z

ActivityPub would actually be the most standard and cleanest/elegant approach, as that's what powers the Fediverse, not Mastodon's specific APIs. Relays implement ActivityPub. Anyway, as for "not mentioning it earlier", in August I did mention it to the FediBuzz developer: https://hachyderm.io/@thisismissem/110854193383112425 However, I often have a lot of other work going on and do not exclusively work on Mastodon's streaming server, so I may have forgotten to share my concerns here. Likewise, I'd only taken the briefest of look back in July since I was busy with other things. Other's have also raised concerns in the conversations surrounding this ticket happening in other places. If you want to build for the Fediverse, implement something like a Relay and use ActivityPub. If you only want Mastodon, maybe streaming, but it is inherently lossy (you don't get events if not connected) and it is designed first and foremost for end-user clients of Mastodon, not for data extraction or data mining. Consent is important, and in it's current form we lack the adequate tools to ensure trust and safety were we to implement this feature request. (updated to fix GitHub mangling the email reply)

lxcode · 2023-10-26T15:14:55Z

To be clear, I think fediverse-wide relays are a good idea — they'll make lots of things easier. But coming from the T&S research community use case: what we're trying to achieve here is the general utility of "the firehose", like Twitter and now Bluesky. In this world, you could simply add rules to stream messages with hashtags related to white supremacy, NCII or child abuse across the entirety of the service, or analyze trends in a random overall sample of public messages.

Obviously servers dedicated to or tolerant of malicious activity are not going to opt in to a relay. It is prohibitive to create an account when faced with 10k+ servers. In these cases, the alternative to a public streaming API isn't relays, it's people switching to more resource-intensive web scraping or doing wasteful, wide-scale ingest of many federated timelines, then deduplicating. If one wants to study Mastodon in a manner akin to Twitter, relays will rarely work because they won't be representative.

With regards to consent, I assume users have already consented to their public messages being public when they sign up to a public instance. What we're talking about here is simply mode of access. So with an eye toward balancing these concerns, it seems like the options are a combination of:

New status quo. Streaming requires an account.
Resolving the issue here: streaming API accessible from a registered app.

Plus:

Opt-in relays. Not utterly useless, but not very useful. Basically unusable for any kind of research purpose.
Opt-out relays. This would be more stable than streaming API and would be closest to a firehose.

Regardless of whether the streaming API requires an account (I of course prefer it didn't), I think opt-out relays are a better solution than opt-in. One way to think of this could be: unless you are a private server (i.e. one that does not expose an unauthenticated feed on your front page), you by default enable the streaming API in some fashion and also opt into the relay system. Would this be potentially workable?

erincandescent · 2023-11-01T22:38:36Z

I think "Streaming API is accessible (by default) without credentials for feeds which are public" is a fine option, but its key (to me) that its more of an informed decision than (just) enabled by default.

I'm not sure what being "opted-in to the relay system by default" would entail - a relay that anyone could subscribe to? a bidirectional relay? (Many of the biggest operating relays have absolutely gigantic amounts of traffic, and are known for absolutely swamping smaller instances)

ocdtrekkie · 2023-12-18T22:30:50Z

@lxcode A big problem is that the equivalent of the firehose is the relay setup where you need to gain the server's consent. While they might have handed it out fairly freely to researchers, there are prominent stories about how sensitive Twitter considered access to the firehose: https://www.theverge.com/2022/6/8/23159898/twitter-musk-firehose-bot-complaints-data-sec-deal

Making it possible for you to access it without the server owner's permission would be dangerous and definitely not equivalent to letting anyone with an account pull it.

astro · 2023-12-19T01:32:51Z

@ocdtrekkie

because of its value for ad-targeting

mszajna added the bug Something isn't working label Mar 15, 2023

andypiper added the api REST API, Streaming API, Web Push API label Aug 8, 2023

trwnh added the streaming Streaming server label Sep 16, 2023

vmstan added suggestion Feature suggestion and removed bug Something isn't working labels Oct 19, 2023

This was referenced Oct 19, 2023

Possible to issue Access Tokens without correct underlying user permissions #27477

Open

Ability for Administrators to manage OAuth 2.0 applications that have access to their instance #27478

Open

ronilaukkarinen mentioned this issue Oct 26, 2023

Use relays instead of Streaming API ronilaukkarinen/fedionfire#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming API won't accept client_credentials Access Tokens #24116

Streaming API won't accept client_credentials Access Tokens #24116

mszajna commented Mar 15, 2023

kabiiQ commented May 2, 2023

Mai-Lapyst commented Jul 21, 2023

ThisIsMissEm commented Jul 27, 2023

astro commented Jul 28, 2023

ThisIsMissEm commented Jul 29, 2023

ThisIsMissEm commented Jul 29, 2023

ThisIsMissEm commented Sep 16, 2023

alexjc commented Oct 14, 2023

lxcode commented Oct 14, 2023

ThisIsMissEm commented Oct 14, 2023

ThisIsMissEm commented Oct 14, 2023 •

edited

alexjc commented Oct 17, 2023 •

edited

ThisIsMissEm commented Oct 18, 2023

alexjc commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

alexjc commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

alexjc commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

alexjc commented Oct 20, 2023 •

edited

ThisIsMissEm commented Oct 20, 2023 via email •

edited

lxcode commented Oct 26, 2023

erincandescent commented Nov 1, 2023

ocdtrekkie commented Dec 18, 2023

astro commented Dec 19, 2023

Streaming API won't accept client_credentials Access Tokens #24116

Streaming API won't accept client_credentials Access Tokens #24116

Comments

mszajna commented Mar 15, 2023

Steps to reproduce the problem

Expected behaviour

Actual behaviour

Detailed description

Specifications

kabiiQ commented May 2, 2023

Mai-Lapyst commented Jul 21, 2023

ThisIsMissEm commented Jul 27, 2023

astro commented Jul 28, 2023

ThisIsMissEm commented Jul 29, 2023

ThisIsMissEm commented Jul 29, 2023

ThisIsMissEm commented Sep 16, 2023

alexjc commented Oct 14, 2023

lxcode commented Oct 14, 2023

ThisIsMissEm commented Oct 14, 2023

ThisIsMissEm commented Oct 14, 2023 • edited

alexjc commented Oct 17, 2023 • edited

ThisIsMissEm commented Oct 18, 2023

alexjc commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

alexjc commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

alexjc commented Oct 19, 2023

ThisIsMissEm commented Oct 19, 2023

alexjc commented Oct 20, 2023 • edited

ThisIsMissEm commented Oct 20, 2023 via email • edited

lxcode commented Oct 26, 2023

erincandescent commented Nov 1, 2023

ocdtrekkie commented Dec 18, 2023

astro commented Dec 19, 2023

ThisIsMissEm commented Oct 14, 2023 •

edited

alexjc commented Oct 17, 2023 •

edited

alexjc commented Oct 20, 2023 •

edited

ThisIsMissEm commented Oct 20, 2023 via email •

edited