Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate alternate private API's that new Reddit uses #818

Open
sigaloid opened this issue Jun 4, 2023 · 28 comments
Open

Investigate alternate private API's that new Reddit uses #818

sigaloid opened this issue Jun 4, 2023 · 28 comments

Comments

@sigaloid
Copy link
Member

sigaloid commented Jun 4, 2023

Related to #785

We need to be prepared in a month for the Reddit API changes. We can begin by looking into private API's that the web app uses. (We don't need to resort to webscraping yet)

@sigaloid
Copy link
Member Author

sigaloid commented Jun 5, 2023

Okay, I've done some initial work looking at the web client and getting a few requests down.

  • Subreddit info (posts, info, etc)
  • User info (comments, posts, etc)
  • Post info (comments, content, etc) (NOT USED in web client currently AFAIK - but still works - subject to removal?)
  • GQL - short snippets of data like user's karma, sub info, etc (could be useful)

I'd like to properly RE the GraphQL - the web client regularly uses an SVC Shreddit HTML rendering endpoint especially for post comments and additional replies, and I don't want to parse HTML as I don't think it's a long term viable strategy anyway.

If anyone has access to a mobile proxy like Charles Proxy and wants to investigate further, please do - I want to find GraphQL queries for all of the above.

You can view my current workspace:

https://hoppscotch.io

Click the box icon to import, click more, then import from Gist. Paste my gist: https://gist.github.com/sigaloid/e48750287236f42c3bc3aba2130ac675

@hogseedy
Copy link

hogseedy commented Jun 5, 2023

Main Reddit GQL schema as used by the Android app (id, name, query): https://gist.github.com/hogseedy/a83a632aedbc813c4dbc0353422efd5f

This dump doesn't have the object descriptions, because they're scattered all around the apk, but it's possible to extract all of them algorithmically. I'll work on it.

Example code using reddit app anonymous mode/native authentication to query GQL (acquired tokens also work on public apis, native app uses them a lot):
https://gist.github.com/hogseedy/b149c5f1ad1b628ba00556c7d4a898f8

Hope it helps!

@sigaloid
Copy link
Member Author

sigaloid commented Jun 5, 2023

Absolutely amazing. Thank you so much! This will help enormously. I'm going to see what I can do with this.

@luphoria
Copy link

luphoria commented Jun 6, 2023

Main Reddit GQL schema as used by the Android app (id, name, query): https://gist.github.com/hogseedy/a83a632aedbc813c4dbc0353422efd5f

Note we can also brute force a (partial) schema using something like https://github.com/nikitastupin/clairvoyance. I've used it before. But for some reason it won't work with gql.reddit.com?. AttributeError: 'NoneType' object has no attribute 'get' Perhaps I need to pass some headers.

You can see there used to be graphiql cached on gql.reddit.com but now it's a 403.
image

@sigaloid
Copy link
Member Author

sigaloid commented Jun 6, 2023

Yeah, I got the same error when trying to use clairvoyance. I don't know if it would work anyhow, as all of the query ID's seem randomly generated and a dictionary attack won't really work. (I'm not wholly familiar with GraphQL, apologies if that's wrong)

There's also https://graphql.kubernetes.ue1.snooguts.net which popped up a few times in my research. Looks like another GQL endpoint of sorts. Might be of interest.

@luphoria
Copy link

luphoria commented Jun 6, 2023

I don't know if it would work anyhow, as all of the query ID's seem randomly generated and a dictionary attack won't really work. (I'm not wholly familiar with GraphQL, apologies if that's wrong)

Queries aren't randomly generated, as per https://gist.github.com/hogseedy/a83a632aedbc813c4dbc0353422efd5f. See for example query AllPosts.

The dictionary attack works because the GQL server (nicely named Apollo 🤪) leaks names based on approximations, as well as parameters, which allows clairvoyance to construct a schema json.

@sigaloid
Copy link
Member Author

sigaloid commented Jun 6, 2023

Ah, I thought the query that clairvoyance would guess was the 12 letter ID. Hopefully that can help us find any other queries not in the Android client!

@luphoria
Copy link

luphoria commented Jun 6, 2023

I did just look and I'm not sure where that ID comes from.
It looks like the gql endpoint is proxied or something? the queries from web client seem messed up. But GQL is very simple in that you get what you asked for, and querying weird IDs returns their true names. Perhaps this is intentional obfuscation. But then I'm more confused by @hogseedy's post because it seems to reference properly structured GQL queries, no weird IDs?
If we were to include all the Android IDs in a wordlist, perhaps clairvoyance could introspect. Perhaps it would work with just the plaintext responses it provides. I guess useful for finding all parameters, and then putting it in a more easy to understand schema json. But likely you're right and it won't be of much use :/

@Opening-Button-8988
Copy link

Reading this as a non-developer and I'm completely lost but I just wanted to say thank you, you're doing gods work.

@maxexcloo
Copy link

I saw this today too, might help?

https://api.reddiw.com/

@sigaloid
Copy link
Member Author

sigaloid commented Jun 6, 2023

That’s really interesting, I wonder how they scrape the data. I’m not sure that we’d be able to use it, a goal is to keep it first-party between just the client and Reddit for privacy reasons. but regardless it looks like a very cool project and will be useful to a lot of people who don’t have the ability to convert their API code over to another method.

@snvoid
Copy link

snvoid commented Jun 9, 2023

I saw this today too, might help?

https://api.reddiw.com/

Seems like they might have been forcued to shut down, as the user who posted it on reddit (developer of reddiw) was banned

@Opening-Button-8988
Copy link

Opening-Button-8988 commented Jun 10, 2023

@snvoid Holy crap. I was looking forward to seeing where that was going.

@ading2210
Copy link

I did just look and I'm not sure where that ID comes from.
@luphoria

On the web client at least, they seem to be hardcoded into the JS. Presumably this is automatically generated by whatever build system they use. For example:

//https://www.redditstatic.com/desktop2x/Governance~ModListing~Reddit~ReportFlow.46feabff7f42cf7ddd8f.js
    "./src/redditGQL/operations/UpdateSubredditNotificationSettings.json": function(t) {
        t.exports = JSON.parse('{"id":"0af4f630a2e1"}')
    }

It should be possible to get these using some regex queries on the JS, but this may not be viable long-term.

@ArtisanByteCrafter
Copy link

Would the rss feeds of subreddits provide any benefit? Appending .rss to the url provides the rss link to a sub, like www.reddit.com/r/guitars.rss for example.

@luphoria
Copy link

luphoria commented Jun 10, 2023

@hogseedy

Example code using reddit app anonymous mode/native authentication to query GQL (acquired tokens also work on public apis, native app uses them a lot):

Can you explain this?
I'd like to start building a proxy that translates requests in the old API format to a GQL request and I want to get off the ground first.

@Opening-Button-8988
Copy link

Opening-Button-8988 commented Jun 10, 2023

I have recently migrated to using reddit via an RSS reader. Honestly a much more productive workflow. No ads, no bullshit. Although I love libreddit and will still use it, I think RSS is a viable alternative and probably better if you're reading reddit for text-oriented content rather than entertainment and media (one limitation is that media is fetched but videos don't get audio).

@ading2210
Copy link

I've managed to scrape what I believe to be the ID of every single GraphQL query that Reddit uses: https://gist.github.com/ading2210/665d4d882e584dd27030e7106d3fe561

There's 291 of them, which is more than what was extracted from the Android app previously.

@ading2210
Copy link

ading2210 commented Jun 12, 2023

FYI, you need TLS spoofing (via something like curl-impersonate) on all of your requests or else Reddit will severely throttle your connection.

@gergo-salyi
Copy link

FYI, you need TLS spoofing (via something like curl-impersonate) on all of your requests or else Reddit will severely throttle your connection.

In Rust for hyper/reqwest this means that worst case the setup needs manually built tls connector. They have API for it in the client builder, so it can be done. One may use Wireshark and check the hyper/reqwest client hellos against the browser's client hellos.

@gergo-salyi
Copy link

If anyone has access to a mobile proxy like Charles Proxy and wants to investigate further, please do - I want to find GraphQL queries for all of the above.

I have a setup with mitmproxy. Here is a headers-request-response example of the Android client querying "hot" posts form a subreddit (this is an android client so it's obviously logged in with a reddit account): https://gist.github.com/gergo-salyi/fb52f9c06ba1d598309189d00f778bb0

The GQL queries extracted from the Android client (posted here previously) are currently also working and can be used if one is being authenticated as a anonymous/not-logged-in Chrome browser user. (I mean with the bearer + session + loid header found in the inspector/F12 network log.)

So every GQL thing can be tried at the moment as such, without Android app auth things. Nevertheless if anyone needs something from the Android apps HTTP traffic log, tell me.

@sigaloid
Copy link
Member Author

FYI, you need TLS spoofing (via something like curl-impersonate) on all of your requests or else Reddit will severely throttle your connection.

Is it a pre-existing profile? I don't imagine the Android networking stack uses for example the Chrome Android TLS signature.

@ading2210
Copy link

ading2210 commented Jun 12, 2023

I just impersonated a desktop browser (Chrome 107) and it seems to work fine. If I don't, Reddit rate limits me to around 20 requests per minute.

@Tim455
Copy link

Tim455 commented Jun 15, 2023

I also think that for now just impersonating a desktop browser would be good enough, at least for the time being, and if Reddit starts looking at the API used and TLS fingerprints at the same time. That then impersonating the Android networking stack is investigated, this seems like a lot of manual work that might be unnecessary.
Also the curl-impersonate repo also includes a library that could possibly be used (libcurl-impersonate.so), I am not sure how this would fit in with the cargo build system that is currently used though.

@FireMasterK
Copy link
Contributor

FireMasterK commented Jun 30, 2023

We can configure the cipher suites in rustls with the with_cipher_suites function: defined here.

We can then pass a custom rustls ClientConfig to

let https = hyper_rustls::HttpsConnectorBuilder::new().with_native_roots().https_only().enable_http1().build();
with the with_tls_config function.

@argium
Copy link

argium commented Jul 2, 2023

What's the time frame for Reddit's API changes blocking libreddit? Do you need any assistance to prepare for the API changes?

@ghost
Copy link

ghost commented Jul 27, 2023

I just impersonated a desktop browser (Chrome 107) and it seems to work fine. If I don't, Reddit rate limits me to around 20 requests per minute.

@ading2210 ehhh can you tell me how did you do it?

@FireMasterK
Copy link
Contributor

@ading2210 ehhh can you tell me how did you do it?

#818 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests