-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate alternate private API's that new Reddit uses #818
Comments
Okay, I've done some initial work looking at the web client and getting a few requests down.
I'd like to properly RE the GraphQL - the web client regularly uses an SVC Shreddit HTML rendering endpoint especially for post comments and additional replies, and I don't want to parse HTML as I don't think it's a long term viable strategy anyway. If anyone has access to a mobile proxy like Charles Proxy and wants to investigate further, please do - I want to find GraphQL queries for all of the above. You can view my current workspace: Click the box icon to import, click more, then import from Gist. Paste my gist: https://gist.github.com/sigaloid/e48750287236f42c3bc3aba2130ac675 |
Main Reddit GQL schema as used by the Android app This dump doesn't have the object descriptions, because they're scattered all around the apk, but it's possible to extract all of them algorithmically. I'll work on it. Example code using reddit app anonymous mode/native authentication to query GQL (acquired tokens also work on public apis, native app uses them a lot): Hope it helps! |
Absolutely amazing. Thank you so much! This will help enormously. I'm going to see what I can do with this. |
Note we can also brute force a (partial) schema using something like https://github.com/nikitastupin/clairvoyance. I've used it before. But for some reason it won't work with gql.reddit.com?. You can see there used to be graphiql cached on gql.reddit.com but now it's a 403. |
Yeah, I got the same error when trying to use clairvoyance. I don't know if it would work anyhow, as all of the query ID's seem randomly generated and a dictionary attack won't really work. (I'm not wholly familiar with GraphQL, apologies if that's wrong) There's also |
Queries aren't randomly generated, as per https://gist.github.com/hogseedy/a83a632aedbc813c4dbc0353422efd5f. See for example The dictionary attack works because the GQL server (nicely named Apollo 🤪) leaks names based on approximations, as well as parameters, which allows clairvoyance to construct a schema json. |
Ah, I thought the query that clairvoyance would guess was the 12 letter ID. Hopefully that can help us find any other queries not in the Android client! |
I did just look and I'm not sure where that ID comes from. |
Reading this as a non-developer and I'm completely lost but I just wanted to say thank you, you're doing gods work. |
I saw this today too, might help? |
That’s really interesting, I wonder how they scrape the data. I’m not sure that we’d be able to use it, a goal is to keep it first-party between just the client and Reddit for privacy reasons. but regardless it looks like a very cool project and will be useful to a lot of people who don’t have the ability to convert their API code over to another method. |
Seems like they might have been forcued to shut down, as the user who posted it on reddit (developer of reddiw) was banned |
@snvoid Holy crap. I was looking forward to seeing where that was going. |
On the web client at least, they seem to be hardcoded into the JS. Presumably this is automatically generated by whatever build system they use. For example: //https://www.redditstatic.com/desktop2x/Governance~ModListing~Reddit~ReportFlow.46feabff7f42cf7ddd8f.js
"./src/redditGQL/operations/UpdateSubredditNotificationSettings.json": function(t) {
t.exports = JSON.parse('{"id":"0af4f630a2e1"}')
} It should be possible to get these using some regex queries on the JS, but this may not be viable long-term. |
Would the rss feeds of subreddits provide any benefit? Appending .rss to the url provides the rss link to a sub, like www.reddit.com/r/guitars.rss for example. |
Can you explain this? |
I have recently migrated to using reddit via an RSS reader. Honestly a much more productive workflow. No ads, no bullshit. Although I love libreddit and will still use it, I think RSS is a viable alternative and probably better if you're reading reddit for text-oriented content rather than entertainment and media (one limitation is that media is fetched but videos don't get audio). |
I've managed to scrape what I believe to be the ID of every single GraphQL query that Reddit uses: https://gist.github.com/ading2210/665d4d882e584dd27030e7106d3fe561 There's 291 of them, which is more than what was extracted from the Android app previously. |
FYI, you need TLS spoofing (via something like curl-impersonate) on all of your requests or else Reddit will severely throttle your connection. |
In Rust for hyper/reqwest this means that worst case the setup needs manually built tls connector. They have API for it in the client builder, so it can be done. One may use Wireshark and check the hyper/reqwest client hellos against the browser's client hellos. |
I have a setup with mitmproxy. Here is a headers-request-response example of the Android client querying "hot" posts form a subreddit (this is an android client so it's obviously logged in with a reddit account): https://gist.github.com/gergo-salyi/fb52f9c06ba1d598309189d00f778bb0 The GQL queries extracted from the Android client (posted here previously) are currently also working and can be used if one is being authenticated as a anonymous/not-logged-in Chrome browser user. (I mean with the bearer + session + loid header found in the inspector/F12 network log.) So every GQL thing can be tried at the moment as such, without Android app auth things. Nevertheless if anyone needs something from the Android apps HTTP traffic log, tell me. |
Is it a pre-existing profile? I don't imagine the Android networking stack uses for example the Chrome Android TLS signature. |
I just impersonated a desktop browser (Chrome 107) and it seems to work fine. If I don't, Reddit rate limits me to around 20 requests per minute. |
I also think that for now just impersonating a desktop browser would be good enough, at least for the time being, and if Reddit starts looking at the API used and TLS fingerprints at the same time. That then impersonating the Android networking stack is investigated, this seems like a lot of manual work that might be unnecessary. |
We can configure the cipher suites in rustls with the We can then pass a custom rustls Line 18 in ea69668
|
What's the time frame for Reddit's API changes blocking libreddit? Do you need any assistance to prepare for the API changes? |
@ading2210 ehhh can you tell me how did you do it? |
|
Related to #785
We need to be prepared in a month for the Reddit API changes. We can begin by looking into private API's that the web app uses. (We don't need to resort to webscraping yet)
The text was updated successfully, but these errors were encountered: