Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very puzzling issue with get_thread_content -- 503 Service Unavailable #17

Closed
PsychAnon opened this issue Aug 29, 2021 · 3 comments · Fixed by #18
Closed

Very puzzling issue with get_thread_content -- 503 Service Unavailable #17

PsychAnon opened this issue Aug 29, 2021 · 3 comments · Fixed by #18
Labels

Comments

@PsychAnon
Copy link

PsychAnon commented Aug 29, 2021

I used the following query, which returned a dataframe with 237 threads:

mydata = RedditExtractor::find_thread_urls(
keywords = "psychology",
subreddit = "askphilosophy",
sort_by = "relevance",
period = "all")

I then used those threads in get_thread_content(mydata$url), but received the following error repeatedly:

Error in value[3L] :
Cannot read from Reddit, check your inputs or internet connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://www.reddit.com/r/askphilosophy/comments/o4rbkx/engels_on_the_a_priori_in_antidühring/.json?limit=500': HTTP status was '503 Service Unavailable'

Here is where it becomes puzzling: When I search for that thread in mydata$url, it's there, but it doesn't end with ".json?limit=500". That is being added somewhere in get_thread_content(). It's also only happening for that one link. When I remove just that one link from mydata before using get_thread_content(), then get_thread_content() works (i.e., it doesn't appear to be a rate limit issue). To make matters even more confusing, when I went to the thread (after removing the "/.json?limit=500") to manually check it out, nothing weird about it pops out. It's not a deleted post, a heavily downvoted post, a private post, from a private account, or anything like that. Just a normal reddit thread.

In the end I made a workaround, but I don't understand what the root cause of the issue was or how to prevent it.

@ivan-rivera
Copy link
Owner

Thanks for reporting! A weird issue indeed. I noticed that the thread contains some special characters in it -- do the other URLs that you get from your query also contain special characters? FYI, the suffix (JSON + limit) is being appended here.

I'll look into this issue in the next few days.

@ivan-rivera ivan-rivera changed the title Very puzzling issue with get_thread_content() Very puzzling issue with get_thread_content -- 503 Service Unavailable Aug 31, 2021
@ivan-rivera
Copy link
Owner

FYI, the issues is caused by special characters, I'll try to fix it

@ivan-rivera
Copy link
Owner

The issue is fixed but it won't appear on CRAN until at least next week (from 31/08/21)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants