You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is where it becomes puzzling: When I search for that thread in mydata$url, it's there, but it doesn't end with ".json?limit=500". That is being added somewhere in get_thread_content(). It's also only happening for that one link. When I remove just that one link from mydata before using get_thread_content(), then get_thread_content() works (i.e., it doesn't appear to be a rate limit issue). To make matters even more confusing, when I went to the thread (after removing the "/.json?limit=500") to manually check it out, nothing weird about it pops out. It's not a deleted post, a heavily downvoted post, a private post, from a private account, or anything like that. Just a normal reddit thread.
In the end I made a workaround, but I don't understand what the root cause of the issue was or how to prevent it.
The text was updated successfully, but these errors were encountered:
Thanks for reporting! A weird issue indeed. I noticed that the thread contains some special characters in it -- do the other URLs that you get from your query also contain special characters? FYI, the suffix (JSON + limit) is being appended here.
ivan-rivera
changed the title
Very puzzling issue with get_thread_content()
Very puzzling issue with get_thread_content -- 503 Service Unavailable
Aug 31, 2021
I used the following query, which returned a dataframe with 237 threads:
mydata = RedditExtractor::find_thread_urls(
keywords = "psychology",
subreddit = "askphilosophy",
sort_by = "relevance",
period = "all")
I then used those threads in get_thread_content(mydata$url), but received the following error repeatedly:
Error in value[3L] :
Cannot read from Reddit, check your inputs or internet connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://www.reddit.com/r/askphilosophy/comments/o4rbkx/engels_on_the_a_priori_in_antidühring/.json?limit=500': HTTP status was '503 Service Unavailable'
Here is where it becomes puzzling: When I search for that thread in mydata$url, it's there, but it doesn't end with ".json?limit=500". That is being added somewhere in get_thread_content(). It's also only happening for that one link. When I remove just that one link from mydata before using get_thread_content(), then get_thread_content() works (i.e., it doesn't appear to be a rate limit issue). To make matters even more confusing, when I went to the thread (after removing the "/.json?limit=500") to manually check it out, nothing weird about it pops out. It's not a deleted post, a heavily downvoted post, a private post, from a private account, or anything like that. Just a normal reddit thread.
In the end I made a workaround, but I don't understand what the root cause of the issue was or how to prevent it.
The text was updated successfully, but these errors were encountered: