New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError exception when a request comes back with status 403. #1336
Comments
Hi I see that Additionally, this stacktrace doesn't give us any information about where it actually failed in your code. Can you reproduce without threading and include information about what the actual request was that resulted in the 403 body? |
Please include the code that led to the error. If it's a threading call, include the function that's called by the thread. |
Hey @bboe, I am not sure how to reproduce without threading(multiprocessing) since I am pretty sure the 403 comes from spamming the API too much.
This is not the actual code, I chopped it down to the basics and so it's more digestible. |
First of all you can just return Second, can you add this code to the top of your code? import logging
logging.basicConfig(level=logging.DEBUG) |
Unfortunately PRAW doesn't attempt to be threadsafe, so we're going to close this issue. If you can reproduce without using threads then please reply and we'll reopen it. Thanks for reporting. |
@bboe Alright thank you, I thought that using multiprocessing instead of the thread module would solve that issue. |
Are you explicitly using a threadpool? If so, then that's problematic. If you're using multiprocessing with processes and not threads (confusing, I know), then things should work. |
I don't use threading anywhere to my knowledge. I use the https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool if I print os.getppid() inside the function which creates a new PRAW instance (def process_submission) the processes are all different. So it is using different processes. I don't know the internals of multiprocessing or how it forks the process on Windows, so there's probably some shared state somehow? |
Oh okay. multiprocessing must internally be using threads to keep track of something. So long as the PRAW code isn't being run across threads, things should be okay. It'd still be ideal to reproduce this code without multiple processes. Or at least can you narrow your code down significantly to the smallest reproducible chunk? |
@Majiick https://praw.readthedocs.io/en/latest/getting_started/logging.html import logging
handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
logger = logging.getLogger('prawcore')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler) This will let us see what API call caused the error. Place this at top and re-run your file. |
Hi @bboe and @PythonCoderAS, The issue seems to be that pickling custom exceptions in Python is broken. Such is the case in prawcore/exceptions.py. You can see the explanation in https://stackoverflow.com/questions/27993567/custom-exceptions-are-not-raised-properly-when-used-in-multiprocessing-pool Problem in the code:
The fix is to do this instead:
Code to reproduce:
Edit: I realize I just leaked my Reddit dev keys in this comment, I revoked access to them in my Reddit account. |
@Majiick i read your code and it seems that basically I have to set request to a value of None. However, I think you can change your code to use a |
Using a ThreadPool would cause other issues since Praw is not thread-safe, is that not correct? |
@Majiick it works perfectly fine as long as each thread makes their own Reddit. |
The documentation says to stick with multiple processing instead of threads, but I will try with threads. https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html
|
The problem with multiple threads is the access of shared objects. In theory, as long as no objects are shared, or shared as strings, there shouldn’t be any problem. |
Hi @PythonCoderAS, I've ran it for a while with multiple threads and it seems to be doing fine (and takes up much less memory!). Anyway, the workaround is to use threads but the multiprocessing is still broken. |
The best way to resolve this is to include a note stating that Reddit instances should never be shared and all other instances should be converted to strings before sharing. My export PR can come in handy for this. |
@Majiick upon further testing it's only found in multiprocessing Pools. Codeimport dataclasses
import multiprocessing
import threading
import time
class MyError(Exception):
pass
@dataclasses.dataclass
class MyItem:
prop: object
class MyArgException(MyError):
def __init__(self, arg):
print("Arg Type: {}, Arg: {}".format(type(arg), arg))
print("*"*10)
time.sleep(1)
super().__init__("Arg: {}".format(arg.prop))
self.arg = arg
def main(*args):
raise MyArgException(MyItem(5))
proc = multiprocessing.Process(target=main, args=(5,))
proc.start()
# Process, runs normally
time.sleep(5)
print("#"*25)
pool = multiprocessing.Pool(5)
thread = threading.Thread(target=pool.map, args=(main, [5]))
thread.start()
# Pool, fails
time.sleep(5)
print("#"*25)
# Main thread
main(5) Output
|
Me and @jarhill0 have identified a solution to the error, but it is very cumbersome to implement. Basically, this code: class ResponseException(PrawcoreException):
"""Indicate that there was an error with the completed HTTP request."""
def __init__(self, response):
"""Initialize a ResponseException instance.
:param response: A requests.response instance.
"""
self.response = response
super(ResponseException, self).__init__(
"received {} HTTP response".format(response.status_code)
) becomes class ResponseException(PrawcoreException):
"""Indicate that there was an error with the completed HTTP request."""
def __init__(self, response):
"""Initialize a ResponseException instance.
:param response: A requests.response instance.
"""
self.response = response
if isinstance(response, str):
super(ResponseException, self).__init__(response)
else:
super(ResponseException, self).__init__(
"received {} HTTP response".format(response.status_code)
) However, this is hard to implement, and as it is an edge case (Multiprocessing Pool) with a very easy solution, (use Thread Pool), this will not be fixed. Documentation will be changed to reflect this. |
Note: If anyone ever wants to attempt to fix this in the future, as of 23-02-2020, you will need to replace these exceptions:
|
Describe the bug
praw throws an AttributeError exception when a request comes back with status 403.
To Reproduce
Get a 403 response from a request that praw makes.
Expected behavior
praw raises correctly formatted ResponseException(just like it does with 404 responses) instead of AttributeError.
Code/Logs
If I print response before the exception it is a string with the value of "received 403 HTTP response".
System Info
I am running Praw with multiprocessing. Each child process initializes its own praw.Reddit instance.
The text was updated successfully, but these errors were encountered: