Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError exception when a request comes back with status 403. #1336

Closed
Majiick opened this issue Feb 2, 2020 · 21 comments
Closed

AttributeError exception when a request comes back with status 403. #1336

Majiick opened this issue Feb 2, 2020 · 21 comments
Labels
Bug Documentation Verified

Comments

@Majiick
Copy link

@Majiick Majiick commented Feb 2, 2020

Describe the bug
praw throws an AttributeError exception when a request comes back with status 403.

To Reproduce
Get a 403 response from a request that praw makes.

Expected behavior
praw raises correctly formatted ResponseException(just like it does with 404 responses) instead of AttributeError.

Code/Logs

Traceback (most recent call last):
  File "C:\Users\*\AppData\Local\Programs\Python\Python36-32\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\*\AppData\Local\Programs\Python\Python36-32\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\*\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 463, in _handle_results
    task = get()
  File "C:\Users\*\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "C:\Users\*\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\exceptions.py", line 49, in __init__
    .format(response.status_code))
AttributeError: 'str' object has no attribute 'status_code'

If I print response before the exception it is a string with the value of "received 403 HTTP response".

System Info
I am running Praw with multiprocessing. Each child process initializes its own praw.Reddit instance.

  • OS: Windows 10 1903
  • Python: 3.6
  • PRAW Version: 6.5.1
@bboe
Copy link
Member

@bboe bboe commented Feb 2, 2020

Hi I see that threading is your stacktrace. Are you doing something with ThreadPool?

Additionally, this stacktrace doesn't give us any information about where it actually failed in your code. Can you reproduce without threading and include information about what the actual request was that resulted in the 403 body?

@PythonCoderAS PythonCoderAS added the Bug label Feb 2, 2020
@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 2, 2020

Please include the code that led to the error. If it's a threading call, include the function that's called by the thread.

@Majiick
Copy link
Author

@Majiick Majiick commented Feb 3, 2020

Hey @bboe, I am not sure how to reproduce without threading(multiprocessing) since I am pretty sure the 403 comes from spamming the API too much.

@PythonCoderAS

def get_all_comments(submission):
    submission.comments.replace_more(limit=None)
    ret = []
    for comment in submission.comments.list():
        ret.append(comment)
    return ret

def process_submission(submission_id: str):
    reddit = praw.Reddit(client_id='placeholder',
                        client_secret='secret',
                        user_agent='my user agent')

    submission = reddit.submission(id=str(submission_id))
    comments = get_all_comments(submission)
    write_submission_and_comments_to_file(submission, comments)


pool = multiprocessing.Pool(processes=55)
while True:
    ids = getIds()
    for id in work_ids:
            responses.append(pool.apply_async(process_submission, (base36.dumps(id), )))
    for r in responses:
            status = r.get()

This is not the actual code, I chopped it down to the basics and so it's more digestible.

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 3, 2020

First of all you can just return submission.comments.list() directly, no need to loop over it again.

Second, can you add this code to the top of your code?

import logging
logging.basicConfig(level=logging.DEBUG)

@bboe
Copy link
Member

@bboe bboe commented Feb 3, 2020

I am not sure how to reproduce without threading(multiprocessing)

Unfortunately PRAW doesn't attempt to be threadsafe, so we're going to close this issue. If you can reproduce without using threads then please reply and we'll reopen it.

Thanks for reporting.

@bboe bboe closed this as completed Feb 3, 2020
@Majiick
Copy link
Author

@Majiick Majiick commented Feb 3, 2020

@bboe Alright thank you, I thought that using multiprocessing instead of the thread module would solve that issue.

@bboe
Copy link
Member

@bboe bboe commented Feb 3, 2020

I thought that using multiprocessing instead of the thread module would solve that issue.

Are you explicitly using a threadpool? If so, then that's problematic. If you're using multiprocessing with processes and not threads (confusing, I know), then things should work.

@Majiick
Copy link
Author

@Majiick Majiick commented Feb 3, 2020

I don't use threading anywhere to my knowledge. I use the multiprocessing.Pool with apply_async just like in the code snippet in one of the previous comments. It does say in the documentation that the Pool is a pool of processes, not threads.

https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool

if I print os.getppid() inside the function which creates a new PRAW instance (def process_submission) the processes are all different. So it is using different processes.

I don't know the internals of multiprocessing or how it forks the process on Windows, so there's probably some shared state somehow?

@bboe bboe reopened this Feb 3, 2020
@bboe
Copy link
Member

@bboe bboe commented Feb 3, 2020

Oh okay. multiprocessing must internally be using threads to keep track of something. So long as the PRAW code isn't being run across threads, things should be okay. It'd still be ideal to reproduce this code without multiple processes. Or at least can you narrow your code down significantly to the smallest reproducible chunk?

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 4, 2020

@Majiick https://praw.readthedocs.io/en/latest/getting_started/logging.html

import logging

handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
logger = logging.getLogger('prawcore')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

This will let us see what API call caused the error. Place this at top and re-run your file.

@Majiick
Copy link
Author

@Majiick Majiick commented Feb 4, 2020

Hi @bboe and @PythonCoderAS,

The issue seems to be that pickling custom exceptions in Python is broken. Such is the case in prawcore/exceptions.py. You can see the explanation in https://stackoverflow.com/questions/27993567/custom-exceptions-are-not-raised-properly-when-used-in-multiprocessing-pool

Problem in the code:

class ResponseException(PrawcoreException):
    """Indicate that there was an error with the completed HTTP request."""

    def __init__(self, response):
        """Initialize a ResponseException instance.
        :param response: A requests.response instance.
        """
        self.response = response
        # __init__ receives a string, and later the pickling calls ResponseException(str) which is why we get AttributeError: 'str' object has no attribute 'status_code'
        super(ResponseException, self).__init__(
            "received {} HTTP response".format(response.status_code)
        )

The fix is to do this instead:

super(ResponseException, self).__init__(response)
        self.response = response
        self.e_string = "Message: {}".format('received {} HTTP response'.format(response.status_code))

Code to reproduce:

import multiprocessing
import logging
import base36
import praw
import prawcore

handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
logger = logging.getLogger('prawcore')
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

def process_submission(submission_id: str):  # blob_queue
    reddit = praw.Reddit(client_id='replace',
                        client_secret='replace',
                        user_agent='my user agent')

    try:
        submission = reddit.submission(id=str(submission_id))
        submission.comments.replace_more(limit=None)
        comments = submission.comments.list()
    except prawcore.exceptions.NotFound:
        return '404' 

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=55)
    while True:
        try:
            pool.apply_async(process_submission, ('lghtj', ))
        except prawcore.exceptions.Forbidden as e:
            print(e)

Edit: I realize I just leaked my Reddit dev keys in this comment, I revoked access to them in my Reddit account.

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 5, 2020

@Majiick i read your code and it seems that basically I have to set request to a value of None. However, I think you can change your code to use a ThreadPool, since most of the code's time is spent on network. Verify that a ThreadPool does not not cause the error.

@PythonCoderAS PythonCoderAS added the Verified label Feb 5, 2020
@Majiick
Copy link
Author

@Majiick Majiick commented Feb 5, 2020

@Majiick i read your code and it seems that basically I have to set request to a value of None. However, I think you can change your code to use a ThreadPool, since most of the code's time is spent on network. Verify that a ThreadPool does not not cause the error.

Using a ThreadPool would cause other issues since Praw is not thread-safe, is that not correct?

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 6, 2020

@Majiick it works perfectly fine as long as each thread makes their own Reddit.

@Majiick
Copy link
Author

@Majiick Majiick commented Feb 6, 2020

@Majiick it works perfectly fine as long as each thread makes their own Reddit.

The documentation says to stick with multiple processing instead of threads, but I will try with threads.

https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html

In theory having a unique Reddit instance for each thread should work. However, until someone perpetually volunteers to be PRAW’s thread safety instructor, little to no support will go toward any PRAW issues that could be affected by the use of multiple threads. Consider using multiple processes instead.

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 6, 2020

The problem with multiple threads is the access of shared objects. In theory, as long as no objects are shared, or shared as strings, there shouldn’t be any problem.

@Majiick
Copy link
Author

@Majiick Majiick commented Feb 10, 2020

Hi @PythonCoderAS,

I've ran it for a while with multiple threads and it seems to be doing fine (and takes up much less memory!).

Anyway, the workaround is to use threads but the multiprocessing is still broken.

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 10, 2020

The best way to resolve this is to include a note stating that Reddit instances should never be shared and all other instances should be converted to strings before sharing. My export PR can come in handy for this.

@PythonCoderAS PythonCoderAS added the Documentation label Feb 14, 2020
@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 23, 2020

@Majiick upon further testing it's only found in multiprocessing Pools. multiprocessing.Process works normally.

Code

import dataclasses
import multiprocessing
import threading
import time

class MyError(Exception):
    pass

@dataclasses.dataclass
class MyItem:
    prop: object

class MyArgException(MyError):
    def __init__(self, arg):
        print("Arg Type: {}, Arg: {}".format(type(arg), arg))
        print("*"*10)
        time.sleep(1)

        super().__init__("Arg: {}".format(arg.prop))
        self.arg = arg

def main(*args):
    raise MyArgException(MyItem(5))

proc = multiprocessing.Process(target=main, args=(5,))
proc.start()
# Process, runs normally
time.sleep(5)
print("#"*25)
pool = multiprocessing.Pool(5)
thread = threading.Thread(target=pool.map, args=(main, [5]))
thread.start()
# Pool, fails
time.sleep(5)
print("#"*25)
# Main thread
main(5)

Output

Arg Type: <class '__main__.MyItem'>, Arg: MyItem(prop=5)
**********
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "~/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py", line 23, in main
    raise MyArgException(MyItem(5))
MyArgException: Arg: 5
#########################
Arg Type: <class '__main__.MyItem'>, Arg: MyItem(prop=5)
**********
Arg Type: <class 'str'>, Arg: Arg: 5
**********
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
    task = get()
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "~/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py", line 19, in __init__
    super().__init__("Arg: {}".format(arg.prop))
AttributeError: 'str' object has no attribute 'prop'

#########################
Arg Type: <class '__main__.MyItem'>, Arg: MyItem(prop=5)
**********
Traceback (most recent call last):
  File "~/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py", line 34, in <module>
    main(5)
  File "~/Library/Preferences/PyCharm2019.3/scratches/scratch_3.py", line 23, in main
    raise MyArgException(MyItem(5))
__main__.MyArgException: Arg: 5

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 23, 2020

Me and @jarhill0 have identified a solution to the error, but it is very cumbersome to implement. Basically, this code:

class ResponseException(PrawcoreException):
    """Indicate that there was an error with the completed HTTP request."""

    def __init__(self, response):
        """Initialize a ResponseException instance.
        :param response: A requests.response instance.
        """
        self.response = response
        super(ResponseException, self).__init__(
            "received {} HTTP response".format(response.status_code)
        )

becomes

class ResponseException(PrawcoreException):
    """Indicate that there was an error with the completed HTTP request."""

    def __init__(self, response):
        """Initialize a ResponseException instance.
        :param response: A requests.response instance.
        """
        self.response = response
        if isinstance(response, str):
            super(ResponseException, self).__init__(response)
        else:
            super(ResponseException, self).__init__(
                "received {} HTTP response".format(response.status_code)
            )

However, this is hard to implement, and as it is an edge case (Multiprocessing Pool) with a very easy solution, (use Thread Pool), this will not be fixed. Documentation will be changed to reflect this.

@PythonCoderAS
Copy link
Contributor

@PythonCoderAS PythonCoderAS commented Feb 23, 2020

Note: If anyone ever wants to attempt to fix this in the future, as of 23-02-2020, you will need to replace these exceptions:

  • prawcore.exceptions.RequestException,
  • prawcore.exceptions.ResponseException,
  • prawcore.exceptions.OAuthException,
  • praw.exceptions.APIException,
  • prawcore.exceptions.BadJSON,
  • prawcore.exceptions.BadRequest,
  • prawcore.exceptions.Conflict,
  • prawcore.exceptions.Forbidden,
  • prawcore.exceptions.InsufficientScope,
  • prawcore.exceptions.InvalidToken,
  • prawcore.exceptions.NotFound,
  • prawcore.exceptions.Redirect,
  • prawcore.exceptions.ServerError,
  • prawcore.exceptions.SpecialError,
  • prawcore.exceptions.TooLarge,
  • prawcore.exceptions.UnavailableForLegalReasons,
  • praw.exceptions.WebSocketException

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Documentation Verified
Projects
None yet
Development

No branches or pull requests

3 participants