Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle fails on BeautifulSoup's navigableString instances #45222

Closed
taleinat opened this issue Jul 19, 2007 · 4 comments
Closed

Pickle fails on BeautifulSoup's navigableString instances #45222

taleinat opened this issue Jul 19, 2007 · 4 comments
Labels
stdlib Python modules in the Lib dir

Comments

@taleinat
Copy link
Contributor

BPO 1757062
Nosy @birkenfeld, @taleinat
Files
  • bug-175062.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-05-13.19:25:07.172>
    created_at = <Date 2007-07-19.18:23:56.000>
    labels = ['library']
    title = "Pickle fails on BeautifulSoup's navigableString instances"
    updated_at = <Date 2008-05-13.19:25:07.170>
    user = 'https://github.com/taleinat'

    bugs.python.org fields:

    activity = <Date 2008-05-13.19:25:07.170>
    actor = 'georg.brandl'
    assignee = 'nnorwitz'
    closed = True
    closed_date = <Date 2008-05-13.19:25:07.172>
    closer = 'georg.brandl'
    components = ['Library (Lib)']
    creation = <Date 2007-07-19.18:23:56.000>
    creator = 'taleinat'
    dependencies = []
    files = ['8306']
    hgrepos = []
    issue_num = 1757062
    keywords = []
    message_count = 4.0
    messages = ['32531', '55250', '55252', '66796']
    nosy_count = 3.0
    nosy_names = ['georg.brandl', 'taleinat', 'altherac']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1757062'
    versions = []

    @taleinat
    Copy link
    Contributor Author

    Trying to pickle an instance of BeautifulSoup's NavigableString class, this is the result:
    "RuntimeError: maximum recursion depth exceeded"

    Diagnosis: The problem arises when trying to pickle such instances - pickle enters an endless loop and reaches the max recursion limit (eventually). This happens regardless of the protocol used.

    Possibly related to SF bug bpo-1581183: "pickle protocol 2 failure on int subclass"
    http://sourceforge.net/tracker/index.php?funchttp://sourceforge.net/tracker/index.php?func=detail&aid=1581183&group_id=5470&atid=105470=detail&aid=1512695&group_id=5470&atid=105470

    See http://mail.python.org/pipermail/idle-dev/2007-July/002600.html (originally a bug report for IDLE on the IDLE-dev list) for details (including how to recreate the error).

    Related IDLE bug report: bpo-1757057
    https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1757057&group_id=5470

    @taleinat taleinat added stdlib Python modules in the Lib dir labels Jul 19, 2007
    @altherac
    Copy link
    Mannequin

    altherac mannequin commented Aug 24, 2007

    I started by isolating the most minimalist code that triggers the error.
    If you play a bit with NavigableString, you will end up with the
    attached code.

    As expected, this program fails with RuntimeError: maximum recursion
    depth exceeded
    The evil recursion proceeds as follows :

    > File "C:\Python25\lib\pickle.py", line 1364, in dump
    > Pickler(file, protocol).dump(obj)

    Initial call to dump(), as intended.

    > File "C:\Python25\lib\pickle.py", line 224, in dump
    > self.save(obj)

    save() calls obj.__reduce_ex(), obj being our EvilString instance.

    This function is defined in copyreg.py, line 58 and following my
    example, returns a tuple containing three elements:

    1. the _reconstructor function, as defined in copyreg.py, line 46
    2. a tuple : (<class '__main__.EvilString'>, <type 'unicode'>,
      <'__main__.EvilString' instance at 0xXXXXXXXX>)
      First element is the actual class of obj, second is the base class,
      and third is the current instance (known as state).
    3. an empty dict {}

    > File "C:\Python25\lib\pickle.py", line 331, in save
    > self.save_reduce(obj=obj, *rv)

    save_reduce() calls self.save() twice:

    • first on the func argument, which is the _reconstructor function. This
      call works as intended
    • next on the tuple (<class '__main__.EvilString'>, <type 'unicode'>,
      <'__main__.EvilString' instance at 0xXXXXXXXX>)

    > File "C:\Python25\lib\pickle.py", line 403, in save_reduce
    > save(args)
    > File "C:\Python25\lib\pickle.py", line 286, in save
    > f(self, obj) # Call unbound method with explicit self

    save() finds out its argument is a Tuple, and calls save_tuple()
    appropriately

    > File "C:\Python25\lib\pickle.py", line 564, in save_tuple
    > save(element)

    ... and save_tuple() calls save() on each element of the tuple.
    See what's wrong ?
    This means calling save() again on the EvilString instance. Which, in
    turn, will call save_reduce() on it, and so on.

    The problem lies in _reduce_ex(), in the definition of the state of the
    object:

    copyreg.py, lines 65 to 70:
    if base is object:
    state = None
    else:
    if base is self.__class__:
    raise TypeError, "can't pickle %s objects" % base.__name__
    state = base(self)

    When this code gets executed on an EvilString instance, base is the type
    'unicode'.
    Since it's not an object, and since it's not the actual class EvilString
    either, the following line gets executed:
    state=base(self)

    Which corresponds to unicode(self), or self.__unicode__, which returns
    an EvilString instance, not a variable of type unicode.
    And there starts the recursion.

    I don't know if this is flaw in the design of _reduce_ex, or a flaw
    inherent to having __unicode__(self) returning self.
    My guess is the latter is right.

    @birkenfeld
    Copy link
    Member

    This is indeed tricky. The docs say __unicode__ "should return a Unicode
    object", so I'm inclined to blame BeautifulSoup.

    Asking Neal for a second opinion.

    @birkenfeld
    Copy link
    Member

    Closing as "won't fix".

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants