Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing Pool starmap - struct.error: 'i' format requires -2e10<=n<=2e10 #72692

Closed
JustinTing mannequin opened this issue Oct 22, 2016 · 6 comments
Closed

Multiprocessing Pool starmap - struct.error: 'i' format requires -2e10<=n<=2e10 #72692

JustinTing mannequin opened this issue Oct 22, 2016 · 6 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@JustinTing
Copy link
Mannequin

JustinTing mannequin commented Oct 22, 2016

BPO 28506
Nosy @tim-one, @serhiy-storchaka
Superseder
  • bpo-17560: problem using multiprocessing with really big objects?
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-10-23.11:17:07.794>
    created_at = <Date 2016-10-22.16:30:52.596>
    labels = ['type-bug', 'library']
    title = "Multiprocessing Pool starmap - struct.error: 'i' format requires -2e10<=n<=2e10"
    updated_at = <Date 2017-10-23.11:17:07.793>
    user = 'https://bugs.python.org/JustinTing'

    bugs.python.org fields:

    activity = <Date 2017-10-23.11:17:07.793>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-10-23.11:17:07.794>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2016-10-22.16:30:52.596>
    creator = 'Justin Ting'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 28506
    keywords = []
    message_count = 6.0
    messages = ['279200', '279201', '279202', '279203', '279233', '304793']
    nosy_count = 3.0
    nosy_names = ['tim.peters', 'serhiy.storchaka', 'Justin Ting']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '17560'
    type = 'behavior'
    url = 'https://bugs.python.org/issue28506'
    versions = ['Python 3.5']

    @JustinTing
    Copy link
    Mannequin Author

    JustinTing mannequin commented Oct 22, 2016

    Multiprocessing is throwing this error when dealing with large amounts of data (all floating points an integers), but none of which exceeds the number boundaries in the error that it throws:

    File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 268, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
    File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
    File "/root/anaconda3/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
    put(task)
    File "/root/anaconda3/lib/python3.5/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
    File "/root/anaconda3/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
    struct.error: 'i' format requires -2147483648 <= number <= 2147483647

    /root/anaconda3/lib/python3.5/multiprocessing/connection.py(393)_send_bytes()
    -> header = struct.pack("!i", n)

    It works fine on any number of subsets of this data, but not when put together.

    @JustinTing JustinTing mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 22, 2016
    @tim-one
    Copy link
    Member

    tim-one commented Oct 22, 2016

    This has nothing to do with the _values_ you're passing - it has to do with the length of the pickle string:

        def _send_bytes(self, buf):
            n = len(buf)
            # For wire compatibility with 3.2 and lower
            header = struct.pack("!i", n)  IT'S BLOWING UP HERE
            if n > 16384:
                ...
                self._send(header)
                self._send(buf)

    where the traceback shows it's called here:

        self._send_bytes(ForkingPickler.dumps(obj))

    Of course the less data you're passing, the smaller the pickle, and that's why it doesn't blow up if you pass subsets of the data.

    I'd suggest rethinking how you're sharing data, as pushing two-gigabyte pickle strings around is bound to be the least efficient way possible even if it didn't blow up ;-)

    @serhiy-storchaka
    Copy link
    Member

    This looks as a duplicate of bpo-17560.

    @JustinTing
    Copy link
    Mannequin Author

    JustinTing mannequin commented Oct 22, 2016

    Ah, should have picked that up, coding at 3:30am doesn't do wonders for
    keeping a clear head.

    Thanks Tim, I'll keep that in mind!

    *Justin Ting*
    *E* justingling@gmail.com | *M* +61 424 751 665 | *L*
    *https://au.linkedin.com/in/justinyting
    <https://au.linkedin.com/in/justinyting\>* | *G *https://github.com/jyting

    On Sun, Oct 23, 2016 at 3:48 AM, Tim Peters <report@bugs.python.org> wrote:

    Tim Peters added the comment:

    This has nothing to do with the _values_ you're passing - it has to do
    with the length of the pickle string:

    def \_send_bytes(self, buf):
        n = len(buf)
        # For wire compatibility with 3.2 and lower
        header = struct.pack("!i", n)  IT'S BLOWING UP HERE
        if n \> 16384:
            ...
            self.\_send(header)
            self.\_send(buf)
    

    where the traceback shows it's called here:

    self.\_send_bytes(ForkingPickler.dumps(obj))
    

    Of course the less data you're passing, the smaller the pickle, and that's
    why it doesn't blow up if you pass subsets of the data.

    I'd suggest rethinking how you're sharing data, as pushing two-gigabyte
    pickle strings around is bound to be the least efficient way possible even
    if it didn't blow up ;-)

    ----------
    nosy: +tim.peters


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue28506\>


    @JustinTing
    Copy link
    Mannequin Author

    JustinTing mannequin commented Oct 23, 2016

    Actually, on further inspection, I seem to be having a slightly different problem with the same error that I initially described now.

    Even after modifying my code so that each python forked off to another process was only given the following arguments:
    args = [(None, models_shape, False, None, [start, end], 'data/qp_red_features.npy') for start, end in jobs]

    where models_shape, start, and end are only single integers, the same error still comes up as a result. Within each process, I'm reading in a (relatively small, only 12MB) .npy ndarray and taking the [start:end] slice.

    @serhiy-storchaka
    Copy link
    Member

    Closed as a duplicate of bpo-17560.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants