Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34410: itertools.tee not thread-safe; can segfault #9254

Closed
wants to merge 2 commits into from

Conversation

hongweipeng
Copy link
Contributor

@hongweipeng hongweipeng commented Sep 13, 2018

This is a resolution without lock. I think it's not right in the demo 3.py , because it will use the original iterable in imap_unordered -> _guarded_task_generation.

imap_unordered_demo.py

import itertools
import multiprocessing

def f(x):
    return x

def g(x):
    return x

def main():
    pool = multiprocessing.Pool()

    #i = pool.imap_unordered(f, list(range(10)))
    i = list(range(10))

    i, i2 = itertools.tee(i)

    results = pool.imap_unordered(g, i)
    i2 = pool.imap_unordered(g, i2)

    print(list(i2))

    for x in results:
        print(x)

if __name__ == '__main__':
    main()

https://bugs.python.org/issue34410

@tirkarthi
Copy link
Member

Related PR with locking approach : #9075

Thanks

@hongweipeng
Copy link
Contributor Author

In addition, users should ensure that the critical section is thread-safe on their own python code.

The patch make tee to traverse each element returned by PyIter_Next. But if the elements returned in PyIter_Next are missing, for example:

def __next__(self):
    self.i += 1
    return self.i

Here self.i may return duplicate values due to multithreading, which will cause the elements to be missed. Users need to lock themselves, such as:

def __next__(self):
    with self.lock:
        self.i += 1
        return self.i

@rhettinger rhettinger removed their assignment Aug 22, 2019
@rhettinger
Copy link
Contributor

Serhiy, would you please take a look at this?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a NEWS entry.

This approach can fix a crash, but it can lead to emitting items out of order (e.g. 1, 2, 4, 3, ...). This should be explicitly documented in the documentation of tee().

Also an exception can be raised even if there are cached values. For example, the original iterator emits 1, 2, and then raise an error. In thread A you get 1, then in thread B you get 1 and 2, then in thread A you get an error instead of 2.


if (value != NULL) {
teedataobject *temp = tdo;
while (temp->numread + 1 > LINKCELLS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (temp->numread + 1 > LINKCELLS) {
while (temp->numread >= LINKCELLS) {

}
temp->values[temp->numread] = value;
temp->numread++;
}else if (i == tdo->numread) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}else if (i == tdo->numread) {
}
else if (i == tdo->numread || PyErr_Occurred()) {

@rhettinger
Copy link
Contributor

Closing in favor of PR 15567

@rhettinger rhettinger closed this Aug 29, 2019
@serhiy-storchaka
Copy link
Member

#15567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants