Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: race condition in Index.is_unique #21150

Open
adbull opened this Issue May 21, 2018 · 3 comments

Comments

Projects
None yet
4 participants
@adbull
Copy link

adbull commented May 21, 2018

Code Sample, a copy-pastable example if possible

Input:

from concurrent.futures import ThreadPoolExecutor
import pandas as pd

x = pd.date_range('2001', '2020')
with ThreadPoolExecutor(2) as p:
    assert all(p.map(lambda x: x.is_unique, [x]*2))

Output:

Traceback (most recent call last):
  File "bug.py", line 7, in <module>
    assert all(p.map(lambda x: x.is_unique, [x]*2))
AssertionError

Problem description

When calling Index.is_unique from multiple threads simultaneously, the wrong answer is returned.

Expected Output

Shouldn't raise.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.14-200.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 4.2.1
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.2
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented May 21, 2018

virtually nothing is threadsafe in pandas you have to be very careful. you are welcome to submit a patch.

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented May 21, 2018

see #2728

kinow added a commit to kinow/pandas that referenced this issue Jul 21, 2018

Fix for issue pandas-dev#21150, using a simple lock to prevent an iss…
…ue with multiple threads accessing an Index

kinow added a commit to kinow/pandas that referenced this issue Jul 21, 2018

Fix for issue pandas-dev#21150, using a simple lock to prevent an iss…
…ue with multiple threads accessing an Index

kinow added a commit to kinow/pandas that referenced this issue Jul 27, 2018

Fix for issue pandas-dev#21150, using a simple lock to prevent an iss…
…ue with multiple threads accessing an Index

kinow added a commit to kinow/pandas that referenced this issue Oct 11, 2018

Fix for issue pandas-dev#21150, using a simple lock to prevent an iss…
…ue with multiple threads accessing an Index

kinow added a commit to kinow/pandas that referenced this issue Oct 12, 2018

Fix for issue pandas-dev#21150, using a simple lock to prevent an iss…
…ue with multiple threads accessing an Index
@batterseapower

This comment has been minimized.

Copy link
Contributor

batterseapower commented Feb 21, 2019

It's not surprising that modifying a frame while trying to use it from multiple threads is unsafe, but it's kind of weird that "read only" operations like is_unique can break. In my experience at least, this race in Index is the only part of Pandas that seems to go wrong when using a DataFrame in a read-only fashion from several threads. I guess there might be a few other initialised caches hanging around other parts of the codebase though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.