Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
DataFrame __getitem__ ~100X slower on Pandas 0.19.1 vs 0.18.1 possibly getitem caching? #14930
Comments
jreback
added this to the
0.19.2
milestone
Dec 20, 2016
|
I have a fix for this. |
|
I changed this 3c96442 to make it lazy |
jreback
added Indexing Performance Regression
labels
Dec 20, 2016
jreback
referenced
this issue
Dec 20, 2016
Closed
PERF: fix getitem unique_check / initialization issue #14933
dragoljub
commented
Dec 20, 2016
|
@jreback Thanks for looking into this. Can you provide some info about Was this regression due to something that improved performance of row-based index but possibly hurt columnar access? |
|
see the referenced issues. They explain all. |
|
@dragoljub short answer is this. When the The solution was to do our uniqueness checking at the same time as we are already checking for sortedness( e.g. the However it seems that this was each time constructing the index mapping (bad). cc @llllllllll |
|
@jreback I took a look at your proposed fix. I think it looks correct, but I think there might be a simpler fix that just makes |
dragoljub
commented
Dec 21, 2016
|
@jreback Thanks for the explanation and quick response time on this. |
jreback
closed this
in 07c83ee
Dec 21, 2016
jorisvandenbossche
added a commit
to jorisvandenbossche/pandas
that referenced
this issue
Dec 24, 2016
|
|
jreback + jorisvandenbossche |
5110eaf
|
ShaharBental
added a commit
to ShaharBental/pandas
that referenced
this issue
Dec 26, 2016
|
|
jreback + ShaharBental |
c36844f
|
dragoljub commentedDec 20, 2016
Problem description
It appears that
get_item_cache()or__contains__may have something to do with it. This affects other functionality such asdf.info()which is now also ~100X slower.Expected Output
Output of
pd.show_versions()pandas: 0.19.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.10.1
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.1
IPython: 4.2.0
sphinx: 1.3.6
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.2.5
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.6.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None