Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF/COMPAT: define platform int to np.intp #13972
Conversation
codecov-io
commented
Aug 12, 2016
•
Current coverage is 85.29% (diff: 100%)@@ master #13972 diff @@
==========================================
Files 139 139
Lines 50219 50245 +26
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42819 42854 +35
+ Misses 7400 7391 -9
Partials 0 0
|
sinhrks
added Performance Reshaping Windows
labels
Aug 12, 2016
jreback
commented on the diff
Aug 12, 2016
| @@ -767,6 +767,40 @@ Note that the limitation is applied to ``fill_value`` which default is ``np.nan` | ||
| - Bug in ``SparseSeries.abs`` incorrectly keeps negative ``fill_value`` (:issue:`13853`) | ||
| - Bug in single row slicing on multi-type ``SparseDataFrame``s, types were previously forced to float (:issue:`13917`) | ||
| +Indexer dtype Changes | ||
| +^^^^^^^^^^^^^^^^^^^^^ |
|
|
jreback
commented on an outdated diff
Aug 12, 2016
| @@ -767,6 +767,40 @@ Note that the limitation is applied to ``fill_value`` which default is ``np.nan` | ||
| - Bug in ``SparseSeries.abs`` incorrectly keeps negative ``fill_value`` (:issue:`13853`) | ||
| - Bug in single row slicing on multi-type ``SparseDataFrame``s, types were previously forced to float (:issue:`13917`) | ||
| +Indexer dtype Changes | ||
| +^^^^^^^^^^^^^^^^^^^^^ | ||
| + | ||
| +.. note:: | ||
| + | ||
| + This change only affects 64 bit python running on Windows, and only affects relatively advanced | ||
| + indexing operations | ||
| + | ||
| +Methods such as ``Index.get_indexer`` that return an indexer array coerce that array to a "platform int", so that it can be | ||
| +directly used in 3rd party library operations like ``numpy.take``. Previously, a platform int was defined as ``np.int_`` | ||
| +which corresponds to a C integer - but the correct type, and what is being used now, is ``np.intp``, which corresponds |
|
|
jreback
commented on an outdated diff
Aug 12, 2016
| + | ||
| +Previous behaviour: | ||
| + | ||
| +.. code-block:: ipython | ||
| + | ||
| + In [1]: i = pd.Index(['a', 'b', 'c']) | ||
| + | ||
| + In [2]: i.get_indexer(['b', 'b', 'c']).dtype | ||
| + Out[2]: dtype('int32') | ||
| + | ||
| +New behaviour: | ||
| + | ||
| +.. ipython :: python | ||
| + | ||
| + i = pd.Index(['a', 'b', 'c']) | ||
| + i.get_indexer(['b', 'b', 'c']).dtype |
jreback
Contributor
|
|
yes this looks reasonable. go ahead an make ready on windows. Also add an asv as above. |
|
pls update the docs as indicated. otherwise lgtm. |
|
What else were you looking for in the docs? I did add an issue ref in the first paragraph. |
|
ok this looks fine. can you run this on 32-bit linux as well and see that it comes up clean (IIRC you tests on windows so that should be good). |
jreback
added this to the
0.19.0
milestone
Aug 15, 2016
|
The changes I just pushed were to get this passing on 32 bit windows. I don't have anything handy to test 32 bit Linux, I can probably set up a VM, although I think it should generally act the same 32 bit Windows ( |
chris-b1
referenced
this pull request
Aug 15, 2016
Closed
CLN/PERF: remove ndarray.take and platform int conversions #13924
|
ok I'll run on 32 bit tomorrow I just use a macosx / vm |
|
This currently fails on 32-bit linux (before your PR). I recall had to do something on windows to get this to pass. Something wrong somewhere. Your PR fixes this! yeah!
So as-is this passes on linux32 (and fixes the above). lmk when good to go. |
|
Cool - fixed the lint issue so this should be good to go. |
chris-b1
changed the title from
PERF/COMPAT: define platform int to np.intp (WIP) to PERF/COMPAT: define platform int to np.intp
Aug 16, 2016
jreback
closed this
in 0780443
Aug 17, 2016
|
thanks! |
chris-b1 commentedAug 12, 2016
•
edited
AFAIK this only affects 64 bit python on Windows.
numpywants annp.intp(i8 on Windows) as a indexer fortake, but pandas defines a "platform int" as anp.int_(i4 on Windows). This hits performance twice, because we often start with i8, cast to i4, then numpy will cast back to i8 in itstake.This is an alternative to #13924 - there I explored replacing
ndarray.takewith ourtake_ndfully, but this approach solves the perf problem without much pain or risking new segfaults.I'd still need to adjust a bunch of tests here to pass on Windows. This is an API change for "advanced" methods like
get_indexer, but I don't think anything is necessary beyond a doc note?ASV: