Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Add uint64 support to IntervalTree #20651

Merged
merged 2 commits into from
Apr 11, 2018

Conversation

jschendel
Copy link
Member

Note that this doesn't fully close the linked issue; this just resolves the uint64 case, which was relatively straightforward. The datetimelike case still needs to be resolved, but seems distinct enough for a different PR.

@@ -24,6 +24,7 @@ ctypedef fused scalar_t:
float32_t
int64_t
int32_t
uint64_t
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure if this is necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. Even if your tests pass without it, I would still keep it for consistency.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type labels Apr 11, 2018
@jreback jreback added this to the 0.23.0 milestone Apr 11, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm.

np.array([0, 4, -1], dtype='int64'))
with pytest.raises(KeyError):
tree.get_indexer(np.array([3.0]))
self.tree = tree('int64')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob don't need setup_method this if you pass tree into other methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture makes more sense if you plan on using more than one dtype IMO. As it stands, I think we can move the tree initialization inside setup_method (and remove the tree input to the tests).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tree fixture is using the dtype fixture when passed as a parameter to tests, so it's generating trees for all dtypes in the dtype fixture. Initializing self.tree like that is just a hack I used to create a single instance of a int64 tree, since there was one existing test that only used int64. The test passes for all dtypes though, so removed setup_method and just passed in the tree fixture.

@codecov
Copy link

codecov bot commented Apr 11, 2018

Codecov Report

Merging #20651 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20651      +/-   ##
==========================================
- Coverage   91.83%   91.81%   -0.03%     
==========================================
  Files         153      153              
  Lines       49269    49270       +1     
==========================================
- Hits        45247    45238       -9     
- Misses       4022     4032      +10
Flag Coverage Δ
#multiple 90.2% <ø> (-0.03%) ⬇️
#single 41.9% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/indexes/datetimes.py 95.73% <0%> (ø) ⬆️
pandas/util/testing.py 84.59% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e6aa1c...4e0ec35. Read the comment docs.

remove setup_method, parametrize additional test
@jreback jreback merged commit 2794474 into pandas-dev:master Apr 11, 2018
@jreback
Copy link
Contributor

jreback commented Apr 11, 2018

thanks @jschendel

@jreback
Copy link
Contributor

jreback commented Apr 11, 2018

side issue: do we have tests constructing (and raising) non-supported dtypes, e.g. int16. They raise on directly constructing an IntervalTree, so should add some tests for this (I believe have tests directly constructing an II)

@jreback
Copy link
Contributor

jreback commented Apr 11, 2018

this is generating lots of build warnings

building 'pandas._libs.interval' extension
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/jreback/miniconda3/envs/pandas/include -arch x86_64 -I/Users/jreback/miniconda3/envs/pandas/include -arch x86_64 -Ipandas/_libs/src/klib -Ipandas/_libs/src -I/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/numpy/core/include -I/Users/jreback/miniconda3/envs/pandas/include/python3.6m -c pandas/_libs/interval.c -o build/temp.macosx-10.7-x86_64-3.6/pandas/_libs/interval.o -Wno-unused-function
In file included from pandas/_libs/interval.c:525:
In file included from /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1816:
/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
pandas/_libs/interval.c:13323:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:13580:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:13837:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:14094:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:14351:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:15478:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:15734:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:15990:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:16246:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:16502:33: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
                      ~~~~~~~~~ ^ ~~~~~~~~~
pandas/_libs/interval.c:67694:139: warning: comparison of integers of different signs: '__pyx_t_5numpy_int32_t' (aka 'int') and '__pyx_t_5numpy_uint64_t' (aka 'unsigned long') [-Wsign-compare]
      __pyx_t_8 = ((*((__pyx_t_5numpy_int32_t *) ( /* dim=0 */ (__pyx_v_self->left.data + __pyx_t_11 * __pyx_v_self->left.strides[0]) ))) <= __pyx_v_point);
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^  ~~~~~~~~~~~~~
pandas/_libs/interval.c:67697:36: warning: comparison of integers of different signs: '__pyx_t_5numpy_uint64_t' (aka 'unsigned long') and '__pyx_t_5numpy_int32_t' (aka 'int') [-Wsign-compare]
        __pyx_t_8 = (__pyx_v_point < (*((__pyx_t_5numpy_int32_t *) ( /* dim=0 */ (__pyx_v_self->right.data + __pyx_t_12 * __pyx_v_self->right.strides[0]) ))));
                     ~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pandas/_libs/interval.c:67740:34: warning: comparison of integers of different signs: '__pyx_t_5numpy_uint64_t' (aka 'unsigned long') and '__pyx_t_5numpy_int32_t' (aka 'int') [-Wsign-compare]
    __pyx_t_13 = ((__pyx_v_point < __pyx_v_self->pivot) != 0);
                   ~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~
pandas/_libs/interval.c:67788:138: warning: comparison of integers of different signs: '__pyx_t_5numpy_int32_t' (aka 'int') and '__pyx_t_5numpy_uint64_t' (aka 'unsigned long') [-Wsign-compare]
        __pyx_t_13 = ((!(((*((__pyx_t_5numpy_int32_t *) ( /* dim=0 */ (__pyx_v_values.data + __pyx_t_17 * __pyx_v_values.strides[0]) ))) <= __pyx_v_point) != 0)) != 0);
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^  ~~~~~~~~~~~~~

mapehe pushed a commit to mapehe/pandas that referenced this pull request Apr 12, 2018
@jschendel
Copy link
Member Author

jschendel commented Apr 13, 2018

Looking into the warnings. The warnings I get are similar but slightly different,; guessing it's a compiler thing. I have a fix that addresses the warnings, but it causes an OverflowError instead of a KeyError when querying negative values on a uint64 dtype IntervalTree. Trying to find a way to gracefully handle it.

do we have tests constructing (and raising) non-supported dtypes, e.g. int16. They raise on directly constructing an IntervalTree, so should add some tests for this

Also added a test for this on my warnings fix branch, and made some minor changes to yield a more informative error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants