Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build warnings #27157

Merged
merged 24 commits into from Jul 2, 2019
Merged

Fix build warnings #27157

merged 24 commits into from Jul 2, 2019

Conversation

ghost
Copy link

@ghost ghost commented Jul 1, 2019

Cleans up a whole bunch. Remaining offenders are:

  • Parsers.pyx and tokenizer. Signed/unsigned hell, made worse by python/cython/c mixing.
  • interval.pyx, not touched because large PR in progress in API: Implement new indexing behavior for intervals #27100.
  • np_datetime.c and np_datetime_strings.c. These are linked into 15+ cython modules as extra object files, but they '#include <datetime.h>`, which expects only standalone modules, who properly init to do it. Kinda painful to fix, though it means each function is compiled and kept in memory 15 times.

@gfyoung gfyoung added the Build Library building on various platforms label Jul 1, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief review from phone can dive deep later. Good to take a look and clean these up for sure.

Can you post the warnings that you are resolving?

pandas/_libs/groupby_helper.pxi.in Outdated Show resolved Hide resolved
pandas/_libs/hashtable.pxd Outdated Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment

pandas/_libs/hashtable.pyx Outdated Show resolved Hide resolved
@@ -352,7 +352,7 @@ cdef class IndexEngine:

cdef Py_ssize_t _bin_search(ndarray values, object val) except -1:
cdef:
Py_ssize_t mid, lo = 0, hi = len(values) - 1
Py_ssize_t mid = 0, lo = 0, hi = len(values) - 1
Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real bug I think. mid is referenced in the return value if len(values)=1 so that the main loop immediately falls through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would fail if len(values) == 0 right? Do you see a code sample to actually trigger that?

Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite possibly this is meant to be "caller checks". But that's out of the compiler's view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea sure. Similar comment though - good to suppress compiler warning but if there's an actual error here (which I think would still happen without the warning) would be good to address

Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if values is empty, the loop falls through, util.get_value_at(values, mid) gets called, which calls validate_indexer which raises. I think it's ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which raises

cython code can have weird corner cases where exceptions get ignored. Adding a test for this case seems worthwhile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. example?

Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh. I think you've nailed something here. So it's not exactly weird corner cases, but cdef functions which return exception as if they were python functions.
https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#error-return-values

validate_indexer and get_value_at are both cdef. validate_indexer is tagged except -1, so any exception translates into a -1 retval. But the caller get_value_at doesn't check that and uses -1 to reference into the the array. in this case an empty one. but it doesn't have an except return value defines. So, I'm not sure what happens.

If an exception is detected in such a function, a warning message is 
printed and the exception is ignored.

OTOH, the index constructor refuses to create empty indexes, so an end-to-end test does not seem possible. and get_value_at is used all over the place.

Hmm. will have to think about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [18]: %%cython
    ...: 
    ...: cdef int foo() except -1:
    ...:     1/0
    ...:     cdef:
    ...:         int mid = 42
    ...:     return mid
    ...:     
    ...: cdef int bar():
    ...:     res = foo()
    ...:     return res
    ...:     
    ...: def baz():
    ...:     return bar()

In [19]: baz()
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
_cython_magic_a591cd36cd8e49116d6e370949ee24c2.pyx in _cython_magic_a591cd36cd8e49116d6e370949ee24c2.foo()

ZeroDivisionError: float division
Exception ignored in: '_cython_magic_a591cd36cd8e49116d6e370949ee24c2.bar'
Traceback (most recent call last):
  File "_cython_magic_a591cd36cd8e49116d6e370949ee24c2.pyx", line 3, in _cython_magic_a591cd36cd8e49116d6e370949ee24c2.foo
ZeroDivisionError: float division
Out[19]: 0

very interesting bug.

pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/_libs/src/parser/tokenizer.c Show resolved Hide resolved
pandas/_libs/lib.pyx Outdated Show resolved Hide resolved
pandas/_libs/lib.pyx Show resolved Hide resolved
pandas/_libs/lib.pyx Show resolved Hide resolved
@@ -352,7 +352,7 @@ cdef class IndexEngine:

cdef Py_ssize_t _bin_search(ndarray values, object val) except -1:
cdef:
Py_ssize_t mid, lo = 0, hi = len(values) - 1
Py_ssize_t mid = 0, lo = 0, hi = len(values) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would fail if len(values) == 0 right? Do you see a code sample to actually trigger that?

pandas/_libs/hashtable_func_helper.pxi.in Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented Jul 1, 2019

Green and I think I've addressed all the comments.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits other lgtm

pandas/_libs/src/parser/tokenizer.c Outdated Show resolved Hide resolved
pandas/_libs/hashtable.pyx Outdated Show resolved Hide resolved
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is good. Maybe one comment to look at

pandas/_libs/src/parser/tokenizer.c Show resolved Hide resolved
@jbrockmendel
Copy link
Member

np_datetime.c and np_datetime_strings.c. These are linked into 15+ cython modules as extra object files, but they '#include <datetime.h>`, which expects only standalone modules, who properly init to do it. Kinda painful to fix, though it means each function is compiled and kept in memory 15 times.

One more for my edification. Is this because they are specified in setup.py, or because of cimports from np_datetime.pyx, or [...]? How costly is the "compile and kept in memory 15 times"?

@jreback jreback added this to the 0.25.0 milestone Jul 1, 2019
@ghost
Copy link
Author

ghost commented Jul 1, 2019

One more for my edification. Is this because they are specified in setup.py, or because of cimports
from np_datetime.pyx, or [...]? How costly is the "compile and kept in memory 15 times"?

I came across your comment in the file:

// TODO(anyone): If we can get PyDateTime_IMPORT to work, we could use
// PyDateTime_Check here, and less verbose attribute lookups.

The issue is that it's supposed to be an independent cython/cpython module, and imported by other cython modules. Then it could call IMPORT, and a few other warnings would go away.

Instead these files are compiled into object files, and linked into each module that uses them. there's no crash because each module does call IMPORT, but the compiler can't see that.

I don't think compilation is an issue, because:

  1. its not a large file
  2. cython generates monstrosities, so any true c pales
  3. On most dev machine, you have ccache, so (maybe, possibly) the compilation result is cached
  4. depending on how distutils compiles things, it's linked 15 times, but the object is the same dependency for all of them in the Makefile sense, so compiled only once/

as for memory waste it's 2019, so no not an issue.

In summary: more of a smell than a perf/resource issue.

@jreback jreback merged commit 527e714 into pandas-dev:master Jul 2, 2019
@jreback
Copy link
Contributor

jreback commented Jul 2, 2019

ok thanks

@ghost ghost deleted the fix_warnings branch July 2, 2019 01:18
@ghost ghost mentioned this pull request Jul 11, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants