Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: rolling with MSVC 2017 build #21813

Merged
merged 12 commits into from Jul 26, 2018

Conversation

Projects
None yet
8 participants
@chris-b1
Copy link
Contributor

chris-b1 commented Jul 8, 2018

  • closes #21786
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

For now just trying 3.7 on appveyor, not sure if all necessary conda packages are there

PYTHON_VERSION: "3.7"
PYTHON_ARCH: "64"
CONDA_PY: "37"
CONDA_NPY: "113"

This comment has been minimized.

@jreback

jreback Jul 8, 2018

Contributor

can you make this 114?

This comment has been minimized.

@jreback

jreback Jul 8, 2018

Contributor

this also very likely won't work, very little is built for 3.7 :<

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

TomAugspurger commented Jul 8, 2018

The c3i_test channel may have some missing packages.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Jul 8, 2018

Codecov Report

Merging #21813 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21813      +/-   ##
==========================================
+ Coverage   91.99%      92%   +<.01%     
==========================================
  Files         167      170       +3     
  Lines       50578    50553      -25     
==========================================
- Hits        46530    46510      -20     
+ Misses       4048     4043       -5
Flag Coverage Δ
#multiple 90.4% <ø> (ø) ⬆️
#single 42.21% <ø> (+0.04%) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/console.py 65.15% <0%> (-4.55%) ⬇️
pandas/core/frame.py 97.19% <0%> (ø) ⬆️
pandas/core/generic.py 96.47% <0%> (ø) ⬆️
pandas/core/internals/blocks.py 94.46% <0%> (ø)
pandas/core/internals/concat.py 98.37% <0%> (ø)
pandas/core/internals/managers.py 96.52% <0%> (ø)
pandas/core/indexes/base.py 96.37% <0%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.95% <0%> (ø) ⬆️
pandas/io/formats/excel.py 97.39% <0%> (+0.01%) ⬆️
pandas/core/reshape/merge.py 94.16% <0%> (+0.05%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0828c25...5d40eae. Read the comment docs.

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 9, 2018

Well, good news is we got a repro of this on our CI - it's not a python 3.7 issue, but instead a Visual Studio 2017 one.
https://ci.appveyor.com/project/pandas-dev/pandas/build/1.0.15961/job/r9axp415no6jubnr#L2059

Still not convinced this isn't a compiler bug, but having trouble to get a simple case to reproduce - one option would be to build our wheels VS 2015 - but annoying, b/c python forces the newest version, so can't have 2017 installed. I'll keep trying some things.

@chris-b1 chris-b1 changed the title BUG: rolling with 3.7 and windows BUG: rolling with MSVC 2017 build Jul 11, 2018

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 11, 2018

This is kind of horrible, but it does fix the problem (just a lint issue I pushed for) and I'm out of ideas - any thoughts or suggestions @jreback, @TomAugspurger ?

@@ -654,16 +654,21 @@ cdef inline void add_var(double val, double *nobs, double *mean_x,
double *ssqdm_x) nogil:
""" add a value from the var calc """
cdef double delta

This comment has been minimized.

@jreback

jreback Jul 11, 2018

Contributor

doing

if val != val:
    return

# rest of code

might work (and be more clear here)

This comment has been minimized.

@chris-b1

chris-b1 Jul 12, 2018

Author Contributor

Great suggestion, sadly didn't work

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

TomAugspurger commented Jul 12, 2018

@chris-b1 could you summarize the issue? You think MSVC is just optimizing away if val == val, which isn't true when val is NaN?

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 12, 2018

That's right, optimizing maybe not the right word, but somehow the add_var function is being turned into a no-op, the mean_x, ssqdm_x, and nobs variables don't get updated. Adding any kind of unconditional action to the function fixes it - originally discovered by noticing it works with a print statement thrown in.

And yes, val==val is a not-a-NaN check. We use it all over the place in this module, so it's not as if that check is universally being treated wrong, but something about this particular context makes MSVC think it's unreachable or something.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

TomAugspurger commented Jul 12, 2018

Ok, thanks...

@jjhelmus do we know anyone at Microsoft who works on their compilers that might be able to provide insight on #21813 (comment) and #21813 (comment)?

@jjhelmus

This comment has been minimized.

Copy link
Contributor

jjhelmus commented Jul 13, 2018

@zooba might be able to connect you to the right team at Microsoft

@zooba

This comment has been minimized.

Copy link
Contributor

zooba commented Jul 13, 2018

I can try, but I haven't been having as much luck recently.

Since this is Cython, can someone post the generated code for the comparison? There's a much better chance of getting something useful if they're looking at the code they actually compile, and not the code someone else is compiling.

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 13, 2018

It's Cython, so of course the bigger file is a tangled mess, but the particular function is easy to read - lines of gist here.
https://gist.github.com/chris-b1/cd974fb6b4165285a48b6d6c12129b40#file-window-cpp-L11028-L11096

Called in a few places, but for instance, here is a problem one.
https://gist.github.com/chris-b1/cd974fb6b4165285a48b6d6c12129b40#file-window-cpp-L11948

Cython isn't doing anything particular interesting - it's a cdef function so basically translates line for line into c++.

I can post it later, not on this machine, but I've also tried to write a simpler, plain c++ example that triggers the problem, but it always worked - really not sure why.

@zooba

This comment has been minimized.

Copy link
Contributor

zooba commented Jul 13, 2018

I can reproduce it with this snippet (has a few tricks to try and defeat optimisations, but they are likely unnecessary):

#include <math.h>
#include <stdio.h>

double get_value(int i) {
    return i > 1 ? NAN : 7;
}

int check_default(double v1, double v2) {
    return ((v1 == v2) != 0);
}

#pragma float_control(precise, off, push)
int check_fast(double v1, double v2) {
    return ((v1 == v2) != 0);
}
#pragma float_control(pop)


int main(int argc, char **argv) {
    double x1 = get_value(argc);
    double x2 = get_value(argc);
    fprintf(stdout, "default: %d\nfast:    %d\n",
        check_default(x1, x2), check_fast(x1, x2)
    );
}

The comparison doesn't behave properly when you disable precise floating point operations. You may be doing this in the build with /fp:fast (if you build this snippet with that option, both checks return '1'; if you build with the default /fp:precise, it works). In fact, the docs for this flag specifically call out NaN comparisons as something that only work in precise mode.

So I'd guess you need to update the compiler flags for this one, or possibly add an explicit isnan() call.

@jbrockmendel

This comment has been minimized.

Copy link
Member

jbrockmendel commented Jul 14, 2018

Is the MSVC info something that can be inferred from platform.uname (at compile-time)?

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 15, 2018

Thanks for taking a look at this @zooba - unfortunately that's not exactly the same problem I'm seeing here, the compiled code is acting as if (v1 == v2) always evaluates to False/0. We are building with /fp:precise (via defaults, but I've also tried manually overriding)

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 15, 2018

Interestingly using isnan from <math.h> does solve the problem here - still feels like MSVC is doing something wrong in the former case, but that's a reasonable enough workaround.

@zooba

This comment has been minimized.

Copy link
Contributor

zooba commented Jul 16, 2018

the compiled code is acting as if (v1 == v2) always evaluates to False/0

Do you have a disassembly showing this? This would be very strange to see as a compiler optimization (unless you mean (v == v) always evaluates to True/1?)

b/c python forces the newest version, so can't have 2017 installed

Just saw this while rereading the thread. If you start with all the tools on PATH and set DISTUTILS_USE_SDK=1 then it'll skip the autodetection and use whatever you provide. (And if you help encourage the ecosystem to get off distutils completely, then more of these problems go away :) )

@jbrockmendel

This comment has been minimized.

Copy link
Member

jbrockmendel commented Jul 16, 2018

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 16, 2018

@zooba - do you know how to pull symbols into the dissambly from a .pyd? I was playing around with dumpbin / compile flags, but never got anything better than, e.g. below, where everything is still in offsets.

 dumpbin.exe /DISASM pandas\_libs\window.cp36-win_amd64.pyd /OUT:tmp.txt

image

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 16, 2018

do you know how to pull symbols into the dissambly from a .pyd

Answering my own question - I was missing '/DEBUG' on extra_link_args

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 16, 2018

Alright @zooba - I think I have this to a reasonable bit of a dis-assembly - pushing the bounds of my knowledge but looks like a code-generation bug.

Here's MSVC 2015 (printfs surround a call to add_var) which works correctly. In particular the highlighted lines are the NaN check.
https://gist.github.com/chris-b1/1049fdabafc56b2a25de7ad377016baa#file-msvc2015-asm-L6-L8

And here is MSVC 2017. The function call doesn't get inlined - half of the NaN check gets done (jn) - then the stack is being set up, then the other half the check (jne). I believe in setting up the stack, the ZF flag originally set by ucomisd gets set back to 0 (from the sub) so the jne check will always be true, and the function unconditionally exits early.
https://gist.github.com/chris-b1/199f818c1fbb8f8c4108aa9e430df4f7#file-msvc2017-asm-L15-L19

@zooba

This comment has been minimized.

Copy link
Contributor

zooba commented Jul 16, 2018

@chris-b1 Yep, you're absolutely right. Thanks for tracking that down, I'll pass it on!

@zooba

This comment has been minimized.

Copy link
Contributor

zooba commented Jul 16, 2018

Here is the public side of the report: https://developercommunity.visualstudio.com/content/problem/294290/incorrect-codegen-for-double-equality.html Feel free to add pings there whenever you like

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 16, 2018

Thanks @zooba!

and thanks @jbrockmendel - that's what I'll have to do to fix the 2.7 build, more modern MSVCs are basically standards compliant and have isnan where it should be.

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Jul 17, 2018

does using the fix proposed in #21940 help here? e.g. npy_isnan?

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 17, 2018

does using the fix proposed in #21940 help here? e.g. npy_isnan?

It does (latest commit has that and is passing on appveyor) - but less sure we want to apply that universally, I'll try perf on @jbrockmendel PR later, suspect it may be worse on Windows.

@chris-b1 chris-b1 force-pushed the chris-b1:roll-var-37 branch from 8decc5a to f0628c0 Jul 22, 2018

chris-b1 added some commits Jul 22, 2018

...

@jreback jreback added this to the 0.23.4 milestone Jul 25, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Jul 25, 2018

@chris-b1 this looks good. can you rebase once more.

perf looks ok? (even if its slightly worse this is a bug :>)

Go ahead and merge on green.

@chris-b1

This comment has been minimized.

Copy link
Contributor Author

chris-b1 commented Jul 26, 2018

@jreback thanks, yeah perf will be fine, just slightly slower on Windows for this particular function

@chris-b1 chris-b1 merged commit 7a2fbce into pandas-dev:master Jul 26, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@meeseeksdev

This comment has been minimized.

Copy link

meeseeksdev bot commented Jul 26, 2018

There seem to be a conflict, please backport manually

minggli added a commit to minggli/pandas that referenced this pull request Jul 28, 2018

Merge branch 'master' into bugfix/replace_recursion
* master:
  BENCH: asv csv reading benchmarks no longer read StringIO objects off the end (pandas-dev#21807)
  BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 (pandas-dev#21224)
  BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13 (pandas-dev#21957)
  CLN: Vbench to asv conversion script (pandas-dev#22089)
  consistent docstring (pandas-dev#22066)
  TST: skip pytables test with not-updated pytables conda package (pandas-dev#22099)
  CLN: Remove Legacy MultiIndex Index Compatibility (pandas-dev#21740)
  DOC: Reword doc for filepath_or_buffer in read_csv (pandas-dev#22058)
  BUG: rolling with MSVC 2017 build (pandas-dev#21813)

@TomAugspurger TomAugspurger referenced this pull request Jul 30, 2018

Closed

RLS: 0.23.4 #22128

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Jul 30, 2018

cc @chris-b1 do you want to see if you can backprot to 23.x branch?

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Aug 2, 2018

BUG: rolling with MSVC 2017 build (pandas-dev#21813)
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...

(cherry picked from commit 7a2fbce)

TomAugspurger added a commit that referenced this pull request Aug 3, 2018

BUG: rolling with MSVC 2017 build (#21813)
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...

(cherry picked from commit 7a2fbce)

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this pull request Aug 12, 2018

BUG: rolling with MSVC 2017 build (pandas-dev#21813)
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...

@WillAyd WillAyd added the Window label Sep 4, 2018

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: rolling with MSVC 2017 build (pandas-dev#21813)
* Appveyor 3.7

* ci package list

* change image type

* try hack fix

* lint

* use isnan on problem function

* use numpy compat isnan

* use right isnan

* work around OSX math undefs

* cleanup const

* fix reversion

* ...

@TomAugspurger TomAugspurger referenced this pull request Oct 16, 2018

Merged

Windows CI #23182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.