New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random appveyor x86 build failures #792

Closed
jagerman opened this Issue Apr 9, 2017 · 21 comments

Comments

Projects
None yet
2 participants
@jagerman
Member

jagerman commented Apr 9, 2017

I'm seeing occassional random build failures on the appveyor x86 builds, always during the linking stage. I have a feeling we're running into memory limitations, perhaps combined with the addition of the /m flag for parallel building and the new multi-CPU abilities. (I haven't yet seen a failure for exactly the same builds on https://ci.appveyor.com/project/jagerman/pybind11). If this persists, perhaps we should turn off the /m flag? (I'm assuming--though may be wrong--that it's default to 2 cores for the main project builds and 1 for my open-source-free account)

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 9, 2017

Another option: we could disable eigen tests on the 2 appveyor x86 builds; in practice it seems to be the most resource heavy test script.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 9, 2017

Example: pre-merge build and post-merge build, with no changes to master between the PR and the merge.

@wjakob

This comment has been minimized.

Member

wjakob commented Apr 9, 2017

Removing the /m flag sounds good to me, that doesn't really seem to have helped in any way.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 9, 2017

I removed it; we'll see if it helps.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 9, 2017

Another failure without /m (this time on jagerman not wjakob): https://ci.appveyor.com/project/jagerman/pybind11/build/1.0.529

And this strange one: https://ci.appveyor.com/project/wjakob/pybind11/build/1.0.1648/job/jrflasp8e5tsr95l

I have a feeling that there are some appveyor x86 issues.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 10, 2017

And another, which worked here.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 10, 2017

Now every PR is triggering it, even on x64. Um...

@wjakob

This comment has been minimized.

Member

wjakob commented Apr 10, 2017

Huh? I don't see that here.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 10, 2017

Okay, not every; but it did show up on the first and second x64 builds, so I guess this isn't purely x86-specific.

I'm trying to reproduce it in a Win 10 VM, watching the compiler and linker memory usage; no issues so far (max memory usage slightly over 500MB) , but I had only the VS 2017 RC installed. I'm updating now to investigate some more.

@wjakob

This comment has been minimized.

@jagerman

This comment has been minimized.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 11, 2017

I got an RDP connection to the appveyor VM immediately after a build failure had occured on it, issued the build manually, and it completed successfully without error. I couldn't find anything in the logs to suggest a problem.

(For future reference, changing the on_failure to:

on_failure:
  - ps: $blockRdp = $true; iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/appveyor/ci/master/scripts/enable-rdp.ps1'))

will block it and print RDP connection info in the build log, but the VM only stays alive for 1hr).

@wjakob

This comment has been minimized.

Member

wjakob commented Apr 11, 2017

The RDP feature is neat, but I don't think sticking it into on_failure is a good idea because it will block all AppVeyor builds for 1 hour.

@wjakob

This comment has been minimized.

Member

wjakob commented Apr 11, 2017

I suspect that there is some kind of MSVC ICE Heisenbug. When cl.exe crashes with an ICE, the panic message is generally not picked up by AppVeyor (likely due to the custom Appveyor.MSBuildLogger.dll solution)

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 11, 2017

Oh, for sure; I wasn't suggesting we add it, just making a note for how to use it.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 11, 2017

From the appveyor forum bug report:

I have a feeling that "Command exited with code -1" error might be something related to VS updates as 15.1 is not the latest version, but 15.1.1 was recently released: https://www.visualstudio.com/en-us/news/releasenotes/vs2017-relnotes

We are going to roll out image update with 15.1.1 in the coming days - will see if that fixes the issue.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 17, 2017

I disabled fast_finish (b4cbd7a) at least for now, so that the random failures don't prevent other jobs from running.

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 18, 2017

Confirmation -- it's not just us!

@jagerman

This comment has been minimized.

Member

jagerman commented Apr 19, 2017

Appveyor believes this is fixed now, as of about an hour ago; I'll close this, but reopen if the exited with -1 comes up again!

@jagerman jagerman closed this Apr 19, 2017

@jagerman

This comment has been minimized.

Member

jagerman commented May 19, 2017

Appveyor believes this is fixed with the most recent update.

@jagerman jagerman closed this May 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment