Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: windows builder provisioning #2

Closed
wants to merge 7 commits into from

Conversation

tkelman
Copy link
Contributor

@tkelman tkelman commented Sep 19, 2014

ref JuliaCI/julia-buildbot#3

Not fully ready yet, but in case you want to have a look at it

For now my Vagrantfile is pretty much just

Vagrant.configure("2") do |config|
  config.vm.box = "opentable/win-2012r2-standard-amd64-nocm"
  config.vm.communicator = "winrm" # important!
  config.vm.provision "shell", path: "support/cygwin_juliadev.ps1"
  config.vm.network "public_network", bridge: 'eth0'
end

But I think to be proper about licenses we will need to figure out how to build our own vagrant boxes from authorized evaluation images.

@tkelman
Copy link
Contributor Author

tkelman commented Oct 7, 2014

@staticfloat let me know if you get a chance to look at this at some point. The provisioner powershell script here actually works to build Julia, runs tests, gives consistent results with what I see locally. I went away from chocolatey, just using the cygwin setup exe directly now from the script.

The opentable box I'm using needs manual tweaking of the password policy, I reported the issue and they said they're in the process of fixing it. They used packer to generate the box, they have their code for doing so posted here https://github.com/opentable/packer-images/tree/master/windows/templates/windows-2012R2-serverstandard-amd64

I haven't gone so far as to reproduce building a new box with packer yet, I guess that would be the next thing to try here.

@staticfloat
Copy link
Owner

You've done a great amount of work here, @tkelman. I'll try to get to this on the weekend.

don't bother with chocolatey, it's being finicky

cyg-get was a pretty thin wrapper around the cygwin setup exe anyway
@tkelman
Copy link
Contributor Author

tkelman commented Oct 9, 2014

I seem to have done something that started causing cygwin's perl to constantly segfault, so I decided to try using various packer templates that are out there to make a new box. This worked.

git clone https://github.com/joefitzgerald/packer-windows
# could maybe add this as a submodule?
cd packer-windows
# patch to disable windows updates
curl https://gist.githubusercontent.com/anonymous/499ebeff1dcaae77c2a1/raw | git apply
packer build windows_2012_r2.json
# wait an hour or two
cd .. # assuming we started here in julia-vagrant
cp win2012r2.vagrant Vagrantfile
vagrant up --no-provision && vagrant provision

Should be able to adjust the provisioning script to do parallel make, run the tests, etc (the backtrace test has been failing for win64 on master for some time, so if you get that far it's not new). I imagine it should be split up somehow into "install stuff and build the deps" in one step, then "build julia on an ongoing basis" as a separate setup.

@staticfloat
Copy link
Owner

Since OpenStack is a KVM-based virtualization engine, I've started trying to use KVM with packer locally to build the images. That's the only way I could get CentOS working, for instance, because it could never survive the VMWare/VirtualBox -> KVM transition that I have setup for Ubuntu and friends. I don't know if Windows will have the same issues, (I would guess probably not; windows has been pretty resilient in the past for me) but it's something to watch out for. I'm trying it out right now to see if it will indeed make the transition.

@tkelman
Copy link
Contributor Author

tkelman commented Oct 14, 2014

Hm, good to know. Are you running the buildbot VMs on openstack then?

This is actually working fairly well for me with VirtualBox, except I'm hitting an annoying 2-hour timeout somewhere. I've been digging around trying to find where it might be coming from, no conclusive leads yet. Vagrant just craps out (or maybe the provisioning command in powershell exits?) after roughly 2 hours, no matter what.

@staticfloat
Copy link
Owner

That is annoying. I'm currently trying to build on vmware, we'll see how it goes.

@staticfloat
Copy link
Owner

And yes, all of our build machines (except OSX) are on openstack. OSX won't virtualize under KVM.

@tkelman
Copy link
Contributor Author

tkelman commented Oct 14, 2014

Gotcha. I do actually have an OSX VM (hackintosh bootloader) in VirtualBox on my Windows host, but it's so slow and inconvenient I can't really use it for anything Julia-related. Maybe if I packerized it I'd have an easier time.

With enough persistence and repeated attempts at provisioning the Windows VM will eventually finish building the deps. With all deps in place, the full-build provisioning step should take about half an hour (with 2 cores for C compiling and test running) each for win32 and win64, less if you skip make dist or play it a little dangerous and just do make clean instead of make cleanall.

@staticfloat
Copy link
Owner

My default template for builder VMs is 4 CPUs and 4GB of memory, so hopefully we can cut that down a little bit. I expect the windows VMs to be slower than Linux, but I will be sad if they take more than 20 minutes for a rebuild.

Also, some of your recent work just shed some light onto why the OSX buildbot builds are so much slower than the Linux builds; Some of the dependencies (such as openlibm) get rebuilt on OSX but not on Linux. I'll look into it one of these days, but I'm certain that install_name_tool is screwing us over somewhere.

@tkelman
Copy link
Contributor Author

tkelman commented Oct 14, 2014

Some time-consuming steps are inherently serial right now - bootstrapping the sysimg, and (I think) packaging the installer. But more cores will help with the tests and the C part of the build. I'm also doing both win32 and win64 within the same VM, which you may want to split up, but it is a lot of extra space when the whole OS and all but 2 of the Cygwin packages (+ dependencies thereof) are shared between the two.

You know more about what install_name_tool is doing than I do, but maybe it has a preserve timestamp flag?

@staticfloat
Copy link
Owner

Sigh, I tried converting the vmware box (which works just fine) to a kvm image and running it on OpenStack. It didn't work, probably because the transition was just too much for the poor thing. I guess the next thing I'll try is building a box using packer's qemu builder and seeing if I can get that to build.

@staticfloat
Copy link
Owner

Alright, I had the opportunity to run through your steps manually (not using vagrant) on KVM. Your steps work nicely, (I had to manually finaggle with Cygwin a bit, but nothing too terrible) and I have a running Windows 8.1 Pro VM right now. (I didn't go for server because I didn't have access to an ISO to install from easily; but that is probably the superior choice)

Julia builds, but fails the test suite (on 32-bit) due to some issue in Dates, but I don't think that's anything new:

julia> Dates.DateTime("[14:51:00.118]","[HH:MM:SS.sss]")
ERROR: InexactError()

@tkelman
Copy link
Contributor Author

tkelman commented Nov 12, 2014

yay!

Can you remember what kind of finagling was required?

The test failure is quite odd, I'm not seeing that right now...

  | | |_| | | | (_| |  |  Version 0.4.0-dev+1572 (2014-11-12 00:38 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 0086b52* (0 days old master)
|__/                   |  i686-w64-mingw32

julia> Dates.DateTime("[14:51:00.118]","[HH:MM:SS.sss]")
0001-01-01T14:51:00.118

But I do see the numbers test fail if I run in serial, JuliaLang/julia#8895 - does deleting sys.dll help any with the dates failure?

@tkelman
Copy link
Contributor Author

tkelman commented Nov 12, 2014

cc @quinnj it might be a win7 vs win8 difference, not sure

@staticfloat
Copy link
Owner

I'm thinking more likely its a CPU capabilities difference. E.g. I don't
have SSE enabled or something ridiculous like that. I'll look into it in a
day or two.
On Nov 11, 2014 5:58 PM, "Tony Kelman" notifications@github.com wrote:

cc @quinnj https://github.com/quinnj it might be a win7 vs win8
difference, not sure


Reply to this email directly or view it on GitHub
#2 (comment)
.

@quinnj
Copy link

quinnj commented Nov 12, 2014

Hmmmm......I thought we had resolved all the 32-bit issues. I can check a new 32-bit build.

@tkelman
Copy link
Contributor Author

tkelman commented Nov 12, 2014

It sounds like it may be something specific to Elliot's VM configuration, so maybe don't worry too much about it for now.

@staticfloat
Copy link
Owner

@quinnj The issue does indeed seem to have disappeared now that I've enabled some CPU features. Amazing that Windows 8.1 will run on a VM without SSE.

I'm now failing the backtrace test on 32 and 64-bit. Is that normal? If so, I think we're well on our way to having buildbot nightlies.

@tkelman
Copy link
Contributor Author

tkelman commented Nov 13, 2014

I'm now failing the backtrace test on 32 and 64-bit. Is that normal?

64 bit, on master, yes. 32 bit, no, at least not as of the last time I built Julia master a few days ago.

@tkelman
Copy link
Contributor Author

tkelman commented Nov 14, 2014

Oh right, but on 32 bit I do see failures in the numbers test unless I delete sys.dll, JuliaLang/julia#8895

@tkelman
Copy link
Contributor Author

tkelman commented Nov 15, 2014

@staticfloat
Copy link
Owner

Yes! Except its being uploaded as linux-x86_64
On Nov 14, 2014 7:02 PM, "Tony Kelman" notifications@github.com wrote:

64 bit success?
http://buildbot.e.ip.saba.us:8010/builders/package_win8.1-x64/builds/4


Reply to this email directly or view it on GitHub
#2 (comment)
.

@staticfloat
Copy link
Owner

Alright, could you take a look at the latest windows binaries? These were built on the buildbot farm. Neither pass the tests completely, I'm hoping that there is a good reason for it.

Also, they are SO MUCH SLOWER than an ubuntu machine. My goodness. I set it going to performing a full build when I go to sleep, and it's not done by the time I wake up in the morning.

Here are direct links to the latest windows binaries, in case Isaiah uploads his own: 32-bit, 64-bit

@tkelman
Copy link
Contributor Author

tkelman commented Nov 16, 2014

I do see the same dates test failure on win32, not sure what's up.

Julia Version 0.4.0-dev+1627
Commit 74b8a6b (2014-11-15 17:04 UTC)
Platform Info:
  System: Windows (i686-w64-mingw32)
  CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Penryn)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

 in error at error.jl:21
 in runtests at interactiveutil.jl:383
 in runtests at interactiveutil.jl:372

julia> Dates.DateTime("[14:51:00.118]","[HH:MM:SS.sss]")
ERROR: InexactError()
 in slotparse at dates/io.jl:99
 in getslot at dates/io.jl:108
 in parse at dates/io.jl:120

@staticfloat
Copy link
Owner

@ihnorton @tkelman What are your make incantations for building on windows? Do you build via cygwin? What do you set MARCH to?

@tkelman
Copy link
Contributor Author

tkelman commented Nov 16, 2014

I've never set MARCH to anything, but also never tried too hard to make a portable binary. These latest binaries were ending up with a sys.dll file as well, which isn't really supposed to make it into the binaries at this point.

@tkelman
Copy link
Contributor Author

tkelman commented Nov 17, 2014

Interesting - so when I use julia-0.4.0-dev-1f9a5f0-win32.exe which was built from the old system, everything (except the win64 backtrace test) passes https://ci.appveyor.com/project/tkelman/julia-nightly-packaging/build/1.0.53

But using julia-0.4.0-1f9a5f00b8-win32.exe from the new buildbot fails the dates test https://ci.appveyor.com/project/tkelman/julia-nightly-packaging/build/1.0.54/job/vtm4jq4wgs3ykgy9

I'm going to guess MARCH is the only difference, and try locally with MARCH=i686 to see if I can reproduce it with a local build. Hopefully it doesn't need all of the dependencies to be rebuilt with the MARCH flag set to get it to happen.

Also, they are SO MUCH SLOWER than an ubuntu machine. My goodness. I set it going to performing a full build when I go to sleep, and it's not done by the time I wake up in the morning.

That's just the dependencies though, right? Other than shrinking base and moving more of them out to binary packages I'm not sure how much we can do about that for now. Steady-state once all dependencies are compiled it looks like it's taking something like 30-35 minutes per build?

@staticfloat
Copy link
Owner

Yes, I don't think there's much we can do regarding speed. ;)

@staticfloat
Copy link
Owner

32-bit is now built with MARCH=pentium4. How's it look for you?

@tkelman
Copy link
Contributor Author

tkelman commented Nov 19, 2014

I'll check. I think the makefile logic somewhere for whether we delete sys.dll doesn't expect binary builds to be setting MARCH. So we'll need to fix that one way or another. I don't think any Windows users have complained about binary incompatibility with @ihnorton's builds, which AFAIK don't set MARCH.

@tkelman
Copy link
Contributor Author

tkelman commented Nov 19, 2014

pentium4 looks like it passed all tests on appeyor (except backtrace, skipped that since it's a known failure on win64): https://ci.appveyor.com/project/tkelman/julia-nightly-packaging/build/1.0.55 man fast appveyor pro is awesome, wish I had tried it earlier, and if I had a real job I'd just buy it myself

and yes these binaries have been including sys.dll which I think we need to fix

@staticfloat
Copy link
Owner

I think our sys.DLL deletion machinery gets turned off when MARCH is set.
On Nov 19, 2014 5:22 AM, "Tony Kelman" notifications@github.com wrote:

pentium4 looks like it passed all tests on appeyor (except backtrace,
skipped that since it's a known failure on win64):
https://ci.appveyor.com/project/tkelman/julia-nightly-packaging/build/1.0.55
man fast appveyor pro is awesome, wish I had tried it earlier, and if I had
a real job I'd just buy it myself

and yes these binaries have been including sys.dll which I think we need
to fix


Reply to this email directly or view it on GitHub
#2 (comment)
.

@tkelman
Copy link
Contributor Author

tkelman commented Nov 22, 2014

Which is another indication that @ihnorton's builds are in all likelihood not setting MARCH, or they are manually deleting sys.dll. If it's the former, it's interesting that no one has complained about the Windows binaries not working on older machines?

@tkelman
Copy link
Contributor Author

tkelman commented Nov 22, 2014

Should we either stop setting MARCH for the windows builds, or add a copy of https://github.com/JuliaLang/julia/blob/ac1371136e1aae1ca7366b4d19cc0ffa0d659d72/Makefile#L349
a few lines below under the ifeq ($(OS), WINNT) ?

@staticfloat
Copy link
Owner

I'm pretty sure we still need to set MARCH. Let's add the extra line below, and push that to release-0.3 as well.

@staticfloat
Copy link
Owner

@tkelman JuliaLang/julia@c9da7cb

Once the buildbots crunch through that latest commit, we should be good to go w.r.t. this.

and set HOMEDRIVE and HOMEPATH, because windows
@tkelman
Copy link
Contributor Author

tkelman commented Jul 1, 2015

@vtjnash see the script in this PR for cygwin setup, or here JuliaCI/julia-buildbot#20 (comment) for MSYS2

@tkelman tkelman closed this Jan 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants