Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows support #755

Merged
merged 18 commits into from
May 7, 2022
Merged

Windows support #755

merged 18 commits into from
May 7, 2022

Conversation

zbeekman
Copy link
Collaborator

@zbeekman zbeekman commented Apr 26, 2022

coverage on master
Codecov branch

Summary of changes

  • Support building OpenCoarrays natively on Windows without relying on WSL or other heavyweight options using GCC/GFortran 11 and Intel MPI
  • CMake build system can now better handle spaces in paths in most/many cases (this is quite common on Windows machines)
  • A probable bug was fixed in which a CMake generator expression was used in a try_compile() statement. This was definitely breaking windows builds and seems like it should have been breaking linux & macOS builds too, since the try_compile() doesn't cause an actual build system generation so the angle bracket syntax text of the generator expression was just getting injected directly into a makefile or compile command.
  • Mostly updated & modernized the usage of the overhauled FindMPI CMake module--deprecated variables and options were removed in favor of their modern counterparts, but a few instances of the old variables may have been overlooked.
  • The caf and cafrun scripts were made more robust which in turn lets them be called directly from a bash or bash-like shell on Windows or a native Windows cmd.exe shell (with Git BASH installed). A long standing typo in the caf script was also fixed 😄
  • Utilize more intelligent threading library support provided by CMake since the opencoarrays runtime library explicitly utilizes pthreads. On windows this is potentially an issue, and, at the very least, the caf wrapper script will need to ensure that the compiler/linker enables a pthread compatible threading solution. This should make the build more robust on other platforms too.

Rationale for changes

These changes were made to support the ability to build and run OpenCoarrays and therefore coarray Fortran applications on Windows and to generally improve the robustness of the build system. WSL and other heavyweight *nix compatibility solutions are often unpalatable or unacceptable to some users and organizations. Requiring the now freely available Intel MPI implementation (part of Intel's OneAPI set of developer tools) is acceptable because 1) It's the only sufficiently modern and up-to-date MPI solution that I am aware that runs natively on windows, 2) it is now available free of cost and 3) many HPC computing centers, government agencies, etc. already have support contracts with Intel and use the Intel compilers/OneAPI. In addition to Intel OneAPI, the GFortran and GCC native windows compilers are required along with CMake and unix make/gmake, and a bash shell like the one that comes with git-bash for windows. Binaries produced with the OpenCoarrays caf wrapper script should be redistributable without all these other requirements--only requiring the intel MPI runtime--assuming that the user doesn't intentionally try to create dynamically linked executables.

ToDo

  • Add documentation about using/building on windows
  • Fix the perl style tests and shell script tests on windows (this might just mean disabling them)
  • Add windows CI testing
  • Create a batch file wrapper version of the caf and cafrun scripts so the build will proceed under powershell or cmd.exe (But you'll still need bash installed somewhere & on the path) The batchfile would just pass arguments through to caf/cafrun.
  • Try to get CMake's findMPI module working on windows... right now my work around is a hack and this would make things more robust
  • Verify modern usage of FindMPI has been updated everywhere and bump CMake minimum version and policies accordingly.
  • Verify redistributability of statically linked binaries built with caf wrapper script
  • Test if caf wrapper script encapsulates all the required intel MPI environment information to be used outside of the intel configured CMD.exe shell.
  • Investigate what the installation currently does on windows, get it working if it is broken and figure out what a good/canonical install location and directory structure would be.

Additional info and certifications

This pull request (PR) is a:

  • Bug fix
  • Feature addition
  • Other, Please describe:

I certify that

  • I certify that:
    • I have reviewed and followed the contributing guidelines
    • I will wait at least 24 hours before self-approving the PR to give another
      OpenCoarrays developer a chance to review my proposed code
    • I have not introduced errant white space (no trailing white space or white space errors may
      be introduced)
    • I have added an explanation of what these changes do and why they should be included
    • I have checked to ensure there aren't other open Pull Requests for the same change
    • I have you written new tests for these changes
    • I have successfully tested these changes locally
    • I have commented any non-trivial, non-obvious code changes
    • The commits are logically atomic, self consistent and coherent
    • The commit messages follow best practices
    • Test coverage is maintained or increased after this is merged

Code coverage data

coverage on master

Add some quotes to paths passed as compiler flag arguments in CMake
This fixes a broken build on windows, and seems like it should not
have been working when using Makefile generators on any platform,
i.e., macOS & other *nix
The library now builds with Intel MPI (from Intel OneAPI) and GCC +
GFortran. There are still some path handling quirks and string quoting
things that need to be resolved to get the caf and cafrun wrapper
scripts working reliably. A test program can be built and run by
tweaking/fixing the commands attempted by the caf script. The program
executes correctly and then encounters an error during MPI
finalize/when shutting down.
No special changes needed beyond robustness improvements
Use CMakes FindThreads feature/module to find pthreads since
they are used in the mpi opencoarrays library. As a result
the caf wrapper script should also link in/build with pthreads
Copy link
Collaborator

@everythingfunctional everythingfunctional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the extent that I fully understand the CMake stuff, I have reviewed the changes and they seem reasonable. I have also attempted to test things out on Windows, with much struggle. Here are some notes on what I was able to accomplish.

I used a Windows Command Prompt. I first ran the setvars.bat script provided by Intel to setup the environment. I created opencoarrays-build and opencoarrays-install folders. From the opencoarrays-build folder, I used cmake-gui ../OpenCoarrays to set the Generator to "Unix Makefiles", the compilers to those installed by the Equation.com gcc installer, and the install folder to opencoarrays-install. I was then able to, from the same command prompt, run make and the make test to compile and run the test suite. A few tests failed, specifically register_alloc_comp_1,2 and 3, test-installation-scripts.sh, caf, and cafrun. It also popped up a lot of prompts from Windows Defender, but they did not reappear after having accepted them all and running the suite again. The failure of the last 3 tests is probably expected, since a Windows command prompt doesn't understand how to execute a shell script. make install then copied the expected things over to opencoarrays-install. From a git-bash terminal I can then use the provided caf and cafrun scripts to compile and execute a "Hello, World", although each image segfaults on program shutdown. Using fpm with caf and cafrun does not work however, as it uses execute_command_line, which executes in a cmd environment, which can't execute the scripts.

Sorry for the wall of text there, but I wanted to document as much as I could of what I did.

In all, I think this is a huge leap forward, but I would ask for at least one additional thing to be included in this PR. Update the README.md and/or INSTALL.md to provide better instructions for building and using on Windows. A future PR for use of caf and cafrun in a native Windows environment would be a good idea, but it's not super urgent.

@zbeekman
Copy link
Collaborator Author

@everythingfunctional thanks so much for the test drive and review!!! I'm impressed and happy that you got so far without any adequate documentation. This is my first priority and on the top of my todo list and I requirement before I change this PR from draft status.

I used a Windows Command Prompt. I first ran the setvars.bat script provided by Intel to setup the environment. I created opencoarrays-build and opencoarrays-install folders. From the opencoarrays-build folder, I used cmake-gui ../OpenCoarrays to set the Generator to "Unix Makefiles", the compilers to those installed by the Equation.com gcc installer, and the install folder to opencoarrays-install. I was then able to, from the same command prompt, run make and the make test to compile and run the test suite. A few tests failed, specifically register_alloc_comp_1,2 and 3, test-installation-scripts.sh, caf, and cafrun. It also popped up a lot of prompts from Windows Defender, but they did not reappear after having accepted them all and running the suite again. The failure of the last 3 tests is probably expected, since a Windows command prompt doesn't understand how to execute a shell script. make install then copied the expected things over to opencoarrays-install. From a git-bash terminal I can then use the provided caf and cafrun scripts to compile and execute a "Hello, World", although each image segfaults on program shutdown. Using fpm with caf and cafrun does not work however, as it uses execute_command_line, which executes in a cmd environment, which can't execute the scripts.

These are known issues, some/most of which are documented in the todo list in the main PR comment/description

  • register_alloc_comp_{1..3} fail for me as well, and, do so on macOS as well: They fail with MPICH too, but not OpenMPI, IIRC. So this issue isn't limited to windows. This appears to be a bug in OpenCoarrays most likely, or MPICH based MPI implementations, or in my introspection build logic.
  • The final test failures need better logic to guard against executing them in an environment missing developer specific packages like perl and are testing style things mostly. This is something I plan to fix/improve before merging.
  • Concerning shells and fpm: I'm contemplating ways of trying to improve shell portability and alternative options. Right now the caf and cafrun scripts are the only thing preventing the build & testing to be able to take place entirely within a native CMD or powershell environment AKAICT. I'm brainstorming ways to create thin wrappers to call them from a CMD.exe shell or similar. Likewise, Find_MPI isn't working from the bash shell, likely because the MPI compiler wrapper scripts provided by Intel are batch files and can't be executed from a bash shell in a straightforward way. I'm still contemplating ways to improve general useability and cut out or abstract and upstream complexity.
  • The program segfault on exit is something I have been seeing too. Interestingly, with a slightly different (older) version of IntelOneAPI and pulling in a windowspthread implementation that I wasn't picking up in my environment they don't see this problem. I should check if some of the pthread locking calls are made during shutdown/finalization. I seem to recall having this problem or a similar problem with certain MPI implementations on macOS as well. So it may be a bigger OC or MPI implementation bug.

In all, I think this is a huge leap forward, but I would ask for at least one additional thing to be included in this PR. Update the README.md and/or INSTALL.md to provide better instructions for building and using on Windows. A future PR for use of caf and cafrun in a native Windows environment would be a good idea, but it's not super urgent.

I completely agree. Lack of documentation is the number one blocker of this PR and an absolute requirement before merging. That's why it's at the top of the ToDo list. 😉

Thanks again for the feedback!

I've pasted the segfault during shutdown info from a debug build below in the event that you or @vehre have any insight into what's happening.

$ cafrun -np 2 ./hello_caf.exe
 Hello from            2  of            2 !
 Hello from            1  of            2 !

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xadf38fdb
#1  0xadf2ef04
#2  0xadf0fef1
#3  0x78797ff7
#4  0x78ab20ce
#5  0x78a61453
#6  0x78ab0bfd
#7  0xadf27185
#8  0xadee3471
#9  0xadee34e7
#10  0xadee1683
#11  0xadee13c0
#12  0xadee14f5
#13  0x784f7033
#14  0x78a62650
#15  0xffffffff

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xadf38fdb
#1  0xadf2ef04
#2  0xadf0fef1
#3  0x78797ff7
#4  0x78ab20ce
#5  0x78a61453
#6  0x78ab0bfd
#7  0xadf27185
#8  0xadee3471
#9  0xadee34e7
#10  0xadee1683
#11  0xadee13c0
#12  0xadee14f5
#13  0x784f7033
#14  0x78a62650
#15  0xffffffff
1/2: _gfortran_caf_init(957) The mpi memory model is: unified  (0x2, 1).
1/2: finalize_internal(972) (status_code = 0)
1/2: finalize_internal(1026) In barrier for finalize...1/2: finalize_internal(1059) Freed all slave tokens.
2/2: finalize_internal(972) (status_code = 0)
2/2: finalize_internal(1026) In barrier for finalize...2/2: finalize_internal(1059) Freed all slave tokens.
Error: Command:
   `C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -np 2 ./hello_caf.exe`
failed to run.

The intrinsic FindMPI module seems like it has some bugs or
documentation bugs. I got this working, but it is a bit of black magic
as to how/why it works. It simplifies the code, however, and should
offload some maintanence upstream, so it's worth doing.
Some usage of deprecated variables provided in older version of
FindMPI were removed, updating them to the new, correct variables.

The cmake_minimum_required_version was also bumped along with CMake
policies.
These scripts get configured to call the bash version found during
CMake configuration time. Using this version/installation of bash, the
batch files call the corresponding bash caf & cafrun scripts and pass
all arguments through. This *should* allow things like fpm to work
with the caf wrapper scripts, *I beleive* (untested)

CAVEAT: WSL's bash will dump you into your WSL home directory and
there's no obvious work around. If you put Git-Bash's bash.exe (in the
bin subdirectory) first on your path you *should* be in business (I
think). There may be weird edge cases.
If you have Git-Bash's bash.exe in the front of your path,
you can now do everything from within the Intel OneAPI CMD.exe shell.
If you experience issues you should launch git-bash from the OneAPI
CMD.exe shell and it will inherit your environment variables and be
more robust. If CMake finds WSL, you're probably hosed.
A bunch of the variable manipulation should be abstracted as a
function.
windows. Also invoke perl directly so style/linting on windows works.
@zbeekman zbeekman marked this pull request as ready for review May 5, 2022 20:00
@zbeekman
Copy link
Collaborator Author

zbeekman commented May 5, 2022

@everythingfunctional @rouson let me know if the updated build instructions in INSTALL.md make sense and you can follow them. WSL's bash can cause problems, and I know the instructions are somewhat long and involved but I tried to have a good mix of hand-holding and robustly doing things on the users end; more advanced users will probably see shortcuts and alternative routes to get to the same endpoint.

Also, I've added batch-script wrappers for caf and cafrun. If the bin subdirectory of the opencoarrays installation directory is on your path, you should be able to call caf and cafrun natively, and hopefully (untested 🤞) this means you might be able to pass them as the compiler to fpm.

INSTALL.md Outdated Show resolved Hide resolved
@@ -448,6 +527,10 @@ file.
[INSTALL.pdf]: https://md2pdf.herokuapp.com/sourceryinstitute/OpenCoarrays/blob/master/INSTALL.pdf
[INSTALL.md]: https://github.com/sourceryinstitute/OpenCoarrays/blob/master/INSTALL.md
[CMake]: https://cmake.org
[Git-Bash]: https://gitforwindows.org/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Git-Bash]: https://gitforwindows.org/
[Git BASH]: https://gitforwindows.org/

##### Pre-requisits #####

* [CMake]
* [Git-Bash]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [Git-Bash]
* [Git BASH]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inconsistent with other references to Bash within our own code-base and, indeed with other references to Bash throughout the internet I think I'm going to leave it as is for now. It also appears that Git-Bash has been rebranded to Git for Windows... so maybe we should update these references to that?

2. Launch the 64-bit Intel OneAPI developer shell. This will either be in the
start menu, or you may need to figure out how to launch it manually. See the
note below for help.
3. Ensure that cmake, git-bash, gfortran, gcc, make are available on your default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Ensure that cmake, git-bash, gfortran, gcc, make are available on your default
3. Ensure that cmake, git BASH, gfortran, gcc, make are available on your default

note below for help.
3. Ensure that cmake, git-bash, gfortran, gcc, make are available on your default
path. The installers should have taken care of this for you.
5. Launch a Git-Bash bash shell from the OneAPI environment, or ensure Git-Bash's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Launch a Git-Bash bash shell from the OneAPI environment, or ensure Git-Bash's
5. Launch a Git BASH shell from the OneAPI environment, or ensure Git-Bash's

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git-Bash is referring to the software distribution, and importantly, ships both a git-bash.exe and a bash.exe (as well as a sh.exe). So the repetition is intentional and specific: We want bash.exe from within the Git-Bash (or Git for Windows) distribution, NOT git-bash.exe. In most circumstances git-bash.exe would be OK, but I know that if you're using ConEMU terminal emulator git-bash.exe causes problems whereas bash.exe does not. So I think this wording should stay.

INSTALL.md Outdated
2. Editing your system wide `%PATH%` variable, or updating it in the local shell
so that git-bash's `bin` subdirectory (containing `bash.exe`) is at the front
of your `PATH`: `set PATH=C:\path\to\Git-Bash\bin;%PATH%`.
6. At this point typing `bash --version` should show you you're using Git-Bash, *AND*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
6. At this point typing `bash --version` should show you you're using Git-Bash, *AND*
6. At this point, typing `bash --version` should show you you're using Git BASH, *AND*

INSTALL.md Outdated Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@everythingfunctional everythingfunctional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to follow the instructions and then able to use the provided caf and cafrun to compile and execute code. Great work.

INSTALL.md Outdated Show resolved Hide resolved
note below for help.
3. Ensure that cmake, git-bash, gfortran, gcc, make are available on your default
path. The installers should have taken care of this for you.
5. Launch a Git-Bash bash shell from the OneAPI environment, or ensure Git-Bash's
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Launch a Git-Bash bash shell from the OneAPI environment, or ensure Git-Bash's
5. Ensure Git-Bash's

If I launch a bash shell, it no longer has access to mpifc. It still has access to ifort, so I'm not sure why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence describes both of the substeps--substep 1 corresponds to "Launch a Git-Bash bash shell from the OneAPI environment" and substep 2 corresponds to "ensure Git-Bash's bash.exe is first on your path". Therefore I think the original text makes more sense unless you think there is a problem with my approach/description or can think of a clearer way to phrase it.

If I launch a bash shell, it no longer has access to mpifc. It still has access to ifort, so I'm not sure why.

This is because of the way Windows handles file types: by filename extension and because the Intel compiler wrapper scripts are batchfile scripts. When you call mpifc cmd.exe (or powershell))

Thanks to @rouson and @everythingfunctional for the review and great suggestions!

Co-authored-by: Damian Rouson <rouson@lbl.gov>
Co-authored-by: Brad Richardson <everythingfunctional@protonmail.com>
@zbeekman
Copy link
Collaborator Author

zbeekman commented May 7, 2022

@rouson @everythingfunctional Thanks so much for the reviews! Many great suggestions and improvements.

@rouson Please see this comment (1) and this comment (2) about "Git-Bash" vs "Git BASH" and "a Git-Bash bash shell" vs "a Git BASH shell" respectively. Right now I think the original wording/phrasing should stand unless I'm misunderstanding something.

@everythingfunctional I would like my wording/explanation to be clearer too, but think that the repetitive phrasing should stay, as explained in this comment unless there is a better way to say it. (But I can't think of one.)

@zbeekman zbeekman merged commit 2b21cef into main May 7, 2022
@zbeekman zbeekman deleted the windows-support branch May 7, 2022 02:25
@rouson
Copy link
Member

rouson commented May 7, 2022

@zbeekman all looks great. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants