New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: Shenfun: High Performance Computing Platform for the Spectral Galerkin method #1071

Open
whedon opened this Issue Nov 8, 2018 · 15 comments

Comments

Projects
None yet
5 participants
@whedon
Collaborator

whedon commented Nov 8, 2018

Submitting author: @mikaem (Mikael Mortensen)
Repository: https://github.com/spectralDNS/shenfun
Version: 1.2.0
Editor: @katyhuff
Reviewer: @lucydot, @lindsayad
Archive: Pending

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/43f64b8a0ef42408c72acead37717ec6"><img src="http://joss.theoj.org/papers/43f64b8a0ef42408c72acead37717ec6/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/43f64b8a0ef42408c72acead37717ec6/status.svg)](http://joss.theoj.org/papers/43f64b8a0ef42408c72acead37717ec6)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@lucydot & @lindsayad, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @katyhuff know.

Please try and complete your review in the next two weeks

Review checklist for @lucydot

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (1.2.0)?
  • Authorship: Has the submitting author (@mikaem) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @lindsayad

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (1.2.0)?
  • Authorship: Has the submitting author (@mikaem) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon

This comment has been minimized.

Collaborator

whedon commented Nov 8, 2018

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @lucydot, it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
@whedon

This comment has been minimized.

Collaborator

whedon commented Nov 8, 2018

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Collaborator

whedon commented Nov 8, 2018

@lucydot

This comment has been minimized.

Collaborator

lucydot commented Nov 11, 2018

In general, shenfun is packaged and documented very nicely. I need a few more days to look at functionality, but here are my comments so far:

Documentation

  • I really like the paper published on RawGit (Unfortunately I noticed that RawGit is shutting down next year, so you'll have to move it across to Github Pages or similar at some point).
  • It might be useful to state in the readme that shenfun is a Python 3 package (compatible with Python 2?)
  • I think you could move a very short summary of Shenfun's unique selling points (global shape functions which lead to more accurate approximations than FEniCS / ability to run DNS on supercomputers using an accessible high-level language) to the readme and/or landing page of read the docs.
  • You could give instructions for running Pytest locally python -m pytest and explicitly point to where the Travis builds are.
  • You could include a "how to cite" section in your documentation (Zenodo DOI until the JOSS paper is published), including a bibtex entry.
  • There are not clear guidelines for people who might want to contribute to the software.

Installation

  • I used conda to install mpi4py and FFTW, followed by pip install shenfun. There was a problem with pip as cython was not installed - I think this is because it is under setup_requires rather than install_requires in the setup.py file, so pip ignores it: pytest-dev/pytest-xdist#136. After pip installing cython, pip install shenfun worked fine.

Functionality

  • I've been working through the Klein-Gordon equation demo. I've ran on up to 24 cores, doubling the cores gives ~1.5X speed-up, so the MPI looks like it is working good. I'd like a few more days to play with the functionality before ticking it off.
@lindsayad

This comment has been minimized.

Collaborator

lindsayad commented Nov 11, 2018

I've been very impressed by the documentation, and I've enjoyed my first exposure to a spectral element implementation.

I'm also exploring the Klein-Gordon problem and I'm a little curious about some of the timings. I'm running on my laptop which has four real cpus, capable of hyperthreading. Results:

(spectralDNS) lindad@localhost:~/scratch$ time mpiexec -np 1 python ./klein-gordon.py 

real	0m15.007s
user	0m14.711s
sys	0m0.273s
(spectralDNS) lindad@localhost:~/scratch$ time mpiexec -np 2 python ./klein-gordon.py 

real	0m11.127s
user	0m19.624s
sys	0m0.520s
(spectralDNS) lindad@localhost:~/scratch$ time mpiexec -np 4 python ./klein-gordon.py 

real	0m9.876s
user	0m33.977s
sys	0m1.127s
(spectralDNS) lindad@localhost:~/scratch$ time mpiexec -np 8 python ./klein-gordon.py 

real	0m11.531s
user	1m12.890s
sys	0m5.291s

So I do see a speed-up while increasing my real cpu count, but a stagnation or performance decline when using hyperthreading. The latter doesn't necessarily mean too much to me for reasons like this. However, I guess I'm a little curious about the relatively small speed-up from 1 to 4 cores. Is there a lot of serial computation or communication? I'm also curious how @lucydot did her timings? I also understand that scaling studies are fraught with peril, especially for novice users of software.

@lindsayad

This comment has been minimized.

Collaborator

lindsayad commented Nov 11, 2018

Sorry, much better to use the time module as it's already used in the solver script. Limiting output to rank 0:

(spectralDNS) lindad@localhost:~/scratch$ mpiexec -np 8 python ./klein-gordon.py 
Time  8.530541181564331
(spectralDNS) lindad@localhost:~/scratch$ mpiexec -np 4 python ./klein-gordon.py 
Time  7.4511120319366455
(spectralDNS) lindad@localhost:~/scratch$ mpiexec -np 2 python ./klein-gordon.py 
Time  8.857075214385986
(spectralDNS) lindad@localhost:~/scratch$ mpiexec -np 1 python ./klein-gordon.py 
Time  14.115585088729858

So very strong performance gain moving from 1 to 2 procs, but seemingly diminishing returns moving from 2 to 4. With four procs, there should still be roughly 64**3 / 4 = 65536 degrees of freedom per process, so it seems like still plenty of work for all. Any comment on this? It's not something I need to dwell on, but curious.

@mikaem

This comment has been minimized.

mikaem commented Nov 12, 2018

@lucydot Thanks a lot for very good and constructive feedback! I'll start by answering some of your questions/comments below:

Documentation

- I really like the paper published on RawGit (Unfortunately I noticed that RawGit is shutting down next year, so you'll have to move it across to Github Pages or similar at some point).

Thanks for the heads up, I did not know that. I'll look into other options.

- It might be useful to state in the readme that shenfun is a Python 3 package (compatible with Python 2?)

Yes, I think you're right. A short section with dependencies could be in the readme. I'll add it.

- I think you could move a very short summary of Shenfun's unique selling points (global shape functions which lead to more accurate approximations than FEniCS / ability to run DNS on supercomputers using an accessible high-level language) to the readme and/or landing page of read the docs.

Very good idea:-)

- You could give instructions for running Pytest locally python -m pytest and explicitly point to where the Travis builds are.
- You could include a "how to cite" section in your documentation (Zenodo DOI until the JOSS paper is published), including a bibtex entry.
- There are not clear guidelines for people who might want to contribute to the software.

All good and valid points. I'll add these at appropriate locations to the documentation.

@mikaem

This comment has been minimized.

mikaem commented Nov 12, 2018

@lindsayad Thank you for the fast and nice feedback:-)
Some comments on performance. Spectral methods make use of global basis functions and the communication load is therefore very high (MPI Alltoall), much larger than for example for a finite element method that only needs to communicate data on the interface between distributed meshes. For this reason it is usually necessary with a good computer (high performance supercomputer preferably:-)) with fast interconnect between CPUs to see good speedup. Most laptops do not have good enough interconnect speeds to achieve good scaling, but a Cray XC does:-).

For a laptop with 4 cores (like my own 3 year old MacBook pro) you should be able to get speedup all the way up to 4, though, even if not perfect. However, you should use keyword slab=True when creating the TensorProductSpace

T = TensorProductSpace(comm, (K0, K1, K2), slab=True)

This has to do with how the data are decomposed and distributed between processors. A slab uses one processor group whereas with slab=False you will use as many groups as possible, here 2. The advantage with using 2 groups is that you can use many more processors than only for one group. But if you are only going to use 4 processors, then one group is actually faster. That may explain your results that are using 4 processors. Another reason can be the hardware itself. Because most processors with 4 cores usually come with two sockets, with two cores in each socket. And communication inside the same socket is faster than between sockets. And as such you may get a nice speedup going from one to two, but not as good moving to four. With a supercomputer the communication speeds are much higher than in your laptop and speedup should be more or less perfect up to thousands of processors. In the end this all depends on the hardware.

Hyperthreading is a different story. You need to activate it using, e.g.,

T = TensorProductSpace(comm, (K0, K1, K2), slab=True, **{'threads': 2})

This will use the FFTW library with hyperthreading (OpenMP). Again, speedup is very hardware dependent; I have seen speedup on some computers, but nothing on others. I'm not an expert though.

@mikaem

This comment has been minimized.

mikaem commented Nov 12, 2018

@lucydot Regarding installation, what was the problem with Cython? My understanding is that if cython is in setup_requires, then pip will install it if it is missing? And cython is only required for building the code, not for running shenfun, which is why I did not include it under install_requires.

@mikaem

This comment has been minimized.

mikaem commented Nov 12, 2018

@lucydot Regarding functionality please see my response to @lindsayad. For scaling tests please use slab=True

@lindsayad

This comment has been minimized.

Collaborator

lindsayad commented Nov 12, 2018

@mikaem

This comment has been minimized.

mikaem commented Nov 12, 2018

Ok. Should be fixed by adding cython to install_requires, right?

@lucydot

This comment has been minimized.

Collaborator

lucydot commented Nov 12, 2018

Hi @mikaem, thank you for your post re: scaling up. @lindsayad I was running the calculations on HPC which explains why I got a better scale-up. Yes, moving to install_requires should fix it.

@lucydot

This comment has been minimized.

Collaborator

lucydot commented Nov 15, 2018

Hi @mikaem I've took a look at your updated documentation, and happy to sign off on my review.
cc @katyhuff

@mikaem

This comment has been minimized.

mikaem commented Nov 15, 2018

Great, thanks a lot @lucydot :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment