Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning for 7.7/8.0 #3191

Closed
peastman opened this issue Jul 28, 2021 · 117 comments
Closed

Planning for 7.7/8.0 #3191

peastman opened this issue Jul 28, 2021 · 117 comments

Comments

@peastman
Copy link
Member

peastman commented Jul 28, 2021

We should start considering what features we want to include in the next release after 7.6 (whatever we end up calling it). Here's my first pass at reviewing open feature requests. To start with, here are some minor changes I suggest we include. I expect each of them to take a day or less to implement, so they won't take long.

#3185: Detect non-physical parameters
#3124: Add box shape option in addSolvent()
#3111: Calculate certain derivatives more robustly
#3071: Reporter options when resuming a simulation
#3053: Improve accuracy of Fortran constants
#2913: Use RMSDForce when defining molecules
#2870: Option not to overwrite metadynamics biases
#2655: Have Context store current step count
#97: Warn if unused flags are present

Here are some larger features that I also suggest including. They'll take more work.

#3181, #1195: Improve reporting of template matching errors
#3123, #2816: Support the latest version of AMOEBA
#3104, #2675: Support GB with newer force fields
#2955: Option to reduce CPU usage with OpenCL
#2921: Improve robustness of adding membranes
#2513: Improvements to Drude particles

The following all need research or design. I suggest investigating them, after which we might or might not decide to include them in this release.

#3097: Include search path when compiling CUDA code
#3077: Adaptive barostat interval algorithm
#3054: Recompute long-range dispersion correction when parameter offsets change
#2898: Prevent barostat from re-imaging molecules
#2871, #1305: PyPI packages
#2757: Allow ForceField to store extra descriptors for a residue
#2725: Provide a public method in forcefield.py to get templates and list unmatched residues
#2514: Fully flexible unit cells during NPT simulations
#1911: Copy CustomIntegrator Python attributes and methods
#1155: Report minimization progress

There also are a couple of other new force fields it would be good to include. We don't currently have open feature requests for them.

  • Amber19
  • GLYCAM

I understand Amber19 has issues that may make it difficult to support. We should at least investigate it.

@peastman
Copy link
Member Author

I added one more to the list of features needing design that we should investigate:

#2514: Fully flexible unit cells during NPT simulations

@jchodera
Copy link
Member

@peastman : What if we call this the "initial draft" of features for 7.8 and tag all these issues for that milestone?

If we decide to change the milestone to 8.0, we can simply rename the milestone to 8.0 and they'll all update.
If we decide to punt some of these or add others, we can easily re-tag the issues.

@jchodera
Copy link
Member

Some specific feedback about soem larger features:

#3181, #1195: Improve reporting of template matching errors

This would be incredibly useful, but we would have to balance investment in the OpenMM ForceField ecosystem against the huge investment into the Open Force Field force fields, which will produce their first biopolymer force fields this year.

#3123, #2816: Support the latest version of AMOEBA

Again, I'd be cautious here: How useful is the "latest version of AMOEBA"? Is there an automated parameterization scheme we can leverage? Would it be more useful to implement more "a la carte" polarizability for the Open Force Field effort to use to integrate polarization, given they are produce easy-to-use, open source, fully automated parameterization tools?

#3104, #2675: Support GB with newer force fields

The solution we identified---use the assignment scheme already within OpenMM for validation and just bend it to also apply paraeters---should be quick to implement. Beyond ~1 day worth of effort, probably not worth further investment.

#2921: Improve robustness of adding membranes

The whole setup pipeline is our weakest link here. We might discuss how we could link up with others that have more programmatic ways to prepare systems for the longer term. Notably, HTMD (from @giadefa) has some very robust programmatic approaches to preparing membrane systems.

#2513: Improvements to Drude particles

These all sound sensible and not that time-consuming to implement!

For the "more research needed" features, we might seriously also discuss (perhaps at the next OpenMM call)

  • What else can we do to make it easy to use QML potentials? What about ML collective variables or integrators?
  • Are there any sensible solutions for constant-pH dynamics?
  • What features can we incorporate that will make it easier to build useful alchemical free energy tools (such as the alchemical transfer plugin)

@jchodera
Copy link
Member

Also, this fancier hydrogen mass repartitioning feature just came across my radar, and might be an easy feature to include.

@peastman
Copy link
Member Author

How useful is the "latest version of AMOEBA"? Is there an automated parameterization scheme we can leverage?

This doesn't involve parametrizing anything ourselves. It just means converting the latest version of the force field included with Tinker. We've had a lot of requests for it.

@jchodera
Copy link
Member

This doesn't involve parametrizing anything ourselves. It just means converting the latest version of the force field included with Tinker. We've had a lot of requests for it.

I'm asking "how useful is the latest version of AMOEBA to our users"? Can they simulate protein-ligand complexes?

Even if we don't have to parameterize anything, if our users have to run a ton of MP2 calculations and derive parameters by hand, it's not going to be too useful to them either.

@peastman
Copy link
Member Author

I don't think protein-ligand simulations are one of the main things AMOEBA gets used for.

@jchodera
Copy link
Member

I don't think protein-ligand simulations are one of the main things AMOEBA gets used for.

Well, that's because the tools to do this don't exist.

Can you point me to the latest AMOEBA publications, parameters, and tools?

We can do a little poll to see how helpful this actually would be to our users.

I suspect it really isn't as helpful as focusing on things that can actually be used to model protein-ligand complexes.

@peastman
Copy link
Member Author

It's a common request. It comes up regularly both on the forum and in emails people send me. Protein-ligand complexes are only one narrow slice of what OpenMM gets used for.

@jchodera
Copy link
Member

I'm extremely dubious: I just don't think that a force field only useful for protein-in-solvent---without even cofactors---is all that useful.

Can you point me to the latest AMOEBA publications, parameters, and tools?

We can do a little poll to see how helpful this actually would be to our users.

Users can still use Tinker and Tinker-HP (which now supports GPUs) to run those simulations. The developers are funded by an ERC Synergy grant that provides millions of Euros to support that work. I just don't see this as being a useful priority for us, though supporting mix-and-match components of the potential could be highly useful above and beyond what Tinker provides.

@peastman
Copy link
Member Author

I'm extremely dubious: I just don't think that a force field only useful for protein-in-solvent---without even cofactors---is all that useful.

Can you point me to the latest AMOEBA publications, parameters, and tools?

No, but I'm sure @jayponder can.

@jchodera
Copy link
Member

Here's an idea: Could we make it easy to fit AMOEBA parameters for new small molecules by leveraging our QML potentials? Could we build a tool that would use one of the QML potentials we provide to generate AMOEBA parameters users could then easily use to simulate protein:ligand complexes with AMOEBA? And potentially support QML/AMOEBA simulations in a manner analogous to this really exciting QM/AMOEBA work from the AMOEBA folks? This could leverage our strengths.

@jmichel80
Copy link

Hi @peastman I wonder what's the status of supporting more easily PME in alchemical free energy OpenMM implementations. Last time I looked into it it wasn't straightforward to use softcore potentials in reciprocal space calculations. I know @jchodera uses (or used to use) a end-states reweighting approach that I believe is not generally applicable.
I'm generally interested in easier routes to handle robusly net charge changes in FEP calculations done with OpenMM. There has been periodic discussions on github about this over the years e.g. #2011 are those issues still relevant to the current codebase ?

@raimis
Copy link
Contributor

raimis commented Jul 30, 2021

I vote for #1155 (report minimization progress)! Ideal it should be possible to get State at each step of LocalEnergyMinimizer. In ACEMD, we are not using LocalEnergyMinimizer, just because we cannot report to a user what the minimizer is doing.

@raimis
Copy link
Contributor

raimis commented Jul 30, 2021

Also, it would be good to have #2898 (prevent barostat from re-imaging molecules). We are seeing a lot of confused clients thinking that something went wrong with a system when the barostat wrapped it.

@jayponder
Copy link

jayponder commented Jul 30, 2021

Sorry, I've literally been in the wilderness most of this past week, and away from internet and cell phone connections...

The standard AMOEBA parameterization tool is POLTYPE, which is available on Github from https://github.com/pren/poltype. Actually, instead you want the POLTYPE2 version, as that's the version that's been under active development for some time now (months to years..). It's fronted and primarily developed by Pengyu Ren's lab, but lots of people are working on it. The newer POLTYPE2 is not "perfect", but it's pretty good at this point. That's what everyone in the AMOEBA community is using for ligands, cofactors, etc. And it's the "official" AMOEBA parameter generator.

POLTYPE is Python-based, and as of now you need to download it and run it yourself. It requires a Tinker installation, and access to either Psi4 or Gaussian. The plan is to put up a fully automated server site, but we're not there yet and likely won't be in the near future.

Under the hood POLTYPE is roughly analogous to GAFF, CGenFF and LigParGen, but does significantly more semi-serious to serious QM in order to fit AMOEBA values- especially for electrostatics/polarization, but also for stuff like torsions on POLTYPE auto-generated model fragments, etc. These days it's fast enough to fairly easily run sets of ligands/drugs, even larger ones. Though you won't want to try to push all of PubChem through it :)

@jayponder
Copy link

jayponder commented Jul 30, 2021

Users can still use Tinker and Tinker-HP (which now supports GPUs) to run those simulations. The developers are funded by an ERC Synergy grant that provides millions of Euros to support that work.

LOL :) That ERC grant is spearheaded by Jean-Philip Piquemal and others in France. I think I'm listed somewhere in that huge proposal as a deep collaborator. It's a lot of funding, but as far as I can tell it's supporting dozens of different projects and topics, most of which are not AMOEBA-related. And I've never seen a penny of that money...

For what it's worth, Pengyu is developing POLTYPE without any direct funding. And my entire group is one postdoc and two grad students, whom I struggle to keep fed... We do the best we can to advance Tinker/AMOEBA and try to make things available via Github. While we do now have our own GPU-capable codes (Tinker9 and Tinker-HP; see https:/github.com/TinkerTools, which also has the main POLTYPE2 version), we would like to and intend to keep a fully correct, efficient and up-to-date version of AMOEBA running in OpenMM for the broader OpenMM community to use.

@peastman
Copy link
Member Author

What is the output of POLTYPE? It would be great if we could make it so molecules parametrized with it can be used in OpenMM.

@jayponder
Copy link

jayponder commented Jul 30, 2021

It generates a complete Tinker-compatible parameter file for the input molecule(s). As of now, I believe input is either SMILES strings, MOL2 files, or Tinker coordinates files. I think you folks already have tools to automate the conversion of the Tinker parameter file format to your XML-style format? We'd probably prefer to not build different outputs directly into the code, but we could discuss that...

@peastman
Copy link
Member Author

We do have a quite old, kind of hacked together script for converting Tinker parameter files to OpenMM force fields. It only supports the specific features that we needed for converting AMOEBA, so it isn't really a general purpose Tinker parameter converter. I'll need to update it for the features in the newer AMOEBA.

The two obvious approaches would be either for POLTYPE to generate OpenMM XML files, or for OpenMM to have a well supported class for reading Tinker parameter files. I think both approaches could be made to work. The first approach obviously only needs to support the specific features found in POLTYPE output files. For the second approach, we would need to decide how general to make it.

@pren your thoughts on this would be great.

@pren
Copy link

pren commented Jul 30, 2021 via email

@peastman
Copy link
Member Author

We've already got an XML file for the standard AMOEBA parameters, and we'll be creating an updated one for the newer version. So all that needs to be converted is the extra parameters generated by POLTYPE.

@jchodera
Copy link
Member

Hi @jayponder @pren! Thanks for pointing out POLTYPE.

A few questions:

  • Is this the current POLTYPE repo? https://github.com/pren/poltype
  • I don't see any releases. Is this code in production now, or still in active development?
  • If users were to run a simulation with POLTYPE-produced parameters, how would they cite it?
  • Which version(s) of AMOEBA does POLTYPE produce parameters to be compatible with, and where can we find that version of the AMOEBA parameters and associated citations?
  • I see that poltype is a Python script, rather than an installable Python package. Is there a plan to make it an installable package?
  • Is there a list of dependencies for POLTYPE somewhere?
  • From what I can tell, it at least requires an installed version of Tinker and Gaussian (which is commercial). Is it compatible with any open source QM codes like psi4?
  • It also requires a copy of GDMA to be installed. Where is GDMA available?
  • Since the main tool is Python, it would be likely much easier to add capabilities to POLTYPE to optionally have it output the OpenMM ffxml or System serialized XML directly than to have a separate converter script. Would it be acceptable if we added this in a pull request at some point?
  • There also seem to be some Perl components. Is it necessary these remain in Perl?
  • We're not committing to this, but if we decided to invest the effort into somehow making this easily conda-installable and runnable by users in a fully open source manner, would you folks be OK with that? As of right now, it seems like it would be nightmarishly complicated to determine what dependencies are needed, install them, and automate running it on a new molecule to generate new parameters.
  • If you folks already have plans to make this simple for folks to use, we would want to avoid duplicating effort. Is this on your roadmap?

@jayponder
Copy link

jayponder commented Jul 30, 2021

Hi John,

Hi @jayponder @pren! Thanks for pointing out POLTYPE.

A few questions:

Yes. Though as per my earlier reply, see the "poltype2" branch.

  • I don't see any releases. Is this code in production now, or still in active development?

It is in active development. Though the current "development version", found in the "poltype2" repo is stable enough for general use. Lots of us are using it almost daily. If you prefer a different nomenclature, it's probably what you might consider an "alpha" or "beta" release version.

  • If users were to run a simulation with POLTYPE-produced parameters, how would they cite it?

I would suggest citing Pengyu's POLTYPE Github site, and the original POLTYPE paper: "Automation of AMOEBA polarizable force field parameterization for small molecules", J. C. Wu, G. Chattree and P. Ren, Theoretica Chemica Acta, 131, 1138 (2012). There is not an publication for POLTYPE2 at present, though I'm sure Pengyu will publish one in due course- I'll leave it to him to elaborate on publication plans.

  • Which version(s) of AMOEBA does POLTYPE produce parameters to be compatible with, and where can we find that version of the AMOEBA parameters and associated citations?

There are not really different "versions" of AMOEBA, just slight tweaks to the parameter sets over time. For proteins, nucleic acids, water, ions, etc. the current "production" parameters are all in the "amoebabio18.prm" parameter file distributed with the Tinker codes. This is our unified parameter set from circa 2018. The component pieces (proteins, etc.) were all published separately, and references can be found near the top of the parameter file. These are the parameters to be used with small organics, ligands, etc. generated via POLTYPE2.

  • I see that poltype is a Python script, rather than an installable Python package. Is there a plan to make it an installable package?

I'll let Pengyu reply here. But anyone with even minimal Python experience can install it.

  • Is there a list of dependencies for POLTYPE somewhere?

See the README_INSTALL.MD file.

  • From what I can tell, it at least requires an installed version of Tinker and Gaussian (which is commercial). Is it compatible with any open source QM codes like psi4?

As mentioned in my email above, you need either Psi4 or Gaussian. I am personally a big fan of Psi4, and if there are remaining dependencies specific for Gaussian, we should try to eliminate them so that only Psi4 is actually required (in case Gaussian is not available). One possible issue I am aware of is that Psi4 does not have gradients of implicit solvation models. This may be essentially the only remaining pure Gaussian dependency.

  • It also requires a copy of GDMA to be installed. Where is GDMA available?

GDMA is distributed with Psi4 as an option to be included at build time, and is (I think) included in the executable with their Conda distribution. For use with Gaussian, you need to download the freely available GDMA code from Anthony Stone's web site. It is a small Fortran90 code that is relatively simple and trivially fast to build.

  • Since the main tool is Python, it would be likely much easier to add capabilities to POLTYPE to optionally have it output the OpenMM ffxml or System serialized XML directly than to have a separate converter script. Would it be acceptable if we added this in a pull request at some point?

Again, up to Pengyu. But my guess is we would probably prefer for the translation to be done on the OpenMM end instead of on our end. This is a valid point for discussion.

  • There also seem to be some Perl components. Is it necessary these remain in Perl?

I was actually not even aware of this, and don't know what any Perl code (!) is doing. My guess is that it's some small, trivial thing. Have to get back to you on this...

  • We're not committing to this, but if we decided to invest the effort into somehow making this easily conda-installable and runnable by users in a fully open source manner, would you folks be OK with that? As of right now, it seems like it would be nightmarishly complicated to determine what dependencies are needed, install them, and automate running it on a new molecule to generate new parameters.

This is not really for you guys to do, and would probably be frustrating at present as the POLTYPE2 code is still changing frequently. POLTYPE2 is a Tinker family code, and developed by Pengyu's group. My intent in raising this whole conversation was not for you folks to feel like you have to distribute it as part of OpenMM. I simply wanted everyone to know that a parameterization tool is already available, and is under very active development. As I noted earlier, our eventual intent is to make AMOEBA parameterization, at least for molecules up to some reasonable size limit, available as a web service.

  • If you folks already have plans to make this simple for folks to use, we would want to avoid duplicating effort. Is this on your roadmap?

As above, this is for us to do. As you say, this is definitely on our roadmap. as it is essentially the GAFF or CGenFF for AMOEBA, and it belongs with Tinker and AMOEBA. We will try to make it easy for OpenMM users to access, and we are certainly open to suggestions to make that simpler. But Pengyu, and the rest of the Tinker and AMOEBA crew as needed, will develop, package and distribute POLTYPE and future parameterization tools.

Best, Jay

@jchodera
Copy link
Member

Yes. As per my earlier reply, see the "poltype2" branch.

Goodness, somehow I had totally missed #3191 (comment), which anticipated the answers to many of my questions! Apologies for this, and thanks for the detailed reply. Going through this now...

@jchodera
Copy link
Member

Hi @peastman I wonder what's the status of supporting more easily PME in alchemical free energy OpenMM implementations.

@jmichel80 : You'll be excited to check out #3173, which allows you to disable the direct-space PME calculation in NonbondedForce. This means that you can use offsets to turn on/off or change charges for multiple groups of atoms in NonbondedForce and exactly include their reciprocal-space PME contributions.

The direct-space PME contributions can be handled by a CustomNonbondedForce. Exceptions could either remain in NonbondedForce or CustomBondForce, depending on whether you need to alchemically modify them.

Bookkeeping is still very complicated here, but this at least means you can fully treat softcore electrostatics as well in a flexible manner while retaining exact PME at the endstates.

@jchodera
Copy link
Member

While we do now have our own GPU-capable codes (Tinker9 and Tinker-HP; see https:/github.com/TinkerTools), we would like to and intend to keep a fully correct, efficient and up-to-date version of AMOEBA running in OpenMM for the broader OpenMM community to use.

This is fantastic, @jayponder, since this would not even be possible without your help.

I was hoping we could focus our limited resources (one NIH grant that was cut by >35%) on what would deliver the most unique value to users without being duplicative of other better-resourced projects. From twitter, it looked like Tinker-HP was ready to launch into the stratosphere. :)

@jayponder
Copy link

jayponder commented Jul 30, 2021

Hi John, One needs to be careful interpreting Twitter... I believe this has been amply proven by our collective experience with the political disfunction here in America over the past few years.

@peastman
Copy link
Member Author

peastman commented Nov 9, 2021

(On this topic: I just noticed GraphQL support for creating and editing GitHub Discussions posts that could allow us to migrate programmatically!)

Thanks! I'll take a look at it.

@jchodera
Copy link
Member

jchodera commented Nov 9, 2021

Every member of the team should be monitoring that forum and participating in it. It's the primary place where users ask questions.

@peastman is right here. I do find it hard to use and pretty outdated, but we should be continuing to fully support it until we can gracefully transition to a more modern platform like GitHub Discussions.

@mark-cresset
Copy link

I believe there are errors in both the PDB file and the force field. First, the CONECT records indicate two bonds between the NLN and the 0GB. One is from ND2 to C1, which I think is correct, and the other is from OD1 to H2O. That indicates a hydrogen is forming two covalent bonds???

One of the (many) braindead features of the PDB format is that CONECT records don't necessarily imply covalent bonds. In particular, the spec for the CONECT record says:

"For hydrogen bonds, when the hydrogen atom is present in the coordinates, a CONECT record between the hydrogen atom and its acceptor atom is generated."

So the CONECT record here is quite possibly correct, albeit incredibly unhelpful :).

@tristanic
Copy link
Contributor

"For hydrogen bonds, when the hydrogen atom is present in the coordinates, a CONECT record between the hydrogen atom and its acceptor atom is generated."

... wow. I did not know that. Insane.

@peastman
Copy link
Member Author

peastman commented Dec 1, 2021

Another reason to encourage people to move to PDBx/mmCIF. It provides an enumeration of connection types so you can distinguish a covalent bond from a hydrogen bond.

@mark-cresset
Copy link

Another reason to encourage people to move to PDBx/mmCIF. It provides an enumeration of connection types so you can distinguish a covalent bond from a hydrogen bond.

But still (generally) no bond orders. Sigh.

@jchodera
Copy link
Member

jchodera commented Dec 3, 2021

But still (generally) no bond orders. Sigh.

I believe we need both bond orders and formal charges in order to easily interconvert representations between toolkits. PDBx/mmCIF provides (optional) capabilities for both.

@ijpulidos and the OpenFF crew are talking to the RCSB folks soon---it's possible we could persuade them to include these in published models in the RCSB, but given these models frequently lack many atoms, it wouldn't make any sense to do this. However, as a community standard for interchange, it could be sensible to standardize around PDBx/mmCIF with the bond orders and formal charges specified.

@peastman
Copy link
Member Author

peastman commented Dec 3, 2021

I think we're getting close to being ready to build a release candidate. Are there any remaining fixes anyone particularly wants to be sure we get in first?

@zhang-ivy
Copy link
Contributor

Would be great to see this one addressed #3124 !

@raimis
Copy link
Contributor

raimis commented Dec 3, 2021

#1155 is waiting for 6 years.

@peastman
Copy link
Member Author

peastman commented Dec 3, 2021

Both of those are new features. Since we're past the beta, we're only doing bug fixes and documentation changes right now. We decided to cut the feature list short so we could get the context handling changes out sooner.

@peastman
Copy link
Member Author

peastman commented Dec 7, 2021

Last chance to speak up before I start building the release candidate!

@tristanic
Copy link
Contributor

tristanic commented Dec 7, 2021 via email

@ijpulidos
Copy link
Contributor

How about #3302 ?

@peastman
Copy link
Member Author

peastman commented Dec 7, 2021

What in particular? There's some discussion of possible new features there, but I don't see anything that's a clear bug fix?

@jchodera
Copy link
Member

jchodera commented Dec 7, 2021

We're hoping we can squeeze in a solution to #3301 (comment), and also get a couple of more days of data on the execution time cycling problem before locking into the release candidate!

@peastman
Copy link
Member Author

peastman commented Dec 7, 2021

That's a feature, not a bug fix. This release already has a lot of optimizations to the long range correction. We can hopefully find ways to make it even faster (perhaps with #3368), but that's a feature for the next release.

With the execution time cycling, is there any reason to think it's caused by a bug? As far as I can tell, everything works correctly. The only question is whether we can find ways to make it faster, and even at that we're only talking about a small speedup.

@jchodera
Copy link
Member

jchodera commented Dec 8, 2021

That's a feature, not a bug fix.

7.7.0 is a feature release, not a bugfix release, right?

This release already has a lot of optimizations to the long range correction

And none of them have actually fixed the 100x slowdown because we still haven't implemented the One Small Useful API extension I suggested. :)

With the execution time cycling, is there any reason to think it's caused by a bug? As far as I can tell, everything works correctly.

Unclear, which is why I asked for another couple of days to convince ourselves it isn't.

The only question is whether we can find ways to make it faster, and even at that we're only talking about a small speedup.

In some cases, it's 2x.

@peastman
Copy link
Member Author

peastman commented Dec 8, 2021

7.7.0 is a feature release, not a bugfix release, right?

All features need to go in before the beta. We already cut a bunch of features so we could get out the context handling changes (needed for ML) as soon as possible. When we're all ready to build the release candidate, that's not the time for adding new features. At this point we're down to only merging high priority bug fixes.

@peastman
Copy link
Member Author

peastman commented Dec 9, 2021

It's been another two days. Shall I start building it?

@peastman
Copy link
Member Author

peastman commented Dec 26, 2021

After many trials with conda-forge, the release is now out! I'm now making my way through the release checklist.

  • Create GitHub release
  • Create conda packages
  • Perform minimal test
  • Update documentation on the website
  • Post announcement on the forum
  • Post announcement on OpenMM Twitter
  • Update benchmarks
  • Create new versions of downstream packages
  • Update the version number in conda recipe for nightly builds

@jchodera
Copy link
Member

Hooray!

Where on this list should I tweet an announcement fo the new release, and to which URL should I point them---the forum announcement or the GitHub release notes?

For benchmarks, I'm easily able to collect benchmarks in the short term for these GPUs:

  • NVIDIA A10, A40, A100
  • NVIDIA GTX 2080, 2080 Ti
  • Google Compute Engine T4, V100, P100, P4, K80

Once @dotsdl is able to build the new FAH OpenMM core22 from the openmm-7.7.0 conda-forge package, I can quickly set up and run a benchmark across all GPUs on Folding@home.

For future, how do you want to discuss improvements/updates to the release checklist process for the next release? Should I open an issue for that?

@peastman
Copy link
Member Author

Where on this list should I tweet an announcement fo the new release, and to which URL should I point them---the forum announcement or the GitHub release notes?

It can be at the same time as the forum announcement. I would point to the GitHub release.

For benchmarks, I'm easily able to collect benchmarks in the short term for these GPUs:

That would be great. Hopefully I can also benchmark a variety of GPUs on the NVIDIA cluster. Overhauling our benchmarks section would be useful. We probably want to have lots of benchmarks for just a few GPUs, plus just a couple of benchmarks that are run on many different GPUs. Folding@Home would be really useful for the latter. For the former, we need a controlled environment that will be consistent from one release to the next so we can track changes in performance.

For future, how do you want to discuss improvements/updates to the release checklist process for the next release? Should I open an issue for that?

Sounds good.

@peastman
Copy link
Member Author

How about these benchmarks:

  • Full benchmark suite on an A100. It's a high end current generation GPU, so it shows off what we can do on the best hardware.
  • Full benchmark suite on a Titan V. It's an older but still pretty fast GPU, and I've used it for benchmarking the last several releases so we can directly compare.
  • Pick just a few of the larger systems (perhaps apoa1pme, amber20-cellulose, and amber20-stmv) to benchmark on multiple A100s to show how the performance scales.
  • Pick just a couple of benchmarks (perhaps apoa1pme and amber20-cellulose) to run on as many different GPUs as possible.

I can run the first three, and Folding@home would be perfect for the fourth one.

@jchodera
Copy link
Member

jchodera commented Dec 27, 2021

It can be at the same time as the forum announcement. I would point to the GitHub release.

Can we add a box for this in #3191 (comment) if you'd like me to do this now, then?

That would be great. Hopefully I can also benchmark a variety of GPUs on the NVIDIA cluster. Overhauling our benchmarks section would be useful. We probably want to have lots of benchmarks for just a few GPUs, plus just a couple of benchmarks that are run on many different GPUs. Folding@Home would be really useful for the latter. For the former, we need a controlled environment that will be consistent from one release to the next so we can track changes in performance.

If we're going to overhaul the page, let's think about what our users want from us a bit more deeply. Perhaps we could open a new thread to discuss?

There are four main categories of benchmarks we would like to provide information for:

  1. At-a-glance comparison of MD package performance. These benchmarks provide comparison on standard systems and hardware between different packages (e.g. OpenMM, Amber, NAMD, gromacs, etc.) to help practitioners select performant software for their application classes.
  2. Cloud hardware performance. These benchmarks provide some useful assessments of the price:performance ratio of different GPUs available on cloud (AWS, GCE, Azure) providers on several workload categories of interest.
  3. HPC and high-end workstation performance. These benchmarks provide useful assessments of price:performance ratio for purchasing hardware for HPC clusters (passively-cooled GPUs) and workstations (high-end actively cooled GPUs).
  4. Folding@home. These benchmarks provide useful assessments of price:performance/PPD ratio for purchasing hardware for personal machines that might contribute to Folding@home workloads.

If we're going to rework the benchmarks page, perhaps we can think about how best to address these four categories of use cases?

In the interim, we can at certainly update the benchmarks right away as we collect data for your suggestion #3191 (comment)

Does that sound reasonable?

For running the benchmarks now, I can collect data on a single A100, but don't have access to multiple A100s or a Titan V. The Folding@home benchmarks will follow in a week or two (depending on when @dotsdl has time to build/test/deploy.)

@peastman
Copy link
Member Author

Can we add a box for this in #3191 (comment) if you'd like me to do this now, then?

Done.

Perhaps we could open a new thread to discuss?

Sounds good.

For running the benchmarks now, I can collect data on a single A100, but don't have access to multiple A100s or a Titan V.

I do, so I can run them.

@jchodera
Copy link
Member

Tweet posted and checkbox ticked: https://twitter.com/openmm_toolkit/status/1475569472693886982

@peastman
Copy link
Member Author

We now have updated versions of OpenMM-Torch and OpenMM-Setup. I think those are the only downstream packages we need to update. Which means this issue can now be closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests