Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] Allow conda install geoschem ? #47

Closed
JiaweiZhuang opened this issue Aug 23, 2019 · 8 comments
Closed

[DISCUSSION] Allow conda install geoschem ? #47

JiaweiZhuang opened this issue Aug 23, 2019 · 8 comments
Labels
category: Feature Request New feature or request

Comments

@JiaweiZhuang
Copy link
Contributor

JiaweiZhuang commented Aug 23, 2019

Just learned that making a new conda package is not too difficult (just made one for xesmf). For example, see the config file and build script for ESMF. It shows how to handle MPI/NetCDF dependencies. It is possible to follow a similar pattern to allow:

conda install -c conda-forge geoschem
conda install -c conda-forge gchp

ESMF is already on conda which makes things a lot simpler.

How does it work?
Conda ships its own compiler. Defaults for gcc/gfortran on Linux and clang on Mac. The package is pre-compiled and stored on Anaconda cloud. Then users runconda install to download the compiled binary.

Limitations?

No compile-time configurations allowed. However, with FlexGrid I believe there are almost no compile-time flags.

Versioning is not a big concern as you can compile & upload all major versions, just like other packages.

Compare to container?

Pros: Conda is more lightweight and doesn't require Docker/Singularity installed.
Cons: Container environment also allows you to re-compile from source.

Compare to Spack?

Frankly two approaches are quite similar... They can both install in user space without root permission.
Pros: More people know how to use conda (for Python env) than Spack; conda install is faster, as the package is pre-compiled.
Cons: Spack allows fine-tuning compile flags and using other compilers (e.g. ifort).
To get optimal performance, especially on HPC systems, Spack is still the best choice. But if you just want a reasonable performance with gfortran -O3, conda is good enough.

Does this sound useful for users?

@LiamBindle you might be interested in this. Seems a fun exercise.

@LiamBindle
Copy link
Contributor

LiamBindle commented Aug 25, 2019

Hi @JiaweiZhuang thanks for the mention!

Yeah a conda package for GEOS-Chem would be great! Regarding GEOS-Chem Classic, I agree it probably wouldn't be too difficult, but I do see two challenges:

  1. Run directories still need to be created from the unit tester, so users would have to be sure to checkout the same version of the unit tester that conda installed. This will be fixed by the rundir wizard for GEOS-Chem Classic. Do you think it would be worth waiting until then?
  2. Chemistry mechanisms are still compile-time changes. A possible workaround would be to ship geos executables for all three (e.g. geos-standard, geos-tropchem, and geos-soa_svpoa) and decide at runtime which one actually gets called based calling ./getRunInfo.

For GCHP I think we still have the same issues, but it also might be worth waiting until ESMF is made an external library so we can depend on the conda distribution of it.

Regarding Spack vs Conda vs containers, I don't think it needs to be one or the other. I think each serves a slightly different audience.

User needs a cluster? Target users is someone who wants...
conda Yes A quick and easy way to install a prebuilt GEOS-Chem
spack Yes To build GEOS-Chem from source or optimized compiler options
containers No To run GEOS-Chem on the cloud

One of my motivations for the CMake work was that it would make automated builds easier (less sensitive to the build environment and build-configuration-time error checking) which I hope will make it easier to support and maintain support for projects things like conda, spack, and singularity. Striving to support all three might be a good goal!

I think this is a great idea and I'd be happy to help out in getting a conda package for GEOS-Chem going!

What do you think?

Btw, it sounds like GMAO has some interest in spack too (see GEOS-ESM/MAPL#25).

@JiaweiZhuang
Copy link
Contributor Author

JiaweiZhuang commented Aug 26, 2019

This will be fixed by the rundir wizard for GEOS-Chem Classic. Do you think it would be worth waiting until then?

Definitely. There is no urgency for this.

Chemistry mechanisms are still compile-time changes.

I would probably just support geos-standard to avoid maintenance troubles. Users who want more customization should compile from source code instead...

Regarding Spack vs Conda vs containers, I don't think it needs to be one or the other. I think each serves a slightly different audience!

I agree. My only concern is adding maintenance burden on GCST, and whether the benefits justifies this additional maintenance (although updating a package should be as simple as changing version name and rerunning the build)

Anaconda cloud tracks how many times a package gets downloaded, so we will be able to see whether this is used at all, and decide whether to continue supporting it.

For GCHP I think we still have the same issues, but it also might be worth waiting until ESMF is made an external library so we can depend on the conda distribution of it.

Given that GC-classic is not hard to compile, the most useful case is probably allowing conda install gchp so users can grab a standard implementation of GCHP without having to learn anything about containers or Spack. ESMF 8.0.0 should be available on conda by the end of this year; I also need it for xesmf.

@LiamBindle
Copy link
Contributor

LiamBindle commented Aug 26, 2019

That all sounds good to me. I could tackle the GC-Classic conda package once the rundir wizard is added and GCHP once CMake is added.

@yantosca
Copy link
Contributor

Hi --

This all sounds interesting. I don't know about Conda but I think you can tell Spack a Git repository that it should pull code from.

The one downside of this is that people might get a little too comfortable with installing pre-built GEOS-Chem and then start demanding that we make available all of the varieties. (It's just human nature.) I think it's good to have the standard version but then for anything beyond that, users should clone GC from the Git repo.

@lizziel
Copy link
Contributor

lizziel commented Aug 26, 2019

A few questions:

  1. Is there a develop mode such that the GEOS-Chem install comes with the source code and links to github?
  2. Creating a run directory will require the source code. Is this what you meant by run directory wizard GC Classic?
  3. Who is the target user for this?
  4. We have ready-to-run binary available on AWS cloud. How does this expand on that accessibility?

I think making gcpy available on conda is higher priority than GEOS-Chem, but I'd like to hear more.

@JiaweiZhuang
Copy link
Contributor Author

JiaweiZhuang commented Aug 29, 2019

  1. Is there a develop mode such that the GEOS-Chem install comes with the source code and links to github?

Yes, for example the ESMF config file pulls from ESMF's git repo. The version can be specified by git_tag.

  1. Creating a run directory will require the source code. Is this what you meant by run directory wizard GC Classic?

I think the run directory can be treated as "small data files" and just shipped with the compiled binary.

  1. Who is the target user for this?
  2. We have ready-to-run binary available on AWS cloud. How does this expand on that accessibility?

It is probably the quickest way to install GCHP on user's own cluster or server? No containers needed; no need to spend long time building from source as with Spack (which can also have compiler-specific problems for GCHP).

Think about why Anaconda takes off in the numerical Python community -- it makes SciPy (mostly wrappers around Fortran & C functions) and all SciPy-related packages very easy to install. Otherwise people would have to build the Fortran dependencies manually. From code perspective, numerical models are somewhat like SciPy... This packaging method works greats as long as people don't need to change source code.

making gcpy available on conda

For pure Python packages this is very easy to do!

@msulprizio msulprizio changed the title Allow conda install geoschem ? [FEATURE REQUEST] Allow conda install geoschem ? Sep 4, 2019
@msulprizio msulprizio added the category: Feature Request New feature or request label Sep 4, 2019
@yantosca yantosca changed the title [FEATURE REQUEST] Allow conda install geoschem ? [DISCUSSION] Allow conda install geoschem ? Oct 17, 2019
@kilicomu
Copy link
Contributor

kilicomu commented Nov 3, 2019

it makes SciPy (mostly wrappers around Fortran & C functions) and all SciPy-related packages very easy to install

I'm not sure that GEOS-Chem is comparable with SciPy. SciPy is typically used as a supporting library for whatever it is you are working on, is it not? In the case of GEOS-Chem, GEOS-Chem is the 'whatever it is you are working on', and the SciPy equivalent would be something like the NetCDF4 libraries (which are already straightforward to install with conda).

If the GEOS-Chem community largely consisted of model runners, and not people who are both running and modifying the model to explore science questions, then straightforward installation of a pre-compiled model would undoubtedly be valuable. I'm not sure that the GEOS-Chem community is this way, leaving me wondering 'what is the substantial value in developing and supporting an additional way to get the model?'.

@yantosca
Copy link
Contributor

I will close out this discussion but feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Feature Request New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants