Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build dependencies to meta.yaml #2132

Merged
merged 28 commits into from
Aug 22, 2022
Merged

Conversation

rth
Copy link
Member

@rth rth commented Jan 24, 2022

This adds build dependencies to meta.yaml (or host dependencies in conda vocabulary). The end goal is faster parallel builds as there would be fewer dependency tree bottlenecks and sometimes fewer runtime dependencies overall (since we are currently mixing both).

Closes #1476

TODO:

  • finish migrating all packages to differentiate between run and host dependencies (adapted from conda-forge)
  • fix tests
  • add docs

@hoodmane hoodmane mentioned this pull request Feb 14, 2022
3 tasks
@henryiii
Copy link
Contributor

henryiii commented Mar 5, 2022

Also removes the circular dependency issue manual workaround.

Do we really need "run" dependencies at all, though? Once the wheel is built, can't it just install_requires on other packages, and micropip.install can pick them up just like it would normally? For example, if we built yt, then micropip.install("yt") would install any install_requires that are missing just like normal, skipping any pre-built ones (numpy for example)`. Currently one built package (yt 3.0) requires a bunch of minor pure python dependencies that could be downloaded from PyPI instead of being included in a growing package list.

You'd still need a few pure python packages needed for building, like setuptools_scm, but not things like pqdm which are runtime only (at least I guess it doesn't have a progress bar when building! :) ).

@hoodmane
Copy link
Member

hoodmane commented Mar 5, 2022

Yeah @henryiii that is the direction I was thinking of going as well. If a package is pure Python and we don't have to patch it we shouldn't have to build it. Ideally we could still do "static" ahead of time dependency resolution, but I was thinking we should just add something like micropip.freeze() which stores all current dependency resolution info into a lock file. This way, if people need some pure Python package for their app, they can do micropip.install("some_pkg") which can do expensive resolution stuff, then micropip.freeze() and bundle the resulting lock file with their app, so that when they need some_pkg it can be downloaded in parallel with all of its dependencies.

@rth
Copy link
Member Author

rth commented Mar 5, 2022

Once the wheel is built, can't it just install_requires on other packages, and micropip.install can pick them up just like it would normally?

To get the dependencies you need to download the wheel, so it's less efficient than having a separate metadata file since then you cannot download the dependencies in parallel. Also,

  • our dependencies don't necessarily match the ones in install_requires for instance, distutils from sdlib is unvendored and is a separate dependency. Also code wise it seems more reliable and simpler to have the full dependency graph in advance and download everything needed, rather than trying to reconstruct it one wheel at a time.
  • Other packages are .tar files not wheels (e.g. CLAPACK, tests) so we could

Currently one built package (yt 3.0) requires a bunch of minor pure python dependencies that could be downloaded from PyPI instead of being included in a growing package list.

I see what you mean. We could also indeed indicate in some way in meta.yaml that the dependency should to be fetched from PyPi directly, say,

requirements:
  run:
    - numpy
    - matplotlib
    - sympy
    - another_package:pypy

as long as there is an entry in packages.json for the core packages, that would address my concerns above.

but I was thinking we should just add something like micropip.freeze() which stores all current dependency resolution info into a lock file.

+1 but I also also a slightly different use case of packaging apps, as opposed to using the default distribution say in a REPL.

@henryiii
Copy link
Contributor

henryiii commented Mar 5, 2022

If implemented, please use conda's syntax for that!

requirements:
  run:
    - numpy
    - matplotlib
    - sympy
    - pip:
        - another_package

(This is mostly from environment.yml's syntax, but we could use it here).

Though I think it would be better to use install_requires, and just list pyodide dependencies here (like CLAPACK, etc). That would be less maintenance. You don't need to reconstruct the dependencies unless you are installing or testing. In fact, maybe the minimal dependencies could be listed in "run", and that would be used for testing, but when a user runs micropip.install, the full install requires would be run, grabbing things like "ipython" in "yt"'s case that we don't need for testing. And yes, you'd need a method to save the download wheels for distribution to avoid the install step for things you want pre-installed.

@hoodmane
Copy link
Member

hoodmane commented Mar 5, 2022

the minimal dependencies could be listed in "run", and that would be used for testing

This does still add a lot of maintenance work. Probably what we should do is only list dependencies as an override for what's in install_requires. Perhaps we could have a key to add extra stuff like distlib and another key to replace the install_requires entirely?

And yes, you'd need a method to save the download wheels for distribution to avoid the install step for things you want pre-installed.

Well aside from test unvendoring I'm not clear it makes a huge difference at runtime whether the package is downloaded from our cdn or from pypi. The main issue is just that the dependencies need to be recorded ahead of time so that they can all be downloaded at the same time.

@hoodmane hoodmane mentioned this pull request Mar 13, 2022
@rth
Copy link
Member Author

rth commented Apr 1, 2022

Once the wheel is built, can't it just install_requires on other packages, and micropip.install can pick them up just like it would normally?

Another reason why this wouldn't work is that for partial builds. When the user asks to build a single package, we need to know if any of its dependencies also need to be built, even if the package doesn't depend on them on build time. For instance there are a number of packages that have run_dependencies=['numpy', 'scipy'] and host_dependencies=['numpy']'. If one is building a custom build of pyodide, and run_depdendencies are not indicated, numpy will not be built.

As far as I understand, with respect to build isolation added by @hoodmane,

  • pure python packages needed at build time would generally be handled by the build isolation with pyproject.toml directly
  • C libraries (e.g. libxml needed by lxml), or Python packages with C extensions would need to be handled by this system.
    • one could say that this corresponds to packages in UNISOLATED_PACKAGES, but I'm not sure that's exactly true. For instance, according to that list, scikit-learn runs with build isolation. However, that only works because numpy and scipy were build before, and we drop the exact dependency versions (please correct if needed). So in that case, we would still need to specify numpy and scipy as host requirements with this system, so they are built at the right time. It is a somewhat confusing situation.
  • finally, other packages need to be present in the host environment (e.g Cython)

Copy link
Member

@hoodmane hoodmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @rth.

As I said I think the correct setup would be:

  1. libraries should only be host dependencies, never run or build dependencies
  2. shared libraries should always be both host and run dependencies but never build dependencies
  3. Python packages can be run dependencies. If they are needed at build time, they should be host dependencies if cross-build-env: true, in other cases they should be build dependencies.

I think one conclusion from this is that it is possible to programmatically determine whether a package should be a host or a build dependency, though it's probably better to separate them since they are conceptually and practically distinct.

packages/libxml/meta.yaml Outdated Show resolved Hide resolved
docs/development/meta-yaml.md Outdated Show resolved Hide resolved
packages/cffi/meta.yaml Outdated Show resolved Hide resolved
@ryanking13
Copy link
Member

3. If they are needed at build time, they should be host dependencies if cross-build-env: true, in other cases they should be build dependencies.

Then probably we should also raise error when a Python package is specified as a host dependency but cross-build-env of that package is False?

@hoodmane
Copy link
Member

probably we should also raise error when a Python package is specified as a host dependency but cross-build-env of that package is False?

Yes. I think a subsequent PR should add build dependencies and also add this logic requiring host dependencies to have shared-library: True or cross-build-env: True.

@ryanking13
Copy link
Member

ryanking13 commented May 19, 2022

I think a subsequent PR should add build dependencies

I am a little bit confused. What do we specify for build dependency? If some pure python packages are required during the build, doesn't pypa/build automatically install them (if they are correctly specified in setup.py or pyproject.toml, ...) in its isolated environments so we don't need to care?

@hoodmane
Copy link
Member

hoodmane commented May 19, 2022

some pure python packages are required during the build, doesn't pypa/build automatically install them (if they are correctly specified in setup.py or pyproject.toml, ...) in its isolated environments so we don't need to care?

That's a fair point. Usually but not always. Some packages conditionally recythonize only if Cython is already available. Sometimes setuptools releases a regression that breaks matplotlib builds. Also, there is something weird with pythran. When we have constraints that are different than the install_requires constraints that the package lists, it would be useful to have a place to put them.

@hoodmane
Copy link
Member

@rth Can we merge this? It seems like a nice improvement and CI passes.

@rth
Copy link
Member Author

rth commented May 24, 2022

Thanks for the review! I'll try to get this done this week.

CI passes

I realized there is still some issue with build packages --only "*" as currently that CI job doesn't actually build packages. So it's not ready to be merged for now.

@ryanking13
Copy link
Member

ryanking13 commented Jul 11, 2022

I realized there is still some issue with build packages --only "*" as currently that CI job doesn't actually build packages. So it's not ready to be merged for now.

I think it works correctly. It is expected behavior that build-packages CI job does not build any packages but generates repodata.json file from already built packages.

@ryanking13
Copy link
Member

ryanking13 commented Aug 17, 2022

@rth It would be great if we can merge this before we get more packages. I think this worked correctly before and it will unless I broke anything during fixing merge conflicts.

@rth
Copy link
Member Author

rth commented Aug 18, 2022

Thanks for continuing it @ryanking13 ! Agreed, the reason it was still not merged is that I was not 100% confident it would work as expected in all edge cases, the test suite on dependency resolution doesn't really cover all the cases in the real dependency tree (it's hard to do so) and rebuilding the full tree to test this was a bit frustrating.

But overall yes, let's merge, it's rather low risk. If there are any issues with partial builds we discover later, we still have plenty of time to fix those before the next release.

@rth rth marked this pull request as ready for review August 18, 2022 12:01
@ryanking13
Copy link
Member

the test suite on dependency resolution doesn't really cover all the cases in the real dependency tree (it's hard to do so) and rebuilding the full tree to test this was a bit frustrating.

Yes, probably there can be some packages that we didn't correctly checked host dependencies, but we can discover it later.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining test error comes from #2975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate build/host and run dependencies in meta.yaml
4 participants