Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Generate meta.yaml dependencies #45

Closed
wants to merge 6 commits into from

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Mar 28, 2023

This PR is an alternative to #28 that is based on generating separate dependency files instead of modifying meta.yaml in place. It is a much simpler changeset as a result. I considered putting the data into the conda_build_config.yaml file instead of using separate YAML files for each dependency set, but I rejected that option because dfg is based around writing separate files sections for every include list and the only way to map that to a single conda_build_config.yaml file is to do a lot of in-place overwriting, which we have generally decided to avoid unless absolutely necessary, i.e. for pyproject.toml (otherwise we could just do that for meta.yaml too). Willing to be convinced otherwise though.

In order to support split recipes, this approach requires writing separate files for every section in every output, which may be too verbose for our liking. Curious to hear opinions there.

@vyasr vyasr self-assigned this Mar 28, 2023
@vyasr
Copy link
Contributor Author

vyasr commented Mar 28, 2023

A partial example of this can be seen in rapidsai/cudf#13022

Comment on lines -171 to -180
"if": {
"properties": { "table": { "const": "project.optional-dependencies" } }
},
"then": {
"required": ["key"]
},
"else": {
"not": {
"required": ["key"]
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having trouble getting this conditional to behave correctly. According to the official JSON Schema docs it should have been sufficient to add a "required": ["table"] to the if statement, but no matter what I did the validator insisted that the key property was required when only the section key was present. I'll need to revisit this but didn't want to block the rest of the review on a schema issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @csadorf if you have thoughts here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when you simply added a "required": ["table"] to the "if" block and removed the "then" and "else" blocks it would always require it? That seems indeed very odd. Can you provide some valid/invalid examples that we can test against?

@vyasr
Copy link
Contributor Author

vyasr commented Mar 28, 2023

For versions that need to be in conda_build_config.yaml, one thought that I had is that I'm fairly certain that there's no specific reason you have to encode only a version in that file, you could also include the package names themselves. For instance, in this first example I believe the following would be equally valid:

python:
    - python 2.7
    - python 3.5

with

package:
    name: compiled-code
    version: 1.0

requirements:
    build:
        - {{ python }}
    run:
        - {{ python }}

If that works then we ought to be able to generate the entire config file with dfg. Not sure if we want to go that route or if that is in scope for this PR, though.

@vyasr
Copy link
Contributor Author

vyasr commented Mar 28, 2023

We should also keep in mind the alternative proposed by @jakirkham in rapidsai/rmm#1220. I've already outlined my thoughts on the pros and cons there, so just restarting that conversation now.

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot better.

I'll summarize our huddle about this PR here for posterity:

I had some concerns about how we'd account for these Jinja functions provided by conda build.

The results of our conversation were:

  • We can likely drop the use of pin_compatible altogether and just manually manage those version relationships wherever necessary. You mentioned we have to do this for wheels anyway
  • We'd like to continue using pin_subpackage since we won't be able to emulate it's exact=True flag functionality (which is pretty much the only flag that use)
  • We can likely drop the use of compiler altogether assuming:
    • We can identify the packages that result from calling compiler() for different architectures
    • Conda provides us with some way to load external files with variable names (e.g. meta_dependencies_x86_64_build.yaml vs meta_dependencies_aarch64_build.yaml)

@ajschmidt8
Copy link
Member

After reading about the compiler() function in the two sections below, I don't think we need to use them at all.

It seems like that function is mostly used for cross-compiling, which we don't do (we natively compile on amd64 and arm64).

From these conda-forge docs, it seems that compiler('cuda') is only responsible for adding nvcc to the build requirements and also adding cudatoolkit to the run requirements. But we generally ignore those run requirements anyway (1,2) since we manage that ourselves.

So I'm assuming we can just replace compiler('cuda') with nvcc.

@jakirkham, can you corroborate all this?

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems like a rather trivial change and I would have no major objections. But I'm wondering whether it is too trivial, considering that a conda_meta file can be fairly complex.

Can you add at least one valid example input and expected output (and thus test)?

How does this mesh with tools like Grayskull?

Comment on lines 25 to +26
default_conda_dir = "conda/environments"
default_conda_meta_dir = "conda/recipes/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why to use a trailing backslash here, but not for the other dirs?

Comment on lines +265 to +266
extras.get("output", ""),
extras.get("section", ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extras.get("output", ""),
extras.get("section", ""),
extras.get("output"),
extras.get("section"),

Since you are filtering on None anyways.

@@ -55,6 +56,8 @@ def dedupe(dependencies):
"""
deduped = sorted({dep for dep in dependencies if not isinstance(dep, dict)})
dict_deps = defaultdict(list)
# The purpose of the outer loop is to support nested dependency lists such as the
# `pip:` list. If multiple are present, they must be internally deduped as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?

Comment on lines +89 to +92

Notes
-----
An empty `gridspec` dict will result in an empty dict as the single yielded value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?

@@ -131,6 +138,12 @@ def make_dependency_file(
"dependencies": dependencies,
}
)
elif file_type == str(OutputTypes.CONDA_META):
file_contents += yaml.dump(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that the header will be valid for all generated file types?

elif file_type == str(OutputTypes.CONDA_META):
file_contents += yaml.dump(
{
"dependencies": dependencies,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "dependencies" section of a conda-meta file is called "requirements". What am I missing here?

@@ -162,22 +163,16 @@
"extras": {
"type": "object",
"properties": {
"section": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the keyword "section" originate from the conda docs? I am wondering whether it would make sense to better namespace our "extras", something like "conda-meta-section" instead of just "section".

Comment on lines -171 to -180
"if": {
"properties": { "table": { "const": "project.optional-dependencies" } }
},
"then": {
"required": ["key"]
},
"else": {
"not": {
"required": ["key"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when you simply added a "required": ["table"] to the "if" block and removed the "then" and "else" blocks it would always require it? That seems indeed very odd. Can you provide some valid/invalid examples that we can test against?

@vyasr
Copy link
Contributor Author

vyasr commented Apr 13, 2023

Overall this seems like a rather trivial change and I would have no major objections. But I'm wondering whether it is too trivial, considering that a conda_meta file can be fairly complex.

To be clear, we are not generating a full recipe anymore, just simple YAML files containing dependency lists. That's basically a subset of what we already need for conda environment.yaml files (same thing, but without the channels). That's why this is so simple right now.

Can you add at least one valid example input and expected output (and thus test)?

I will when I get a chance. This work has temporarily been deprioritized since we don't want to make this kind of change until after we have CUDA 12 compatible recipes, but once that's done I'll be getting back to this and will add some then.

How does this mesh with tools like Grayskull?

Grayskull's goals are a bit different. IIUC it's for generating an entire recipe given a PyPI package or a repo. However, the resulting package will be too simple for our purposes since it won't encode all of the extra information we put in our recipes around things like run_exports for instance, nor will it support complex features like split outputs. This PR is an approach that lets us share dependencies with the rest of our infrastructure while still having the flexibility to fill out the non-dependency parts of the recipe, which strikes the best balance for us.

@csadorf
Copy link
Contributor

csadorf commented Apr 14, 2023

Overall this seems like a rather trivial change and I would have no major objections. But I'm wondering whether it is too trivial, considering that a conda_meta file can be fairly complex.

To be clear, we are not generating a full recipe anymore, just simple YAML files containing dependency lists. That's basically a subset of what we already need for conda environment.yaml files (same thing, but without the channels). That's why this is so simple right now.

So the conda-meta semantics here means "this will be needed for a conda recipe" rather than "we are generating a conda-meta file"?

How does this mesh with tools like Grayskull?

Grayskull's goals are a bit different. IIUC it's for generating an entire recipe given a PyPI package or a repo. However, the resulting package will be too simple for our purposes since it won't encode all of the extra information we put in our recipes around things like run_exports for instance, nor will it support complex features like split outputs. This PR is an approach that lets us share dependencies with the rest of our infrastructure while still having the flexibility to fill out the non-dependency parts of the recipe, which strikes the best balance for us.

Sure, but I would be surprised if some of our downstream processing tools wouldn't share at least some of the logic. Or on the flipside, maybe a Grayskull generated recipe could be further processed. Or we add missing features to Grayskull. Maybe none of this is appropriate, but I wanted to mention this to make sure it has at least been considered.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 19, 2023

So the conda-meta semantics here means "this will be needed for a conda recipe" rather than "we are generating a conda-meta file"?

Correct. Happy to rename if you think there is a better way to convey this.

Sure, but I would be surprised if some of our downstream processing tools wouldn't share at least some of the logic. Or on the flipside, maybe a Grayskull generated recipe could be further processed. Or we add missing features to Grayskull. Maybe none of this is appropriate, but I wanted to mention this to make sure it has at least been considered.

These are good thoughts. My opinion here is that conda recipes are fairly complex configuration files, and we don't want to be in the business of dealing with them directly. dfg is purely about dependencies.

In the case of pyproject.toml files, we can modify in place largely because a package exists (tomlkit) that supports format-preserving in-place modification. Since conda recipes are a superset of YAML (they also support Jinja etc), we have no way to do the same without rolling our own parser. Moreover, the complexity of split recipes means that we would need a lot of complexity in dfg to handle writing to subsections that are in split outputs vs at the top level of a recipe. As a result, we chose to go this route of generating separate files for each dependency set. Even if we were able to modify in place, however, we still wouldn't want to modify any section aside from dependency lists. As a result, I would expect that the only overlap we have with Grayskull is in Grayskull's ability to generate the dependency sections of meta.yaml from either setup.py or pyproject.toml (I haven't looked at how it does this). I'm not sure how much work it's worth to try and find that common ground in order to share code/logic.

@vyasr
Copy link
Contributor Author

vyasr commented May 8, 2024

I'm going to close this for now. A decent bit has changed since this initial implementation. Also, depending on if we move forward with rapidsai/build-planning#47 there may be much better ways to accomplish our goals in this PR (i.e. writing out meta.yaml from a template directly). We should still address this issue, but the code in this PR is probably no longer the best way to do so.

@vyasr vyasr closed this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants