Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for metadata hook #130

Closed
MrMino opened this issue Oct 8, 2020 · 26 comments · Fixed by #217
Closed

Support for metadata hook #130

MrMino opened this issue Oct 8, 2020 · 26 comments · Fixed by #217
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@MrMino
Copy link

MrMino commented Oct 8, 2020

Am I missing something, or is prepare_metadata_for_build_wheel() not implemented?

It seems like it's something that isn't specified in PEP-517, but I was of the impression, that metadata hook is a mandatory call for the frontend, which will call that hook before build_wheel(), if available. After all, it's not up to the build frontend whether metadata is needed, the responsibility for that decision lies on the code that actually builds the distribution.

Am I correct in my interpretation of the spec?

@FFY00
Copy link
Member

FFY00 commented Oct 8, 2020

prepare_metadata_for_build_wheel is an optional hook. AFAIK it exists to only generate the metadata, which you might want to do in some situations.

Note the last sentence refeering to that hook in the PEP:

If a build frontend needs this information and the method is not defined, it should call build_wheel and look at the resulting metadata directly.

So basically, if you want to get the metadata you can call prepare_metadata_for_build_wheel if it exists, which would presumabily be faster than build_wheel. If it is not present, you are forced to do the full build using build_wheel and then get the metadata from there 😊

But are you having any particular issue? Perhaps I can help there.

@MrMino
Copy link
Author

MrMino commented Oct 9, 2020

@FFY00 Not really much of an issue, it's just that my build_wheel assumes that prepare_metadata_for_build_wheel has actually been called, and trips up when metadata=None. I assumed optionality of it for the backend, and mandatoriness for the frontend.

Assumption is the mother of all f-ups, I guess.

It would be nice to have an API endpoint in ProjectBuilder for it though.

@FFY00
Copy link
Member

FFY00 commented Oct 9, 2020

Yeah, I believe most bugs and vulnerabilities in software are in some way or another due to bad assumptions 😅

We could totally add support for prepare_metadata_for_build_wheel in ProjectBuilder.

@FFY00 FFY00 added enhancement New feature or request good first issue Good for newcomers labels Oct 9, 2020
@gaborbernat
Copy link
Contributor

@FFY00 Not really much of an issue, it's just that my build_wheel assumes that prepare_metadata_for_build_wheel has actually been called, and trips up when metadata=None. I assumed optionality of it for the backend, and mandatoriness for the frontend.

The frontend is not mandated to call it. The backend can always call it, if it wasn't called it yet during the build wheel call. This is what setuptools does.

Can you clarify what is your build_wheel refering to here?

@pganssle
Copy link
Member

pganssle commented Oct 9, 2020

Can you clarify what is your build_wheel refering to here?

build_wheel is one of the mandatory PEP 517 hooks.

I would agree that @MrMino should fall back to calling his own prepare_metadata_for_build_wheel if it's not supplied, though I'm mildly surprised that this hasn't come up before — why does pip unconditionally call prepare_metadata_for_build_wheel? Is it because it can determine using the metadata whether it needs to continue with the build or something? (e.g. don't bother executing build_wheel if python_requires doesn't match?)

@pganssle
Copy link
Member

pganssle commented Oct 9, 2020

I would agree that @MrMino should fall back to calling his own prepare_metadata_for_build_wheel if it's not supplied

Actually, a thought occurs — PEP 517 always specifies that these hooks are called in their own independent processes, so actually calling the other hook may not be the same as having the front-end call it. Still, PEP 517 does say this:

If the build frontend has previously called prepare_metadata_for_build_wheel and depends on the wheel resulting from this call to have metadata matching this earlier call, then it should provide the path to the created .dist-info directory as the metadata_directory argument.

Which pretty clearly indicates that the front-end is not required to call prepare_metadata_for_build_wheel before calling build_wheel. Unless we actually need the information in prepare_metadata_for_build_wheel, we shouldn't call it IMO, since errors like this will help people stay spec-compliant.

@uranusjr
Copy link
Member

uranusjr commented Dec 21, 2020

The pep517 library implements automatic fallback for prepare_metadata_for_build_wheel:
https://github.com/pypa/pep517/blob/9b744ec98e6571ecd0a278b2bab0e0c6ccdb97a3/pep517/_in_process.py#L117-L133

pip unconditionally calls this function, not the backend implementation directly.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

@pganssle after building a build backend from scratch for the second time, this interpretation of the spec still doesn't make sense to me.

Making it optional for the frontend means, that If I'm going to provide this hook and it hasn't been called, I'll have to call that code from build_wheel myself.

It's getting called one way or another. Functionally there will be no difference, apart maybe from the fact that it's a separate process and the frontend has a way of specifying the area of the filesystem where the "preparing" should happen. Which, if we accept the optionality, it shouldn't really care about in the first place.

Optionality for the frontend means that there's no practical difference for when it is there as opposed to when it's not.

If this point is interpreted the way you propose, it ceases to have any normative content. Why is it even specified in the spec then?

@FFY00
Copy link
Member

FFY00 commented Dec 22, 2020

That is true for the use-case where you want to call build_wheel, but prepare_metadata_for_build_wheel isn't there for that use case. It is there so that we can get the metadata without building the wheel.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

@FFY00 That's one thing I haven't thought through. But on the first glance it still seems useless.

The frontend has to use the metadata. It either:

  1. Calls the hook, parses .dist-info.
  2. Doesn't call the hook, unpacks the wheel, parses .dist-info.

That point would still seem to not serve much purpose. I'm not going to argue, I'm just looking for a way of making it less hand-wavy in my mind.

@gaborbernat
Copy link
Contributor

What point? Option 1 serves to be a faster way of option 2.

@FFY00
Copy link
Member

FFY00 commented Dec 22, 2020

The issue here is when you have backends that take a long time, which is usually the case of backends that compile native code for eg. Those backends can opt in on this hook to result in a better experience for frontends. I can see the argument that it is a bit of work for forntends to implement this hook with the fallback to build_wheel, but it is still much better to have it and allow frontends to use it than not. As @uranusjr pointed out, the pep517 package already implements this (with the fallback) anyway so it isn't really a big deal.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

What point? Option 1 serves to be a faster way of option 2.

@gaborbernat If the hook is optional for the frontend, the point in the spec about it ceases to have any normative content, rendering it useless.

For the build_wheel case, it's not having any practical difference.
For taking metadata out without building a wheel, the frontend has to implement at least 2., which is a superset of 1.

@FFY00
Copy link
Member

FFY00 commented Dec 22, 2020

For taking metadata out without building a wheel, the frontend has to implement at least 2., which is a superset of 1.

No? But I guess that might depend on your backend's design... I would say most backends generate the metadata first and only then build stuff.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

@FFY00 Yes, for the frontend - it is. For if you have to unpack the wheel to get .dist-info/METADATA you might as well call the hook and parse it in metadata_directory. Unless we're talking in-memory metadata parsing directly from ZipInfo, but that usecase doesn't really seem to make sense in the context of a build frontend.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

BTW. ensuring we're on the same page, I'm talking about it being "mandatory call for the frontend in case the hook is there".

Optionality for the backend is mentioned directly in the spec and makes sense when there's no way of producing valid metadata before starting the build.

@gaborbernat
Copy link
Contributor

@gaborbernat If the hook is optional for the frontend, the point in the spec about it ceases to have any normative content, rendering it useless.

I don't follow your thinking here 🤔

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

@gaborbernat

What I mean by "normative content" is a specification point, that establishes something that simplifies or improves implementations in some way, e.g. by reducing the amount of checks needed for two parties to be compatible. As opposed to rules that don't add anything, e.g. "parties may do X, or may do ¬X" - such statements have no substance to them.

I proposed, that if you assume the hook to be an optional call for the frontend, this part of PEP-517 becomes logically equivalent to such tautology. The backend might add it, it might not - it changes nothing for neither party, as both have to stay compatible with each other's "non-hook" versions, so who cares.

  • If you assume that the hook is an optional call for a frontend, the backend implementation does not get simplified in any way.
  • For a frontend, this call is a 2-3 line detour from what it would have to do anyway. Not including it does not simplify the code in any significant way, rather it slows the whole process down in case the frontend needs the metadata.

I assumed that the build frontend depends on the metadata in some way, which I now realize might not be the case. It still seems like an unnecessary degree of freedom that complicates things for build backends though.

@pganssle
Copy link
Member

I proposed, that if you assume the hook to be an optional call for the frontend, this part of PEP-517 becomes logically equivalent to such tautology. The backend might add it, it might not - it changes nothing for neither party, so who cares

The value of that section in a world where the hook is optional for both the front-end and the back-end (the world we live in...), is that if the hook exists, it must have the specified behavior and interface.

In the case where the front-end just wants the metadata, it can check if the hook is implemented and if so use the generated metadata. In the case where the back-end needs a wheel, there is no reason to call prepare_metadata_for_build_wheel directly.

@gaborbernat
Copy link
Contributor

gaborbernat commented Dec 22, 2020

The may parts are escape hatchets to speed up some operations. The benefit for users is faster turnaround, which should be enough motivation for the backends that can provide it to implement it. Without enforcing for every backend to provide it. Performance is important too, not just functionality and correctness.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

TL;DR - it now makes sense, sorry for bothering you needlessly, thanks for being patient 😉.

Performance is important too, not just functionality and correctness.

I assumed that every build frontend will need the metadata at some point, which I realized is not the case. If it would, not calling the hook doesn't make sense, as you would have to go out of your way slowing down the implementation by extracting the info from the resulting file instead of just adding the hook call.

It now makes sense, I can happily concede on that point 😊.

The value of that section (...) is that if the hook exists, it must have the specified behavior and interface.

It does not have any value if it doesn't serve any practical purpose, hence arguments above.

In the case where the back-end needs a wheel, there is no reason to call prepare_metadata_for_build_wheel directly.

Err, I think you meant front-end there? Why would I need a wheel to build a wheel 😄 ?

(important distinction between integration frontends and build frontends has not been made here yet, have we been talking past each other because of that?)

@pganssle
Copy link
Member

I assumed that every build frontend will need the metadata at some point, which I realized is not the case.

I think this is not quite right, since if you know you need a wheel, you can skip the metadata call and build the wheel directly, then look at its metadata. The only reasons you'd call prepare_metadata_for_build_wheel are if you are not planning to build a wheel, you don't know if you need to build a wheel until after you have looked at the metadata, or you know you want to build a wheel but you can do some useful work in parallel with the build that requires the wheel metadata. Otherwise there's no reason.

@MrMino
Copy link
Author

MrMino commented Dec 22, 2020

if you know you need a wheel, you can skip the metadata call and build the wheel directly, then look at its metadata

@pganssle But if that's the case, is there a reason not to do it in the frontend if the hook is there? That's not what breaks my argument: the fact that the build frontend might not care about metadata at all is.

Having a convoluted method of getting metadata post-build does not mean the hook should be frontend-optional.

If every build frontend has to care about metadata (which is the false premise I used in my argumentation), then not calling the hook is just plain wrong. Unless you want to go into the implementation around the IO, the behavior required for the frontend to satisfy the missing hook (unpacking .dist-info/METADATA somewhere, then parsing it) is a superset of the one required when the hook is there (having the info somewhere, then parsing it).

I've misread the intention behind the hook, but I'd say the PEP could really use a section that outlines how these hooks are used and why they are optional. It's obvious when you're a maintainer of an integration frontend that has to build wheels, but I doubt that it's exclusively the target audience of the spec.

@uranusjr
Copy link
Member

So what’s the resolution here? I’m looking into implementing this now, but have a few questions how the interface:

  • What should happen when prepare_metadata_for_build_wheel does not exist? Should build_wheel be called, or should the API simply raise an error?
  • What should be the return value? A path to the result .dist-info directory? Where should the directory be?
  • How should the “pass the previously generated .dist-info directory to build_wheel” part of PEP 517 be satisfied? Should ProjectBuilder.build() accept an additional optional argument?
  • What would the CLI look like for all the above? Should there be a CLI at all?

@FFY00
Copy link
Member

FFY00 commented Jan 29, 2021

What should happen when prepare_metadata_for_build_wheel does not exist? Should build_wheel be called, or should the API simply raise an error?

Yes, I think so. As outlined here, this method is mainly for people wanting the metadata. If they want to build the distribution, they can call build_wheel directly, so people calling this probably want the metadata. I feel the correct thing to do would be to always give them the metadata. pep517 already implements this fallback mechanism IIRC. If in the future we see the need to also have the other approach available, we can add an argument to the method.

What should be the return value? A path to the result .dist-info directory? Where should the directory be?

The returned value should be what the backend returns. The user should provide a metadata_directory as an argument. Pretty much the same as the hook itself.

How should the “pass the previously generated .dist-info directory to build_wheel” part of PEP 517 be satisfied? Should ProjectBuilder.build() accept an additional optional argument?

I feel we should save this information internally, and have the hook use it if it is set.

What would the CLI look like for all the above? Should there be a CLI at all?

No CLI in the normal command. I was going to comment in #198, but I like your proposal to expose the API as CLI, so we would expose it there.

@uranusjr
Copy link
Member

uranusjr commented Jan 29, 2021

I feel the correct thing to do would be to always give them the metadata.

I feel we should save this information internally, and have the hook use it if it is set.

The fact that pep517 implements a fallback is actually kind of problematic here. We should only pass the prepared metadata directory to build_wheel when the backend actually implements prepare_metadata_for_wheel, but since pep517 always returns the metadata directory, build can’t tell how it did that.

The returned value should be what the backend returns.

PEP 517 indicates the name of the generated .dist-info directory to be returned (it must return the basename). But it says the same for build_wheel and build_sdist, and build currently returns the full path anyway. So I assume prepare_metadata_for_build_wheel should return the full path as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants