Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a specification of the entry_points.txt file #390

Merged
merged 17 commits into from
Oct 26, 2017

Conversation

takluyver
Copy link
Member

As discussed on the distutils-sig mailing list, I would like entry points to be documented as an interoperable standard. I have a particular interest in this as I've written implementations of both creating entry points (flit) and loading them (entrypoints).

This is my attempt to document the entry points mechanism as it already exists, based on reading code and the pkg_resources API docs.

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks pretty good. Just a few comments, mostly minor.

of a better term, especially where it points to a function to launch a
program.

There is also an optional piece property: the **extras** are a set of strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the word "piece"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix.

There is also an optional piece property: the **extras** are a set of strings
identifying optional features of the distribution providing the entry point.
If these are specified, the entry point requires the dependencies of those
'extras'. See the metadata field :ref:`metadata_provides_extra`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit difficult to follow. When you say "the entry point requires", do you mean "accessing the object that the object path refers to, or using that object, needs the requirements associated with the extra"? Or do you mean "if the requirements specified by the extra aren't present, readers should ignore this entry point and act as if it wasn't present"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've deliberately left this somewhat ambiguous. From the pkg_resources API docs, it looks like the extras were meant to control the thing setuptools did with .egg directories added to sys.path according to what dependencies an application specified. Presumably, loading an entry point would configure sys.path with the dependencies for its extras.

Since the recommended tools are now pip and virtualenvs, that approach doesn't quite make sense. None of the entry points in packages I have installed actually specify extras.

Maybe we should put a note that the extras aren't widely used, or even that they're no longer recommended? I'm not clearly how true that is, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simply say that extras are a setuptools extension? Standard writers shouldn't write them, and readers should ignore them.

optionxform = staticmethod(str)

The entry points file always uses ``=`` to delimit names from values (whereas
configparser also allows using ``:``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"must always use =" reads better as a specification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

names, and the values encode both the object path and the optional extras.
If extras are used, they are a comma-separated list inside square brackets.
There may be zero, one or multiple spaces between the object name and the
opening square bracket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Are spaces allowed within the extras list? I.e., is [ a, b , c] allowed? And is it equivalent to [a,b,c]? If I were writing a reader, I'd probably return a list of extras, so the spec needs to be clear on this.

If we want to be more formal as a specification (I'm not sure to what extent that would be useful, rather than over-pedantic) we should probably write out a proper syntax for the values. Things that are vague in the above:

  1. How should content after the closing ] of the extras be interpreted?
  2. Are spaces allowed in the object path part?
  3. What about control characters (tab, or other non-printable characters)
  4. Non-ASCII data? Elements of the object path follow the rules for Python names, so they are fine. But what are the rules for extras? What about things like the Unicode non-breaking space character?

In practice, I'd go with a pedantic view that extras should follow the Python name syntax for consistency, and whitespace (defined as anything the .strip() string method will remove) is allowed between elements of the extras, but not in the object path. No trailing text (except whitespace) is allowed after the value.

So we have name(.name)*(:name(.name)*)?([name(,name)*]). name follows the rules for Python identifiers, whitespace is only allowed around [, ] and , in the extras section, and no trailing data (including comments as defined by configparser) are allowed.

However, I came up with this purely in the abstract. As this is an existing format, the rules setuptools defines take precedence (if they can be confirmed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also just drop in to say that it might be helpful to be a bit restrictive about the whitespace, if setuptools turns out to be extremely lenient since this would be a good opportunity to fix that.

I'm speculating so take it with a grain of salt fwiw. :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think spacing is allowed around and between the extras, but I couldn't confirm that through looking at the code in the time I had. It winds up in pyparsing code in packaging.requirements, if someone else wants to take a look.

pkg_resources allows spaces around the colon, but not in the dotted parts (mod.submod : obj.attr). Every example I have installed has the object path without any spaces, though. I think it would be reasonable to try to tighten the spec there, but there's bound to be one or two packages that do use the spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, rather than try too hard to understand what setuptools does, I'd be inclined to just specify that writers must be strict and readers must be lenient (using proper "specification-y" terms like "writers must not put whitespace between..." and "readers must accept and ignore whitespace..."). That way again, we allow any variations setuptools introduces to simply be setuptools extensions to the standard.

@takluyver
Copy link
Member Author

@pfmoore I added notes along the lines you suggested.


Conceptually, an entry point is defined by three required properties:

The **group** an entry point belongs to indicates what sort of object it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Maybe "group that an entry point belongs to"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with a bulleted list:

  • group: ...
  • name: ...
  • object reference: ...

I think we should avoid calling the object reference a path, since "path" is already an overloaded term (referring to path actual filesystems paths and list of import system search locations). However, "reference" makes sense, since we literally except the given information to resolve to a runtime object reference.

My only other terminology question would be whether "category" might be a better choice than "group". Either works (so I wouldn't mind leaving it as is), but the potential meanings of "category" are narrower, so it should translate to other languages more reliably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

[console_scripts]
foo = foomod:main
# One which depends on extras:
foobar = foomod:main_bar [bar,baz]
Copy link
Member

@pradyunsg pradyunsg Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎨 Maybe foobar = foomod:main_bar[bar, baz]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples I've seen leave a space there:

And I marginally prefer it with a space for aesthetics. I'm happy to change it if most people prefer no space, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues either way tbh.

If these are specified, the entry point requires the dependencies of those
'extras'. See the metadata field :ref:`metadata_provides_extra`.

The precise functionality of entry points with extras is tied to setuptools'
Copy link
Member

@pradyunsg pradyunsg Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to just specify that writers must be strict and readers must be lenient (using proper "specification-y" terms like "writers must not put whitespace between..." and "readers must accept and ignore whitespace...").

+1 to what @pfmoore said. I think adding a note about how the readers and writers of this metadata should treat extras would be useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this feature doesn't really make sense outside of the setuptools dynamic dependency resolution model. At the specification level, I think we can treat this similar to the way we treat name conflicts within a category: say that the semantics are up to the category consumer, and we only specify the syntax.

For the syntactic aspect, we can delegate to PEP 508, since the grammar for that defines an extras node: https://www.python.org/dev/peps/pep-0508/#grammar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about the whitespace, in the file format section. Does that cover what you mean?

I'm reluctant to try to say much about what readers and writers should do with extras, because I just don't know. It looks to me like they only really make sense in a setuptools-managed egg world, but maybe other toolchains could have a way to use them. I think that the best I can do is describe what they mean, and warn that the behaviour is undefined (i.e. this paragraph).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Agreed. The more I think about it, the more I'd say that from a "standard format" perspective, we can stick with treating extras as a setuptools-only proprietary extension that we acknowledge the existence of but that new tools should simply ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that cover what you mean?

Indeed. :)

I'd missed that.

Copy link
Member

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I got distracted and almost left this pending instead of actually sending the comments :)

This mostly looks good to me (thanks!), some comments on specific details inline

example:

- Distributions can specify ``console_scripts`` entry points, each referring to
a function. When *pip* installs the distribution, it will create a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it clearer that pip is just a specific example here, perhaps word it as "When pip (or another console_scripts aware installer) installs the distribution, it will create ..."


Conceptually, an entry point is defined by three required properties:

The **group** an entry point belongs to indicates what sort of object it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with a bulleted list:

  • group: ...
  • name: ...
  • object reference: ...

I think we should avoid calling the object reference a path, since "path" is already an overloaded term (referring to path actual filesystems paths and list of import system search locations). However, "reference" makes sense, since we literally except the given information to resolve to a runtime object reference.

My only other terminology question would be whether "category" might be a better choice than "group". Either works (so I wouldn't mind leaving it as is), but the potential meanings of "category" are narrower, so it should translate to other languages more reliably.


The **name** identifies this entry point within its group. The precise meaning
of this is up to the consumer. For console scripts, the name of the entry point
is the command that will be used to launch it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should state that while entry point names within a category must always be unique within a given project, it's up to the category consumer how name conflicts between different entry point providers are handled (for example, console_scripts will often end up in a common directory, so packages that define scripts with the same name usually won't be able to be installed at the same time into the same environment).

for attr in attrs.split('.'):
obj = getattr(obj, attr)
else:
obj = importlib.import_module(object_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly shorter code:

modname, qualname_separator, qualname = object_ref.partition(':')
obj = importlib.import_module(modname)
if qualname_separator:
    for attr in qualname.split('.'):
        obj = getattr(obj, attr)

The attribute lookup loop can be spelled obj = operator.attrgetter(qualname)(obj), but I don't think that really makes the example code clearer, whereas switching to str.partition more clearly emphasises the module-import-with-optional-attribute-lookup semantics.

If these are specified, the entry point requires the dependencies of those
'extras'. See the metadata field :ref:`metadata_provides_extra`.

The precise functionality of entry points with extras is tied to setuptools'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this feature doesn't really make sense outside of the setuptools dynamic dependency resolution model. At the specification level, I think we can treat this similar to the way we treat name conflicts within a category: say that the semantics are up to the category consumer, and we only specify the syntax.

For the syntactic aspect, we can delegate to PEP 508, since the grammar for that defines an extras node: https://www.python.org/dev/peps/pep-0508/#grammar

provides. For instance, the group ``console_scripts`` is for entry points
referring to functions which can be used as a command, while
``pygments.styles`` is the group for classes defining pygments styles.
The consumer typically defines the expected interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include an explicit recommendation here that category consumers should use their PyPI name to ensure that their category names don't conflict with categories defined by other projects.

The comparable sentence in PEP 518 (for the [tool] table in pyproject.toml) reads: "... a project can use the subtable tool.$NAME if, and only if, they own the entry for $NAME in the Cheeseshop/PyPI."

Since this is documenting an existing standard, the wording shouldn't be as strong in this case, but we can still nudge future category definitions in that direction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dstufft
Copy link
Member

dstufft commented Oct 20, 2017

As I've said on distutils-sig, I am -1 on this pull request unless it is specific to defining what we need for console_scripts and no more.

@takluyver
Copy link
Member Author

@ncoghlan 'group' may not be the clearest term, but it is already the term setuptools docs and API use for a group/category/interface of entry points, so I think it's worth using the same name here.

I couldn't see any clear existing name for what I chose to call an 'object path', so I've renamed it to 'object reference' as you suggested.

@ncoghlan
Copy link
Member

@takluyver +1 for maintaining consistency with established terminology where it exists.

:doc:`/guides/creating-and-discovering-plugins`.

Entry points were developed as part of setuptools. This document aims to
describe the mechanism that is already a de-facto standard.
Copy link
Member

@ncoghlan ncoghlan Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the distutils-sig discussion, I'd rephrase this paragraph:

The entry point file format was originally developed to allow packages built with setuptools to provide integration point metadata that would be read at runtime with pkg_resources. It is now defined as a PyPA interoperability specification in order to allow build tools other than setuptools to publish pkg_resource compatible entry point metadata, and runtime libraries other than pkg_resources to portably read published entry point metadata (potentially with different metadata caching and conflict resolution strategies).

of this is up to the consumer. For console scripts, the name of the entry point
is the command that will be used to launch it. Within a distribution, entry
point names should be unique. If different distributions provide the same
name, the consumer decides how to handle that.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest ending this with "... how to handle such conflicts." rather than "... how to handle that."

Copy link
Member

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of wording comments inline, but aside from that, I like this version.

@takluyver
Copy link
Member Author

Thanks, I applied both of your suggestions.

of this is up to the consumer. For console scripts, the name of the entry point
is the command that will be used to launch it. Within a distribution, entry
point names should be unique. If different distributions provide the same
name, the consumer decides how to handle such conflicts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this, then we should do it correctly and not bake in undefined behavior like this. We can look at a long history of undefined behavior and what happens when you add it. Typically it means you either end up needing to add conditionals based on implementation (ifdefs for GCC vs Clang, browser user agent sniffing, etc), one implementation becomes the (defacto) standard and everyone else tries to match it's behavior bug for bug (CPython vs PyPy), or people just give up and say that thing is only supported using a particular implementation (This site best viewed in IE).

I don't care what mechanism is selected, but it should be a standard mechanism.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can vary based on use case; if you're looking for plugins, you can ignore the names entirely and load all entry points, even if the names conflict. If you're looking for something with a specific name, there will be situations where it makes sense to error out on a conflict, to warn and pick one, or to take the first and stop looking.

pkg_resources leaves it up to the application what to do with conflicting entry points (its API is an iterator that yields all matching entry points). I don't think it makes sense to try to specify this.

want of a better term, especially where it points to a function to launch a
program.

There is also an optional property: the **extras** are a set of strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be stronger here and suggest that using extras as part of an entrypoint should not be used at all, but that parsers should be able to parse it due to historical reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this approach- new publishers should feel free to not support this feature, and new consumers may want to disallow it as well (but will need to handle it if they want to support reading arbitrary existing entry point groups)

Entry points are defined in a file called ``entry_points.txt`` in the
``*.dist-info`` directory of the distribution. This is the directory described
in :pep:`376` for installed distributions, and in :pep:`427` for wheels.
The file uses the UTF-8 character encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget offhand what setuptools does here, but we should ensure that it's doing this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to track it down through pkg_resources, and it looks like that is at least the intention.

https://github.com/pypa/setuptools/blob/403bfce4ab920823cc4ba0b5ca5ac0d1b213513d/pkg_resources/__init__.py#L1494


Conceptually, an entry point is defined by three required properties:

- The **group** that an entry point belongs to indicates what sort of object it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe setuptools has restrictions on what a valid group name is, I'm not sure offhand. If so it should be encoded here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consumers defining a new group should use names starting with a PyPI name
owned by the consumer project, followed by ``.``.

- The **name** identifies this entry point within its group. The precise meaning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe setuptools has restrictions on what a valid name is, I'm not sure offhand. If should be encoded here. At the very least I know console_scripts does not allow anything with a / in the name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex in pkg_resources appears to allow any character, though it cannot contain =, and can't end with a space. It would probably be appropriate to recommend a more restrictive character set for new entry names.

https://github.com/pypa/setuptools/blob/403bfce4ab920823cc4ba0b5ca5ac0d1b213513d/pkg_resources/__init__.py#L2449

===============

Two groups of entry points have special significant in packaging:
``console_scripts`` and ``gui_scripts``. In both groups, the name of the entry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off my memory, I believe that this is more complex then this. I know there are restrictions about having / in the name, and something makes me feel like we actually treat console_scripts as case insensitive. There is likely more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because console scripts translate to filenames, they are case insensitive because (some of) the target filesystems are.

It obviously depends on the group what's allowed (and we can't do anything about that). However, the spec probably does need to say something about who enforces restrictions - is it down to the producer to reject invalid entry point names? Or to the consumer to deal with them? The former is basically impractical, but the consumer may well not be able to deal gracefully with the situation. We have this with pip - packages can specify two console scripts differing only in case, and pip then ends up having to write a broken installation on a case-insensitive system like Windows.

I don't think the spec can do anything other than say "any other restrictions are the responsibility of the publisher and subscriber to manage". That sucks, but it's in the nature of de facto standards that you have to live with such suckiness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any code in pip to block / in script names - does setuptools do something special?

I'm adding a note about case sensitivity.

@dstufft
Copy link
Member

dstufft commented Oct 20, 2017 via email

@takluyver
Copy link
Member Author

Oh, my mistake, I missed that there are other APIs to load entry points. However, as far as I can see in the docs, the others only work on one distribution at a time. In any case, applications do get to choose how conflicts are handled.


As files are created from the names, and some filesystems are case-insensitive,
packages should avoid using names in these groups which differ only in case.
The behaviour of install tools when names differ only in case is undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is starting to veer away from specifying the entry point file format into defining the specifics of how individual publishers and subscribers use them.

I think that the "entry points file format" spec should say nothing other than "the console_scripts and gui_scripts groups are reserved for use by packaging systems". We could have another spec that defines how wrapper scripts are defined using those entry points, but if we put that information in this spec, we do start to hit the issues @dstufft is concerned about, of conflating specifications that pip/setuptools/wheel/distlib care about with those that they don't.

Copy link
Member Author

@takluyver takluyver Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This final section is intended to document the use of entry points in packaging systems, after the earlier sections have described the file format and the general semantics. This is kind of combining two specifications into one, but they are clearly separated by the headings.

I can take this section out or move it to a separate document if people want. But it would be a pretty short document, so I'm saying 'practicality beats purity' unless there's a consensus to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Reading this in context, I'm OK with it. Sorry - github's UI for long-running reviews is horrible, it's way too easy to lose context and/or whole comments :-(

Use for scripts
===============

Two groups of entry points have special significant in packaging:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"significance"

@theacodes
Copy link
Member

I'm okay with us documenting this, but considering this is under "specifications" it seems odd that this document does not reference a pep or a reference implementation - are we just documenting setuptools internals? If the answer is "yes" that's fine, I just want us to be clear about how "specific" the "specifications" are.

@takluyver
Copy link
Member Author

It's documenting an existing mechanism. I wouldn't call it setuptools 'internals' any more: multiple tools already have implementations to read and write it, so I consider it a de-facto interoperability standard. Donald disagrees with me.

On top of documenting the mechanism as it already exists, this makes a few recommendations about how tools should use it - e.g. namespacing entry point group names to avoid clashes.

@ncoghlan
Copy link
Member

The reference publishing tool is setuptools (as with PKG-INFO), the reference subscriber is pkg_resources.

@ncoghlan
Copy link
Member

@jaraco's approval for elevation to an interoperability specification: pypa/setuptools#1179 (comment)

@ncoghlan ncoghlan merged commit 34c37f0 into pypa:master Oct 26, 2017
@ncoghlan
Copy link
Member

Thanks @takluyver - with @jaraco's +1, I've gone ahead and published this version.

If we find any further clarifications or corrections are needed, they can go in a new PR (as with any of the specifications).

@takluyver
Copy link
Member Author

takluyver commented Oct 26, 2017 via email

@takluyver takluyver deleted the entrypoints-spec branch April 25, 2021 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants