ridiculously long instance names #2343

kbuzzard · 2023-07-21T16:25:01Z

Prerequisites

Put an X between the brackets on this line if you have done all of the following:
- Checked that your issue isn't already filed.
- Reduced the issue to a self-contained, reproducible test case.

Description

Lean 4's default instance naming algorithm, combined with mathlib's rich mathematics hierarchies, adds up to auto-generated instance names which are 2500 characters or more.

Steps to Reproduce

Make a gigantic topology and algebra hierarchy
Do something mathematically innocuous, like putting a normed space structure on a certain space of linear maps.
Marvel at the auto-generated name ContinuousLinearMap.instNormedSpaceContinuousLinearMapToSemiringToDivisionSemiringToSemifieldToFieldToNormedFieldIdToNonAssocSemiringContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupModuleSmulCommClass_selfToCommMonoidToCommRingToEuclideanDomainToMulActionToMonoidWithZeroToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToSeminormedRingToSeminormedCommRingToNormedCommRingToSMulToZeroToAddMonoidToSMulZeroClassToZeroToSMulWithZeroContinuousSMulToZeroToCommMonoidWithZeroToCommGroupWithZeroBoundedSMulModuleSmulCommClass_selfToMulActionToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToSMulToZeroToAddMonoidToSMulZeroClassToSMulWithZeroContinuousSMulBoundedSMulInstSeminormedAddCommGroupContinuousLinearMapToSemiringToDivisionSemiringToSemifieldToFieldToNormedFieldIdToNonAssocSemiringContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupModuleSmulCommClass_selfToCommMonoidToCommRingToEuclideanDomainToMulActionToMonoidWithZeroToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToSeminormedRingToSeminormedCommRingToNormedCommRingToSMulToZeroToAddMonoidToSMulZeroClassToZeroToSMulWithZeroContinuousSMulToZeroToCommMonoidWithZeroToCommGroupWithZeroBoundedSMulModuleSmulCommClass_selfToMulActionToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToSMulToZeroToAddMonoidToSMulZeroClassToSMulWithZeroContinuousSMulBoundedSMul

Versions

e.g. Lean (version 4.0.0-nightly-2023-06-20, commit a44dd71ad62a, Release) (OS independent)

Additional Information

What prompted me to open this issue was the fact that these instance names now pollute search; any reasonably small string of reasonable characters is a substring of the above instance name. Others are worried about blow-up continuing and causing weird problems down the line. Another observation is that this causes problems with the docs: the relevant link for this instance doesn't allow you to look at the equations for the instance -- you get the message One or more equations did not get rendered due to their size.

In community Lean 3 there was a more reasonable algorithm, which unfortunately I do not know the details of, although I do remember that it was changed from the Lean 3.4.2 algorithm (which was sometimes returning instance names which were too short :-) in the sense that they were missing relevant information or in the wrong namespace or something). It would be great to see this more controlled algorithm in Lean 4 before the exponential growth of auto-generated instance names starts to cause weird problems.

The text was updated successfully, but these errors were encountered:

Kha · 2023-07-26T15:10:59Z

Let's start by reviewing the tried and potential naming schemes (numbering is purely for disambiguation)

current Lean 4 scheme: ignores binders and camel case-concatenates names of all referenced constants and constant-likes (Type/Prop/ForAll) in the elaborated instance type. Low chance of collisions but see above for downsides. We still append a disambiguating index when necessary and possible (i.e. when not in a diamond situation), which does happen in practice
previous Lean 4 scheme: similar to above, but using the concrete syntax tree, thus ignoring any inferred information. This led to undesirable collisions as documented in Names of inferred instances clash #951. An upside of any syntactic approach is that the output should at most be linear in the input size.
syntactic approach including binders: as mentioned in Names of inferred instances clash #951. It would probably have to include non-explicit binders to avoid collisions, which should still be less verbose than what we have now.
Lean 3c scheme: a syntactic-type solution like 2 but with more heuristics, including for the namespace, documented at https://github.com/leanprover-community/lean/blob/master/src/frontends/lean/decl_util.cpp#L58-L68. Actually, it works on the pre-term of the type and thus is not even applicable 1:1 to Lean 4, though eagerly unfolding macros might get close enough.
namespacing using the module name: as briefly mentioned and immediately discarded by me in Names of inferred instances clash #951. It does prevent any chance of collisions. My concern with introducing these namespaces was with auto completion, documentation, and any other kind of declaration listing, though we could work on de-emphasizing them in all these cases - after all, people should not care about the name of an unnamed instance in most cases. However, it is unclear to me how the full declaration name would be assembled specifically - if we discard the current namespace, we break scoped unnamed instances. If we include it, we will generate quite strange names full of redundancies.
using hygiened (i.e. inaccessible) names: this would be the natural, straight-forward solution if we see unnamed instances as a macro for named instances. Like the namespacing approach, collision freedom would be guaranteed. It would also guarantee inability to access the name at all, which is most probably a too-principled stance. Indeed, an unscientific grep of mathlib4 shows a few hundred potential references to unnamed instances: https://gist.github.com/Kha/d42f60d22f60e72bf7c2bd2b4cdb141f.
We could work on enabling these use cases without having to reference the instance by name, but even just for debugging it can be helpful to be able to write down a fully elaborated expression in surface syntax; which, to be fair, is a UX limitation of hygiene in general.

It would seem that the Lean 3c approach presented a complex but practical compromise somewhere between approaches 1 and 2. If there were any remaining issues or annoyances with it, it would be great to document them here. But in the end, I really would like to ensure that we never have to think about instance name collisions again. And discouraging named access to unnamed instances, without outright preventing it, also seems reasonable for the sake of robustness. Thus I'm inclined to at least explore the potential design and impact of the latter two approaches instead of designing yet another heuristic that may or may not work out. Comments welcome.

semorrison · 2023-08-07T00:45:52Z

I would like to pursue option 6. I don't think it is an obstacle that Mathlib currently references unnamed instances by name. No one is happy about this, and it should be a relatively easy refactor to just avoid this.

(I would probably do this by locally modifying Lean to break all of the auto-names in any dumb way, and then fix everything again.)

semorrison · 2023-08-07T02:37:40Z

Ah -- I have encountered a problem with using entirely inaccessible names: sometimes we need to nolint instances created using deriving. The example I found first was in Mathlib.CategoryTheory.Limits.Shapes.WidePullbacks

/-- The type of arrows for the shape indexing a wide pullback. -/
inductive Hom : WidePullbackShape J → WidePullbackShape J → Type w
  | id : ∀ X, Hom X X
  | term : ∀ j : J, Hom (some j) none
  deriving DecidableEq
#align category_theory.limits.wide_pullback_shape.hom CategoryTheory.Limits.WidePullbackShape.Hom

attribute [nolint unusedArguments] instDecidableEqHom

Possibly the correct response to this is to improve the deriving handler!

Kha · 2023-08-07T08:38:22Z

I would like to pursue option 6

I'm very happy to hear that!

sometimes we need to nolint instances created using deriving

Should inaccessible declarations be linted at all?

digama0 · 2023-08-07T08:44:30Z

It would really help if there was a reliable @[automatically_generated] attribute like in rust... right now we have half a dozen approximations to this implemented in lean, std and mathlib.

eric-wieser · 2023-08-07T13:10:16Z

using hygiened (i.e. inaccessible) names

My main concern with this approach is that we lose the ability to have conversations on Zulip like:

A: Do we have the fact that LinearMaps form a module

B: Yes, it's docs#LinearMap.instModuleLinearMap

Without having to first make a PR to give the instance an explicit name if it didn't already have one.

digama0 · 2023-08-07T14:46:28Z

yes, I am concerned that such options will just push us to never use anonymous instances at all in mathlib, which feels like a failure of the naming heuristic. And with things like deriving DecidableEq this is even worse because then we won't be able to use the sugar (unless it gets an option like deriving DecidableEq (name := instDecidableEqNat)).

gebner · 2023-08-07T20:16:17Z

Should inaccessible declarations be linted at all?

We have plenty of linters for simp lemmas and type class instances that ensure a property of the whole simp set / instance set. Simp will apply lemmas whether they are inaccessible or not, so they need to be linted either way.

And tbh this looks like a bug in the derive handler:

attribute [nolint unusedArguments] instDecidableEqHom

…6423) Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long. This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated. There are four exceptions, which are now commented, with links to leanprover/lean4#2343. This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything. Co-authored-by: Scott Morrison <scott.morrison@gmail.com>

kbuzzard · 2023-08-09T14:01:00Z

B: Yes, it's docs#LinearMap.instModuleLinearMap

Isn't the correct answer "Yes, the typeclass inference system knows this, it should just work"?

eric-wieser · 2023-08-09T14:04:17Z

No, because that doesn't tell them "it's only true under assumptions XYZ"

kbuzzard · 2023-08-10T08:20:59Z

OK then I vote for making the PR to give the instance an explicit name if it didn't already have one, because I suspect that this is a pretty marginal concern in practice. It is also a step forward from "Yes, it's docs#<2500 character string>.

Is it clear that one would not be able to link to an inaccessible term? Could the answer be "yes it's docs#inst37, sorry we don't have an explicit name for it, but the typeclass system knows it so it should just work"? I have no idea about how the docs work.

eric-wieser · 2023-08-10T09:07:21Z

yes, I am concerned that such options will just push us to never use anonymous instances at all in mathlib, which feels like a failure of the naming heuristic. And with things like deriving DecidableEq this is even worse because then we won't be able to use the sugar (unless it gets an option like deriving DecidableEq (name := instDecidableEqNat)).

In case you missed it, @digama0: leanprover-community/mathlib4#6423 (now merged) seems like a big step along this path to never using anonymous instances.

…6423) Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long. This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated. There are four exceptions, which are now commented, with links to leanprover/lean4#2343. This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything. Co-authored-by: Scott Morrison <scott.morrison@gmail.com>

Kha · 2023-08-10T14:41:55Z

Is it clear that one would not be able to link to an inaccessible term?

I don't know (@hargoniX?) but it would be good to ensure doc-gen4 can reasonably handle them before touching the naming scheme

semorrison · 2023-08-10T23:38:55Z

Noting that my PR leanprover-community/mathlib4#6423 was not intended as a vote for "Mathlib should never ever use anonymous instance". Instead it was to decouple Mathlib from the current (fairly obviously unsatisfactory) scheme, so that we have room to change the current scheme without having to fix a million regressions in Mathlib.

If we can come up with a good replacement scheme, and implement it, then of course it's great if Mathlib is happy to go back to relying on the automatically generated names!

…6423) Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long. This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated. There are four exceptions, which are now commented, with links to leanprover/lean4#2343. This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything. Co-authored-by: Scott Morrison <scott.morrison@gmail.com>

utensil · 2023-10-09T10:24:45Z

using hygiened (i.e. inaccessible) names

My main concern with this approach is that we lose the ability to have conversations on Zulip like:

A: Do we have the fact that LinearMaps form a module

B: Yes, it's docs#LinearMap.instModuleLinearMap

Without having to first make a PR to give the instance an explicit name if it didn't already have one.

What about choosing a name prefix based on option 2 (linear in the input size) then a short hygiened (i.e. inaccessible) name such as a hash (like the short git commit id) of the name scheme from option 1 as postfix? e.g. instModuleLinearMap_3e71a6e which should remain readable and search-friendly for its prefix and much less likely to clash due to the cryptographic properties of the hash algorithm.

This way one can still talk about doc#LinearMap.instModuleLinearMap as it usually jumps to a unique doc, and gracefully downgrade to a search result to disambiguate.

hargoniX · 2023-10-09T13:55:48Z

Is it clear that one would not be able to link to an inaccessible term?

I don't know (@hargoniX?) but it would be good to ensure doc-gen4 can reasonably handle them before touching the naming scheme

doc-gen has a custom filter for declarations it is not supposed to render, as long as this filter is made not to trigger on auto-named instances they will keep showing up just fine.
Assuming they remain accessible as definitions in the Environment but I assume that's a given.

kbuzzard · 2024-01-28T13:17:21Z

@kmill what is the situation with #3089 ? It's still a draft?

Someone suggested to me that we write a linter which checks to see if an auto-generated instance name is too long and if so then CI fails and the user is forced to supply a more sensible name. This would be another workaround for this issue but I'd rather see a PR merged which fixes this high priority mathlib issue properly. It's embarrassing to see members of our community answering questions on the Zulip with links to explicit declarations in the docs which are 1000 characters long.

Implements a new method to generate instance names for anonymous instances that uses a heuristic that tends to produce shorter names. A design goal is to make them relatively unique within projects and definitely unique across projects, while also using accessible names so that they can be referred to as needed, both in Lean code and in discussions. The new method also takes into account binders provided to the instance, and it adds project-based suffixes. Despite this, a median new name is 73% its original auto-generated length. (Compare: [old generated names](https://gist.github.com/kmill/b72bb43f5b01dafef41eb1d2e57a8237) and [new generated names](https://gist.github.com/kmill/393acc82e7a8d67fc7387829f4ed547e).) Some notes: * The naming is sensitive to what is explicitly provided as a binder vs what is provided via a `variable`. It does not make use of `variable`s since, when names are generated, it is not yet known which variables are used in the body of the instance. * If the instance name refers to declarations in the current "project" (given by the root module), then it does not add a suffix. Otherwise, it adds the project name as a suffix to protect against cross-project collisions. * `set_option trace.Elab.instance.mkInstanceName true` can be used to see what name the auto-generator would give, even if the instance already has an explicit name. There were a number of instances that were referred to explicitly in meta code, and these have been given explicit names. Removes the unused `Lean.Elab.mkFreshInstanceName` along with the Command state's `nextInstIdx`. Fixes #2343

kmill · 2024-04-13T20:31:46Z

For the record, here's an explanation of the scheme that I used in #3098:

It elaborates the instance's binders and type and then creates a pi type with all of this.
It creates something like a Lean 3 pre-expression from this type, deleting all instance arguments, (most) implicit arguments, all universe levels, and the domains to dependent foralls. It leaves implict arguments that are types (except Sort _) or type families, to avoid regression with respect to Names of inferred instances clash #951. (Note: the resulting expression isn't well-formed anymore.)
It essentially follows the Lean 4 algorithm on the resulting expression. But, it skips all subexpressions it has seen before during the traversal. For top-level pi types, it also uses the pattern FooOfBar rather than ForallBarFoo. It initializes the set of seen subexpressions with all prefixes of the current namespace that correspond to declarations.
The root module of the current module is called the "current project". If there is no component of the name that came from a constant in the current project, then the name gets a project suffix.
If after all this the name already exists, it gets a numerical suffix.

Kha self-assigned this Jul 21, 2023

utensil mentioned this issue Oct 9, 2023

Improve UX for long instance names leanprover/doc-gen4#156

Open

semorrison added the Mathlib4 high prio Mathlib4 high priority issue label Oct 11, 2023

digama0 mentioned this issue Nov 19, 2023

Unnamed instances lead to ridiculously long instance names in the docs leanprover-community/mathlib4#8521

Closed

kmill mentioned this issue Dec 18, 2023

feat: shorten auto-generated instance names #3089

Merged

Kha assigned kmill and unassigned Kha Mar 6, 2024

ocfnash mentioned this issue Apr 1, 2024

[Merged by Bors] - feat(Algebra/Lie): define derivations on Lie algebras leanprover-community/mathlib4#11790

Closed

kmill closed this as completed in #3089 Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ridiculously long instance names #2343

ridiculously long instance names #2343

kbuzzard commented Jul 21, 2023 •

edited

Loading

Kha commented Jul 26, 2023

semorrison commented Aug 7, 2023

semorrison commented Aug 7, 2023

Kha commented Aug 7, 2023

digama0 commented Aug 7, 2023

eric-wieser commented Aug 7, 2023

digama0 commented Aug 7, 2023

gebner commented Aug 7, 2023

kbuzzard commented Aug 9, 2023

eric-wieser commented Aug 9, 2023

kbuzzard commented Aug 10, 2023

eric-wieser commented Aug 10, 2023 •

edited

Loading

Kha commented Aug 10, 2023

semorrison commented Aug 10, 2023

utensil commented Oct 9, 2023 •

edited

Loading

hargoniX commented Oct 9, 2023 •

edited

Loading

kbuzzard commented Jan 28, 2024

kmill commented Apr 13, 2024 •

edited

Loading

ridiculously long instance names #2343

ridiculously long instance names #2343

Comments

kbuzzard commented Jul 21, 2023 • edited Loading

Prerequisites

Description

Steps to Reproduce

Versions

Additional Information

Kha commented Jul 26, 2023

semorrison commented Aug 7, 2023

semorrison commented Aug 7, 2023

Kha commented Aug 7, 2023

digama0 commented Aug 7, 2023

eric-wieser commented Aug 7, 2023

digama0 commented Aug 7, 2023

gebner commented Aug 7, 2023

kbuzzard commented Aug 9, 2023

eric-wieser commented Aug 9, 2023

kbuzzard commented Aug 10, 2023

eric-wieser commented Aug 10, 2023 • edited Loading

Kha commented Aug 10, 2023

semorrison commented Aug 10, 2023

utensil commented Oct 9, 2023 • edited Loading

hargoniX commented Oct 9, 2023 • edited Loading

kbuzzard commented Jan 28, 2024

kmill commented Apr 13, 2024 • edited Loading

kbuzzard commented Jul 21, 2023 •

edited

Loading

eric-wieser commented Aug 10, 2023 •

edited

Loading

utensil commented Oct 9, 2023 •

edited

Loading

hargoniX commented Oct 9, 2023 •

edited

Loading

kmill commented Apr 13, 2024 •

edited

Loading