Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ridiculously long instance names #2343

Closed
1 task done
kbuzzard opened this issue Jul 21, 2023 · 18 comments · Fixed by #3089
Closed
1 task done

ridiculously long instance names #2343

kbuzzard opened this issue Jul 21, 2023 · 18 comments · Fixed by #3089
Assignees
Labels
Mathlib4 high prio Mathlib4 high priority issue

Comments

@kbuzzard
Copy link
Contributor

kbuzzard commented Jul 21, 2023

Prerequisites

  • Put an X between the brackets on this line if you have done all of the following:
    • Checked that your issue isn't already filed.
    • Reduced the issue to a self-contained, reproducible test case.

Description

Lean 4's default instance naming algorithm, combined with mathlib's rich mathematics hierarchies, adds up to auto-generated instance names which are 2500 characters or more.

Steps to Reproduce

  1. Make a gigantic topology and algebra hierarchy
  2. Do something mathematically innocuous, like putting a normed space structure on a certain space of linear maps.
  3. Marvel at the auto-generated name ContinuousLinearMap.instNormedSpaceContinuousLinearMapToSemiringToDivisionSemiringToSemifieldToFieldToNormedFieldIdToNonAssocSemiringContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupModuleSmulCommClass_selfToCommMonoidToCommRingToEuclideanDomainToMulActionToMonoidWithZeroToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToSeminormedRingToSeminormedCommRingToNormedCommRingToSMulToZeroToAddMonoidToSMulZeroClassToZeroToSMulWithZeroContinuousSMulToZeroToCommMonoidWithZeroToCommGroupWithZeroBoundedSMulModuleSmulCommClass_selfToMulActionToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToSMulToZeroToAddMonoidToSMulZeroClassToSMulWithZeroContinuousSMulBoundedSMulInstSeminormedAddCommGroupContinuousLinearMapToSemiringToDivisionSemiringToSemifieldToFieldToNormedFieldIdToNonAssocSemiringContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupModuleSmulCommClass_selfToCommMonoidToCommRingToEuclideanDomainToMulActionToMonoidWithZeroToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToSeminormedRingToSeminormedCommRingToNormedCommRingToSMulToZeroToAddMonoidToSMulZeroClassToZeroToSMulWithZeroContinuousSMulToZeroToCommMonoidWithZeroToCommGroupWithZeroBoundedSMulModuleSmulCommClass_selfToMulActionToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToMulActionWithZeroContinuousConstSMulToSMulToZeroToAddMonoidToSMulZeroClassToSMulWithZeroContinuousSMulBoundedSMul

Versions

e.g. Lean (version 4.0.0-nightly-2023-06-20, commit a44dd71ad62a, Release) (OS independent)

Additional Information

What prompted me to open this issue was the fact that these instance names now pollute search; any reasonably small string of reasonable characters is a substring of the above instance name. Others are worried about blow-up continuing and causing weird problems down the line. Another observation is that this causes problems with the docs: the relevant link for this instance doesn't allow you to look at the equations for the instance -- you get the message One or more equations did not get rendered due to their size.

In community Lean 3 there was a more reasonable algorithm, which unfortunately I do not know the details of, although I do remember that it was changed from the Lean 3.4.2 algorithm (which was sometimes returning instance names which were too short :-) in the sense that they were missing relevant information or in the wrong namespace or something). It would be great to see this more controlled algorithm in Lean 4 before the exponential growth of auto-generated instance names starts to cause weird problems.

@Kha Kha self-assigned this Jul 21, 2023
@Kha
Copy link
Member

Kha commented Jul 26, 2023

Let's start by reviewing the tried and potential naming schemes (numbering is purely for disambiguation)

  1. current Lean 4 scheme: ignores binders and camel case-concatenates names of all referenced constants and constant-likes (Type/Prop/ForAll) in the elaborated instance type. Low chance of collisions but see above for downsides. We still append a disambiguating index when necessary and possible (i.e. when not in a diamond situation), which does happen in practice
    image

  2. previous Lean 4 scheme: similar to above, but using the concrete syntax tree, thus ignoring any inferred information. This led to undesirable collisions as documented in Names of inferred instances clash #951. An upside of any syntactic approach is that the output should at most be linear in the input size.

  3. syntactic approach including binders: as mentioned in Names of inferred instances clash #951. It would probably have to include non-explicit binders to avoid collisions, which should still be less verbose than what we have now.

  4. Lean 3c scheme: a syntactic-type solution like 2 but with more heuristics, including for the namespace, documented at https://github.com/leanprover-community/lean/blob/master/src/frontends/lean/decl_util.cpp#L58-L68. Actually, it works on the pre-term of the type and thus is not even applicable 1:1 to Lean 4, though eagerly unfolding macros might get close enough.

  5. namespacing using the module name: as briefly mentioned and immediately discarded by me in Names of inferred instances clash #951. It does prevent any chance of collisions. My concern with introducing these namespaces was with auto completion, documentation, and any other kind of declaration listing, though we could work on de-emphasizing them in all these cases - after all, people should not care about the name of an unnamed instance in most cases. However, it is unclear to me how the full declaration name would be assembled specifically - if we discard the current namespace, we break scoped unnamed instances. If we include it, we will generate quite strange names full of redundancies.

  6. using hygiened (i.e. inaccessible) names: this would be the natural, straight-forward solution if we see unnamed instances as a macro for named instances. Like the namespacing approach, collision freedom would be guaranteed. It would also guarantee inability to access the name at all, which is most probably a too-principled stance. Indeed, an unscientific grep of mathlib4 shows a few hundred potential references to unnamed instances: https://gist.github.com/Kha/d42f60d22f60e72bf7c2bd2b4cdb141f.
    We could work on enabling these use cases without having to reference the instance by name, but even just for debugging it can be helpful to be able to write down a fully elaborated expression in surface syntax; which, to be fair, is a UX limitation of hygiene in general.


It would seem that the Lean 3c approach presented a complex but practical compromise somewhere between approaches 1 and 2. If there were any remaining issues or annoyances with it, it would be great to document them here. But in the end, I really would like to ensure that we never have to think about instance name collisions again. And discouraging named access to unnamed instances, without outright preventing it, also seems reasonable for the sake of robustness. Thus I'm inclined to at least explore the potential design and impact of the latter two approaches instead of designing yet another heuristic that may or may not work out. Comments welcome.

@semorrison
Copy link
Collaborator

I would like to pursue option 6. I don't think it is an obstacle that Mathlib currently references unnamed instances by name. No one is happy about this, and it should be a relatively easy refactor to just avoid this.

(I would probably do this by locally modifying Lean to break all of the auto-names in any dumb way, and then fix everything again.)

@semorrison
Copy link
Collaborator

Ah -- I have encountered a problem with using entirely inaccessible names: sometimes we need to nolint instances created using deriving. The example I found first was in Mathlib.CategoryTheory.Limits.Shapes.WidePullbacks

/-- The type of arrows for the shape indexing a wide pullback. -/
inductive Hom : WidePullbackShape J → WidePullbackShape J → Type w
  | id : ∀ X, Hom X X
  | term : ∀ j : J, Hom (some j) none
  deriving DecidableEq
#align category_theory.limits.wide_pullback_shape.hom CategoryTheory.Limits.WidePullbackShape.Hom

attribute [nolint unusedArguments] instDecidableEqHom

Possibly the correct response to this is to improve the deriving handler!

@Kha
Copy link
Member

Kha commented Aug 7, 2023

I would like to pursue option 6

I'm very happy to hear that!

sometimes we need to nolint instances created using deriving

Should inaccessible declarations be linted at all?

@digama0
Copy link
Collaborator

digama0 commented Aug 7, 2023

It would really help if there was a reliable @[automatically_generated] attribute like in rust... right now we have half a dozen approximations to this implemented in lean, std and mathlib.

@eric-wieser
Copy link
Contributor

using hygiened (i.e. inaccessible) names

My main concern with this approach is that we lose the ability to have conversations on Zulip like:

A: Do we have the fact that LinearMaps form a module

B: Yes, it's docs#LinearMap.instModuleLinearMap

Without having to first make a PR to give the instance an explicit name if it didn't already have one.

@digama0
Copy link
Collaborator

digama0 commented Aug 7, 2023

yes, I am concerned that such options will just push us to never use anonymous instances at all in mathlib, which feels like a failure of the naming heuristic. And with things like deriving DecidableEq this is even worse because then we won't be able to use the sugar (unless it gets an option like deriving DecidableEq (name := instDecidableEqNat)).

@gebner
Copy link
Member

gebner commented Aug 7, 2023

Should inaccessible declarations be linted at all?

We have plenty of linters for simp lemmas and type class instances that ensure a property of the whole simp set / instance set. Simp will apply lemmas whether they are inaccessible or not, so they need to be linted either way.

And tbh this looks like a bug in the derive handler:

attribute [nolint unusedArguments] instDecidableEqHom

bors bot pushed a commit to leanprover-community/mathlib4 that referenced this issue Aug 9, 2023
…6423)

Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long.

This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated.

There are four exceptions, which are now commented, with links to leanprover/lean4#2343.

This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything.



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
@kbuzzard
Copy link
Contributor Author

kbuzzard commented Aug 9, 2023

B: Yes, it's docs#LinearMap.instModuleLinearMap

Isn't the correct answer "Yes, the typeclass inference system knows this, it should just work"?

@eric-wieser
Copy link
Contributor

No, because that doesn't tell them "it's only true under assumptions XYZ"

@kbuzzard
Copy link
Contributor Author

OK then I vote for making the PR to give the instance an explicit name if it didn't already have one, because I suspect that this is a pretty marginal concern in practice. It is also a step forward from "Yes, it's docs#<2500 character string>.

Is it clear that one would not be able to link to an inaccessible term? Could the answer be "yes it's docs#inst37, sorry we don't have an explicit name for it, but the typeclass system knows it so it should just work"? I have no idea about how the docs work.

@eric-wieser
Copy link
Contributor

eric-wieser commented Aug 10, 2023

yes, I am concerned that such options will just push us to never use anonymous instances at all in mathlib, which feels like a failure of the naming heuristic. And with things like deriving DecidableEq this is even worse because then we won't be able to use the sugar (unless it gets an option like deriving DecidableEq (name := instDecidableEqNat)).

In case you missed it, @digama0: leanprover-community/mathlib4#6423 (now merged) seems like a big step along this path to never using anonymous instances.

bors bot pushed a commit to leanprover-community/mathlib4 that referenced this issue Aug 10, 2023
…6423)

Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long.

This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated.

There are four exceptions, which are now commented, with links to leanprover/lean4#2343.

This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything.



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
bors bot pushed a commit to leanprover-community/mathlib4 that referenced this issue Aug 10, 2023
…6423)

Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long.

This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated.

There are four exceptions, which are now commented, with links to leanprover/lean4#2343.

This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything.



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
@Kha
Copy link
Member

Kha commented Aug 10, 2023

Is it clear that one would not be able to link to an inaccessible term?

I don't know (@hargoniX?) but it would be good to ensure doc-gen4 can reasonably handle them before touching the naming scheme

@semorrison
Copy link
Collaborator

Noting that my PR leanprover-community/mathlib4#6423 was not intended as a vote for "Mathlib should never ever use anonymous instance". Instead it was to decouple Mathlib from the current (fairly obviously unsatisfactory) scheme, so that we have room to change the current scheme without having to fix a million regressions in Mathlib.

If we can come up with a good replacement scheme, and implement it, then of course it's great if Mathlib is happy to go back to relying on the automatically generated names!

semorrison added a commit to leanprover-community/mathlib4 that referenced this issue Aug 14, 2023
…6423)

Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long.

This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated.

There are four exceptions, which are now commented, with links to leanprover/lean4#2343.

This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything.



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
semorrison added a commit to leanprover-community/mathlib4 that referenced this issue Aug 15, 2023
…6423)

Per leanprover/lean4#2343, we are going to need to change the automatic generation of instance names, as they become too long.

This PR ensures that everywhere in Mathlib that refers to an instance by name, that name is given explicitly, rather than being automatically generated.

There are four exceptions, which are now commented, with links to leanprover/lean4#2343.

This was implemented by running Mathlib against a modified Lean that appended `_ᾰ` to all automatically generated names, and fixing everything.



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
@utensil
Copy link

utensil commented Oct 9, 2023

using hygiened (i.e. inaccessible) names

My main concern with this approach is that we lose the ability to have conversations on Zulip like:

A: Do we have the fact that LinearMaps form a module

B: Yes, it's docs#LinearMap.instModuleLinearMap

Without having to first make a PR to give the instance an explicit name if it didn't already have one.

What about choosing a name prefix based on option 2 (linear in the input size) then a short hygiened (i.e. inaccessible) name such as a hash (like the short git commit id) of the name scheme from option 1 as postfix? e.g. instModuleLinearMap_3e71a6e which should remain readable and search-friendly for its prefix and much less likely to clash due to the cryptographic properties of the hash algorithm.

This way one can still talk about doc#LinearMap.instModuleLinearMap as it usually jumps to a unique doc, and gracefully downgrade to a search result to disambiguate.

@hargoniX
Copy link
Contributor

hargoniX commented Oct 9, 2023

Is it clear that one would not be able to link to an inaccessible term?

I don't know (@hargoniX?) but it would be good to ensure doc-gen4 can reasonably handle them before touching the naming scheme

doc-gen has a custom filter for declarations it is not supposed to render, as long as this filter is made not to trigger on auto-named instances they will keep showing up just fine.
Assuming they remain accessible as definitions in the Environment but I assume that's a given.

@kbuzzard
Copy link
Contributor Author

@kmill what is the situation with #3089 ? It's still a draft?

Someone suggested to me that we write a linter which checks to see if an auto-generated instance name is too long and if so then CI fails and the user is forced to supply a more sensible name. This would be another workaround for this issue but I'd rather see a PR merged which fixes this high priority mathlib issue properly. It's embarrassing to see members of our community answering questions on the Zulip with links to explicit declarations in the docs which are 1000 characters long.

@Kha Kha assigned kmill and unassigned Kha Mar 6, 2024
github-merge-queue bot pushed a commit that referenced this issue Apr 13, 2024
Implements a new method to generate instance names for anonymous
instances that uses a heuristic that tends to produce shorter names. A
design goal is to make them relatively unique within projects and
definitely unique across projects, while also using accessible names so
that they can be referred to as needed, both in Lean code and in
discussions.

The new method also takes into account binders provided to the instance,
and it adds project-based suffixes. Despite this, a median new name is
73% its original auto-generated length. (Compare: [old generated
names](https://gist.github.com/kmill/b72bb43f5b01dafef41eb1d2e57a8237)
and [new generated
names](https://gist.github.com/kmill/393acc82e7a8d67fc7387829f4ed547e).)

Some notes:
* The naming is sensitive to what is explicitly provided as a binder vs
what is provided via a `variable`. It does not make use of `variable`s
since, when names are generated, it is not yet known which variables are
used in the body of the instance.
* If the instance name refers to declarations in the current "project"
(given by the root module), then it does not add a suffix. Otherwise, it
adds the project name as a suffix to protect against cross-project
collisions.
* `set_option trace.Elab.instance.mkInstanceName true` can be used to
see what name the auto-generator would give, even if the instance
already has an explicit name.

There were a number of instances that were referred to explicitly in
meta code, and these have been given explicit names.

Removes the unused `Lean.Elab.mkFreshInstanceName` along with the
Command state's `nextInstIdx`.

Fixes #2343
@kmill
Copy link
Collaborator

kmill commented Apr 13, 2024

For the record, here's an explanation of the scheme that I used in #3098:

  1. It elaborates the instance's binders and type and then creates a pi type with all of this.
  2. It creates something like a Lean 3 pre-expression from this type, deleting all instance arguments, (most) implicit arguments, all universe levels, and the domains to dependent foralls. It leaves implict arguments that are types (except Sort _) or type families, to avoid regression with respect to Names of inferred instances clash #951. (Note: the resulting expression isn't well-formed anymore.)
  3. It essentially follows the Lean 4 algorithm on the resulting expression. But, it skips all subexpressions it has seen before during the traversal. For top-level pi types, it also uses the pattern FooOfBar rather than ForallBarFoo. It initializes the set of seen subexpressions with all prefixes of the current namespace that correspond to declarations.
  4. The root module of the current module is called the "current project". If there is no component of the name that came from a constant in the current project, then the name gets a project suffix.
  5. If after all this the name already exists, it gets a numerical suffix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Mathlib4 high prio Mathlib4 high priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants