Skip to content

Conversation

ctchou
Copy link

@ctchou ctchou commented Oct 3, 2025

This patch adds a new datatype ωList for representing infinite sequences, which is basically a wrapper around the type ℕ → α which supports the dot-notation and the analogues of many familiar API functions of List α.

In my automata theory project:
https://github.com/ctchou/AutomataTheory
I originally used naked ℕ → _ types to represent infinite sequences but grew unsatisfied with that approach. First of all, I couldn't use the dot-notation that would allow me to use familiar names (from List) for such common API functions as take, drop, extract, append, etc. Secondly, I ended up re-proving many theorems already in the Stream' datatype. So I switched to using Stream'. (Note the prime in Stream'; Stream without the prime is a completely different datatype.). But after a long discussion, in particular with Mario Carneiro who wrote Stream':
https://leanprover.zulipchat.com/#narrow/channel/287929-mathlib4/topic/Why.20is.20Stream'.20called.20Stream'.3F/with/543032393
we concluded that it is probably best to introduce a new data type to represent infinite sequences. I chose the name ωList because this new type is an infinite analogue of List and will share with it many API function names. I did not use the word "sequence" because unfortunately that word now means something different in mathlib:
https://leanprover-community.github.io/mathlib4_docs/Mathlib/Data/Seq/Defs.html
and "sequence" just has too many different meanings in mathematics in general.

Currently the code of ωList is basically copied from Stream' with suitable renaming. But I expect to extend it in the future. (For example, extract is not yet defined.)

@ctchou ctchou requested a review from fmontesi as a code owner October 3, 2025 22:20
@fmontesi
Copy link
Collaborator

fmontesi commented Oct 6, 2025

I really like the name ωList!

@chenson2018 this might be relevant for work on semantics and infinite computations in languages as well. I also suspect this would really benefit from grind annotations. Please feel free to comment.

@fmontesi fmontesi requested a review from chenson2018 October 6, 2025 11:07
@ctchou
Copy link
Author

ctchou commented Oct 6, 2025

Yes, ωList is intended for use in modeling and reasoning about infinite computations. After another PR to enhance ωList further, I'll work on ωLanguage which is the counterpart of:
https://leanprover-community.github.io/mathlib4_docs/Mathlib/Computability/Language.html
on infinite words.

@chenson2018
Copy link
Collaborator

Sorry to rehash this, but before we duplicate definitions from Mathlib downstream, I want to make sure I understand the rationale completely. Here's my brief summary:

  • Mathlib's Stream' is meant to emulate a coinductive type, using ℕ → α since this is not yet supported
  • Mario argues that eventually when coinduction is supported, that it will be a separate type

What I don't quite get is what we get out of anticipating this change. Could you briefly explain what is the downside of just using Stream' here, which you've already ported your automata formalization to use? (Besides the naming, I think we all agree that your ωList is much nicer)

@ctchou
Copy link
Author

ctchou commented Oct 6, 2025

ωList is a fundamental type on which all theories involving infinite words and executions will be built. The name ωList will appear everywhere in those theories. I don't think we want to do have to change many files in the future, should Stream' go in a different direction. And I will be extending ωList with definitions and theorems not currently in Stream'. I think the history of Stream' in mathlib involved a lot of mistakes in design and name choices. There is no point for cslib to continue to be beholden to those mistakes.

@chenson2018
Copy link
Collaborator

ωList is a fundamental type on which all theories involving infinite words and executions will be built. The name ωList will appear everywhere in those theories. I don't think we want to do have to change many files in the future, should Stream' goes in a different direction. And I will be extend ωList with definitions and theorems not currently in Stream'.

But if we were to continue using Stream', we could just as well add these additional definitions and theorems here in Cslib, right? If this leaves only the maintenance due to Stream' changing upstream as a concern, my counterargument would be:

  • I really think it is going to be some time until coinductive data lands in the compiler
  • Even when it does land, we don't yet know the scope of incompatibility.
  • I'd rather deal with the potential issue of migration then rather then preempt it and for an indeterminate time period maintain a parallel implementation of Stream'

@ctchou
Copy link
Author

ctchou commented Oct 6, 2025

What is wrong with our own version of ωList that is completely under our control? Do you really want to suffer through having the making-no-sense-at-all name Stream' everywhere? Take a look at my project which uses Stream' and you'll see what I mean:
https://github.com/ctchou/AutomataTheory
We are already redefining the automata stuffs in mathlib. By your logic, we shouldn't be doing that, should we?

@ctchou
Copy link
Author

ctchou commented Oct 6, 2025

Rebase on the current main to fix conflicts.

@chenson2018
Copy link
Collaborator

What is wrong with our own version of ωList that is completely under our control? Do you really want to suffer through having the making-no-sense-at-all name Stream' everywhere? Take a look at my project which uses Stream' and you'll see what I mean: https://github.com/ctchou/AutomataTheory

I have looked at your repo already. I agree that the primed name is unfortunate, but it is ultimately a cosmetic concern versus potential maintaince burden. I would push back against calling it "making-no-sense-at-all", as I would be perfectly fine with Stream (with no prime). If the name truly is that unbearable, I would just make ωList an abbrev, but I am having trouble seeing a justification for duplicating all this code.

We are already redefining the automata stuffs in mathlib. By your logic, we shouldn't be doing that, should we?

There is a difference between redefining Stream' versus automata. Compare how they are respectively used:

chenson@pop-os:~/git/mathlib4$ rg --files-with-matches " (DFA|NFA) " Mathlib
Mathlib/Computability/DFA.lean
Mathlib/Computability/MyhillNerode.lean
Mathlib/Computability/EpsilonNFA.lean
Mathlib/Computability/NFA.lean

chenson@pop-os:~/git/mathlib4$ rg --files-with-matches "Stream'" Mathlib
Mathlib/Control/LawfulFix.lean
Mathlib/Control/Fix.lean
Mathlib/Combinatorics/Hindman.lean
Mathlib/Data/WSeq/Relation.lean
Mathlib/Data/WSeq/Productive.lean
Mathlib/Data/WSeq/Basic.lean
Mathlib/Data/WSeq/Defs.lean
Mathlib/Data/Stream/Init.lean
Mathlib/Data/Stream/Defs.lean
Mathlib/Data/Seq/Parallel.lean
Mathlib/Data/Seq/Computation.lean
Mathlib/Data/Seq/Basic.lean
Mathlib/Data/Seq/Defs.lean
Mathlib/Topology/MetricSpace/CantorScheme.lean
Mathlib/Algebra/ContinuedFractions/ConvergentsEquiv.lean
Mathlib/Algebra/ContinuedFractions/Basic.lean
Mathlib/Algebra/ContinuedFractions/Computation/Basic.lean
Mathlib/Algebra/ContinuedFractions/Computation/TerminatesIffRat.lean
Mathlib/Algebra/ContinuedFractions/Computation/Approximations.lean
Mathlib/Algebra/ContinuedFractions/Computation/Translations.lean
Mathlib/Algebra/ContinuedFractions/TerminatedStable.lean
Mathlib/Algebra/ContinuedFractions/Translations.lean

Stream' is highly entrenched in the import structure, and arguably a much more fundamental concept. On the other hand, there has been some positive reception to upstreaming or downsteaming Mathlib's automata definitions, which are isolated at the ends of the import structure. It's a smaller scope and potentially a temporary divergence.

@ctchou
Copy link
Author

ctchou commented Oct 6, 2025

I'm not sure an abbreviation will work. The theorems for Stream' are in the Stream' namespace. You either have to open that namespace, or refer to the name explicitly. In either case the thin veneer disappears.

And I fail to see why the use of Stream' by other theories in mathlib is relevant at all. Does having our version of ωList make any of those theories unusable?

@chenson2018
Copy link
Collaborator

And I fail to see why the use of Stream' by other theories in mathlib is relevant at all. Does having our version of ωList make any of those theories unusable?

Having a duplicated version of Stream' means that when we inevitably want to use one of these areas of Mathlib or additional API is added to Stream' that we have to either duplicate more theorems or prove equivalence if they interact with ωList.

To me, it sounds like an unnecessary maintenance burden when the downside of the alternative is a slightly undesirable name.

The decision falls to @fmontesi as the lead maintainer, but I have not seen any arguments that convince me that this is better than just using Stream'.

@ctchou
Copy link
Author

ctchou commented Oct 6, 2025

I don't understand why there would be any "interaction" between ωList and Stream'. Neither is defined in terms of the other and neither is in the dependency hierarchy of the other. They are two completely different things as far as Lean is concerned.

Sure, there are currently many theorems which are common to both except for identifier names. But that's really a consequence of the unfortunate decision by Mario to define a supposedly co-inductive type before co-induction is available in Lean and thereby "hijacked" the sequence type ℕ → _. In my opinion, that's a historical mistake that we are under no obligation to perpetuate. With a separate ωList type, the Stream' type can change in whatever way in the future and we will not be impacted. As to the extra maintenance burden, if we want to model and reason about infinite computations in the future, it's only fair that we will do the work of maintaining a data type for modeling infinite computation, isn't it? In fact, I am working on two PRs extending ωList right now. The first is to add the definition of and some theorems about ωList.extract (the counterpart of List.extract) and the second is about ωList.infOcc for reasoning about the set of elements that appears infinitely often in an ωList. (Does this notion make any sense in a co-inductive view of infinite sequences?)

@ctchou
Copy link
Author

ctchou commented Oct 7, 2025

Please take a look at #86, where I got rid of the API function get and commit to the notion that ωList α is really an abstraction of ℕ → α. Note that for s : Stream' and n : ℕ, that s n works and equal to s.get n now is an accidental consequence of the current implementation of Stream' and certainly does not accord with Mario's intended meaning for Stream'. But for our purpose, we want ωList α to be just an abstraction of ℕ → α and the function application notation is shorter and easier to read than the get function. For example, consider the definitions DA.infRun, and NA.InfRun, BuchiNA.Accepts, and ωList.infOcc in the following commit:
ca190eb
Using get will make all those expressions longer and harder to read. If we do not use get and Stream' changes internal implementation in the future, it will be a lot more work than a global name change to fix the problem.

@chenson2018
Copy link
Collaborator

But for our purpose, we want ωList α to be just an abstraction of ℕ → α ... Using get will make all those expressions longer and harder to read. If we do not use get and Stream' changes internal implementation in the future, it will be a lot more work than a global name change to fix the problem.

Is this not what we refer to as "defeq abuse", in that you are relying on the definitional equality of the type implemention? This is highly discouraged in Mathlib for this reason, among others.

I'll give one more related argument in favor of continuing to use Stream'. Say that in the near future coinductive types land in the compiler and the implementation is changed in a way that causes issues as you describe above. I have little doubt that Mathlib will follow at minimum its usual deprecation timeline of six months, and given that Steam' has other dependencies, there will certainly be mitigation of the most common migration issues, possibly even defining a separate type for ℕ → α that Mathlib will adopt.

Why not use Steam' as is in the meantime, and follow Mathlib's lead once this happens? In my view the downside of a displeasing name is far overshadowed by the benefits of not indefinitely having duplicate code and keeping us in sync with Mathlib.

@ctchou
Copy link
Author

ctchou commented Oct 7, 2025

Why on earth is this "defeq abuse"? People use naked ℕ → α to represent infinite sequences all the time in Lean. I'm introducing a type wrapper only because I also want to use dot-notation and have suggestive API function names similar to those in List.

If you persist in not letting me define ωList, I'm going back to ℕ → α. I'm not going to sprinkle completely useless get in my code.

@chenson2018
Copy link
Collaborator

Why on earth is this "defeq abuse"? People use naked ℕ → α to represent infinite sequences all the time in Lean. I'm introducing a type wrapper only because I also want to use dot-notation and have suggestive API function names similar to those in List.

Please correct me if I'm wrong, but my impression was that having s : Stream α then relying on its implementation to get s n : α instead of using the API fits this description. Perhaps relatively minor, as you could have something like an GetElem instance and write s[n].

If you persist in not letting me define ωList, I'm going back to ℕ → α. I'm not going to sprinkle completely useless get in my code.

While I feel pretty strongly that we should use Steam' (or I suppose ℕ → α would be fine too...), I am not currently a code owner for this directory. As I mentioned above, I will defer to @fmontesi once he has a chance to respond. Let's wait and see what he says.

@ctchou
Copy link
Author

ctchou commented Oct 7, 2025

Why do I want to use the [...] notation? As I pointed out, people use ℕ → α to represent infinite sequences all the time in Lean. (Read "Mathematics in Lean", for example.) Do they use GetElem because conceptually they are "really" dealing with infinite sequences rather than functions? I want to do exactly the same thing, except that I also want to use dot-notation and nice and short API names.

The ONLY reason that I used Stream' in my project was that it allows me to use dot-notation and nice API names. After corresponding with Mario, I now realize that iIt was a pure historical accident that Stream' is now an abstracted version of ℕ → α. Stream' was designed to represent a co-inductive data type, not infinite sequences of type ℕ → α.

@chenson2018
Copy link
Collaborator

chenson2018 commented Oct 7, 2025

Why do I want to use the [...] notation? As I pointed out, people use ℕ → α to represent infinite sequences all the time in Lean. (Read "Mathematics in Lean", for example.) Do they use GetElem because conceptually they are "really" dealing with infinite sequences rather than functions? I want to do exactly the same, except that I also want to use dot-notation and nice and short API names.

This is really a distraction from my primary issue (duplication of code I don't see a benefit for) but I can respond briefly.

In MIL, to my knowledge, they always use ℕ → α directly, not wrapped in a type like Stream'. Once you introduce a type wrapper, you have a question of defeq abuse. As a rule of thumb, imagine if you instead made Stream' a structure with one field of type ℕ → α, which you could not just apply a natural number to. I only suggested GetElem notation as something that would abstract from this.

I will stress again, this is a secondary concern. Mathlib isn't perfect about this either, the exact definition of defeq abuse is a bit more subtle, and I am much more concerned with the general idea of duplicating this code.

It is late in my time zone, let's resume this tomorrow after Fabrizio has had a chance to read through our discussion.

@ctchou
Copy link
Author

ctchou commented Oct 7, 2025

Rebase on the current "main".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants