Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Clarify Class#subclases behavior quirks #5480

Merged
merged 1 commit into from
Dec 11, 2022

Conversation

zverok
Copy link
Contributor

@zverok zverok commented Jan 24, 2022

As per discussion in Feature #18273, explain the non-deterministic nature of the method.

Rendering of a version achieved after some discussion:
image

@zverok zverok added the Documentation Improvements to documentation. label Jan 24, 2022
@zverok zverok requested a review from byroot January 24, 2022 22:32
@zverok zverok self-assigned this Jan 24, 2022
Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I believe warning about this in the doc is useful, I wouldn't use the non-deterministic terminology.

I think it would be better to refer those as weak reference.

I also think it would be better to frame it as how subclasses don't prevent sublclasses from being GCed rather than to frame it as a cruft.

class.c Outdated
Comment on lines 1510 to 1511
* is not a representation of some internal state but is
* calculated dynamically. It might lead to a non-deterministic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not a representation of some internal state but is calculated dynamically

It very much is a representation of some internal state. Class has basically a weaklist of references to it's subclasses.

class.c Outdated
@@ -1504,6 +1504,35 @@ class_descendants(VALUE klass, bool immediate_only)
* A.subclasses #=> [D, B]
* B.subclasses #=> [C]
* C.subclasses #=> []
*
*
* Note that unlike Module#ancestors, this method's result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only real difference with ancestors is that one is a strong reference, the other a weak reference. So yes, if you hold a module, it's ancestors won't ever be GCed, if you hold a class, it's sublclasses might be GCed.

Copy link
Contributor

@fxn fxn Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, that's from the point of view of who implements the language.

From the point of view of the programmer, it's not about strong/weak. From the point of view of the programmer, superclasses and mixins are added to the ancestor chain. There's no API to alter the ancestor chain, in particular for removing or replacing.

That said, the point of my feedback is not so much about whether there's an internal materialization of the collection or the collection is computed walking some internal structures. I don't think we need to word it this way.

The reality is that this method returns the "subclasses that are alive in memory", which is a weird way to say it. So wording this in terms of weak references may be worth a try.

The user has to know that the same exact program, without touching a comma, can behave in a different way, like the synthetic

Class.new
Object.subclasses

does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reality is that this method returns the "subclasses that are alive in memory", which is a weird way to say it

Yes, all Ruby objects are either alive in memory, or either you can't interact with them, hence why I'm not kin on this kind of phrasing.

I think a simple reminder that subclasses may be GCed should be enough.

class.c Outdated
* GC.start
*
* A.subclasses
* # => []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be in the docs, it is not guaranteed.

@byroot
Copy link
Member

byroot commented Jan 25, 2022

So the current documentation is:

Returns an array of classes where the receiver is the direct superclass of the class, excluding singleton classes.
The order of the returned array is not defined.

What about adding something along the line of:

Note that classes that are no longer reachable and are about to be garbage collected are returned as well.

@fxn
Copy link
Contributor

fxn commented Jan 25, 2022

I like it. It's concise and enough.

@zverok
Copy link
Contributor Author

zverok commented Jan 25, 2022

@byroot @fxn
I see your points.

I am a bit unsure about this wording, though:

Note that classes that are no longer reachable and are about to be garbage collected are returned as well.

For me, this has some vibe of a "note to self" (like, experienced Rubyists/language authors, "yeah, we didn't forget to mention it"). In my view, one of the important audiences of documentation is people new to language—maybe they are new to programming in general, maybe not; maybe they are new to dynamic/Garbage collected languages.

From my memory of tinkering with Ruby many years ago, and from my experience of teaching Ruby for several years, I believe that "interestingly named" methods of core classes draw attention to those trying to make a mental model of the language, to understand how its entities are related and how they are behaving.

And I suspect that (I believe it is part of what we discussed with @fxn on the tracker) for some readers presence of subclasses might indicate that the list is keeping references to all subclasses that were created during the program execution; for others, existence of dynamically created subclasses might be a novel and unusual idea (as far as I can remember, I "discovered" some_var = Clas.new not in the first months of my life with Ruby, maybe not even in the first years).

WDYT about this:

  1. Word a note this way (the only text to add to the current docs, under the examples), trying to unify our ideas: "Note that dynamically created subclasses may disappear from the list when there are no references to them left (they become unreachable), but the exact moment of their disappearance is not guaranteed and depend on garbage collection." It is a bit over-explanatory, but still short, and gives some insight on "what's might be going on" for less experienced developers.
  2. Leave only this part of my example (demonstrating how the dynamic subclasses are created, and what does it mean "no references left"):
     class A; end

     # create dynamic subclass, not associated with a constant
     c = Class.new(A)
     A.subclasses
     # => [#<Class:0x00007fe8cc5e3690>]

     # drop the reference to subclass, it can be garbage-collected now
     c = nil

     A.subclasses
     # It can be
     #  => [#<Class:0x00007fe8cc5e3690>]
     # ...or
     #  => []
     # ...depending on whether garbage collector was run

WDYT?

@byroot
Copy link
Member

byroot commented Jan 25, 2022

I am a bit unsure about this wording

I'm not particularly attached to the specific wording, it's more about the overall point.

WDYT about this

IMHO, Class#subclasses is not the right place to include a long explanation on how GC languages work.

I'm also worried that having 1ish line of description for 10ish line of what seem like a warning might put people off. Especially when it's something that few people will encounter. Active Support's implementation of Class#subclasses had exactly the same behavior for over a decade, and I don't think I ever saw any concerns or questions about it anywhere.

Eventually mentioning that anonymous classes are returned too might be interesting.

But ultimately I'm a fairly lousy documenter, so if you feel strongly about some part, go for it.

@fxn
Copy link
Contributor

fxn commented Jan 25, 2022

IMHO, Class#subclasses is not the right place to include a long explanation on how GC languages work.

Agree.

Also, this is not the place to explain in detail what a class object is, and that they can go out of scope and be garbage collected like any other object. That's learning the Ruby language.

So, I believe we need to say something to have a more complete description of what the method does, but there has to be a balance and the small print has to be short.

The small print may make some readers wonder: "wait, a class can be GCed, how's that possible?", and that would make them go and learn the part of Ruby they did not know yet. But that has to be elsewhere in my opinion.

@zverok
Copy link
Contributor Author

zverok commented Jan 26, 2022

@byroot @fxn TBH, I already feel bad about wasting so much of your time/resource on discussing a small paragraph in docs (I am saying it honestly, and not as a passive-agressive attack!)

Therefore, I want to clarify my general intentions one more time (because it would be helpful for future work), and after that, I'll go with whatever the consensus will be. Or we can just close this PR, actually :)

So, there are a few things I generally consider:

  1. It is not that we have an abundance of publicly accessible, open, and "official" (or at least well-maintained, regularly updated, and easy to find) Ruby docs. Basically, the class-by-class and method-by-method reference on docs.ruby-lang.org (and its copies of various quality and usability on other sites) is all we have. That's why we are frequently deviating from "let's keep the reference lean, they should learn the language elsewhere": there is no "elsewhere".
  2. Ruby is distinctive by the fact that almost all of its functionality is grouped in classes/methods; and that's where people look for the docs first (saying this from my own experience—it was a public rendering of "Programming Ruby 1.6" then—and from the experience of my mentees). We don't have just "a chapter of enumeration" somewhere, but we have Enumerable; we don't have "a chapter on math", we have Numeric; we don't have "system/core functionality", we have Kernel etc. That's where most of the explanations live anyway. The balance should be kept (we have some criminally overdocumented methods/classes, too, it is a thing! when something simple has 5 screens of docs and details and becomes intimidating), but I don't feel like I was trying to squeeze a whole book on subclasses here :)
  3. When thinking about "appropriate" method docs, I prefer to think about "who would read this doc and in what situation" (I understand that it diverges from the "minimal objective documentation" goal, but I believe it aligns pretty well with other methods docs and with points above). The one looking for .subclasses probably doing some metaprogramming experiments and/or designing a DSL; and at this point, any experimenting/testing would make "oh, it works with anonymous subclasses" and "oh, they can be removed" a valuable piece of knowledge.

So... I think @byroot is onto something with "Eventually mentioning that anonymous classes are returned too might be interesting.", and I propose this form as a balance between "Mario, your details are in another castle!" and too much fine print, and a compilation of various versions proposed during the discussion. That might be the whole doc for the method:

Returns an array of classes where the receiver is the
direct superclass of the class, excluding singleton classes.
The order of the returned array is not defined.

class A; end
class B < A; end
class C < B; end
class D < A; end

A.subclasses        #=> [D, B]
B.subclasses        #=> [C]
C.subclasses        #=> []

Anonymous subclasses (not associated with a constant) are
returned, too:

c = Class.new(A)
A.subclasses        # => [#<Class:0x00007f003c77bd78>, D, B]

Note that the parent does not hold references to subclasses
and doesn't prevent them from being garbage collected. This
means that the subclass might disappear when all references
to it are dropped:

# drop the reference to subclass, it can be garbage-collected now
c = nil

A.subclasses
# It can be
#  => [#<Class:0x00007f003c77bd78>, D, B]
# ...or just
#  => [D, B]
# ...depending on whether garbage collector was run

This way the "fine print about disappearance" resides in the end (after a more commonly useful example of the dynamic subclass), and doesn't look like "the main thing that docs are saying".

PS: @fxn you got me extremely confused 😂 It was from your initial concern that this effort started, but now we seem to "change places" in who considers what is important!

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

I already feel bad about wasting so much of your time/resource on discussing a small paragraph in docs

No problem at all! Designing an API and writing good docs is not trivial at all. Your contribution and the discussion are necessary and totally welcome. You think as much about something as you need to come with a final result you like. And that final result is often the product of discussions, listening to different points of view, and sleeping on the topic.

I am not sure about anonymous classes. A subclass is a parent/child relationship and no names are involved in that concept. Nor for the parent, nor for the child. In docs, sometimes you are a bit redundant for didactic purposes, though. The balance here, whether to mention this or not, I'd leave to @byroot.

@zverok you example with c = nil would be a way to illustrate what we said before. Personally, I like it too as a companion of the note.

PS: @fxn you got me extremely confused 😂 It was from your initial concern that this effort started, but now we seem to "change places" in who considers what is important!

Sorry for the confusion!

In Redmine, my position was that I believe this shouldn't be in Ruby, and I tried to explain why. And I say that from a pure API-design POV, in the sense that I have a huge respect for @byroot. It's just that in this particular case, we see it differently. Of course, this API was blessed by Ruby core, so all my respect there too, and opening the discussion was a way to say: "hey, have you considered this from this angle?" If the answer is, "yeah, it is fine", I accept it.

Now, if the method has to stay, then I believe the original docs are incomplete. We need to be more precise. You can present it like happy path first, and then small print. But we need some small print because this is core Ruby.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

Followups:

c = Class.new(A)
A.subclasses # => [#<Class:0x00007f003c77bd78>, D, B]

this, in isolation, is not guaranteed. After line 1, c is unusued and the GC could consider the class to be elegible. You see? This method is tricky.

But we need some small print because this is core Ruby.

I didn't express myself very well here. All documentation should excel at being didactic, correct, and comprehensive. As a programmer, I have to get all what the method does described for me to be informed.

I was trying to say that, if of all of them, there's a documentation that has to be this way, that's the one of the core language and its standard library (which definitely has room for improvement in this regard).

@byroot
Copy link
Member

byroot commented Jan 26, 2022

After line 1, c is unusued and the GC could consider the class to be elegible.

No, you'd need to exit the current method or block or whatever the scope is.

e.g.:

def foo
  c = Class.new(A)
  1_000.times { GC.start } # GC won't ever collect `c` here.
end

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

No, you'd need to exit the current method or block or whatever the scope is.

One thing is how CRuby works internally today, and a different thing is which is the public contract. What you can assure because Ruby, the language, guarantees in the abstract. When you document, you need to abide strictly to what the public contract is.

In particular, GCs can change over the time, and different Rubies have different GCs.

I don't think the garbage collector has any public contract. You don't even know what GC.start does.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

You don't even know what GC.start does.

You being a generic you there :).

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

Let me explain a bit more what I am saying.

As a developer trying to squeeze the last drop of performance, you know the internals, you know how things work in some particular version of Ruby and program taking advantage of those undocumented things. Like, better intern this string, better do this this way or that way because while on paper they are the same, we know this thing performs better than the other today. And when you upgrade, maybe you tweak again for the particularities of the new one.

But, as a documenter, all that has to be forgotten. You only have the public contract, and you have to carefully and consistently abide to the public contract.

Ruby does not tell you when something is elegible for GC, or when GC runs, or what a GC run does. And Ruby could in theory be smart enough to understand the value is not used anymore anywhere and can be garbage collected right away. Nothing is telling you that does not happen.

If you don't have guarantees as a public contract, you can't assume any particular behavior in the official documentation.

@zverok
Copy link
Contributor Author

zverok commented Jan 26, 2022

If you don't have guarantees as a public contract, you can't assume any particular behavior in the official documentation.

I believe there is a great difference between documentation statements like "Foo will do this" or "Foo will fail if ...", and examples. Everywhere throughout the Ruby documentation examples are informal, relying on a vague feeling of "demonstrating the essence of the behavior". No, we can't guarantee the particular GC behavior (especially if we'll try to extend the guarantee to any implementation in any year in the future). As we can't guarantee the object id of some example object (and still show it in examples), or behavior of particular IO operation on some OS (and still demonstrate how File, IO and puts work), or time-related methods etc.

I have a strong belief that most examples' goal is to "give an idea", not "show the formally verifiable and 100% reproducible demonstration of behavior, or show nothing". And I believe that "giving an idea" of dynamic subclass in subclasses output is a sensible goal.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

I believe there is a great difference between documentation statements like "Foo will do this" or "Foo will fail if ...", and examples.

No, no, the examples are public contract too.

As we can't guarantee the object id of some example object (and still show it in examples)

That's a different level. In some examples, the outcome obviously depends on the individual execution and the reader understands that. If that is not obvious, the documentation has to make it explicit.

In a case like

c = Class.new(A)
A.subclasses # => [#<Class:0x00007f003c77bd78>, D, B]

I don't think it is obvious that the class may or may not be there, and people could assume that is guaranteed.

And that is my whole point about this method, you can't rely on it in the general case. Same program, different output.

@zverok zverok force-pushed the docs-class-subclasses-clarification branch from 374b71b to ee0f046 Compare January 26, 2022 21:12
@zverok
Copy link
Contributor Author

zverok commented Jan 26, 2022

@fxn I appreciate your choice of a hill to die on but I don't see it this way, honestly.
I believe in simple yet consistent explanations and informal examples.

ATM, I don't know what we can do here.

@nobu @jeremyevans Maybe you can advise on the best way to proceed?

Long story short: I am trying to update Class#subclasses docs to give reasonable and not overwhelming insight on dynamic subclasses, but there are doubts about whether it should be mentioned, how it should be stated, and what's appropriate for examples. What should be done in this case?

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current language in the pull request seems fine to me. I would remove the extra empty comment line added at the top, though.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

I appreciate your choice of a hill to die

I don't like that you put it in that way, honestly.

One way to not mess with unespeficied behavior is to give it a side-kick, like the original patch did:

class A; end

c = Class.new(A)
A.subclasses # => [#<Class:0x00007f003c77bd78>]

c = nil
A.subclasses # => [#<Class:0x00007f003c77bd78>] or []

because writing it like this, you know for sure what happens in line 4, and do not need to even mention the other case.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

@jeremyevans we were writing at the same time. If the current patch is good for you, I'm fine.

@zverok
Copy link
Contributor Author

zverok commented Jan 26, 2022

I appreciate your choice of a hill to die

I don't like that you put it in that way, honestly.

@fxn Sorry, it was a joke of ill taste (I am Ukrainian which means I could be clumsy with English and also coming from a post-Soviet culture of a pretty adversary online community discussions, still trying to get rid of bad habits).
I didn't mean any offense and have only respect for you.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

@zverok one of my first comments was "I like it. It's concise and enough.". That was the end of it for me.

Appreciate the clarification, in text, I could not see a joke in those words.

@zverok
Copy link
Contributor Author

zverok commented Jan 26, 2022

@fxn All I tried to say (with badly chosen idiom!) is that I see your attention to a problem of whether invoking the assumption that dynamic subclass would be produced by subclasses, but can't share the feeling of importance for this distinction.

@zverok zverok force-pushed the docs-class-subclasses-clarification branch from ee0f046 to 274399d Compare January 26, 2022 21:51
@zverok
Copy link
Contributor Author

zverok commented Jan 26, 2022

@jeremyevans

I would remove the extra empty comment line added at the top, though.

Done. And commits squashed.

@byroot
Copy link
Member

byroot commented Jan 26, 2022

One thing is how CRuby works internally today, and a different thing is which is the public contract.

It is public contract. APIs such eval, binding.instance_variable_get etc, implies that variables stay valid until the scope is exited. It's probably in Ruby spec somewhere.

This was discussed at length in many tickets, for a Ruby implementation to change this, it would have to get rid of several very important APIs. So no, it's not an implementation detail, it's the semantic of Ruby.

@fxn
Copy link
Contributor

fxn commented Jan 26, 2022

It is public contract. APIs such eval, binding.instance_variable_get etc, implies that variables stay valid until the scope is exited. It's probably in Ruby spec somewhere.

Or maybe a smart analyzer could understand that those need the variable in scope, like c = nil does, but that otherwise the reference is orphan.

However, this method is your baby, you are fine with it, and @jeremyevans is fine too. I am fine with it too then.

@fxn
Copy link
Contributor

fxn commented Jan 27, 2022

All I tried to say (with badly chosen idiom!) is that I see your attention to a problem of whether invoking the assumption that dynamic subclass would be produced by subclasses, but can't share the feeling of importance for this distinction.

My doubts about this method are conceptual, are about API design. To me, an API of this level should not need to depend on GC. This is a criteria that is not shared and I consider the discussion over in that sense.

However, you seem to think "dynamic" classes are kind of rare, or exceptional, or something. I don't know what do you mean exactly by "dynamic" classes. All classes are created equal, either assigned to a constant, or a variable, or nothing. But when Rails reloads, there may be thousands and thousands of orphan classes in memory. Every reload, in every Rails application in the world, generates orphan subclasses. So the situation where non-reachable classes may show up non-deterministically is actually very common.

Let me stress, though, that in the discussion I am not thinking about Rails or about Zeitwerk, just API design.

Active Support has a similar API. But in my view Active Support may take some license. A Ruby core API has different demands of consistency and rigor.

As per discussion in [Feature #18273], explain the
non-deterministic nature of the method.
@zverok zverok force-pushed the docs-class-subclasses-clarification branch from 274399d to 038b1b1 Compare December 10, 2022 11:20
@zverok zverok merged commit f07897f into ruby:master Dec 11, 2022
@zverok zverok deleted the docs-class-subclasses-clarification branch December 11, 2022 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Improvements to documentation.
4 participants