Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codeblock-scoped # optional annotations for doctests (modularization) #35750

Closed
mkoeppe opened this issue Jun 9, 2023 · 36 comments · Fixed by #35749
Closed

Codeblock-scoped # optional annotations for doctests (modularization) #35750

mkoeppe opened this issue Jun 9, 2023 · 36 comments · Fixed by #35749

Comments

@mkoeppe
Copy link
Member

mkoeppe commented Jun 9, 2023

They would enable us to:

  • keep doctest annotations sufficiently fine-grained for the purposes of test coverage in modularized tests
  • reduce the visual clutter in source code and formatted documentation
  • reduce the maintenance effort (in particular when using primitive text editors).

See

Switching from the current line-level annotations to block-level annotations (where it makes sense) can be easily automated. We should include such an automation tool in Sage. It would complement the other development tools provided to assist with modularization tasks. That's:

See also:

Part of:

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 14, 2023

How about this?

        Persistent optional tags work::

            sage: # long time
            sage: QQbar(I)^10000
            1
            sage: QQbar(I)^10000  # not tested
            I

            sage: # optional - sage.rings.finite_rings
            sage: GF(7)
            Finite Field of size 7
            sage: GF(10)  # not tested
            Traceback (most recent call last):
            ...
            ValueError: the order of a finite field must be a prime power    

A persistent optional tag is triggered by a line sage: # ... (note empty code). Then it is effectively added at the end of each line of the block following the tag.

This seems better (and easy) than adding the tag to the start of a Sphinx directive ::, which I originally suggested.

It is easy since I could implement the persistent optional tag by less than 10 lines :-)

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 15, 2023

I think it would be good if the syntax was explicit about affecting a block rather than a line.
For example:

            sage: # doctest block: long time
            sage: QQbar(I)^10000
            1
            sage: QQbar(I)^10000  # not tested
            I

            sage: # doctest block: optional - sage.rings.finite_rings
            sage: GF(7)
            Finite Field of size 7
            sage: GF(10)  # not tested
            Traceback (most recent call last):
            ...
            ValueError: the order of a finite field must be a prime power    

And I'm not sure that a simple blank line should end a block in this sense. For other purposes, these blank lines have no syntactic meaning. So perhaps it should be

But overall these details don't matter much to me, as long as it can be kept in a format that makes writing automatic code maintenance scripts not too hard.

Probably you'll want feedback from people who participated in the discussion in https://groups.google.com/g/sage-devel/c/utA0N1it0Eo; @kcrisman @tscrim @tobiasdiez

@tobiasdiez
Copy link
Contributor

xdoctest handles this as follows: https://xdoctest.readthedocs.io/en/latest/xdoctest.directive.html

# An inline directive appears on the same line as a command and

# only applies to the current line.

raise AssertionError('this will not be run (a)')  # xdoctest: +SKIP

print('This line will print: (A)')

print('This line will print: (B)')

# However, if a directive appears on its own line, then it applies

# too all subsequent lines.

# xdoctest: +SKIP()

raise AssertionError('this will not be run (b)')

print('This line will not print: (A)')

# Note, that SKIP is simply a state and can be disabled to allow

# the program to continue executing.

# xdoctest: -SKIP

print('This line will print: (C)')

print('This line will print: (D)')

# This applies to inline directives as well

# xdoctest: +SKIP("an assertion would occur")

raise AssertionError('this will not be run (c)')

print('This line will print: (E)')  # xdoctest: -SKIP

raise AssertionError('this will not be run (d)')

# xdoctest: -SKIP("a reason can be given as an argument")

print('This line will print: (F)')

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 15, 2023

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 15, 2023

I think it would be good if the syntax was explicit about affecting a block rather than a line.

These optional tags already require the reader's understanding of the mechanism of optional tags. Then it is not a big deal that users learn about "persistent optional tags" in addition, which I think is quite natural and self-evident that it is a block tag (it is already a coding practice to add a comment above the relevant code block) . Once learnt, "doctest block:" would look just another cluttering.

But overall these details don't matter much to me, as long as it can be kept in a format that makes writing automatic code maintenance scripts not too hard.

The script would convert

sage: line1 # optional - sage.modules
sage: line2 # optional - sage.modules
<BLANKLINE>

to

<BLANKLINE>
sage: # optional - sage.modules
sage: line1
sage: line2
<BLANKLINE>

and

sage: line0
sage: line1 # optional - sage.modules
sage: line2 # optional - sage.modules
<BLANKLINE>

to

sage: line 0
<BLANKLINE>
sage: # optional - sage.modules
sage: line1
sage: line2
<BLANKLINE>

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 15, 2023

Given that "persistent optional tag" is very light weight (it takes 10 lines of code), xdoctest seems too much as a solution to our problem. Whether we need the full power of xdoctest in sage is another matter.

@DaveWitteMorris
Copy link
Member

Matthias wrote on sage-devel:
Fine with me to introduce something like "# module - sage.groups" with identical semantics as "# optional - sage.groups". I'd suggest to take such discussions to #35750

It seems to me that this would be more self-explanatory if we use something like "# requires sage.groups" or "# requires import of sage.groups"

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 16, 2023

# optional - has a big merit that it has long been used for doctests depending on optional packages as in # optional - mathematica. Matthias just extended its use to distribution packages (# optional - sage.modules).

I don't think using just different phrases does not add much but lose the merit.

@DaveWitteMorris
Copy link
Member

I agree that # optional has worked well for doctests, especially since packages are classified as "optional". But for examples in documentation that an end-user will see, I think something like # requires ... is much more helpful. It is saying that if you want to do the example yourself, then you need to havesage.modules. I think that almost anyone who sees requires <whatever> will realize that they need to figure out a way to get <whatever>. (Maybe a different phrase would be even better.) But I don't think that the tag optional <whatever> conveys the same message to a non-expert. Indeed, I don't think I would understand what it meant if I weren't already involved with sage development.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 16, 2023

How about # standard - sage.groups then

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 16, 2023

I think that almost anyone who sees requires <whatever> will realize that they need to figure out a way to get <whatever>.

Right. So that is more problematic in a sense because in reality they don't need to do anything to run the doctests since <whatever> is already in sage. Those tags are really for the (yet imaginary) users of selected pieces of sage.

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 16, 2023

So I think it is more important now that we hint (or educate) the user that they just can ignore the optional tags, for which "optional" seems better than "requires" (which scares a user :-)

@DaveWitteMorris
Copy link
Member

You are right that the word "requires" is not helpful for users of the distribution (although I think it is good for the "(yet imaginary) users"). Maybe the word "uses" would be better? For a user of the distribution, this is just a comment about what is being used, but for a user who is importing a-la-carte, this is telling them what they need. There may well be an even better word.

Following up on Matthias's suggestion, it would be more informative to say something like # uses sage.groups (a standard module) and sage.mathematica (an optional module), but I would guess we all agree that is too unwieldy.

Perhaps part of our nonagreement is that I am only talking about examples, not all doctests. My comments are intended to be directed at what will show up in the (printed or html or ...) documentation, so I am not trying to say anything about code that appears in a TEST:: block. In my way of thinking, doctests are for developers, but I hope that examples in the documentation can be clear to novice users, with as little special education as possible. Perhaps I'm not being realistic.

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 16, 2023

We have now the following candidates:

(o) optional # optional - sage.groups (current)
(m) module # module - sage.groups
(s) standard # standard - sage.groups
(r) requires # requires sage.groups
(u) uses # uses sage.groups
(n) needs # needs sage.groups
(a) assumes # assumes sage.groups

I added the last one.

Anyway, this is orthogonal with the idea of persistent tags I suggested before.

@mathzeta
Copy link

If those annotations correspond to actual modules, then other candidates are # import sage.groups and # imports sage.groups. Then it is easy to see how they can be used on a line of their own, instead of codeblock-scoped.

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 16, 2023

No. It just signals that the distribution package providing the feature named sage.groups (which of course means group functionality) should be installed for the doctests to work.

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 16, 2023

+1 to (n). It reads well.

@tobiasdiez
Copy link
Contributor

Would it work to explicitly import things from other distributions (not as an comment, but as a direct import), then analyze the imports and if they point to a different distribution then ignore the lines following the import?

This would also have the huge advantage to support users that want to use sage as a normal python package without loading the sage environment.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 16, 2023

Would it work to explicitly import things from other distributions (not as an comment, but as a direct import), then analyze the imports and if they point to a different distribution then ignore the lines following the import?

I think we would be able to generate import statements from the # optional annotations.

The necessary cross-referencing information is going to be in sage.features.sagemath.
See:

This would also have the huge advantage to support users that want to use sage as a normal python package without loading the sage environment.

+1.

I could imagine that the code blocks in the formatted HTML could offers tabs for different views, as is done for example in https://setuptools.pypa.io/en/latest/userguide/package_discovery.html: The example in monolithic Sage (goes through preparser, no imports), and the example in plain Python (some prettified version of the preparsed example + imports). This would also address @williamstein's comment (https://groups.google.com/g/sage-devel/c/utA0N1it0Eo/m/79aD4-RQAAAJ)

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 17, 2023

I chose "needs" as an alternative to "optional" in the linked PR. Let me know if you object.

"needs" is from https://groups.google.com/g/sage-devel/c/lb1Eq8o0a5I/m/dwofof9dAAAJ

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 17, 2023

One problem with "needs" (and other phrases as well in this matter except "optional") is that people may use the word in normal comments. One instance was revealed in the linked PR (there is a comment "# needs 2 period").

Instead of trying to maker our parser smarter, I think it is easy and safer to use more formal "needs:" (note the colon at the end) for our purpose. So # needs: sage.groups. Any idea?

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 17, 2023

I decided to go with "needs" (colon or hyphen can follow optionally though), revising the problematic doctests.

@DaveWitteMorris
Copy link
Member

I think this is much better than "optional", so I do not object. However, "needs" is almost synonymous with "requires" in this context, so I'm surprised that you chose it. I think "uses" would be better, but I do not feel strongly and do not object.

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 21, 2023

I think this is much better than "optional", so I do not object.

Thanks.

However, "needs" is almost synonymous with "requires" in this context, so I'm surprised that you chose it.

People seemed to want something more direct than "optional" in that context, and "needs" sounds (to me yes) gentle and soft compared with other candidates.

I also realized, by Matthias' "standard" candidate, that "optional" in that context is not quite right since "sage.groups" is not as "optional" as say "mathematica".

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 21, 2023

Yes, that's exactly why I suggested "standard" - which continues to have the same technical meaning in the modularized setting as in the Sage distribution

@DaveWitteMorris
Copy link
Member

I think the word "standard" is short for something like "requires the following standard module" (or "assumes the following standard module has been imported"). My objection is that I don't think an uneducated user will understand this shorthand, and it seems to me that the word "needs" (or "requires" or "uses") is more self-explanatory -- it seems to me that it is better at getting across the main idea without requiring further explanation. (And I like the word "uses" in the distribution context, because it sounds more like a comment that can be ignored, whereas "needs" and "requires" would seem to indicate that something needs to be done.)

By the way, if "uses" (or a similar word) is chosen, then perhaps something like "uses module:" (or "uses modules:" when there is more than one) would be better if it is ok to have more than one word here. Specifically, it would pretty much eliminate the possibility of random comments that happen to have the form picked up by the doctester.

I may be wrong on all of this, but thought I should share my opinions in case they are useful. My main point is that, for the reason in my first paragraph, I think "needs"/"requires"/"uses" (or something similar), is better than "standard" and "optional" for the user documentation. Choosing between "needs"/"requires"/"uses" may be bikeshedding, so I really don't want to push on that.

@soehms
Copy link
Member

soehms commented Jun 21, 2023

And I like the word "uses" in the distribution context, because it sounds more like a comment that can be ignored, whereas "needs" and "requires" would seem to indicate that something needs to be done.

+1

By the way, if "uses" (or a similar word) is chosen, then perhaps something like "uses module:" (or "uses modules:" when there is more than one) would be better if it is ok to have more than one word here. Specifically, it would pretty much eliminate the possibility of random comments that happen to have the form picked up by the doctester.

I think checking that the words after the keyword are the names of features would be secure enough.

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 22, 2023

There are 5 doctest instances that have dangling comments starting with "# uses ..." while there was 1 doctest instance that has a dangling comment starting with "# needs ...", which was fixed by #35779. Hence "needs" is slightly safer than "uses" as a tag.

"# needs sage.groups" really means that the doctest needs the distribution package installed that provides the feature named "sage.groups". It is somewhat misleading to say that the doctest "uses" the module (python module or python package) "sage.groups".

@soehms
Copy link
Member

soehms commented Jun 22, 2023

There are 5 doctest instances that have dangling comments starting with "# uses ..." while there was 1 doctest instance that has a dangling comment starting with "# needs ...", which was fixed by #35779. Hence "needs" is slightly safer than "uses" as a tag.

"# needs sage.groups" really means that the doctest needs the distribution package installed that provides the feature named "sage.groups". It is somewhat misleading to say that the doctest "uses" the module (python module or python package) "sage.groups".

I agree from the point of view of a developer. But from the point of view of a user who sees this in the documentation, the other choice would be less irritating.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 22, 2023

How about in the block-scoped annotations, we spell it out as # uses standard feature - sage.groups, whereas in the line annotations we use the shortening # standard - sage.groups.

The maintenance script for rewriting annotations (wishlist item) can bring annotations to this standard form

The keyword recognized would just be standard, the rest would be ignored noise

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 22, 2023

... Choosing between "needs"/"requires"/"uses" may be bikeshedding, so I really don't want to push on that.

There is a reason why the term bikeshedding was invented. It is so common. In our case, the problem is not trivial as all developers and users would use and see "needs", "requires", "uses", etc.

Hence to settle down this problem rapidly without spending too much time for discussions and discussions, let's accept that we really care about "bikeshedding".

I would go straight for a sage-devel voting with the following 4 candidates:

(A) "needs"

sage: # needs sage.rings.number_field
sage: QQbar(I)^2
-1
sage: QQbar(I)^1000000000 # long time
1

sage: # needs sage.rings.finite_rings
sage: F = GF(7)
sage: F(1) + QQbar(1)  # needs sage.rings.number_field
...

(B) "requires"

sage: # requires sage.rings.number_field
sage: QQbar(I)^2
-1
sage: QQbar(I)^1000000000  # long time
1

sage: # requires sage.rings.finite_rings
sage: F = GF(7)
sage: F(1) + QQbar(1)  # requires sage.rings.number_field
...

(C) "uses"

sage: # uses sage.rings.number_field
sage: QQbar(I)^2
-1
sage: QQbar(I)^1000000000  # long time
1

sage: # uses sage.rings.finite_rings
sage: F = GF(7)
sage: F(1) + QQbar(1)  # uses sage.rings.number_field
...

(D) "standard"

sage: # use standard feature - sage.rings.number_field
sage: QQbar(I)^2
-1
sage: QQbar(I)^1000000000  # long time
1

sage: # use standard feature - sage.rings.finite_rings
sage: F = GF(7)
sage: F(1) + QQbar(1)  # standard - sage.rings.number_field
...

Do you agree?

@kwankyu
Copy link
Collaborator

kwankyu commented Jun 22, 2023

Don't worry about minor details or punctuations. As Matthias said, the keyword would be recognized, the rest would be ignored noise.

But for the voting, let me know what to edit to make your favorite candidate look best.

@DaveWitteMorris
Copy link
Member

Having a vote is fine with me, but I think it might be a good idea to have a comment period on sage-devel before calling the vote, because someone may want to suggest another candidate.

Is there a mistake in the second example of (D)? I don't think "uses" should appear after the hyphen.

I don't understand the # not tested examples. The answer is not correct, so what are they supposed to illustrate? I guess that will be part of the explanation of the vote.

@mkoeppe
Copy link
Member Author

mkoeppe commented Jun 22, 2023

I agree, other examples for the # not tested annotation, for example very long running time, would better

@williamstein
Copy link
Contributor

I like "needs" because "uses" and "requires" are very close to the keywords in Julia and Javascript for importing modules, whereas "needs" doesn't resemble the package import keyword for any programming language (according to a little search I just did). Thus using "needs" could avoid confusion.

@kwankyu
Copy link
Collaborator

kwankyu commented Jul 7, 2023

Supported by the voting result

https://groups.google.com/g/sage-devel/c/MtS2u3VbJEo

we can go ahead with the keyword "needs". I thank you all who participated in the discussion.

@vbraun vbraun closed this as completed Jul 16, 2023
@mkoeppe mkoeppe added this to the sage-10.1 milestone Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment