Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repetitive doctests in sage.structure.coerce_dict #26238

Open
embray opened this issue Sep 11, 2018 · 11 comments
Open

Repetitive doctests in sage.structure.coerce_dict #26238

embray opened this issue Sep 11, 2018 · 11 comments

Comments

@embray
Copy link
Contributor

embray commented Sep 11, 2018

The sage.structure.coerce_dict has many examples of the problem I've seen before where entire doctests are documented verbatim, or sometimes almost verbatim, in order to satisfy the poorly-worded "all def statements must have an examples block in their docstring" rule.

Some of these are not even useful in the spirit of that rule, because they are written as "indirect doctests" that don't actually usefully demonstrate how the class is used internally. This is especially true for the examples in MonoDictEraser and TripleDictEraser. These could use better tests/examples. Or none at all if it's not practical to do so.

Component: misc

Issue created by migration from https://trac.sagemath.org/ticket/26238

@embray embray added this to the sage-8.4 milestone Sep 11, 2018
@embray
Copy link
Contributor Author

embray commented Sep 11, 2018

comment:1

There's also a lot of repetition between the implementations of MonoDict and TripleDict. Some of this may be unavoidable, but I suspect much of it isn't. There are obvious cases of copy-and-paste in docstrings, e.g. one which refers to "MonoDict" where it should read "TripleDict".

For the most part all that separates these classes is the number of elements in the key. The question is can this be generalized a bit more without sacrificing performance. I suspect yes but it's low priority for me to try.

@embray
Copy link
Contributor Author

embray commented Sep 11, 2018

comment:2

It's also too bad this is kind of buried as sage.structure.coerce_dict. I was aware this module existed but this is the first time I looked closely at it. I thought it might be very particular to the Sage coercion model in some way, but not necessarily. It seems like it might have use cases beyond that. I wonder if some more generalized version of it would be useful to make available as a stand-alone module.

@embray embray modified the milestones: sage-8.4, sage-8.5 Oct 28, 2018
@embray
Copy link
Contributor Author

embray commented Dec 28, 2018

comment:4

Retargeting some of my tickets (somewhat optimistically for now).

@embray embray modified the milestones: sage-8.5, sage-8.7 Dec 28, 2018
@nbruin
Copy link
Contributor

nbruin commented Jan 2, 2019

comment:5

Replying to @embray:

For the most part all that separates these classes is the number of elements in the key. The question is can this be generalized a bit more without sacrificing performance. I suspect yes but it's low priority for me to try.

Yes, it can be generalized, but it's not clear there's an application for it. The code presently has unrolled loops in the hope that we get optimal performance that way. It would seem natural that a general version would be a bit slower. My guess would be that it would still be worth keeping the MonoDict (also because that's a lot closer to a normal dictionary), and since a TripleDict is the only other instance we'd use, there seems to be no gain in generalizing. That was my intuition when it was written.

@jdemeyer
Copy link

jdemeyer commented Jan 2, 2019

comment:6

Replying to @embray:

The question is can this be generalized a bit more without sacrificing performance.

By making it more general, you will always sacrifice performance a bit.

Since TripleDict plays an extremely important role in the coercion model, it affects every single arithmetic operation between different parents. So performance matters a lot.

@jdemeyer
Copy link

jdemeyer commented Jan 2, 2019

comment:7

One solution to reduce code duplication without losing any performance would be to use a templating language on the source code (say jinja2 since Sage ships with that or maybe even C++ templates). We could deal with MonoDict as a special n==1 case where needed.

@embray
Copy link
Contributor Author

embray commented Jan 2, 2019

comment:8

Yes, I had thought of either trying to cobble together something with C preprocessor macros, or with C++. I agree with Nils that it doesn't make sense to instantiate more versions of this than we need; I just meant that the code could probably be generalized in some way that requires less duplication of code.

@jdemeyer
Copy link

jdemeyer commented Jan 2, 2019

comment:9

The question is... is it worth doing that?

@nbruin
Copy link
Contributor

nbruin commented Jan 2, 2019

comment:10

Replying to @embray:

I just meant that the code could probably be generalized in some way that requires less duplication of code.

The use of any templating makes the code harder to understand, at least because it requires the reader to understand the templating in addition to cython and the inner workings of Python. It also makes it harder to spot places where optimizations can be made.

So I'd say there's a significant cost to using a less direct way of generating the code. You'd have to show a significant benefit to justify it. In this case it would be "maintainability". I'm not so sure you really improve that. There is indeed some repetition of code, but it's very straightforward code and the total body of code is not that bad either. So I'd think there's no room to recoup the cost of templating.

The doctests on the other hand: yes some of those are only there to cover the coverage metric. There would probably be room for improving the tests. The actual coverage of the functionality is pretty good with some of the more centralized tests, though, so I'm not sure you'll find bugs or improve the code by improving the doctests.

Concering the doctest coverage metric: I think that metric has the advantage of being very simple and has a net positive effect, so I'd be in favour of keeping the metric.

@embray
Copy link
Contributor Author

embray commented Jan 3, 2019

comment:11

Fair enough; I don't disagree I was just wondering out loud. In this ticket I'm more concerned about the poor and/or repetitive testing.

@embray
Copy link
Contributor Author

embray commented Mar 25, 2019

comment:12

Removing most of the rest of my open tickets out of the 8.7 milestone, which should be closed.

@embray embray removed this from the sage-8.7 milestone Mar 25, 2019
@embray embray added the pending label Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants