Skip to content

Error handling: split stack frames over two lines; adjust more manual examples#6263

Open
fingolfin wants to merge 6 commits intomasterfrom
codex/even-nicer-error-handling
Open

Error handling: split stack frames over two lines; adjust more manual examples#6263
fingolfin wants to merge 6 commits intomasterfrom
codex/even-nicer-error-handling

Conversation

@fingolfin
Copy link
Member

@fingolfin fingolfin commented Mar 11, 2026

Render visible stack frames with numbered prefixes on one line and an aligned location line below, and give compiled GAP frames a stacktrace-specific format.

Also update a bunch of examples in the manual and code comments to reflect the new stack trace formatting.

AI-assisted with Codex for implementation, tests, and documentation.

Follow up to PR #6257


There are several aspects to this PR one may wish to discuss

  1. It splits each stack trace line into two; with the file/location on its own line. I borrowed the formatting from Julia, because I know and like it, but am completely open to alternatives. I would also understand if people prefer the more compact old printing; but for me, it is often difficult to weed through a long and complex GAP stack trace, and often what I want most are the file:line pairs (as I just cmd-click on them in my terminal to look at the code they reference).

  2. It updates more examples, and in some cases edits them; in some cases I decided it is best to just delete something. Please have a look if you agree

  3. The implementation: I feel PRINT_CURRENT_STATEMENT gets more and more complex. I've thought about converting parts of it to GAP code, but this always ends up raising questions about "but won't this need to allocate or cause additional errors, and we can afford neither". So I decided to leave it as it is, to be refactor in a future PR. But feel free to disagree / suggest better alternatives.


Further ideas (for follow-up PRs):

  • investigate using color or bold style (if available; possibly configurable in the same way the color prompts are)
  • show the name of the active function in addition or even instead of the statement
    • pro: the function name provides extra context
    • pro: this is like every (?) other system seems to do it (e.g. gdb, Julia, Python)
    • con: adding a third line (or filling up one of the existing lines) is not only a blessing
    • hence why I mention that this could also be instead of showing the active statement/expression... but again that's not a clear win/loss:
      • the statement can be multiple lines or just generally very long, which is confusing
      • it also can be formatted differently from how it appears in the code
      • still it can be helpful in pinning down to which place exactly the stack trace refers... especially if the code was edited after being loaded, so that line numbers are off, then this is nice for "adjusting" to that
  • change DownEnv/UpEnv to always re-print the stacktrace
  • ... and maybe more, your ideas here

@fingolfin fingolfin added release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: error handling labels Mar 11, 2026
gap> dive(100);
gap> OnBreak:= function() Where(1); end; # shorter traceback
function( ) ... end
gap> OnBreak:= function() Where(2); end;; # shorter traceback
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "1" showed 2 level due to a bug that I fixed in #6257, so I've now changed it to 2 levels to more closely match was the example showed before

[2] dive( depth - 1 );
@ *stdin*:4
... at *stdin*:12
you may 'return;'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think one still can quit; but the message changed. Maybe a bug? But I think not caused by this PR. Someone should investigate (I might, later)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a bug, but I am sure its not cause but this PR as well and its irrelevant to the changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked: in GAP 4.4 it still printed:

Entering break read-eval-print loop ...
you can 'quit;' to quit to outer loop, or
you can 'return;' after assigning a value to continue

but starting with 4.5, the "quit" line was gone. I think I know why (certain changes to kernel error handling), but am too lazy to dig up the exact commit: it'll not be in the public history anyway, plus I don't see a point: clearly we want to show this info, yet error handling has substantially changed since then, so we are better of just adding it back now without caring how it used to be done. I'll do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not so small, so for now I filed an issue, let's focus on the rest of this PR first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

SCRMakeStabStrong( S.stabilizer, [ g ], param, orbits, where, basesize,
base, correct, missing, false ); called from
SCRMakeStabStrong( S.stabilizer, [ g ], param, orbits, where, basesize,
exceeded the permitted memory (`-o' command line option)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to drop the stacktrace: the example can't be reproduced (missing inputs) but it is also irrelevant for the point this section is trying to make

As usual you can leave the break loop with <C>quit;</C>.
If you enter <C>return <A>return-expr</A>;</C> the value of the expression
<A>return-expr</A> is assigned to the variable,
and execution continues after the assignment.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was straight out wrong (we removed this capability years ago as it seemed to be only a source of grief)

Entering break read-eval-print loop ...
you can 'quit;' to quit to outer loop, or
you can 'return;' to continue
gap> StabChain(SymmetricGroup(1000)); # After this we typed ^C
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to go from 100 to 1000 as I otherwise couldn't press Ctrl-C quickly enough 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would resolve that too.
I'm happy with that.

lib/grpfp.gd Outdated
Comment on lines +81 to +85
## *[1] TCENUM.CosetTableFromGensAndRels( fgens, grels, fsgens )
## [2] CosetTableFromGensAndRels( fgens, grels, fsgens )
## [3] TryCosetTableInWholeGroup( H )
## [4] CosetTableInWholeGroup( H )
## [5] IndexInWholeGroup( H )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's actually fake, the AI decided to do that, and I forgot to replace it: problem is that the example code above doesn't "work" anymore, in that Index( f, u ); just instantly prints infinity :-). Which of course is a nice improvement, but it means some extra work is needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I couldn't reproduce it because the wonderful FGA package was loaded. Without it, I got the error. Will update the comment then.

## <Log><![CDATA[
## gap> SylowSubgroup( s4, 6 );
## Error, SylowSubgroup: <p> must be a prime called from
## <compiled or corrupted call value> called from
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh.... good that we replace that now

## Length( Enumerator( C ) ) called from
## Size( UnderlyingCollection( enum ) ) called from
## Length( Enumerator( C ) ) called from
## would otherwise end up in an infinite recursion.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed silly to hack the code to again produce that bug... just saying "this code used to fail" should be enough (or, one might argue, still too much, we have .tst files for such examples... ;-) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin I would definitely move to to tst files and present only working examples in docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be resolved now, I think.

if (activeContext != Fail) {
Pr(context == activeContext ? "*[%d] " : " [%d] ",
INT_INTOBJ(level), 0);
snprintf(prefix, sizeof(prefix), "%c%*s[%lu] ",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use fprintf in this file, so I thought it's OK to allow the AI this ...

Pr("<<compiled GAP function \"%g\">>", (Int)funcname, 0);
}
else {
Pr("<<compiled GAP function>>", 0, 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We deliberately duplicate some PrintKernelFunction functionality to get a more consistent output


while (totalDepthInt > 0) {
prefixWidth++;
totalDepthInt /= 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin Do you like this prefix change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which part you mean? Whether I like the way this code is written (not particular, but it will be replaced in a follow-up), or the feature it enables (indenting the [1], [2] "prefix" so that if they go double-digits, they keep right aligned) ? That I like (and made it add), it really helps reading the stack trace if it has 10 or more levels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

gap>
gap> func( );
Error,
Error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin How about removing this whitespace at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is genuinely output by Error, so removing it here would cause a test failure. I guess we could try to get rid of it here, but I am not sure it's worth the effort: anyone who writes Error() hopefully only does that in test code, not in something shipping to users...

[2] dive( depth - 1 );
@ *stdin*:4
... at *stdin*:12
you may 'return;'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a bug, but I am sure its not cause but this PR as well and its irrelevant to the changes.

## Length( Enumerator( C ) ) called from
## Size( UnderlyingCollection( enum ) ) called from
## Length( Enumerator( C ) ) called from
## would otherwise end up in an infinite recursion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin I would definitely move to to tst files and present only working examples in docs.

src/calls.c Outdated
Obj body = BODY_FUNC(func);
Obj filename = body ? GET_FILENAME_BODY(body) : 0;
if (filename) {
// A precise location means this is pure kernel code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin Do you think these comments are valuable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. I explained to the AI what these different cases mean (so that it could adjust or copy them as needed); and then asked it to turn my remarks into comments, but it didn't do a good job: the first two are indeed "obvious" from the code context; and the third one is misleading: this is not a "legacy" case, but a matter of some code, here MAKE_BITFIELDS, producing function objects with arguably incomplete metadata.

I'll revise this.

fingolfin and others added 3 commits March 13, 2026 09:38
Render visible stack frames with numbered prefixes on one
line and an aligned location line below, and give compiled
GAP frames a stacktrace-specific format.

AI-assisted with Codex for implementation, tests, and
documentation.

Co-authored-by: Codex <codex@openai.com>
Copy link
Contributor

@james-d-mitchell james-d-mitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good (in terms of the way the stack traces look, I haven't really looked at the implementation).

[4] String( record.(nam) ) at GAPROOT/lib/record.gi:LINE called from
[5] String( record.(nam) ) at GAPROOT/lib/record.gi:LINE called from
*[1] String( record.(nam) )
@ GAPROOT/lib/record.gi:LINE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really supposed to be LINE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The script tst/testspecial/run_gap.sh intentionally substitutes the strings GAPROOT and LINE into these files; otherwise we'd constantly get failures caused by e.g. someone inserting a line into lib/record.gi.

## <function>( <arguments> ) called from read-eval-loop
## Entering break read-eval-print loop ...
## Error, the residues must be equal modulo 2
## *[1] Error( "the residues must be equal modulo ", g.gcd );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly vexing to see the error message in line 251 and then again in 252, but I understand that this is a necessity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed...

So, there is a way: in other system, we wouldn't print the statement with the error here, we would print the function containing the statement... but I'd rather leave that to a follow-up PR, as I believe it may be controversial...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of follow-up pull request is great.
I fell like @james-d-mitchell, but I see... the controversy. 😼
I would close it.

*[1] <<compiled GAP code>> from GAPROOT/lib/oper1.g:LINE in function INSTALL_METHOD called from
[2] <<compiled GAP code>> from GAPROOT/lib/oper1.g:LINE in function InstallMethod called from
*[1] <<compiled GAP function "INSTALL_METHOD">>
@ GAPROOT/lib/oper1.g:LINE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More instances of "LINE" which should probably be a number no?

@fingolfin fingolfin force-pushed the codex/even-nicer-error-handling branch from 8a76023 to 6cc08be Compare March 13, 2026 17:41
@fingolfin
Copy link
Member Author

@james-d-mitchell @limakzi thank you for your reviews. I've now tried to address everything. Please user the "Resolve conversation" buttons suitably if you are satisfied; or else please feel free to ask follow-ups

Copy link
Member

@limakzi limakzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

[2] dive( depth - 1 );
@ *stdin*:4
... at *stdin*:12
you may 'return;'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Entering break read-eval-print loop ...
you can 'quit;' to quit to outer loop, or
you can 'return;' to continue
gap> StabChain(SymmetricGroup(1000)); # After this we typed ^C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would resolve that too.
I'm happy with that.

## <function>( <arguments> ) called from read-eval-loop
## Entering break read-eval-print loop ...
## Error, the residues must be equal modulo 2
## *[1] Error( "the residues must be equal modulo ", g.gcd );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of follow-up pull request is great.
I fell like @james-d-mitchell, but I see... the controversy. 😼
I would close it.

## Length( Enumerator( C ) ) called from
## Size( UnderlyingCollection( enum ) ) called from
## Length( Enumerator( C ) ) called from
## would otherwise end up in an infinite recursion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be resolved now, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: error handling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants