Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update framepointers tests to avoid false positive with inlined C functions #11594

Merged
merged 5 commits into from
Oct 6, 2023

Conversation

fabbing
Copy link
Contributor

@fabbing fabbing commented Oct 3, 2022

As explained correctly in 11582 we don't have much control over the backtrace look with C functions: gcc can choose to inline or not some of them.
We probably still want to have minimal checks beyond caml_program (the starting point where our framepointers must be checked) while avoiding false positive on inlined function.
This PR proposes adding an awk script to validate functions from main..caml_start_program are the expected ones and in correct order but discard them, so .reference contains only the least likely to change.

This PR stops the frame-pointers backtrace at caml_program to ignore the differences in case of a different inlining decision by the C compiler.
It also does all of the previous "backtrace post-processing" in situ of the stack-walking-with-frame-pointers C code. So awk is not needed, and the previous dependency on sed or sh are also completely removed.

@dra27
Copy link
Member

dra27 commented Oct 3, 2022

I like the idea, but I really don't think we should introduce awk into the testsuite runners - although it's mitigated by the fact that frame-pointers mode is Linux-only (I think??), it's still a portability nightmare.

My instinct on this is to move the processing of the stack trace from the shell script back into ocamltest - i.e. have the test programs emit a fairly raw backtrace as now, but instead of using a run hook to replace native with a new action. The string processing is not particularly onerous (and could use ocamllex, if really needed). @shindere, what do you think?

Equally, are we straying into over-thinking all this (@xavierleroy, what do you think??)

@fabbing
Copy link
Contributor Author

fabbing commented Oct 3, 2022

Yes, at the moment, frame-pointers support is only for amd64 on Linux (it should however work on other Unixes).

@xavierleroy
Copy link
Contributor

Equally, are we straying into over-thinking all this

Quite possibly :-) My general impressions:

  • Some tests are very useful to get the implementation right, yet quite hard to turn into non-regression tests afterwards. A typical example is testing for fairness of scheduling. Maybe this is one of those tests. In which case we just keep them around but don't mark them as ocamltest tests.
  • Mixing sh, sed and awk in the same script is getting ugly, not even mentioning the portability problems @dra27 alludes to. I'm pretty sure all the filtering can be done in pure sed, or in pure sh, but the code wouldn't be pretty.
  • I still wish we'd standardize on Python for all the scripts of the OCaml distribution. It's not a great language but at least all the kids learn it in school these days, so we'll have maintainers when we retire.
  • No way you'd change ocamltest to handle the postprocessing of the trace! If you want to do your postprocessing in OCaml, you can write it in OCaml, have it compiled by ocamltest, and call it from the run script. Leave ocamltest as generic as possible, please.
  • The C code in fp_backtrace.c could also filter / sanitize / check the validity of the trace itself before printing it, or not print the trace but produce an OCaml list of function names which would then be filtered / sanitized / checked by OCaml code.

@shindere
Copy link
Contributor

shindere commented Oct 9, 2022 via email

@xavierleroy
Copy link
Contributor

This is off-topic for this PR, but:

First, it is reported to be easy to write scripts that are not portable from one version of Python to another. I don't know to which extent this is true.

This can't be worse than our current gawk vs mawk 1.3.3 vs mawk 1.3.4 situation, not to mention the odd non-GNU sed.

Second, I have the idea that the Python requirement is not as easily satisfied on Windows that it is on Linux, that's why there is
resistance to depending on it.

I see Python in the Microsoft Store and as Cygwin packages. Nobody bothered to explain the Windows issues yet; I'm all ears.

@nojb
Copy link
Contributor

nojb commented Oct 11, 2022

I see Python in the Microsoft Store and as Cygwin packages. Nobody bothered to explain the Windows issues yet; I'm all ears.

No issues that I am aware of. It is easy to install under Cygwin in any case.

@dra27
Copy link
Member

dra27 commented Oct 11, 2022

I'm not sure whether this is focussed at our "infrastructure" (testsuite, as here, support scripts like tools/check-typo, tools/pre-commit-githook, etc.) or the "build system" (i.e. what's needed for make world.opt). On the build system side, I personally don't think it's at all good that we switch to having opam switch create foo ocaml.5.something.0 require a working Python installation (on any system) - but that's no reflection of Python or Windows, it's just on adding a not-installed-by-default-everywhere dependency. I also FWIW have "glaciated" work from last year on the build system side that gets rid of all of the relatively tiny uses of sed and awk in the build system.

For the infrastructure, I personally anticipate that we swap one set of papercuts for another, but I don't have a strong objection. I think it is good, for example, that the installation instructions for tools/pre-commit-githook involve copying it to the correct place in your OCaml clone and it then just works (including with Git-for-Windows) and that becomes fractionally more complicated if it's written in Python. Similarly, I think it's good that if you've managed to build OCaml, make -C testsuite all should also just work and we'd potentially lose that, too. But all of that only affects the contribution process to OCaml - that should of course be as easy as possible, but I think it's axiomatically less important than the user experience of OCaml.

For the actual Windows papercuts which may or may not happen. We've already mentioned two entirely different distributions of Python, both assuming different file system conventions. It's that kind of thing - I would anticipate that where occasionally now we hit a funny locale, an overly-stringent POSIX awk, we'll start hitting an unrecognised path format because the user provided us with a Python from something we (and possibly they) hadn't thought of. It doesn't mean it'd be a bad idea to switch to Python, but if we're going to switch to something, I'd prefer it to be utopia, that's all! Given that it's often been me doing it, I kinda prefer chasing down a GNU-ism on my aged Macbook, with the POSIX spec in the other hand, to trying to trace exactly which executable has invoked what in a cryptic CI run and why the Windows file path hasn't been accepted by it (which I have to do quite a lot where git is concerned in the opam project, where we hit exactly this kind of problem and annoyingly regularly).

Another attempt to ignore potentially inlined C functions while still
checking reasonable behavior
In frame-pointers tests, the backtrace post-processing is performed
directly in fp_backtrace, and so, no longer depends on
filter-locations.sh (that was using sed and awk scripts).
@fabbing
Copy link
Contributor Author

fabbing commented Mar 10, 2023

The last commit moves all the backtrace post-processing completely into fp_backtrace.c, so filter-locations.sh is no longer needed, nor the awk and sed scripts it was using.

Hopefully this commit also passes the INRIA CI with more exotic configurations!

@fabbing
Copy link
Contributor Author

fabbing commented Mar 15, 2023

It looks fine on the Inria CI as well!
This PR is again ready for review.

@fabbing fabbing mentioned this pull request Jun 1, 2023
5 tasks
@shindere
Copy link
Contributor

I'm sorry but I can't review this PR myself.

Has anything been decided?

@gasche
Copy link
Member

gasche commented Sep 20, 2023

In #12588 I wrote logic in C to stop the printed backtrace at caml_startup | caml_main. This is an alternative approach to the present awk-script suggestion that could prove more consensual.

@shindere
Copy link
Contributor

shindere commented Sep 20, 2023 via email

@gasche
Copy link
Member

gasche commented Sep 20, 2023

No I didn't, I had forgotten that the present PR existed.

Because AFAIUI this PR does get rid of scripts by doing everything in C
already.

Indeed. On the other hand, the change to the C code in the present PR adds 117 lines and removes 71, while my PR adds 22 lines and removes 6. Maybe we can still hope for faster review.

@fabbing
Copy link
Contributor Author

fabbing commented Sep 20, 2023

This PR has evolved quite a bit from the very first implementation, to try and account for commenters opinion, especially on the delicate topic of awk, sed and scripts portability.
My bad for not updating the PR description accordingly.

Indeed. On the other hand, the change to the C code in the present PR adds 117 lines and removes 71, while my PR adds 22 lines and removes 6. Maybe we can still hope for faster review.

The bigger diff comes from that the all backtrace processing is now done in pure C, with POSIX regex, so it's obviously more C code but also removes the need for any of the sh scripts and sed calls.

@damiendoligez damiendoligez self-assigned this Sep 20, 2023

if (!sigsetjmp(resume_buf, 1)) {
*prev = fi->prev;
*retaddr = fi->retaddr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read correctly, that the signal mask and long jump business in safe_read was here to handle the read to retaddr, and this is why the present PR remove the function (and it auxiliary functions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is absolutely correct.
Previously, fp_backtrace tried to unwind the whole stack, but the linked list of 'saved rbp' was not always NULL terminated, as I expected. safe_read installed a temporary signal handler to gracefully handle an invalid pointer deference attempt.

Stopping the backtrace (and thus stack unwinding) at caml_program means this dubious trick is no longer needed.

Copy link
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing #12588 and this PR, I feel that #12588 is the simpler fix, whereas this PR ends up with the simpler code:

  • all processing is in one place and one language
  • the regexp matching code is as complex as before
  • removing signal handling and longjumps from the test code feels like a clear win.

Now that I have reviewed both PRs, I am in favor of merging this version.

return 0;
//regoff_t len = funcname->rm_eo - funcname->rm_so;
//return strnstr(symbol + funcname->rm_so, CAML_ENTRY, len) == 0;
return strstr(symbol + funcname->rm_so, CAML_ENTRY) != NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems slightly too lax, since we are testing that symbol at offset funcname contains CAML_ENTRY somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right!
I don't know why I stopped using strncmp like I did as in is_from_executable... Fixed in 3dc39fe.

@Octachron Octachron merged commit 6a0a0a4 into ocaml:trunk Oct 6, 2023
9 checks passed
@Octachron
Copy link
Member

Squashed and merged, thanks for the simplification and sorry for the late review of the final version !

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Oct 8, 2023
…ctions (ocaml#11594)

Another attempt to ignore potentially inlined C functions while still
checking reasonable behavior

In frame-pointers tests, the backtrace post-processing is performed
directly in fp_backtrace, and so, no longer depends on
filter-locations.sh (that was using sed and awk scripts).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants