Merge structured output #286

spoutn1k · 2024-01-20T13:17:41Z

For your viewing pleasure, the changes made by lefessan updated to fit with the new commits made on master. I aligned the whitespace to the master branch and made sure all the default tests pass. This is not ready for merging; any input is highly appreciated as to feature completion.

Namely, I noticed the JSON outputs puts everything in quotes, even integers, which I think is not ~~intended~~ ideal and I will ~~change~~ make toggleable.

Add options "-B / --structured-output FORMAT" to print the output of strace in a more parsable format. Currently, we provide "json" for JSON format and "ocaml" for OCaml format. Given the extend of modifications, we don't aim at covering all the features of strace, but only the most commonly used, and keep full textual backward compatibility. See file README-format for more information

…tempt-1

spoutn1k · 2024-01-21T18:47:03Z

There seems to have been a decision made to convert flags to strings. This is fine, but the implementation went a little overboard and in some cases the flags objects are opened within one another resulting in dubious output.

After fixing the merging errors from the cherry picked commits we have this kind of output:

{
    "cmd": "mmap",
    "args": [null, "8192", {
        "flags": ["PROT_READ", "PROT_WRITE"] <--- This is normal, the prot field of mmap
    }, {
        "flags": [{
            "flags": ["MAP_PRIVATE", {
                "flags": ["MAP_ANONYMOUS"]   <--- This is not great
            }]
        }]
    }, "-1", "0"],
    "return": "0x75bd0874c000"
}

Now it looks like the tprint_flags_begin/end and methods have been added to most system calls, along with being inserted in the printflags_ex function that is responsible for the printing, creating this nesting.

Now I was able to fix this for mmap by giving the hand fully to the common denominator printflags_ex, and letting it manage all the flag printing. However, that takes away a lot of power from each implementation and will make me remove all the tprint_flags_begin/end from a lot of files, but will be easier to maintain in the long run.

Any input is appreciated, would you rather have this or letting the syscall-specific implementation manage the flags ?

For instance in mem.c, in the custom flag printing function print_mmap_flags:

[...]

tprint_flags_begin(); // Open the flag object here --------------------------

// Print the type mask first, then remove it from the flag integer ----------
printxvals_ex(flags & MAP_TYPE, "MAP_???", XLAT_STYLE_ABBREV,
	      mmap_flags, NULL);
flags &= ~MAP_TYPE;

// Same for another specific mask, but print later --------------------------
const unsigned int mask = MAP_HUGE_MASK << MAP_HUGE_SHIFT;
const unsigned int hugetlb_value = flags & mask;

flags &= ~mask;
if (flags) {
	tprint_flags_or();
	// Print the flags, opening and closing a nested flag object /!\ ----
	printflags_ex(flags, NULL, XLAT_STYLE_ABBREV,
		      mmap_flags, NULL);
}

if (hugetlb_value) {
	tprint_flags_or();
	tprint_shift_begin();
	PRINT_VAL_U(hugetlb_value >> MAP_HUGE_SHIFT);
	tprint_shift();
	/*
	 * print_xlat_u is not used here because the whole thing
	 * is potentially inside a comment already.
	 */
	tprints_string("MAP_HUGE_SHIFT");
	tprint_shift_end();
}

// Close the flag object ----------------------------------------------------
tprint_flags_end();

[...]

I was able to fix this by removing the open/closing of the flags, then in the case of the structured output disabling the removing of the flags (&= ~ operations). This worked, but forces every implementation to pass ALL its flags to printflags_ex. On the other hand, we let each implementation manage its stuff, but that is A LOT of stuff to tune in case of a change.

spoutn1k · 2024-01-21T18:50:36Z

Noting that "cmd" is a poor choice for the name of an intercepted syscall. I will also definitely focus on the JSON output, as Ocaml is nice and dandy but I never needed it real life after I left prep school (désolé l'Inria)

ldv-alt · 2024-01-21T19:38:40Z

Now it looks like the tprint_flags_begin/end and methods have been added to most system calls, along with being inserted in the printflags_ex function that is responsible for the printing, creating this nesting.

At least in master branch I don't see printflags_ex calling any of tprint_flags_begin/end.

Now I was able to fix this for mmap by giving the hand fully to the common denominator printflags_ex, and letting it manage all the flag printing. However, that takes away a lot of power from each implementation and will make me remove all the tprint_flags_begin/end from a lot of files, but will be easier to maintain in the long run.

tprint_flags_begin/end were added in those cases where printxval*/printflags* are intertwined in ways that would make a potential generic printing method too complex to use. I don't see how have you managed to avoid this complexity.

Any input is appreciated, would you rather have this or letting the syscall-specific implementation manage the flags ?

The idea is to move heavy lifting away from individual syscall dissectors when possible, but it's surely not possible in all cases as some syscalls are quite complex.

// Print the flags, opening and closing a nested flag object /!\ ----
printflags_ex(flags, NULL, XLAT_STYLE_ABBREV,
mmap_flags, NULL);

printflags_ex is a low level method, it is not expected to open/close a nested flags object.
Also, printflags64_in is also not expected to open/close a nested flags object:

static inline int
printflags64_in(const struct xlat *x, uint64_t flags, const char *dflt)
{
        return printflags_ex(flags, dflt, XLAT_STYLE_DEFAULT, x, NULL);
}

ldv-alt · 2024-01-21T19:40:12Z

Noting that "cmd" is a poor choice for the name of an intercepted syscall.

I agree, cmd is misleading. I have no idea why it's not something more obvious like syscall.

spoutn1k · 2024-01-22T04:41:56Z

At least in master branch I don't see printflags_ex calling any of tprint_flags_begin/end.
printflags_ex is a low level method, it is not expected to open/close a nested flags object.
Also, printflags64_in is also not expected to open/close a nested flags object:

Yeah those were added in the structured output commit:

https://github.com/lefessan/strace/blob/9c054003d463e3589ff30f22f6b228cb4ece3baf/src/xlat.c#L431

If you are telling me they are not intended to be here this answers a lot of my questions about the implementation details.

spoutn1k · 2024-01-23T18:37:59Z

From my understanding, the tests make a system call, then use the results to generate the expected output, am I correct ? I wanted to see if I could hijack the current framework to make it compatible with both 'raw' and structured output, but it seems pretty non trivial, correct ?

ldv-alt · 2024-01-23T19:03:07Z

From my understanding, the tests make a system call, then use the results to generate the expected output, am I correct?

Yes, most of the tests fall into this category.

I wanted to see if I could hijack the current framework to make it compatible with both 'raw' and structured output, but it seems pretty non trivial, correct ?

Yes, you're spot on. In fact, I see it as the main obstacle: without tests there is no way to tell whether the structured output is correct, but the tests are not written in a way to test any output format besides the traditional.

spoutn1k · 2024-01-29T04:37:32Z

Ok, so what should the tests include ? Can I introduce a dependency on python to have access to a JSON formatter ?

I am thinking of testing the contents of the structured output and not character by character, hence using a third party tool to format the output in order to compare the results on a common basis.

ldv-alt · 2024-01-29T19:34:31Z

Ok, so what should the tests include ? Can I introduce a dependency on python to have access to a JSON formatter ?

I am thinking of testing the contents of the structured output and not character by character, hence using a third party tool to format the output in order to compare the results on a common basis.

A third party tool for json tests is OK. I suppose there are quite a few tools to choose from, and I'd naturally prefer something that won't bring a lot of dependencies.

ldv-alt · 2024-02-04T00:30:21Z

@parport0 additionally suggested that a converter from json output into traditional strace output would allow reusing most of the existing strace tests for testing of structured output.

spoutn1k · 2024-02-04T04:40:26Z

Feels like a chicken and egg problem to me but if anyone wants to give it a shot they are welcome.

I am making little progress due to work, but I am still working on the PR and will contact the ML if there are obstacles I cannot get over

tests/init.sh

spoutn1k · 2024-02-19T17:42:48Z

I have a test framework, now I need to make at least the first one pass. I am trying to print brackets before and after the main loop with JSON, to make the whole output valid as a JSON object. What would be the best way to make sure that every output file has a [ written to it when opened and ] when closed ?

ldv-alt · 2024-02-19T23:47:10Z

Common log is FILE *shared_log, per-tracee logs are tcp->outf. The former is opened in init() and closed in terminate(), the latter is opened in after_successful_attach() and closed in droptcb().

spoutn1k · 2024-03-01T17:59:47Z

I am happy to report that the first ever JSON output test is passing. Writing JSON tests is unfortunately a tedious and error-prone process, and I am not 100% familiar with the code so I made several shortcuts that will have to be reckoned with.

However, I am pleased with the result and wish to push further. I need however to discuss the matter of introducing external libraries to strace, as I am convinced delegating the work of formatting JSON output to a specialized library would help to write cleaner code and not spend energy formatting (I had to create a special function for the last element of an object, to avoid printing an output-breaking comma).

In lustre-adjacent tools we use json-c and it comes on a majority of distributions, even relics (from my POV) like Centos-7. JSON support can be enabled at configure if the library is found. Would that be acceptable ?

ldv-alt · 2024-03-02T09:38:06Z

In lustre-adjacent tools we use json-c and it comes on a majority of distributions, even relics (from my POV) like Centos-7. JSON support can be enabled at configure if the library is found. Would that be acceptable ?

I'm not quite familiar with libraries that are designed to produce json output, but from a cursory look json-c looks fine.

parport0 · 2024-03-03T15:42:17Z

Not too long ago I chose cjson for one of my dayjob projects, primarily because it is just one well-embeddable MIT-licensed header file. I did not compare it with json-c.

spoutn1k · 2024-03-03T16:31:20Z

Damn I do like a header only library but cJSON seems a lot less supported than json-c, so it might be a riskier bet in the long run. I have not yet started implementing anything however, so either can still work

ldv-alt · 2024-03-03T16:52:18Z

Not too long ago I chose cjson for one of my dayjob projects, primarily because it is just one well-embeddable MIT-licensed header file. I did not compare it with json-c.

If you're talking about cJSON, then it has cJSON.c as well. Also, its README claims that cJSON doesn't support strings that contain the zero character '\0' or \u0000. This is impossible with the current API because strings are zero terminated, this might be an issue for us.

NULL as an address is JSON's null

spoutn1k · 2024-03-03T20:17:42Z

Was there not a PR about disabling string output size limit ? I think it is very relevant to this PR about structured output to not have human-readable partial strings sprinkled in machine-readable format. However disabling this seems less than trivial, as it seems the best way is to increase DEFAULT_STRLEN in case of structured output.

Hex and octal are int in OCaml, quoted in JSON

spoutn1k · 2024-03-07T12:31:52Z

Realistically, what tests are critical and what tests can be overlooked ? I am alone and 1000 tests to rewrite seems like a lot.

ldv-alt · 2024-03-09T10:50:51Z

Realistically, what tests are critical and what tests can be overlooked ? I am alone and 1000 tests to rewrite seems like a lot.

To be honest, I didn't think you're going to manually rewrite all the tests.

spoutn1k · 2024-03-13T15:08:14Z

Well due to the nature of the current tests I do not see another way of implementing them other than a manual rewrite ...

lefessan and others added 10 commits November 30, 2022 15:26

Merge commit '2f36e9f9e9fc6824fdba2bac693bad74d99418d5' into merge-at…

c7f7546

…tempt-1

Fix misc merge issues

752c938

Merge remote-tracking branch 'upstream/master' into back-to-the-2024

29f3340

Merge fixes

9f7c987

Output proper formatted output

f2da272

Align whitespace to upstream/master

fa9e75b

Merge branch 'strace:master' into fix-formatted-output

61b10b8

Fix whitespace inconsistencies in ptrace.c

c1572f4

Debugging nested flags

0392c86

ldv-alt added enhancement project/structured labels Jan 21, 2024

Remove flag management from xlat helpers

b988953

WIP structured test format proposal

90735fe

github-advanced-security bot found potential problems Feb 7, 2024

View reviewed changes

tests/init.sh Fixed Show fixed Hide fixed

tests/init.sh Fixed Show fixed Hide fixed

spoutn1k added 2 commits February 10, 2024 21:21

Fix proper POSIX and args not getting passed

5d6b94e

Format using python when tests are tagged with JSON

d27604b

github-advanced-security bot found potential problems Feb 10, 2024

View reviewed changes

tests/init.sh Fixed Show fixed Hide fixed

Fix type -p not POSIX

a34796e

Do not question the wisdom of the ancients

6f3ca93

ldv-alt reviewed Feb 13, 2024

View reviewed changes

tests/init.sh Outdated Show resolved Hide resolved

Separate strace run and formatting

d70646a

spoutn1k added 3 commits February 26, 2024 21:01

Valid JSON output

f795ecd

Omit print of invalid characters in indirect

dd723cc

Update json test output

9f20b86

spoutn1k force-pushed the fix-formatted-output branch from 8960b56 to 9f20b86 Compare March 1, 2024 17:51

spoutn1k added 3 commits March 3, 2024 22:29

JSON output: int now int, hex quoted

6727c99

NULL as an address is JSON's null

Reflect changes in the test

e0c432a

Change syscall field name to 'syscall'

73bced6

spoutn1k added 4 commits March 6, 2024 21:00

recvfrom@json test

8b68a2c

Ignore all future @JSON binaries

44f9c97

Add open@json test

9da951e

In structured output, hex == oct

b04a386

Hex and octal are int in OCaml, quoted in JSON

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge structured output #286

Merge structured output #286

spoutn1k commented Jan 20, 2024 •

edited

spoutn1k commented Jan 21, 2024 •

edited

spoutn1k commented Jan 21, 2024 •

edited

ldv-alt commented Jan 21, 2024

ldv-alt commented Jan 21, 2024

spoutn1k commented Jan 22, 2024

spoutn1k commented Jan 23, 2024

ldv-alt commented Jan 23, 2024

spoutn1k commented Jan 29, 2024 •

edited

ldv-alt commented Jan 29, 2024

ldv-alt commented Feb 4, 2024

spoutn1k commented Feb 4, 2024

spoutn1k commented Feb 19, 2024

ldv-alt commented Feb 19, 2024

spoutn1k commented Mar 1, 2024 •

edited

ldv-alt commented Mar 2, 2024

parport0 commented Mar 3, 2024

spoutn1k commented Mar 3, 2024

ldv-alt commented Mar 3, 2024

spoutn1k commented Mar 3, 2024

spoutn1k commented Mar 7, 2024

ldv-alt commented Mar 9, 2024

spoutn1k commented Mar 13, 2024

Merge structured output #286

Are you sure you want to change the base?

Merge structured output #286

Conversation

spoutn1k commented Jan 20, 2024 • edited

spoutn1k commented Jan 21, 2024 • edited

spoutn1k commented Jan 21, 2024 • edited

ldv-alt commented Jan 21, 2024

ldv-alt commented Jan 21, 2024

spoutn1k commented Jan 22, 2024

spoutn1k commented Jan 23, 2024

ldv-alt commented Jan 23, 2024

spoutn1k commented Jan 29, 2024 • edited

ldv-alt commented Jan 29, 2024

ldv-alt commented Feb 4, 2024

spoutn1k commented Feb 4, 2024

spoutn1k commented Feb 19, 2024

ldv-alt commented Feb 19, 2024

spoutn1k commented Mar 1, 2024 • edited

ldv-alt commented Mar 2, 2024

parport0 commented Mar 3, 2024

spoutn1k commented Mar 3, 2024

ldv-alt commented Mar 3, 2024

spoutn1k commented Mar 3, 2024

spoutn1k commented Mar 7, 2024

ldv-alt commented Mar 9, 2024

spoutn1k commented Mar 13, 2024

spoutn1k commented Jan 20, 2024 •

edited

spoutn1k commented Jan 21, 2024 •

edited

spoutn1k commented Jan 21, 2024 •

edited

spoutn1k commented Jan 29, 2024 •

edited

spoutn1k commented Mar 1, 2024 •

edited