Skip to content
This repository has been archived by the owner on May 3, 2023. It is now read-only.

read()/write() buffers sometimes not printed correctly #42

Closed
Tracked by #727
alban opened this issue May 14, 2020 · 3 comments
Closed
Tracked by #727

read()/write() buffers sometimes not printed correctly #42

alban opened this issue May 14, 2020 · 3 comments
Assignees

Comments

@alban
Copy link
Member

alban commented May 14, 2020

Following the instructions from demo-traceloop.md, I sometimes have this problem:

  • The buffer of the read() syscall ("3*7*2\n") is printed on a separate line.
  • The buffer of the write() syscall is not printed at all, even though it returns 3.
00:00.481572346 cpu#3 pid 388925 [bc] read("", 5330944, 4096)...
00:00.481963718 "3*7*2\n"
00:00.482122689 cpu#5 pid 388924 [sh] ...write() = 6
00:00.482136409 cpu#5 pid 388924 [sh] exit_group(0)...
00:00.482504750 cpu#3 pid 388925 [bc] ...read() = 6
00:00.482628831 cpu#3 pid 388925 [bc] write(1, "", 3) = 3
00:00.482914162 cpu#3 pid 388925 [bc] read("", 5330944, 4096)...
00:00.482996992 cpu#3 pid 388925 [bc] exit_group(0)...
@eiffel-fl
Copy link

eiffel-fl commented Oct 25, 2021

Hi.

I investigated on the code.

First, the bug does not occur everytime, I would say it occurs 1/30 of the time when we are not lucky.

Regarding the code, I think the following:

  1. I would say that BPF code is correct.
  2. The problem comes from our handling in events.go.

More particularly, I think this snippet cause the problem:

if e.Typ == 0 && i+e.ContNr < len(events) {
	for j := 0; j < e.ContNr; j++ {
		param := events[i+1+j].Param
		paramIdx := events[i+1+j].ParamIdx
		argsStr[paramIdx] = &param
	}
	i += e.ContNr
}

Indeed, we do not check the type of events[i+1+j] to see if this event continues the event before.
When we have multiple CPU, I think this is possible that events following one event are not continuing it but are totally different.

So, I would suggest to change it like this:

diff --git a/pkg/straceback/event.go b/pkg/straceback/event.go
index 7f942fc..ff96e1d 100644
--- a/pkg/straceback/event.go
+++ b/pkg/straceback/event.go
@@ -139,10 +139,16 @@ func eventsToString(events []Event) (ret string) {
                switch e.Typ {
                case 0:
                        var argsStr [6]*string
-                       if e.Typ == 0 && i+e.ContNr < len(events) {
+                       if i+e.ContNr < len(events) {
                                for j := 0; j < e.ContNr; j++ {
-                                       param := events[i+1+j].Param
-                                       paramIdx := events[i+1+j].ParamIdx
+                                       continuedEvent := events[i + 1 + j]
+
+                                       if continuedEvent.Typ != 2 {
+                                               continue
+                                       }
+
+                                       param := continuedEvent.Param
+                                       paramIdx := continuedEvent.ParamIdx
                                        ret += fmt.Sprintf("Cont.type: %d Param: %s ParamIdx: %d ", events[i+1+j].Typ, param, paramIdx)
                                        argsStr[paramIdx] = &param
                                }

But I am not sure this would suffice as I was not able to test it thoroughly.

Best regards.

@alban
Copy link
Member Author

alban commented Oct 27, 2021

I think it is a good idea to check the type of the event before processing it. But I don't think it would fix this:

  • After fetching events from ring buffers of each cpu, the events are re-ordered by the timestamp field in userspace.
  • The timestamps of SYSCALL_EVENT_TYPE_CONT events are set to the same value as the event SYSCALL_EVENT_TYPE_ENTER that created them.
  • So SYSCALL_EVENT_TYPE_CONT events from the same syscall are next to each other.

Example (source):

        00:00.039139009 cpu#0 pid 42218 [runc:[2:INIT]] ...openat() = 3
        00:00.039139009 cpu#0 pid 42218 [runc:[2:INIT]] ...openat() = 3
        00:00.039172211 cpu#0 pid 42218 [runc:[2:INIT]] futex(94492690465560, 129, 1, 0, 0, 0)...
        00:00.039172211 cpu#0 pid 42218 [runc:[2:INIT]] futex(94492690465560, 129, 1, 0, 0, 0) = 1
        00:00.039182512 cpu#0 pid 42218 [runc:[2:INIT]] ...futex() = 1
        00:00.039189913 cpu#0 pid 42218 [runc:[2:INIT]] write("", 824634585143, 1)...
        00:00.039189913 "0"
        00:00.039189913 "0"
        00:00.039193813 cpu#1 pid 42219 [runc:[2:INIT]] ...futex() = 0
        00:00.039210214 cpu#0 pid 42218 [runc:[2:INIT]] ...write() = 1
        00:00.039210214 cpu#0 pid 42218 [runc:[2:INIT]] ...write() = 1
        00:00.039212015 cpu#1 pid 42219 [runc:[2:INIT]] epoll_pwait(7, 140642389710816, 128, 0, 0, 973)...
        00:00.039216815 cpu#0 pid 42218 [runc:[2:INIT]] close(5)...
        00:00.039216815 cpu#0 pid 42218 [runc:[2:INIT]] close(5)...
        00:00.039219015 cpu#1 pid 42219 [runc:[2:INIT]] ...epoll_pwait() = 0
        00:00.039222615 cpu#0 pid 42218 [runc:[2:INIT]] ...close() = 0

If you add the test if continuedEvent.Typ != 2, I think it should break from the loop instead of continuing to avoid consuming events.

It seems the root cause is that all events for cpu#0 are present twice in the array. This would cause the printing problem because the second "write" event would be consumed by eventsToString() while trying to get the "cont" event. Then, the 2 "cont" events are printed out of context.

@eiffel-fl
Copy link

Closing as traceloop was totally reworked in inspektor-gadget/inspektor-gadget#1023.
I am wondering if we should even archive this repository?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants