-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run: Fix a lot of leakages and implement support for multiple socket filter programs #2264
run: Fix a lot of leakages and implement support for multiple socket filter programs #2264
Conversation
@mauriciovasquezbernal I find this approach much better, LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the fix and multiple socket filter program feature
@@ -124,11 +124,7 @@ func (t *Tracer) Init(gadgetCtx gadgets.GadgetContext) error { | |||
return nil | |||
} | |||
|
|||
// Close is needed because of the StartStopGadget interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I remember the importance of the override
keyword in C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks OK from code inspection.
Can you please add a Fixes:
tag to the first commit?
} | ||
socketFilterFound = true | ||
err := t.networkTracer.AttachProg(t.collection.Programs[progName]) | ||
networkTracer := t.networkTracers[p.Name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you test if ok
to avoid a SEGFAULT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that case is possible. If that happens, it seems there's something very broken, so crashing is fine.
The GadgetStartStop interface was removed a long time ago in a3e5871 ("pkg/runtime/local: Move context and timeout management into tracers"), so the Stop() method was never being called. It was causing to leak all BPF objects of the gadgets being run. This commit also fixes a leakage on the network tracer as it was never being closed. Fixes: 07ca0fb ("Initial support for containerized gadgets") Fixes: c1cc966 ("implement networking gadgets for containerized gadgets") Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
175be40
to
966b473
Compare
The logic to create the network tracer (needed by socket filter programs) was placed in NewInstance(), unfortunately it has a lot of drawbacks: 1. NewInstance() is called many times in the workflow without running the gadget, it was causing the network tracer to be instantiated multiple times without any purpose. 2. Creating the network tracer requires root. It was making `ig --help` to print some annoying warnings (#2108) 3. The network tracer was always created, even for gadgets that don't need it, also only a single socket filter program was supported. The main motivation to have this logic on NewInstance() was that the network tracer has to be created before calling AttachContainer(). This commit moves all that logic to Init() (that is called before AttachContainer()) and enables the support for multiple socket filter programs on a gadget. Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
966b473
to
b663533
Compare
All of this started while investigating this comment, which refers to #2108. I found that we weren't releasing the network tracer, and then I found that the issue was worst and we're not releasing any programs at all when running image-based gadgets.
How to test
Before this fix
Before running any gadget, there are some programs and maps in the node:
Run a gadget, and terminate it as soon as it prints some events.
The number of open programs and maps increased:
If you run the gadget again, they keep increasing
The situation is worst with a networking gadget:
A lot of maps named tail_call and the dns program itself are leaked
After this fix
Before running any gadget, there are some programs and maps in the node:
Run a gadget, and terminate it as soon as it prints some events.
The number of open programs and maps is the same as before
Running the dns gadget has the same behaviour
In this case the numbers increased because the socket enricher is lazily-loaded, if we run again the numbers keep constant:
Fixes #2108