Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-59 Create oc plugin to display flow table #1

Merged
merged 9 commits into from
Feb 7, 2024

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Nov 17, 2023

This PR implement first version of oc netobserv cli

  • Capture flows
  • Capture packets
  • Basic filter capabilities

flow-table
packet-table

Dependencies

netobserv/netobserv-ebpf-agent#224

Comment on lines +22 to +23
# image: quay.io/netobserv/netobserv-ebpf-agent:main
image: quay.io/jpinsonn/netobserv-ebpf-agent:flow-stream-1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netobserv/netobserv-ebpf-agent#224 needs to be merged first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if u can reuse the current gRPC exporter from ebpf agent instead of creating new TCP exported ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that work with oc port-forward ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC uses TCP so I would think so

Comment on lines 35 to 37
# TODO: find why this breaks collection: EOF
#- name: ENABLE_PKT_DROPS
# value: "true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncomment this to give a try with pktDrop

After some time running it, I get EOF in the TCP connection read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I was just unlucky ! It's working now

Copy link
Contributor

@stleerh stleerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should provide a script or a way to uninstall this on your cluster.

.gitignore Outdated
@@ -0,0 +1,2 @@
build
output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of new files in this commit are missing the linefeed character on the last line.  We should clean this up.

Makefile Outdated
OUTPUT := $(DIST_DIR)/$(NAME)
DOCKER := $(shell which podman)
ifeq ($(DOCKER),)
DOCKER := $(shell which docker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps call it OCI or OCI_BIN instead of DOCKER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the Docker file is not even provided yet. I'm thinking about being able to run the cli as an extra pod inside the cluster and transfert the generated files locally before deleting it.

That would avoid running all the port-forward and reduce the transfers / latency
I just need to ensure the terminal render properly using oc rsh <pod> / oc exec <pod>

README.md Outdated
@@ -3,18 +3,78 @@
network-observability-cli is a lightweight Flow and Packet visualisation tool.
It deploy [netobserv eBPF agent](https://github.com/netobserv/netobserv-ebpf-agent) on your k8s cluster to collect flows or packets from nodes network interfaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change: "deploys"

@@ -3,18 +3,78 @@
network-observability-cli is a lightweight Flow and Packet visualisation tool.
It deploy [netobserv eBPF agent](https://github.com/netobserv/netobserv-ebpf-agent) on your k8s cluster to collect flows or packets from nodes network interfaces
and streams data to a local collector for analysis and visualisation.
Output files are generated under output/flow and output/pcap directories per host name
Output files are generated under `output/flow` and `output/pcap` directories per host name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a section on Prerequisites that describe what you need to have (e.g. need a kind or OpenShift cluster).

)

const (
flowsToShow = 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did actually, and even more ! I'll make a demo to show all the capabilities


// lock since we are updating lastFlows concurrently
mutex.Lock()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this defeats the benefits of using goroutines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the problem is that I'm merging X sources (1 per nodes) to a single display.
Would it be more performant to save flows per node in different slices or map and call mutex.Lock only when reading them for display ?

c := exec.Command("clear")
c.Stdout = os.Stdout
c.Run()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this tool is not meant to be scalable, but clearing and redrawing the screen for each flow is going to make it very limited.  A better alternative is to periodically update the screen at a certain interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a mechanism that update only when a delay is over

if err != nil {
log.Errorf("Create file %s failed: %v", filename, err.Error())
log.Fatal(err)
}
Copy link
Contributor

@stleerh stleerh Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very useful if you can choose the stdout format, which could just be JSON instead of the UI table.  I know technically you can effectively get this with the 'tee' command, but if it didn't output to the UI table, it would be more scalable.  This would be simpler to export flows than what we have today, particularly if you just want to test this out and see what data you're getting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some predefined view capabilities, including raw JSON 👍

if err != nil {
log.Fatal(err)
}
go managePcapTable(PcapResult{NodeName: filename, PacketCount: 1, ByteCount: int64(n)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how useful this will be by only having node, packet and byte counts, even with a filter applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can become huge in a short time since everything is saved to the output folder
It's more an indicator to avoid filling up your storage more than anything else

@jpinsonneau
Copy link
Contributor Author

It should provide a script or a way to uninstall this on your cluster.

the cleanup is automatic, but I can provide a script that allow user to call it manually 👍

@jpinsonneau
Copy link
Contributor Author

@jotak @msherif1234 @OlivierCazade how do you see the next steps for that ?

I can merge as is if you agree and we can improve the approach if needed. Let me know what works best for you 🐱

@OlivierCazade
Copy link

I am fine merging and improving from here. This make iterations easier to read.

IMO the first question that we need to answer is, do we want to keep this fully independent or de we want to use a NOO stack already present.

Calling this the network observability cli might be a bit missleading. I would expect from a cli to be the terminal equivalent of the frontend, not an independent project with its own capture stack.

If we keep this independent I would suggest to rename it to avoid confusion.

@jpinsonneau
Copy link
Contributor Author

IMO the first question that we need to answer is, do we want to keep this fully independent or de we want to use a NOO stack already present.

I'm open to any solution that deploy as fast as applying the yaml in this PR. I think that's the key of this tool.

  • Relying on operator catalog to install netobserv is probably a bit slow and would not work on every cluster
  • Deploying the controller manager via a YAML is quite fast (same as make deploy from operator repo)

That could bring to less maintenance in the different config when we introduce new features indeed.
We need to investigate on the merge eBPF/FLP components attempt from @jotak first, see if it's worth and if it will be introduced in the operator.

Calling this the network observability cli might be a bit missleading. I would expect from a cli to be the terminal equivalent of the frontend, not an independent project with its own capture stack.
If we keep this independent I would suggest to rename it to avoid confusion.

It will never be the equivalent since we can't handle Loki here. The capture must be different but it can be part of the operator.
Do you have any suggestion in mind ?

func runFlowCapture(cmd *cobra.Command, args []string) {
go scanner()

wg.Add(len(addresses))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wg and goroutines seem wrongly implemented here, we should have a wg.Done() call somewhere... It probably doesn't do what we'd expect here.
I'm trying to change & make sure it doesn't break things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks; feel free to update this PR directly if you want to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - if you want to review my 2 commits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good ! I'm giving a last try before merging

@jotak
Copy link
Member

jotak commented Feb 7, 2024

@jpinsonneau I'm ok to merge, there are a couple of points though that I've noted and that we may address in a follow-up :

  • Ctrl-C doesn't work on packet capture for me (it works in flow capture) - I haven't investigated...
  • currently, local ports are hardcoded; we should make port-forwarding more robust & able to deal with ports in use
  • refine packaging; installing yaml files in /usr/bin doesn't sound correct I guess? Perhaps we could embed the resources in scripts at build time?

/lgtm

@jpinsonneau
Copy link
Contributor Author

I have created a followup for code improvments:
https://issues.redhat.com/browse/NETOBSERV-1484

Thanks !

@jpinsonneau jpinsonneau merged commit 43fcbf0 into netobserv:main Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants