Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global-query #43

Closed
els0r opened this issue Jan 30, 2023 · 20 comments · Fixed by #109
Closed

global-query #43

els0r opened this issue Jan 30, 2023 · 20 comments · Fixed by #109
Assignees
Labels
feature New feature or request
Milestone

Comments

@els0r
Copy link
Owner

els0r commented Jan 30, 2023

Distributed querying for goQuery, aggregating results.Result structures. This allows to run queries and flow aggregations against a global fleet which has goProbe/goQuery deployed.

@els0r els0r added this to the v4 Release milestone Jan 30, 2023
@els0r els0r added the feature New feature or request label Jan 30, 2023
@els0r els0r self-assigned this Jan 30, 2023
@els0r
Copy link
Owner Author

els0r commented Mar 2, 2023

I will overwrite the branch from #43 and overwrite it. Too much changed under the hood. I've saved the relevant bits in global-query.

Way to go about this:

  • introduction of the query.Executor interface, which goDB will implement for use ingoQuery as well as modules inside the global-query tool
  • properly separate query results and goDB (global-query myst not import goDB)
  • put together the global-query tool

@els0r
Copy link
Owner Author

els0r commented Mar 24, 2023

Will start progress next week

@els0r
Copy link
Owner Author

els0r commented Apr 5, 2023

Saw https://github.com/els0r/goProbe/actions/runs/4613869362/jobs/8156300674, which seems unrelated to this issue, but still needs to be investigated.

The problem is a race condition in line 273 of the DBWorkManager

					w.nWorkloadsProcessed++

That number is also accessed in line 256 with:

				logger.Infof("Query cancelled (workload %d / %d)...", w.nWorkloadsProcessed, w.nWorkloads)

@els0r
Copy link
Owner Author

els0r commented Apr 5, 2023

@fako1024 : need help with testing the very first version of global-query. Would be glad if you could give this a spin on your sensors with:

Then, run queries with the tool with

go run cmd/global-query/main.go \
  --config cmd/global-query/local-config.yaml \
  --hosts.querier.config cmd/global-query/api-client-querier.yaml \
  -q <comma-separated-host-list> <goquery args>

@fako1024
Copy link
Collaborator

fako1024 commented Apr 5, 2023

Raised #94 and will take care of the race condition. As for the global query I'll gladly give it a shot today. ❤️

@fako1024
Copy link
Collaborator

fako1024 commented Apr 5, 2023

@els0r OK, first attempt: Deployment went fine, nodes are reachable. However there seems to be an issue with the encoding of the query (probably with the attributes). Whatever I do I get a HTTP 400 from both sensors:

On the caller (my laptop):

└─ $ ▶ ./global-query --config ./local-config.yaml --hosts.querier.config api-client-querier.yaml -q fw-1,fw-2 -i eth0 -n 10 talk_conv 
ts=2023-04-05T09:27:08Z level=info caller=cmd/root.go:219 msg="setting up queriers" app_name=global-query app_version=872abf81 hosts=fw-1,fw-2 query="{\"ifaces\":[\"eth0\"],\"query_type\":\"talk_conv\",\"attributes\":[{},{}],\"direction\":\"bi-directional\",\"from\":1678012028,\"to\":1680686828,\"format\":\"txt\",\"limit\":10,\"sort_by\":\"bytes\",\"dns_resolution\":{\"dns_timeout\":1000000000,\"max_rows\":25},\"db\":\"\"}"
ts=2023-04-05T09:27:08Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=872abf81 hostname=fw-1
ts=2023-04-05T09:27:08Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=872abf81 hostname=fw-1 method=POST url=http://10.1.10.2:18081/api/v1/_query
ts=2023-04-05T09:27:08Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=872abf81 hostname=fw-2
ts=2023-04-05T09:27:08Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=872abf81 hostname=fw-2 method=POST url=http://10.1.20.2:18081/api/v1/_query
ts=2023-04-05T09:27:08Z level=error caller=hosts/query.go:184 msg="failed to run query: 400 Bad Request" app_name=global-query app_version=872abf81 hostname=fw-1
ts=2023-04-05T09:27:08Z level=error caller=hosts/query.go:184 msg="failed to run query: 400 Bad Request" app_name=global-query app_version=872abf81 hostname=fw-2
Status "empty": query returned no results
Hosts with errors: 2

        #    host    status                                 message
                                                                       
        1    fw-1     error    failed to run query: 400 Bad Request
        2    fw-2     error    failed to run query: 400 Bad Request

On both sensors (same error each time, no matter what kind of query type I use):

2023-04-05T11:26:03.312+0200    error   errors/errors.go:34     failed to decode query statement: query.Statement.Attributes: []types.Attribute: decode non empty interface: can not unmarshal into nil, error found in #10 byte of ...|ributes":[{},{}],"di|..., bigger context ...|":["eth0"],"query_type":"talk_conv","attributes":[{},{}],"direction":"bi-directional","from":1678011|...{"app_name": "goProbe_alpine", "app_version": "872abf81"}

This seems fishy: attributes":[{},{}] ...

@fako1024
Copy link
Collaborator

fako1024 commented Apr 5, 2023

Sidenote: This error message probably needs some "love":

└─ $ ▶ ./global-query 
Failed to read in config: Config File ".cmd" Not Found in "[/home/fako]"

As in: If neither config nor command line parameters are specified complain about that specifically (and I assume the .cmd without prefix is also unintentional)...

@els0r
Copy link
Owner Author

els0r commented Apr 7, 2023

@fako1024 : this thing is going places. You can test with latest changes and should be able to get a result.

Will still need polishing, as well as the introduction of the host attribute during printing. But it's a start. Have fun!

@fako1024
Copy link
Collaborator

fako1024 commented Apr 7, 2023

Alright, feedback from the next testing round, we're getting a lot closer:

  • The goProbe binary breaks due to this in goProbe.go (the Version magic might need some love):
        appName := filepath.Base(os.Args[0])
        appVersion := version.GitSHA[0:8]    // HERE BE OUCHIE

        if flags.CmdLine.Version {
                fmt.Printf("goProbe\n%s", version.Version())
                os.Exit(0)
        }
fw-1:~# ./goProbe_alpine --config /root/goprobe_test.conf
panic: runtime error: slice bounds out of range [:8] with length 0

goroutine 1 [running]:
main.main()
        /tmp/goProbe/cmd/goProbe/goProbe.go:64 +0x16a8
  • Performing the queries seems to work (as in, an output table is generated 🥳 ), but according to the output it says that the query returned no results:
ts=2023-04-07T12:01:51Z level=info caller=cmd/root.go:218 msg="setting up queriers" app_name=global-query app_version=devel hosts=fw-1,fw-2 query="{\"query\":\"talk_conv\",\"ifaces\":\"eth0\",\"first\":\"Tue Mar  7 14:01:51 2023\",\"last\":\"Fri Apr  7 14:01:51 2023\",\"format\":\"txt\",\"sort_by\":\"bytes\",\"num_results\":10,\"dns_resolution\":{\"enabled\":false,\"timeout\":1000000000,\"max_rows\":25},\"max_mem_pct\":60}"
ts=2023-04-07T12:01:51Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=devel hostname=fw-1
ts=2023-04-07T12:01:51Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=devel hostname=fw-1 method=POST url=http://10.1.10.2:18081/api/v1/_query
ts=2023-04-07T12:01:51Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=devel hostname=fw-2
ts=2023-04-07T12:01:51Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=devel hostname=fw-2 method=POST url=http://10.1.20.2:18081/api/v1/_query

                                                   packets   packets             bytes      bytes       
                     sip                     dip        in       out      %         in        out      %
      2a00:12e8:1:a:2::7  2a04:4540:8200:99::de2    1.63 M    2.64 M  37.36    1.72 GB    2.22 GB  49.83
  2a01:4f8:191:31c1:3::3  2a04:4540:8200:99::de2  152.34 k    1.51 M  14.54   26.13 MB    1.79 GB  22.90
            85.158.5.153          144.76.128.146  337.83 k  234.84 k   5.01  316.82 MB   20.75 MB   4.16
            85.158.5.153          144.76.128.146  234.44 k  337.64 k   5.00   20.71 MB  316.16 MB   4.15
  2a04:4540:8200:99::d08  2a01:4f8:191:31c1:3::3   28.65 k  204.15 k   2.03    4.72 MB  234.95 MB   2.95
  2a04:4540:8200:99::d08      2a00:12e8:1:a:2::7   35.61 k  231.00 k   2.33    5.59 MB  214.39 MB   2.71
            85.158.5.153           144.76.51.230  163.42 k  103.54 k   2.33  186.18 MB    8.42 MB   2.40
            85.158.5.153           144.76.51.230  103.34 k  163.31 k   2.33    8.40 MB  185.79 MB   2.39
             46.22.10.26            85.158.5.154  137.90 k  134.52 k   2.38   42.64 MB   66.16 MB   1.34
            85.158.5.153           144.76.51.228  180.52 k  158.20 k   2.96   70.04 MB   17.77 MB   1.08
                                                       ...       ...               ...        ...       
                                                    4.35 M    7.09 M           2.60 GB    5.32 GB       
                                                                                                 
                 Totals:                                     11.44 M                      7.92 GB  

Timespan / Interface : [2023-04-05 16:04:41, 2023-04-06 08:45:13] / 
Sorted by            : accumulated data volume (sent and received)
Query stats          : displayed top 10 hits out of 2.00 k in 94ms

Hosts with errors: 2

        #    host    status                      message
                                                            
        1    fw-1     empty    query returned no results
        2    fw-2     empty    query returned no results

If I limit the query to one of both hosts I get a subset of the results, respectively, so it clearly fetches data and the errors are a lie.

  • I'd expect the table to show the hostname (or ID) to indicate where the flow / row came from.
  • Last version showed a log message on the receiving goProbe sensor whenever an API call came in. Did that behavior change in this version? Having an info level message for each API call is probably a good idea (to know who queries my data and when)...
  • Can you maybe rebase / merge current develop into the branch (it's still crashing from Panic during DB writeout (invalid IPv4 / IPv6 key) due to multiple regressions #99 , which makes testing a bit more complicated 😛 )?

@els0r
Copy link
Owner Author

els0r commented Apr 8, 2023

@fako1024 : next iteration ready. Have fun 😎

  • I'd expect the table to show the hostname (or ID) to indicate where the flow / row came from.
  • Last version showed a log message on the receiving goProbe sensor whenever an API call came in. Did that behavior change in this version? Having an info level message for each API call is probably a good idea (to know who queries my data and when)...
  • Can you maybe rebase / merge current develop into the branch (it's still crashing from Panic during DB writeout (invalid IPv4 / IPv6 key) due to multiple regressions #99 , which makes testing a bit more complicated 😛 )?

Will revise the request logging when switching API to gin. Agree with your observation.

@fako1024
Copy link
Collaborator

fako1024 commented Apr 8, 2023

Next test looks awesome alrady:

  • No more crashes
  • Hostname displayed correctly
  • All queries I've tested work (obviously that's just a very biased subset)
  • Query speed / performance is good, even for raw queries with lots of flows (>100k), also no errors / API issues during such queries (will be interesting so see how we can merge that with Protect raw/time queries #75 later)
  • Sorting looks good, both for aggregate and raw queries

Only found that the -resolve flag doesn't do anything (no error, no nothing), but I guess that's unrelated, right? Do we need an issue to fix that for the release?

Kudos, this looks like an awesome first shot. Can't wait to take a look at the API for integration into other frameworks... 😍 !!!

Only remark: As for the HTTP client interaction here I can recommend a cool & well maintained package from some dude (nudge) that simplifies handling of the HTTP client and makes the code more readable (and might also simplify extending the client functionality later on because it already comes with a lot of features, even production-level tested client certificate handling and the likes). Just saying 😉

This is it: https://github.com/fako1024/httpc

@els0r
Copy link
Owner Author

els0r commented Apr 9, 2023

Thanks for the feedback. Glad to see the results looking consistent. As for the resolve queries, those need a separate investigation.

Some other news:

  • have yourself a distributed query server with the server command of global-query 😎
  • httpc is in. Thanks for the pointer

It doesn't end here. I'll move the single call part of global-query to goQuery. Why? Because that's the CLI tool we use to run queries. Since there's already a client for global-query. Means you don't have to duplicate much and learn to call a new tool. Only add one or two config parameters to goQuery and you're done. Will keep you posted.

@fako1024
Copy link
Collaborator

fako1024 commented Apr 9, 2023

Nice! Moving the query stuff to goQuery sounds like a good plan. Basically being able to perform local and global queries (which use the same syntax / logic anyway) with a single tool is probably for the best.

On a different note though: With the recent changes of yesterday / today the query tool stopped working (independent of the server mode). Doing the same queries as yesterday I now get nothing (literally, no error, no feedback, just a return code of 1, so something probably didn't work the way it should):

└─ $ ▶ ./global-query --config ./local-config.yaml --hosts.querier.config api-client-querier.yaml -q fw-1,fw-2 -i eth0 -n 10 talk_conv
fako @ fako-x1 /tmp/goProbe/cmd/global-query (43-global-query *)
└─ $ ▶ echo $?
1

Diving a little deeper I figured as much: In cmd/global-query/cmd/root.go error handling is incomplete (the error is never shown and we just exit):

func Execute() {
        err := rootCmd.Execute()
        if err != nil {
                os.Exit(3)
        }
}

Throwing the error at that place at least tells me what's going on:

unknown command "talk_conv" for "global-query"

Could it be that the cobra command logic has a conflict between the flags / arguments of global-query and goQuery now that you're trying to handle both in one binary?

Just running in server mode seems to work:

└─ $ ▶ ./global-query --config ./local-config.yaml --hosts.querier.config api-client-querier.yaml  server
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /_query                   --> github.com/els0r/goProbe/pkg/global-query/server.(*Server).postQuery-fm (2 handlers)
ts=2023-04-09T10:25:42Z level=info caller=cmd/server.go:72 msg="starting API server" app_name=global-query app_version=devel addr=localhost:8145

@fako1024
Copy link
Collaborator

fako1024 commented Apr 9, 2023

Also maybe a more fundamental question: What exactly is it that I can POST to the endpoint? I figured it should be something along the lines of query.Args, but no matter what I POST to /_query I get nothing (again, no error, no feedback) so it's a bit hard for me to test this properly. Can you maybe provide me with some pointers?

@els0r
Copy link
Owner Author

els0r commented Apr 9, 2023

Interesting. Thanks for the feedback. Then, the latest commit messed up something.

The POST should get the query args and then return the result.

I'll look at it in the context of moving all the code in entrypoint to the appropriate routine in goQuery (which does the call using the global-query client).

Stay tuned

@els0r
Copy link
Owner Author

els0r commented Apr 10, 2023

Thanks for all the testing so far!!!

As for proper feedback/error messages: that still needs a bit of love on the client side (I may need your help with regards to how httpc is best used for that).

The recent commit can somewhat be called MVP for the entire query system. It should work on your machine. Invocation via goQuery works as follows:

./goQuery --query.server.addr <host>:<port> -i eth0 -f -5m -n 10 sip,dip -q fw-1,fw-2

or

./goQuery --config goquery-conf.yaml -i eth0 -f -5m -n 10 sip,dip -q a,b

The query server is started with

./global-query --config global-query-conf.yaml server --server.addr localhost:8888

Naturally, this system needs quite some testing in #74 to make sure there are no known regressions/edge cases that aren't covered.

@fako1024 : if you can confirm that it's now working in your infrastructure, I'll open the PR.

@fako1024
Copy link
Collaborator

fako1024 commented Apr 10, 2023

Alrighty, got it to run. Some feedback:

  • global-query is still very quite when I enter wrong parameters (unrelated to error handling when querying). At first I forgot that the tool is now only the server and tried to submit the query again. Silently exited, no feedback (probably related / similar issue as Command-line error handling is incoherent (and misses certain issues) #104 ).
  • When performing the query with goQuery it ignores the -n parameter (but only when querying more than one host via the -q parameter 🤣):
./goQuery --query.server.addr 127.0.0.1:8888 -i eth0 -f -10m -q fw-2 -n 10 sip,dip
...
Query stats          : displayed top 10 hits out of 10 in 16ms
###########################
./goQuery --query.server.addr 127.0.0.1:8888 -i eth0 -f -10m -q fw-1,fw-2 -n 10 sip,dip
...
Query stats          : displayed top 733 hits out of 733 in 21ms

I'm unsure if it's only that parameter but since I doubt that you manually re-coded every parameter chances are that more parameters might be affected that are not forwarded to the sensor host

  • For readability it might make sense to add some more love to the hostname / hostid columns (let's face it: when using goQuery nobody really cares about the hostid 😛). Maybe we could make this configurable and by default only display the hostname in interactive mode? Or shorten the hostid like a commit hash by default?
  • When calling goQuery against a non-running server it crashes:
└─ $ ▶ ./goQuery --query.server.addr 127.0.0.1:8888 -i eth0 -f -10m -q fw-1,fw-2 --resolve -n 10 -c "dip = 1.1.1.1" talk_conv
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x8f8b60]

goroutine 1 [running]:
github.com/els0r/goProbe/pkg/global-query/api/client.(*Client).Query.func1(0xe57780?, {0xc000094200?, 0xa1666e?})
	/tmp/goProbe/pkg/global-query/api/client/client.go:84
github.com/fako1024/httpc.(*Request).RunWithContext(0xc000094100, {0xaed440, 0xc0001b7580})
	/home/fako/Develop/go/pkg/mod/github.com/fako1024/httpc@v1.0.13/httpc.go:403 +0xb91
github.com/els0r/goProbe/pkg/global-query/api/client.(*Client).Query(0xc00007c040, {0xaed440, 0xc0001b7580}, 0x1?)
	/tmp/goProbe/pkg/global-query/api/client/client.go:91 +0x285
github.com/els0r/goProbe/pkg/global-query/api/client.(*Client).Run(0xc0001c1440?, {0xaed440?, 0xc0001b7580?}, 0x0?)
	/tmp/goProbe/pkg/global-query/api/client/client.go:70 +0x25
github.com/els0r/goProbe/cmd/goQuery/commands.entrypoint(0xe0d980, {0xc0001be2a0, 0x1, 0xe})
	/tmp/goProbe/cmd/goQuery/commands/root.go:269 +0xb02
github.com/spf13/cobra.(*Command).execute(0xe0d980, {0xc0001a6010, 0xe, 0xe})
	/home/fako/Develop/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0xe0d980)
	/home/fako/Develop/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/home/fako/Develop/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918
github.com/els0r/goProbe/cmd/goQuery/commands.Execute()
	/tmp/goProbe/cmd/goQuery/commands/root.go:47 +0x25
main.main()
	/tmp/goProbe/cmd/goQuery/main.go:6 +0x17

I think the reason is that you did specify a function, but no the intervals. I'll quickly check if that's something that needs handling in httpc (at least an error) or if I'm mistaken...

One more, unrelated thing: Is the Go API for the server abstract enough already so that it can be integrated into other tools (use case: I happen to have a central control server already and don't want to deploy another binary / independent microservice - instead I just want to integrate the global query API into my existing tool)?

@els0r
Copy link
Owner Author

els0r commented Apr 10, 2023

Thx! Will have a look at the open points.

As for your question if the Go API is abstract enough: I would say so - as long as you use gin-gonic as the API router 🤓. What I could imagine is to make the server.postQuery registration public.

@fako1024
Copy link
Collaborator

fako1024 commented Apr 10, 2023

Thx! Will have a look at the open points.

As for your question if the Go API is abstract enough: I would say so - as long as you use gin-gonic as the API router nerd_face. What I could imagine is to make the server.postQuery registration public.

That would of course be an interesting addition, right. But I was more referring to a "layer below". Let's assume the following scenario: I detect an interesting IP somewhere and now want to figure out if that IP appeared somewhere globally. I should be able to programmatically construct the query.Args in my Go code and then perform a query just like the server / goQuery does (after all, that's how you integrated it into goQuery). Question is only: How much work do I need to do?

@els0r
Copy link
Owner Author

els0r commented Apr 10, 2023

Answer would be: not much work 😄 . As you said, you can create the query args programmatically and then run it using the global-query client's Run (or Query) method. That's it. Job done.

I'm assuming, the whole thing would use JSON under the hood, since you are not planning to display text to a user, but further process the information retrieved.

If you do want to display it to a user, take note of the following code inside cmd/goQuery/commands/root.go:

		// make sure that the hostname is present in the query type (and therefore output)
		// The assumption being that a human will have better knowledge
		// of hostnames than of their ID counterparts
		if queryArgs.Format == "txt" {
			if !strings.Contains(queryArgs.Query, types.HostnameName) {
				queryArgs.Query += "," + types.HostnameName
			}
		}

That's just adding the label hostname to the output in case a human operator is looking at the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants