Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor metadata concept #50

Closed
4 tasks done
fako1024 opened this issue Feb 18, 2023 · 7 comments · Fixed by #53
Closed
4 tasks done

Refactor metadata concept #50

fako1024 opened this issue Feb 18, 2023 · 7 comments · Fixed by #53
Assignees
Labels
feature New feature or request performance Performance / optimization related topics
Milestone

Comments

@fako1024
Copy link
Collaborator

fako1024 commented Feb 18, 2023

Currently, GPFile block metadata is stored in .gpf.meta files, one metadata file per GPFile. On interface level this is invisible to the caller (GPFile essentially handling its respective header under the hood). Unfortunately there's a few issues with the whole concept that need to be tackled:

  • Both goProbe and goQuery have to open one additional (tiny) file for each day that is access, effectively doubling the amount of IOPS
  • The metadata information is stored and processed as a single line protocol (read / written line by line), causing a huge amount of overhead
  • The whole concept of a "hidden" metadata file that gets handled behind the scenes feels slightly unclean

DoD

  • Switch GPFile metadata storage to a single metadata file per "day" / folder
  • Choose highly efficient serialization protocol (and potentially compression)
  • Reading: Load metadata upon access to anything in a folder and inject into GPFile constructor (or maybe even better: abstract "Folder" that lazy-loads "Columns" (i.e. GPFiles) upon access, acting as manager)
  • Writing: Update metadata upon successful write of all blocks to director (caveat: what happens if there is a failure during write?)
@fako1024 fako1024 added feature New feature or request performance Performance / optimization related topics labels Feb 18, 2023
@fako1024 fako1024 added this to the v4 Release milestone Feb 18, 2023
@fako1024 fako1024 self-assigned this Feb 18, 2023
@fako1024
Copy link
Collaborator Author

fako1024 commented Feb 22, 2023

Implementation looks decent already. Latest commit (8af308741c49b4b696f5ac5f20c97ab83a0705d8) yields the following results compared to current develop (so on top of the already implemented optimizations, c.f. #44):

DB Size on disk:         959856 kiB (- 5 %)
Max resident memory:     655220 kiB (± 0 %)
Query runtime:            03.28 s   (-57 %)
CPU time:         (8.78 + 0.93) s   (-44 %)
Allocation size:           2.38 GiB (-31 %)
Allocated objects:     11245629     (-60 %)

As expected: This speeds up things quite a lot (the memory management improvements were mostly byproducts and not a direct consequence of the new metadata concept per se).

@els0r I assume effects could be even more pronounced on a system with a spinning disk, so whenever you got time, feel free to give it a shot on a host of your choice (obviously the DB has to be converted from scratch - luckily conversion is now about 5-10x faster than before as well).

Things done:

  • Hide all GPFile interaction behind a GPDir structure (representing a day of data)
  • Add customized binary data serialization for metadata (optimized for speed & size on disk without even having to rely on compression by means of delta encoding and removal of redundancies)
  • Use MMAP access for reading metadata to avoid any allocations and maximize throughput for these small but common accesses
  • Allow for sequential access to all blocks without having to go through hoops with timestamps / map access all around the place
  • Generalize & optimize memory pool concept for our use case (channel based / sync.Pool based depending on situation)
  • Support for bulk writes to DB (only used in legacy conversion tool to speed up things and reduce I/O load)
  • Refactor column concept (move to types sub-package as it is used in multiple places)

@fako1024
Copy link
Collaborator Author

fako1024 commented Feb 23, 2023

@els0r As for the legacy / additional metadata we still have:

type BlockMetadata struct {
	Timestamp            int64 `json:"timestamp"`
	PcapPacketsReceived  int   `json:"pcap_packets_received"`
	PcapPacketsDropped   int   `json:"pcap_packets_dropped"`
	PcapPacketsIfDropped int   `json:"pcap_packets_if_dropped"`
	PacketsLogged        int   `json:"packets_logged"`

	// As in Summary
	FlowCount uint64 `json:"flowcount"`
	Traffic   uint64 `json:"traffic"`
}

My proposal about those:

  • Timestamp -> Drop (directly available from dir and / or block timestamps)
  • PcapPacketsReceived -> Drop (Can be computed on metadata level from sum of packets in all flows minus packets dropped if required)
  • PcapPacketsDropped / PcapPacketsIfDropped -> Merge into one variable (PacketsDropped) because most sources don't even distinguish the "type" of drop and put into the new GPDir metadata (because that data is interesting)
  • PacketsLogged -> Drop (Not sure what that even is TBH, but if something was logged we can probably determine that number from elsewhere if required)
  • FlowCount -> Drop (Can be computed on metadata level from sum of length of blocks)
  • Traffic -> Drop (Can be computed on metadata level from sum of all bytes of blocks)

So from my point of view that leaves us with only the number of dropped packets on capture level (per block) to keep. What do you think? And should I take care of adding the PacketsDropped to the new metadata on the current branch (and leaving the cleanup of the "old" metadata to you as discussed previously)?

@fako1024
Copy link
Collaborator Author

fako1024 commented Feb 23, 2023

Some additional minor improvements to memory management (LZ4 cgo interface typing and some trivial zero-allocation string handling when dealing with paths) yield a further reduction in number of allocated objects (rest isn't affected within uncertainties since these objects are small and don't incur any significant CPU load):

Allocated objects:     7464476     (-74 % w.r.t. develop)

@els0r
Copy link
Owner

els0r commented Feb 24, 2023

Alright. Running the tests as we speak. Still in the process, since the conversion tool craps out for some days of data:

ERRO[0007] failed to get block from file set: unexpected amount of bytes after decompression, want 248, have -13
ERRO[0007] failed to get block from file set: unexpected amount of bytes after decompression, want 828, have -13
ERRO[0007] failed to get block from file set: unexpected amount of bytes after decompression, want 1172, have -13
ERRO[0007] failed to get block from file set: unexpected amount of bytes after decompression, want 544, have -13
ERRO[0007] failed to get block from file set: unexpected amount of bytes after decompression, want 932, have -13
ERRO[0007] failed to get block from file set: unexpected amount of bytes after decompression, want 436, have -13
panic: runtime error: index out of range [0] with length 0

goroutine 6 [running]:
github.com/els0r/goProbe/pkg/goDB/storage/gpfile.(*GPDir).Marshal(0xc000000000, {0x698840, 0xc00019a130})
	workspace/goProbe/pkg/goDB/storage/gpfile/gpdir.go:249 +0x565
github.com/els0r/goProbe/pkg/goDB/storage/gpfile.(*GPDir).Close(0xc000000000)
	workspace/goProbe/pkg/goDB/storage/gpfile/gpdir.go:330 +0x271
github.com/els0r/goProbe/pkg/goDB.(*DBWriter).WriteBulk(0xc000135e50, {0x0, 0x0, 0x25?}, 0xc000135d98?)
	workspace/goProbe/pkg/goDB/db_writer.go:186 +0x394
main.converter.convertDir({{0x7ffd2a1a4c6c?, 0x4?}, 0xc000016300?}, {{0xc00001e6b9?, 0xc000135f80?}, {0xc00020a080?, 0x1?}}, 0x0)
	workspace/goProbe/cmd/legacy/main.go:214 +0x8ed
main.main.func1()
	workspace/goProbe/cmd/legacy/main.go:60 +0x153
created by main.main
	workspace/goProbe/cmd/legacy/main.go:58 +0x2b2

I suspect this is due to faulty data inside of the origin database, but am wondering whether we should just skip and continue writing.

As for the tests, I also ran into some other panics while querying:

./goQuery-60071cc5 -d db-v4-60071cc5 -i lacp1 -f -15m  sip,dip,dport,proto -n 5
panic: runtime error: slice bounds out of range [:8] with capacity 0

goroutine 1 [running]:
github.com/els0r/goProbe/pkg/goDB/storage/gpfile.(*GPDir).Unmarshal(0xc00016aaa0, {0x86d578?, 0xc0000143c8?})
	workspace/goProbe/pkg/goDB/storage/gpfile/gpdir.go:183 +0x76f
github.com/els0r/goProbe/pkg/goDB/storage/gpfile.(*GPDir).Open(0xc00016aaa0)
	workspace/goProbe/pkg/goDB/storage/gpfile/gpdir.go:111 +0x2ef
github.com/els0r/goProbe/pkg/goDB.(*DBWorkManager).CreateWorkerJobs(0xc000133880, 0x63f8203a, 0x63f823be, 0xc000138120)
	workspace/goProbe/pkg/goDB/DBWorkManager.go:126 +0x4ea
github.com/els0r/goProbe/pkg/query.createWorkManager({0x7ffca0373c4f?, 0x400?}, {0x7ffca0373c61, 0x5}, 0x203000?, 0x41799f?, 0x1000?, 0x7f87f0fcffff?)
	workspace/goProbe/pkg/query/query.go:449 +0xfa
github.com/els0r/goProbe/pkg/query.(*Statement).Execute(0xc000152000, {0x86d298, 0xc00001c170})
	workspace/goProbe/pkg/query/query.go:234 +0x948
github.com/els0r/goProbe/cmd/goQuery/commands.entrypoint(0xa48780, {0xc000138090, 0x1, 0x9})
	workspace/goProbe/cmd/goQuery/commands/root.go:184 +0x81e
github.com/spf13/cobra.(*Command).execute(0xa48780, {0xc000000150, 0x9, 0x9})
	/root/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0xa48780)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/root/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918
github.com/els0r/goProbe/cmd/goQuery/commands.Execute()
	workspace/goProbe/cmd/goQuery/commands/root.go:36 +0x25
main.main()
	workspace/goProbe/cmd/goQuery/main.go:6 +0x17

Not sure if the latter is related to the former in the sense that the conversion failed altogether.

@fako1024
Copy link
Collaborator Author

fako1024 commented Feb 24, 2023

Thanks for testing! There's a few things that come to mind:

  • Obviously we should continue with the conversion, I'll modify the conversion tool accordingly.
  • The panic(s) shouldn't happen, I suspect I just missed a bounds check on the input / output, never assuming an empty slice. Will fix that, too.

Aside from those two obvious things I'd like to make sure that the source DB is really corrupt and that there's no error on our end. Is there any way you could check the affected blocks during conversion to for consistency? Also, is the source a full legacy DB, or are you converting from the "intermediate" DB you've converted from an earlier stage of our current efforts (I'm asking because I'm not sure from the traces what kind of source it is - in theory it should support both the legacy and more "modern" format, but if it's the latter it may just be an error on my part since I didn't have a DB to test that path on).

Recent commit(s) contain fixes for the above issues. Could you maybe try the following to make sure this is a corruption on the source: Run a legacy goQuery on the pre-conversion DB in console logging mode, if those blocks are indeed corrupt it should throw an error on all of them. If that's the case, all good, otherwise we're probably missing something on the conversion side (e.g. a mismatch between the cloned lz4cust code part or similar)...

@els0r
Copy link
Owner

els0r commented Feb 24, 2023

And something to sweeten your weekend. Conversion of a 19G DB in 5 minutes.
Easy peasy. With that speed, I have no problem just running an sshdo before
the rollout to v4.

[host:/shared/data/OSAGtmi]# du -sh db
19G db
[host:/shared/data/OSAGtmi]# du -sh db-v4-60071cc5/
12G db-v4-60071cc5/

Let's take the heaviest interface we can find:

[host:/shared/data/OSAGtmi]# goquery --list

                                           Iface    # of flows      Traffic From                  Until
                                       ---------    ----------    --------- -------------------    -------------------
...
                                           eth13      326.93 M     44.75 TB 2022-08-28 02:01:33    2023-02-24 18:54:36

(see what I did there ;)). That should be enough to get goquery to sweat a little bit.

[host:/shared/data/OSAGtmi]# goquery -i eth13 -f 1661644800 -l 1664496000 -n 5 sip,dip,dport,proto

                                              packets   packets bytes      bytes
           sip            dip  dport  proto        in       out     % in        out      %
  10.103.64.25  10.224.200.80    445    TCP  167.17 M  414.97 M  3.23 960.68 GB   64.48 GB  14.77
   172.30.0.18   10.103.15.25    514    UDP    0.00    181.55 M  1.01 0.00  B  122.77 GB   1.77
   10.99.64.93   10.103.15.25    514    UDP    0.00    123.12 M  0.68 0.00  B   91.84 GB   1.32
  10.100.1.228      10.14.4.7    445    TCP   14.81 M   13.19 M  0.16 890.91 MB   90.85 GB   1.32
  10.224.15.51   10.103.15.25    514    UDP    0.00    116.55 M  0.65 0.00  B   75.24 GB   1.08
                                                  ...       ...
 ...        ...
                                               7.90 G   10.14 G 3.09 TB    3.69 TB

       Totals:                                          18.04 G
        6.78 TB

Timespan / Interface : [2022-08-28 01:56:33, 2022-09-30 02:01:33] / eth13
Sorted by            : accumulated data volume (sent and received)
Query stats          : 1.03 M hits in 4.6s
Conditions:          : dport != 5551

With the new version that boils down to

[host:/shared/data/OSAGtmi]# ./goQuery-60071cc5 -d db-v4-60071cc5 -i eth13 -c "dport != 5551" -f 1661644800 -l 1664496000 -n 5 sip,dip,dport,proto

                                              packets   packets bytes      bytes
           sip            dip  dport  proto        in       out     % in        out      %
  10.103.64.25  10.224.200.80    445    TCP  167.17 M  414.97 M  3.23 960.68 GB   64.48 GB  14.77
   172.30.0.18   10.103.15.25    514    UDP    0.00    181.54 M  1.01 0.00  B  122.76 GB   1.77 <-- this one is due to the slightly different end dates. If I use '2022-09-30 02:01:33', the numbers are identical
   10.99.64.93   10.103.15.25    514    UDP    0.00    123.12 M  0.68 0.00  B   91.84 GB   1.32
  10.100.1.228      10.14.4.7    445    TCP   14.81 M   13.19 M  0.16 890.91 MB   90.85 GB   1.32
  10.224.15.51   10.103.15.25    514    UDP    0.00    116.55 M  0.65 0.00  B   75.23 GB   1.08
                                                  ...       ...  ...        ...
                                               7.90 G   10.14 G 3.09 TB    3.69 TB

       Totals:                                          18.04 G
        6.78 TB

Timespan / Interface : [2022-08-28 01:56:33, 2022-09-30 02:00:00] / eth13
Sorted by            : accumulated data volume (sent and received)
Query stats          : displayed top 5 hits out of 1.03 M in 2.9s
Conditions:          : dport != 5551

You essentially chopped query times in half. <3 seconds to crawl through
flow metadata of 1/3 of a billion flows and 44 TB of traffic: drop the mic
and grab a brewski.

Toodles,
L

@fako1024
Copy link
Collaborator Author

fako1024 commented Feb 25, 2023

Nice, like it! Thanks a lot for confirming! I hope it's going to be even better once the changes from #47 are completed and merged. I would assume that the reduce GC load is even more pronounced the bigger the underlying DB is... Stay tuned!

Aside from that, the latest commit on this branch removed the legacy .meta information in favor of a streamlined singular set of metadata governed by GPDir (as discussed yesterday). It should now contain all interesting global and per-block stats while still minimizing the storage requirements / access overhead. legacy tool got faster (didn't measure, but I'd say another factor of x2 or so because it doesn't have to deal with writing the duplicate / slow metadata anymore). In addition, the interface summary.json has been removed as well. Consequently, the list target is broken now (I left a TODO there). On the other hand, the any (or sub-set of interfaces) query now works even after conversion again because I've already provided a quick shot implementation to fetch the pure list of interfaces from the DB.

The DB got even smaller by another 8% w.r.t to the results posted above (probably less in your scenario because you have a better raw to metadata ratio to begin with, but for a "commodity" host / server like mine it's substantial):

DB Size on disk:         882008 kiB (- 8 %)

I'll finish up, extend the analyze-metadata tool to provide some insights into this new data and submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request performance Performance / optimization related topics
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants