Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to GoFlow2 for adding IPFIX dataLinkFrameSection support #75

Closed
ChrisDeh opened this issue Nov 2, 2021 · 10 comments
Closed

Move to GoFlow2 for adding IPFIX dataLinkFrameSection support #75

ChrisDeh opened this issue Nov 2, 2021 · 10 comments
Labels
ingest Issues regarding ingesting Data into NetMeta

Comments

@ChrisDeh
Copy link

ChrisDeh commented Nov 2, 2021

After opening a feature request for adding dataLinkFrameSection Support to goflow (cloudflare/goflow#111), i was told this is available in a fork of goflow called goflow2 (https://github.com/netsampler/goflow2)

I'd like to discuss the possibitly for moving to goflow2 as this looks way more active as a project than cloudflares goflow and improvements seem to be more likely.

@leoluk
Copy link
Member

leoluk commented Nov 2, 2021

Agreed, goflow2 is the way to go. Would you be interested in submitting a PR?

@ChrisDeh
Copy link
Author

ChrisDeh commented Nov 2, 2021

I'll try to submit a PR for that

@leoluk
Copy link
Member

leoluk commented Nov 4, 2021

FYI so you don't fall down the same rabbit hole: I gave it a quick try and it appears there are dependency conflicts that Bazel can't resolve. Will look into it.

leoluk added a commit that referenced this issue Nov 4, 2021
portmirror needs to be fixed in a follow-up.

Fixes #75
leoluk added a commit that referenced this issue Nov 5, 2021
portmirror needs to be migrated in a follow-up
in order to get rid of the dependency. The protos
are compatible, so it should continue to work.

Fixes #75
@fionera
Copy link
Collaborator

fionera commented Sep 14, 2022

goflow2 development is also already getting slower and they arent really sure how they want to do things:

Sadly goflow2 on our develop branch doesnt work with kafka because they changed the protobuf file multiple times and clickhouse doesnt support reading the new format with the old proto file. Thats one of the reasons that the develop branch is broken right now.

We have these options for fixing it:

  • Continue using (legacy) goflow
  • Continue using goflow2 and update the protofile (with the possibility that it gets changed again)
    • I saw the ProtobufSingle in Clickhouse and have to check if it works with our usecase and maybe supports the different files
  • Continue using goflow2, switch to JSON as format and use a VIEW in Clickhouse to adapt to possible changes (this should be benchmarked and checked how much overhead it created)
  • Fork goflow2 and maintain it ourself

@leoluk
Copy link
Member

leoluk commented Sep 15, 2022

Continue using (legacy) goflow

There is nothing obviously wrong with this - it works fine as-is. It just isn't very actively developed, but that may be a feature :-)

How did it break? The whole point of using protobuf in the first place is maintaining backwards compatibility, so I'm surprised it no longer works. None of the changes here seem problematic: https://github.com/netsampler/goflow2/commits/main/pb/flow.proto

@fionera
Copy link
Collaborator

fionera commented Sep 15, 2022

I dont know anymore if I checked the resulting data but the clickhouse was always throwing the following exception:

2022.07.22 11:30:27.119927 [ 80 ] {} void DB::StorageKafka::threadFunc(): Code: 444, e.displayText() = DB::Exception: Protobuf messages are corrupted or don't match the provided schema. Please note that Protobuf stream is length-delimited: every message is prefixed by its length in varint., Stack trace (when copying this message, always include the lines below)

So either the protobuf message changes in combination with the required length-delimiter is not working or goflow2 doesnt write it length-delimited. I will test that again

@leoluk
Copy link
Member

leoluk commented Sep 15, 2022

Oh no: https://sourcegraph.com/github.com/ClickHouse/ClickHouse@9f5cd35a6963cc556a51218b46b0754dcac7306a/-/blob/src/Formats/ProtobufReader.cpp?L382%3A5=#tab=references

That could pretty much be anything. Might require some ClickHouse gdb-ing to figure out...

@leoluk
Copy link
Member

leoluk commented Sep 15, 2022

Missing fixed length would be my main suspicion. Other than that, it's actually pretty hard to break proto compatibility (the field names aren't in the binary format, so unless the field numbers or types change, nothing should break even if everything is renamed like goflow2 did).

@lspgn
Copy link

lspgn commented Oct 7, 2022

Hello!
I'm the maintainer of GoFlow2.
I did some breaking changes a while ago as I wanted a clean slate originally but protobuf should have more or less the same fields.
You need to use the following in the CLI arguments of GoFlow2:

  • -format=pb ; and
  • -format.protobuf.fixedlen=true

Just make sure to use the protobuf with the names corresponding to the column names.

I just tried it locally with ClickHouse 22.6.9.11 and 21.5.6.6. No issue.

Let me know if I can help

@fionera fionera added the ingest Issues regarding ingesting Data into NetMeta label Dec 11, 2022
@fionera
Copy link
Collaborator

fionera commented Apr 17, 2023

Implemented with f4beb9c

@fionera fionera closed this as completed Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingest Issues regarding ingesting Data into NetMeta
Projects
None yet
Development

No branches or pull requests

4 participants