Skip to content

Conversation

@KalmanMeth
Copy link
Collaborator

@KalmanMeth KalmanMeth commented Feb 16, 2022

Took some inspiration from the go-pipes.
Allow the same ingest to be directed to multiple pipelines.
We allow a single ingest and a single decode stage.
After that, we allow any order of transform, extract, encode, write stages.
Sample syntax:

pipeline:
  - stage: ingest
    name: ingest1
    params:
      ingest:
        type: file_loop
        file:
          filename: playground/goflow2_input.txt
  - stage: decode
    name: decode1
    follows: ingest1
    params:
      decode:
        type: json
  - stage: transform
    name: generic1
    follows: decode1
    params:
      transform:
        type: generic
        generic:
          rules:
             ..........
  - stage: write
    name: write1
    follows: generic1
    params:
      write:
        type: stdout
  - stage: transform
    name: generic2
    follows: decode1
    params:
      transform:
        type: generic
        generic:
          rules:
             .......
  - stage: write
    name: write2
    follows: generic2
    params:
      write:
        type: stdout

This example shows 2 generic transforms receiving the same flows from the decoder and performing different transforms and different write with them.

@KalmanMeth
Copy link
Collaborator Author

The current state of the code does not allow command-line parameters. With multiple repeating stages in any order, we need to be explicit as to which stage (with its parameters) comes after which other stage. Let's discuss what command-line parameters can still be provided.

@KalmanMeth
Copy link
Collaborator Author

@eranra @mariomac I would like to have some discussion on the syntax of the yaml config file and command-line parameters possibilities.

@KalmanMeth
Copy link
Collaborator Author

Revised format of the yaml file:

pipeline:
  - stage: ingest
    name: ingest1
  - stage: decode
    name: decode1
    follows: ingest1
  - stage: transform
    name: generic1
    follows: decode1
  - stage: write
    name: write1
    follows: generic1
  - stage: transform
    name: generic2
    follows: decode1
  - stage: write
    name: write2
    follows: generic2
parameters:
  - name: decode1
    decode:
      type: json
  - name: generic1
    transform:
      type: generic
      generic:
        rules:
          - input: Bytes
            output: v1_bytes
          - input: DstAddr
            output: v1_dstAddr
          - input: Packets
            output: v1_packets
          - input: SrcPort
            output: v1_srcPort
  - name: write1
    write:
      type: stdout
  - name: generic2
    transform:
      type: generic
      generic:
        rules:
          - input: Bytes
            output: v2_bytes
          - input: DstAddr
            output: v2_dstAddr
          - input: Packets
            output: v2_packets
          - input: SrcPort
            output: v2_srcPort
  - name: ingest1
    ingest:
      type: file_loop
      file:
        filename: playground/goflow2_input.txt
  - name: write2
    write:
      type: stdout

I updated all the tests to work with the new format.
write-loki is not working properly yet to combine the default settings with those specified in the yaml.

@KalmanMeth
Copy link
Collaborator Author

KalmanMeth commented Feb 28, 2022

Refactored some code and fixed the tests for write-loki. Updated the config format to remove redundancy. New format:

pipeline:
  - name: ingest1
  - name: decode1
    follows: ingest1
  - name: generic1
    follows: decode1
  - name: write1
    follows: generic1
  - name: generic2
    follows: decode1
  - name: write2
    follows: generic2
parameters:
  - name: decode1
    decode:
      type: json
  - name: generic1
    transform:
      type: generic
      generic:
        rules:
          - input: Bytes
            output: v1_bytes
          - input: DstAddr
            output: v1_dstAddr
          - input: Packets
            output: v1_packets
          - input: SrcPort
            output: v1_srcPort
  - name: write1
    write:
      type: stdout
  - name: generic2
    transform:
      type: generic
      generic:
        rules:
          - input: Bytes
            output: v2_bytes
          - input: DstAddr
            output: v2_dstAddr
          - input: Packets
            output: v2_packets
          - input: SrcPort
            output: v2_srcPort
  - name: ingest1
    ingest:
      type: file_loop
      file:
        filename: playground/goflow2_input.txt
  - name: write2
    write:
      type: stdout

@KalmanMeth KalmanMeth changed the title multipipe: work-in-progress multipipe Feb 28, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2022

Codecov Report

Merging #83 (e4bf6c3) into main (510ca74) will decrease coverage by 1.62%.
The diff coverage is 52.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #83      +/-   ##
==========================================
- Coverage   56.05%   54.43%   -1.63%     
==========================================
  Files          45       46       +1     
  Lines        2501     2662     +161     
==========================================
+ Hits         1402     1449      +47     
- Misses       1016     1118     +102     
- Partials       83       95      +12     
Flag Coverage Δ
unittests 54.43% <52.95%> (-1.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/flowlogs-pipeline/main.go 0.00% <0.00%> (ø)
pkg/confgen/confgen.go 46.87% <0.00%> (ø)
pkg/confgen/flowlogs2metrics_config.go 0.00% <0.00%> (ø)
pkg/pipeline/extract/aggregate/aggregate.go 97.00% <ø> (ø)
pkg/pipeline/ingest/ingest_collector.go 0.00% <0.00%> (ø)
pkg/pipeline/utils/exit.go 100.00% <ø> (ø)
pkg/config/config.go 50.00% <50.00%> (ø)
pkg/pipeline/extract/aggregate/aggregates.go 74.28% <50.00%> (+5.39%) ⬆️
pkg/pipeline/pipeline.go 55.24% <56.45%> (-19.12%) ⬇️
pkg/test/utils.go 78.94% <60.00%> (-21.06%) ⬇️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 510ca74...e4bf6c3. Read the comment docs.

- name: write1
write:
type: stdout
>>>>>>> implemented multi-pipe and ported tests; Loki not working properly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems to have been leaked from a merge resolution.

Copy link
Collaborator

@ronensc ronensc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR includes many changes, so the review has many comments too...

A general note, I've seen many variable names with json in their name while their actual type is a map[string]interface{}.
Personally, I wouldn't include json in their name because they are in an internal format, not in json format.

log "github.com/sirupsen/logrus"
)

// for specified params structure, return its corresponding (json) string from config.parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@ronensc
Copy link
Collaborator

ronensc commented Mar 1, 2022

@KalmanMeth you probably know but just to make sure, in the Conversation tab, GitHub hides 15 of my comments because there are too many. You need to click "Load more" to view them.

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth -- I am missing the api.md file. It should be updated based on all the changes and improvements you did to the API

@eranra
Copy link
Collaborator

eranra commented Mar 1, 2022

@KalmanMeththe flowlogs-pipeline.conf.yaml file need to be updated to represent the new structure ... in order for that file to be generated you need to change the https://github.com/netobserv/flowlogs-pipeline/blob/be404e4866d9f957d0e66bb4bd8f638abae6a7b1/pkg/confgen/flowlogs2metrics_config.go to create the configuration in the new format

BTW: you need to execute make generate-configuration to create the flowlogs-pipeline.conf.yaml file

@KalmanMeth
Copy link
Collaborator Author

@KalmanMeth you probably know but just to make sure, in the Conversation tab, GitHub hides 15 of my comments because there are too many. You need to click "Load more" to view them.

I missed them earlier. Now they are addressed.

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :-)

@KalmanMeth KalmanMeth merged commit 6d1be78 into netobserv:main Mar 2, 2022
@KalmanMeth KalmanMeth deleted the multipipe branch January 29, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants