Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Run processors in config order #12113

Merged
merged 5 commits into from
Nov 8, 2022
Merged

fix: Run processors in config order #12113

merged 5 commits into from
Nov 8, 2022

Conversation

sspaink
Copy link
Contributor

@sspaink sspaink commented Oct 27, 2022

resolves #8016

This PR updates the sorting logic for processors so that the line number in the TOML config is considered. I made the design decision that if a user defined order is provided it will take precedence over the defined configuration order. So even if only one processors is defined with an order, and its order is high (e.g. "order=100") it will be run first. While it wasn't documented explicitly before, the examples made it seem order is required to start at 1 and for this change it makes sense to enforce it so that we can detect if an order is defined (default value will be of course 0). I updated the documentation accordingly, please advise if it isn't clear enough.

To make sure this solution works I ran the test 10,000 times locally, to ensure the new sorting works as expected.

go test -run TestConfig_MultipleProcessorsOrder -count 10000 ./config

If you use the current sorting logic instead, and run the same test multiple times it will fail.

func (rp RunningProcessors) Less(i, j int) bool { return rp[i].Config.Order < rp[j].Config.Order }

The reason the processors are random without sorting based on line number, is because the TOML AST table provides a map containing all the plugins and Go doesn't guarantee the order for maps. Duplicate plugins are actually stored in the same map key, and each duplicate plugins unique configuration is stored in a slice so these are sorted correctly. Therefore this random behavior is noticeable when you define multiple unique processor plugins. Therefore the test data I added makes sure to define multiple different processors in the test config so that the problem arises quickly when you run the test multiple times.

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Oct 27, 2022
@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Oct 27, 2022
@sspaink sspaink changed the title feat: Run processors in config order fix: Run processors in config order Oct 27, 2022
@sspaink sspaink added fix pr to fix corresponding bug and removed feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Oct 27, 2022
Copy link
Contributor

@MyaLongmire MyaLongmire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks good. Should there also be ordering for aggregators?

@srebhan
Copy link
Contributor

srebhan commented Oct 27, 2022

Hmmm using the line will now randomly mix the order if you use multiple files, doesn't it? Before the order was kept within the files at least.

Sebastian Spaink added 2 commits October 31, 2022 12:55
Order needs to be sorted across files
Non-ordered processors take precedence
@sspaink
Copy link
Contributor Author

sspaink commented Oct 31, 2022

@srebhan good catch! I hadn't considered what would happen when using multiple configuration files. So before all processors were added to c.Processors and then sorted each time LoadConfigData is called with

return rp[i].Config.Order < rp[j].Config.Order

To make sure I got my logic right, so that means:

  1. Processors without a defined order will run before processors with a defined order
  2. When using multiple config files with processors that don't have a order, will stay in place
  3. When using multiple config files with processors that have a defined order, they will be sorted against processors in all files

I've updated the code so that the above properties are maintained and that the processors are only sorted based on line for each file by associating each processor with a unique ID.

So the order:

  1. Processors without "order" for file 1 sorted based on config order
  2. Processors without "order" for file 2 sorted based on config order
  3. Processors without "order" for file ??? sorted based on config order
  4. Processors with "order" across all files sorted based on provided number

Does this seem like expected behavior?

@sspaink
Copy link
Contributor Author

sspaink commented Nov 2, 2022

@srebhan did you want to take another look before we merged it?

@Hipska Hipska self-requested a review November 8, 2022 12:47
Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement!

I think this will make the order config less useful as you can now just be sure by reordering the config file?
Nevermind this, It still makes sense to have a processor in a second config file you want to be executed between other processors from a previous config file.

docs/CONFIGURATION.md Outdated Show resolved Hide resolved
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Nov 8, 2022

@sspaink sspaink merged commit 6a29dcb into master Nov 8, 2022
@sspaink sspaink deleted the processorconfig branch November 8, 2022 21:16
@Hipska
Copy link
Contributor

Hipska commented Nov 9, 2022

Hey guys, I didn't have time for final review..

The tests aren't specific enough as processors have the same name, they should also have an alias or something uniquely identifying in the expectedOrder.

@matthijskooijman
Copy link
Contributor

While it wasn't documented explicitly before, the examples made it seem order is required to start at 1 and for this change it makes sense to enforce it so that we can detect if an order is defined (default value will be of course 0). I updated the documentation accordingly, please advise if it isn't clear enough.

Looking at the code, it does not seem that this is (still) true, I do not see any code that enforces order to start at 1. Also, I would suggest to not require this at all, since it makes maintaining config files easier if you can just do (BASIC-style) order of 10,20,30, etc. so you can easily insert some processors in the chain later without having to renumber everything, or you can add some organization to your processors by putting all that needs to be early in e.g. 100-200, later ones in 200-300, etc.

@srebhan
Copy link
Contributor

srebhan commented Nov 9, 2022

Sorry, I didn't find the time to take another look yet. My proposal would have been to initialize Order to some high number in the processor model and use stable sort. I think this PR pretends to guarantee a certain order but actually might generate a random order from config loading...

@matthijskooijman
Copy link
Contributor

Sorry, I didn't find the time to take another look yet. My proposal would have been to initialize Order to some high number in the processor model and use stable sort. I think this PR pretends to guarantee a certain order but actually might generate a random order from config loading...

That was also my impression from looking at the code - that sorting is not stable, though I believe it does always result in an order that satisfies the ordering requirements (e.g. all unordered before ordered etc).

However, I do also think that ordering could become simpler and stable by implementing something like:

  1. Compare elements by order (for entries without order, just use 0, or MAX_INT maybe)
  2. If order is equal, compare by file id
  3. If file id is equal, compare by line number

Step 2 could be problematic if file ids are not predictable (but I guess they could be made predictable by loading files in lexical order). Omitting step 2 would prevent stable sorting (you might get ties in the line number comparison), though you could maybe swap step 2 and 3 so line number decides the order and file id is only used to break ties.

@Hipska
Copy link
Contributor

Hipska commented Nov 9, 2022

In my idea the loading of config files should be in this order:

  1. telegraf.conf
  2. telegraf.d/*.conf (alphabetical order)

Thus the resulting config and their ordering would be like if you had 1 conf file with all these files concatenated in the order defined above.

@srebhan
Copy link
Contributor

srebhan commented Nov 9, 2022

However, I do also think that ordering could become simpler and stable by implementing something like:

  1. Compare elements by order (for entries without order, just use 0, or MAX_INT maybe)
  2. If order is equal, compare by file id
  3. If file id is equal, compare by line number

I really don't get the reason for all this "order by line"... First of all, if you need a guaranteed order, use order, otherwise you don't case obviously (which is true for the majority of use-cases IMO). Everything else is trying to provide an order where we cannot make any guarantee. If we specify a directory, files might get loaded in random order, so we should sort those. Then what if we do have multiple directories? Do you sort by order of commandline-options or by name of the directories or by the files within? I really do have the fear that this looks like we guarantee an order but actually do not.

And using the line number is nonsense IMO as the TOML parser will give the plugins in the order specified in the file. So a stable-sort will preserve this order. The only question is should ordered processors be executed before unordered ones or after. Before this PR, they were executed before (order for them was initialized to zero) the ordered ones. With this PR, the ordering is enormously complex:

  • if we use only a single config file, the unordered ones are first, then the ordered ones with line-number as criterium if order is not decisive
  • if every processor is defined in a separate file without order specified, the ordering is as before (unordered first, the ordered in order ;-))
  • if multiple files with mixed order and unordered, we get an interlaced complex ordering pattern. The processors get grouped by file first (in random order as uuid is random) and within each group we get pattern 2.

I think you can get more easily understandable orderings with less code by just using the "old" way and use a stable sort under the assumption that the TOML parser will provide the plugin sections in order. Just my two cent...

@matthijskooijman
Copy link
Contributor

I really don't get the reason for all this "order by line"... First of all, if you need a guaranteed order, use order, otherwise you don't case obviously (which is true for the majority of use-cases IMO).

For me, having to explicitly specify order is cumbersome and prone to mistakes. Simply writing the processors in the order they need to run is, to me, a lot more intuitive and makes the config more readable by providing visual structure. But I can see this can be largely a matter of taste.

One reason to want to require explicit ordering could be if, lacking an explicit order, you'd want to support running processors in parallel, but I don't think the current config structure allows for this at all. So if you have to pick some order, might as well use the file order.

If we specify a directory, files might get loaded in random order, so we should sort those. Then what if we do have multiple directories? Do you sort by order of commandline-options or by name of the directories or by the files within? I really do have the fear that this looks like we guarantee an order but actually do not.

That's a good point, but it also seems this is solvable. Alternatively, you could ignore this and only do order-by-file-order within each file, with no guarantees about how files are ordered relative to each other (which I think is how I'd be using file ordering anyway, though maybe others have other expectations).

I think you can get more easily understandable orderings with less code by just using the "old" way and use a stable sort under the assumption that the TOML parser will provide the plugin sections in order. Just my two cent...

That sounds like a reasonable approach, yeah. I guess that (at least that behavior) is what I sortof aimed for in my previous comment, except I misunderstood the meaning of stablesort (thinking it referred to a sorting criterium that would result in an unambiguous/stable ordering, but I see now that it is a sort algorithm that preserves relative ordering of equal keys).

@sspaink
Copy link
Contributor Author

sspaink commented Nov 9, 2022

Seems this was merged too quickly, sorry about that! Thanks everyone for the input but I am not entirely following what the problem is quite yet, so couple of questions.

And using the line number is nonsense IMO as the TOML parser will give the plugins in the order specified in the file. So a stable-sort will preserve this order.

@srebhan I was under the impression this isn't the case, the TOML parser returns the plugins stored in a map (https://github.com/influxdata/telegraf/blob/master/config/config.go#L455) so it isn't guaranteed the order returned matches the file order. Is there another place the parser does maintain the order that we can use?

if multiple files with mixed order and unordered, we get an interlaced complex ordering pattern. The processors get grouped by file first (in random order as uuid is random) and within each group we get pattern 2.

I intended for the multiple configuration file sorting behavior to stay the same as before, were processor plugins with a defined "order" will be sorted based on all plugins across all files. And the only grouping will be for the plugins without a defined order, which should stay the same because isn't return rp[i].Config.Order < rp[j].Config.Order a stable comparison? No swapping will be made if the values are equal? I definitely could be overlooking something here but I not sure how this change is different from before?

Alternatively, you could ignore this and only do order-by-file-order within each file, with no guarantees about how files are ordered relative to each other

From the discussion in the issue this was what I understood as the expected behavior, the only exception would be maintaining previous behavior when an "order" is defined.

@Hipska
Copy link
Contributor

Hipska commented Nov 9, 2022

@sspaink could you already fix the current tests so there is no ambiguity in the expected plugin order list?

@srebhan
Copy link
Contributor

srebhan commented Nov 9, 2022

@sspaink this one isn't problematic as it will only return the category (inputs, outputs, etc) in random order, but I completely overlooked this map which is indeed problematic as you say. Now from what I want is to rather sort the processors here by line before adding them to c.Processors and at the end of loading sort the processors by order with a stable sort. This greatly simplifies the code in the model, makes config-loading specific fields superfluous in the model and guarantees the loading order to be preserved with the files. What do you think?

Btw. the ID you introduced will give me great headaches in #12166... :-)

@sspaink
Copy link
Contributor Author

sspaink commented Nov 9, 2022

After discussion I think we are clear on the next steps to help reduce the complexity of the sort and ensure that multiple configuration files are handled as expected.

  1. For each file sort all the processor plugins by line number and keep this list in a separate slice
  2. Once all files have been processed, finalize c.Processor by appending all separate slices
  3. Sort c.Processor by order as it did originally

I'll work on creating a new pull request to implement this and I will look into updating the tests as well so there is no ambiguity.

@Hipska
Copy link
Contributor

Hipska commented Nov 10, 2022

Sounds correct, let us know here the PR number 😉

@srebhan srebhan mentioned this pull request Nov 30, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

processors should run in the order they appear in the config
7 participants