-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support regex in topic protobuf mappings #584
Conversation
|
I fixed merge conflicts. @weeco / @rikimaru0345 can you take a look on this? |
Hey @upils , |
@weeco thanks for your answer. I have now added matching topics in the strictMappingsByTopic (see https://github.com/redpanda-data/console/pull/584/files#diff-cb0859875b7a3634d6b559a35d024fc14ba558a7960fabbd6287d716423246a7R296) so a given topic never have to be searched twice in the regexMappingsByTopic. I wrote a benchmark with a variable proportion of strict/regex mappings. Before the fix, performances were degrading following the ratio of regex mappings:
After the fix, it shows no significant difference:
I may add more use cases to the benchmark if I think of some corner cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Your feature is awesome and I really need it for my deployment!
Here you can find the solution how to avoid performance degradation to get a green light for this feature!
It also will be nice to have a small example in the configuration mapping in the same project. Because it is the only source of information right behind the mapping data.
Have a good day!
|
||
var match bool | ||
for _, rMapping := range s.regexMappingsByTopic { | ||
match = rMapping.r.MatchString(topicName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a performance degradation for a non-matched topic.
Each time for non-matched topics, the algorithm iterate and match all known regex.
The maximum number of searches has to be estimated as total unique topic names O(N).
I suggest saving the nil
result to the strict mapping s.strictMappingsByTopic[topicName]
and checking on the nil
output at the beginning of this function to avoid another loop searching by regex.
233cfd9
to
4e3d357
Compare
@upils were you able to sign the Contributor License Agreement? I honestly can't tell from the bot 😅 Looks like once that is sorted and you get an approval, this PR is ready for prime time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@upils , @brycefisher , @weeco
Please, take a look at my suggestions in the comments.
With real-world test case suggestions.
I hope it will help Redpanda Console to perform at maximum speed with less configuration required!
backend/pkg/proto/service.go
Outdated
fsSvc *filesystem.Service | ||
schemaSvc *schema.Service | ||
strictMappingsByTopic map[string]config.ProtoTopicMapping | ||
regexMappingsByTopic map[string]regexProtoTopicMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only iterated by regexMapping but never search in the map by regex definition. So the map data structure is not suited for this.
Another point is key-value ordering by range: the key order does not guarantee for map.
A single topic may match by more than one regex.
So I suggest iterating in the same order as mappings appear in config.
In Go, the order in which the range keyword iterates over the keys of a map is not specified and is not guaranteed to be the same from one iteration to the next.
That is why I suggest using the array or Slice.
mappingsByTopic map[string]config.ProtoTopicMapping
mappingsRegex []regexProtoTopicMapping
Tests
First test case for validation:
mappings:
- topicName: tb_core.notifications.*
isRegex: true
valueProtoType: transport.ToCoreNotificationMsg
- topicName: tb_core.*
isRegex: true
valueProtoType: transport.ToCoreMsg
inputs (first matches once, second matches for both):
tb_core.notifications.tb-core-0
tb_core.0
outputs:
transport.ToCoreNotificationMsg
transport.ToCoreMsg
Second test case (first regex always win):
mappings:
- topicName: tb_core.*
valueProtoType: transport.ToCoreMsg
- topicName: tb_core.notifications.*
valueProtoType: transport.ToCoreNotificationMsg
inputs (first matches once, second matches for both):
tb_core.notifications.tb-core-0
tb_core.0
outputs:
transport.ToCoreMsg
transport.ToCoreMsg
return mapping, nil | ||
} | ||
|
||
var match bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, validate the complexity with a simple test.
It is convenient to extract the method that iterates by range s.regexMappingsByTopic.
Let's assume that the method will have the name findMappingByRegex(topicName string) (mapping config.ProtoTopicMapping, err error)
Test case
Call getMatchingMapping
for each topic 3 times. Verify exactly one method call findMappingByRegex
Mappings:
- topicName: "tb_core.0"
isRegex: false
valueProtoType: transport.ToCoreMsg
- topicName: "tb_core.notifications.*"
isRegex: true
valueProtoType: transport.ToCoreNotificationMsg
Inputs (strict, regex, none-mapping):
tb_core.0
tb_core.notifications.tb-core-0
tb_rule_engine.main.9
Verify call getMatchingMapping (times), call findMappingByRegex
(times):
3, 0 strict
3, 1 regex
3, 1 no match
TopicName string `yaml:"topicName"` | ||
|
||
// IsRegex indicates if the topicName must be handled as a regex and match multiple topic names | ||
IsRegex bool `yaml:"isRegex"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all suggestions...
Why can we not approach any topic name as a regex?
Why do we need to increase the complexity of the mapping?
Let's try to compile every topicName
as regex.
For the Kafka name policy, it is completely fine!
If the topic name can not be compiled, let's put a warning to the logs and recognize such a topicName
as a strict match to not break any exotic deployments.
Yes, but I guess I messed up with the mail address I used in my commits. I will rewrite them and it should be good. |
raising it up |
@upils ping on rewriting the commit emails. |
redpanda-data#1122) * backend: check for connect clusters in new connect service startup to avoid confusing logs * backend: improve check for connect cluster len * backend: code whitespace cleanup
…e as a regex first
Hey @smatvienko-tb @twmb @brycefisher . Sorry for this long hiatus. I fixed the commit author and started to apply the changes suggested by @smatvienko-tb . I need to add test cases but let me know if this is what you had in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I left two comments. Overall I'd like to suggest to change this PR so that regex strings are recognized if the input is wrapped into two slashes (see my other PR comment). We can implement a custom type mappings:
- topicName: normal-exact-match
keyProtoType: "foo"
- topicName: /some-topic-prefix-matc.*/
keyProtoType: "bar" You can do so by creating a type such as: type RegexpOrLiteral struct {
literal string
*regexp.Regexp
}
func (r *RegexpOrLiteral) String() string {
if r.literal != "" {
return r.literal
}
if r.Regexp != nil {
return r.Regexp.String()
}
return ""
}
// UnmarshalText unmarshals json into a regexp.Regexp
func (r *RegexpOrLiteral) UnmarshalText(b []byte) error {
regex, err := compileRegex(string(b))
if err == nil {
r.Regexp = regex
}
r.literal = string(b)
return nil
}
// MarshalText marshals regexp.Regexp as string
func (r *RegexpOrLiteral) MarshalText() ([]byte, error) {
if r.Regexp != nil {
return []byte(r.Regexp.String()), nil
}
return []byte(r.literal), nil
} Additionally koanf (our config management lib) needs to be configured so that it will try the TextUnmarshalling, by adding the console/backend/pkg/config/config.go Line 133 in 94ef19b
The console/backend/pkg/config/util_regex.go Lines 19 to 36 in 94ef19b
What do you think @upils @smatvienko-tb ? |
logger.Info("creating Kafka connect HTTP clients and testing connectivity to all clusters") | ||
clientsByCluster := make(map[string]*ClientWithConfig) | ||
|
||
if len(cfg.Clusters) == 0 { | ||
return &Service{ | ||
Cfg: cfg, | ||
Logger: logger, | ||
ClientsByCluster: clientsByCluster, | ||
Interceptor: interceptor.NewInterceptor(), | ||
}, nil | ||
} | ||
|
||
// 1. Create a client for each configured Connect cluster | ||
clientsByCluster := make(map[string]*ClientWithConfig) | ||
logger.Info("creating Kafka connect HTTP clients and testing connectivity to all clusters") | ||
|
||
for _, clusterCfg := range cfg.Clusters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, I assume something went wrong when syncing your fork? Please fix this
r, err := regexp.Compile(mapping.TopicName) | ||
if err != nil { | ||
strictMappingsByTopic[mapping.TopicName] = mapping | ||
logger.Warn("topic name handled as a strict match", zap.String("topic_name", mapping.TopicName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as of today where everyone use "strict matches" this would print one warning message for every single configured topic name. I think we need to change this. I agree that we want to be explicit about what's being used since the regexp compilation can fail.
I suggested something in my comment to overcome this issue
@brycefisher if you'd like to take this on, would you be able to fork and address suggestions from @weeco just above? |
Yes! Happy to put the finishing touches on this. |
We'll move this to unscheduled on our side, but please ping once y'all are able to uptake this and we can re-review. |
This should be addressed now via #1291. |
Fixing #405
This is still a work in progress (need more manual and unit tests) but If you find time you can validate you are OK with the direction I am heading.
I have concerns about performance if every message end up matching with the last regex in a long list. I may add a benchmark to get an idea of the difference between regex matching vs directly matching in the map (I expect a huge difference).