-
Notifications
You must be signed in to change notification settings - Fork 1
Replace config subscriber thread with handler-based config update mechanism #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaced by handler-based config update mechanism
| source: MessageSource[Tin], | ||
| sink: MessageSink[Tout], | ||
| handler_factory: HandlerFactory[Tin, Tout], | ||
| handler_registry: HandlerRegistry[Tin, Tout], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have to pass the registry, since we want to be able to insert a custom handler for config messages, i.e., the processor cannot create the registry itself.
| if var.dtype == sc.DType.datetime64: | ||
| timedelta = var - sc.epoch(unit=var.unit) | ||
| return dataarray_da00.Variable( | ||
| name=name, | ||
| data=timedelta.values, | ||
| axes=list(var.dims), | ||
| shape=var.shape, | ||
| unit=f'datetime64[{var.unit}]', | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file are unrelated. The timeseries helper service previously only worked with --sink=png, now it also works with --sink=kafka.
| processor = StreamProcessor( | ||
| source=source, | ||
| sink=KafkaSink(kafka_config=kafka_config, serializer=serializer), | ||
| builder = DataServiceBuilder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the fake_* services were previously not using DataServiceBuilder. Now they can, by adding a new method to the builder. This is in a sense unrelated to the main change.
YooSunYoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a minor comment and a bit of concerns about consuming messages, with potential blocking.
But we can just revisit this if we ever bump into this issue.
| key = MessageKey(topic=message.topic(), source_name='') | ||
| legacy_key = message.key().decode('utf-8') # See 286 | ||
| value = message.value() | ||
| return Message(key=key, timestamp=0, value={'key': legacy_key, 'value': value}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kafka messages have timestamp (timestamp-type and actual-timestamp) so we can probably just use it, unless 0 has a specific meaning in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, is that why I got a tuple of two ints as a timestamp? I tried using it first but could not make sense of it.
| def consume(self, num_messages: int, timeout: float) -> list[KafkaMessage]: | ||
| messages = [] | ||
| for consumer in self._consumers: | ||
| messages.extend(consumer.consume(num_messages, timeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they shouldn't have same timeout for all different topics, unless it's very small ~0.
If num_message and timeout are not optimized, some topics that have many messages might fall behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary from discussion: May want different config for control messages, but unclear what is correct approach right now. Will leave as-is for now and address when we have information on how this works in practice.
While working on the mechanism to configure data reduction workflows at service runtime, it became clear that the current mechanism for listening to config (aka command) messages was insufficient and cumbersome. Previously, it worked as follows:
ConfigSubscriberwould consume from a config/command topic. It was running in a separate thread.For live data reduction, we needed a mechanism to "take action" on certain config/command messages. The
ConfigSubscribercould not do this. It therefore made more sense to redo the mechanism from scratch, handling config/command messages with the handler mechanism, just like data messages. Key changes are as follows:ConfigSubscriberis removed and replaced byConfigHandler.ConfigHandleris created ahead of time, data message handlers are created by the handler factory as before.