-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for managing offsets in Connect #121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal. I left some comments ... but apart from them, I have a bit mixed feelings about it.
I think it is well designed in terms of the API and I'm not sure it could be done in some significantly better way. But it seems very expensive for something what seems like a pretty niche feature:
- The initial implementation will be non-trivial
- The maintenance effort will be very significant because you have a new custom resource watches, CRDs, you will need to have some metrics => historically, all of these are very expensive to maintain.
So I wonder if it is really worth the effort. And I guess for me it falls to whether we still expect that we can disable the Connect REST API access completely to users using the KafkaConnector resources. If we still think we can cut off the REST API access, why haven't we done it yet? Or if we think it is not something we can really do and that users will always want the REST API access, isn't it better to leave this on the REST API and focus on securing the REST API?
Or did I somehow missed the importance of this feature and it is now (despite being missing for many years) expected that users will be reseting the offsets twice per day?
I wouldn't say it is a niche feature, but I do think it is something that users won't do especially frequently. I think for me the most valuable feature is being able to delete offsets before deleting a connector. So at a minimum I think we should support the delete offset path, which also happens to be the simplest and least additional maintenance. As a bonus we could also look at providing an option that allows users to request we clean up the offsets before deleting a connector, although I don't think we'd be able to do that in a way that guarantees it was successful (e.g. if the operator goes down half way through stopping, deleting offsets and deleting the connector). In terms of supporting securing the Connect REST API that's still stuck behind the other certificate work which is obviously progressing, but isn't a small task. In terms of listing and altering offsets, I do think those are less likely to be used often. If we switched to using a configmap as you suggested for the altering of offsets then perhaps we could do it without a separate CR to reduce the additional maintenance? Then for listing offsets we could again reduce the support, so e.g allow a user to request the offsets using an annotation or something, but make it a one time thing, so to get the updated offsets you have to add the annotation again. @mimaison what are your thoughts on how frequently these endpoints would be used? |
These update/delete endpoints are not expected to be used a lot apart during development. You typically only need to update offsets when setting up a new pipeline or when handling issues like an unprocessable record. The list offsets endpoint is useful to monitor a connector and validate it is making progress. This is especially the case with MirrorMaker at least until KIP-971 gets implemented. Still being able to fully operate connectors via CRDs would be nice. Securing the REST API is in my opinion more fundamental but it's an orthogonal issue and seems to be "in progress". |
@mimaison do you think it will be a common usecase to want to continually fetch offsets, or would having the user annotate the CR every time they want the offsets to be listed be sufficient for most usecases? |
I'm not sure continually fetching offsets is very important. You can track consumer offsets for sink connectors and for source connectors the offsets are specific for the connector and may not may a lot of users. In my opinion the main values of the API are 1) retrieving offsets to update them (or apply them to another Connect cluster), 2) see they are changing to validate the connector is making progress. Today the main reason for continuous fetching would be MirrorMaker as in this case source offsets are Kafka offsets and understandable by users. But as I said, this should be addressed by KIP-971. |
@scholzj @mimaison @ppatierno thanks for your reviews. I've updated the proposal to hopefully simplify things from a maintenance perspective based on our discussions. In summary the changes are:
Let me know if you think this works better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOme nits. But LGTM otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
You do not mention anything about MM2 -> Is it because the MM2 source connectors store the offsets as regular commits in the source cluster? Or will we possibly need a separate proposal to cver MM2 in the future? (In both cases, it might be worth explaining it in the affected projects section)?
That's a good point, if the users are using the KafkaConnector CR they can manage the offsets for the MM2 source connectors like any other source connector. However, if they are using the KafkaMirrorMaker2 CR I think we should also provide support. Since the format of the offsets is known for these I think we can give a more guided experience, rather than allowing a free-form ConfigMap, so I've added a note in the Affected/not affected projects that this should be covered in a separate proposal. |
@mimaison @ppatierno when you have some time please take another look at the proposal with the recent updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this simpler approach works well and it should still enable users to perform all the offsets operations on their connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I added two more comments/nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think this will make a nice improvement. I left a few minor comments, but over all this LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
* Update to only list offsets once, not continually * Update list and alter offsets to use a ConfigMap * Add option to reset offsets when deleting a connector Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
* Use more complex API for ConfigMap names to make it extensible. * Align behaviour when Connector is not stopped. * Remove reference to "file" in ConfigMap. Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
* Remove section on resetting offsets when removing connectors. * Make listOffsets and alterOffsets properties required. * Leave the annotation on the resource when an error occurs. Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
* Change annotation to connector-offsets, not camel case. * Clarifying overwriting of data filed in ConfigMap. * Add ownerReferences to ConfigMap examples. * Clarify operator will validatate for syntatically valid json. * Explicitly state that offsets.json must be present in alter ConfigMap. * Explicitly state that operator will ignore other fields in alter ConfigMap. Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
This has now 3 binding and 2 non-binding +1 votes. If you have any comments or plan to review it, please do so until Friday. Otherwise, we will close it as approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clear proposal. I left some suggestions.
The proposed solution seems easy to set up and use: user-friendly
Co-authored-by: PaulRMellor <47596553+PaulRMellor@users.noreply.github.com> Signed-off-by: Kate Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Approved with 5 binding and 2 non-binding +1 votes. |
No description provided.