-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - feat: added topic deduplication mechanism 1/2 #3392
Conversation
@@ -182,6 +182,14 @@ pub enum ErrorCode { | |||
#[fluvio(tag = 9000)] | |||
#[error("a compression error occurred in the SPU")] | |||
CompressionError, | |||
|
|||
// Deduplication | |||
#[fluvio(tag = 10000)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid adding new error codes until a new strategy is decided. Not show stopper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Especially, on SPU/SC there is no much sense in enum of errors.
@@ -90,6 +90,33 @@ spec: | |||
maxPartitionSize: | |||
type: integer | |||
minimum: 2048 | |||
deduplication: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit concerned about duplicating schema. let's create issue to see if we can have better reusability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan to move away from k8s deps, maybe re-factoring here is not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of minor comments. otherwise LGTM. Nice work
pub struct Bounds { | ||
#[cfg_attr(feature = "use_serde", serde(default, skip_serializing_if = "is_zero"))] | ||
pub count: u64, | ||
#[cfg_attr( | ||
feature = "use_serde", | ||
serde( | ||
default, | ||
skip_serializing_if = "Option::is_none", | ||
with = "humantime_serde" | ||
) | ||
)] | ||
pub age: Option<Duration>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something to change for this PR, but I wonder if there is some commonality with Lookback logic we can reuse between the two (slightly different) uses as a generic "record (pre-?)selection"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although Bounds and Lookback are common now, they might differ later. Bound covers both preselection(Lookback) and limits for runtime stream processing (via SmartModule init params).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
bors r+ |
This is the first part of [Dedup Impl PR](#3385) split for a more convenient review process.
Pull request successfully merged into master. Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
Upd.: Split into two parts: [first](#3392) Added topic-level deduplication mechanism. Key impacted components: - Topic Config - Fluvio CLI (topic creation, topic list, topic describe) - SPU (replica and producer handler) - SC - K8s CRDs (topic and partition) The feature requires for Dedup SmartModule filter to be provided. It is coming, but the one from the examples can be used to try it out: ```bash cd smartmodule/examples/filter_hashset smdk build smdk load ``` then create a file `topic.yaml`: ```yaml version: 0.1.0 meta: name: dedup-topic deduplication: bounds: count: 10 filter: transform: uses: infinyon/fluvio-smartmodule-filter-hashset@0.1.0 ``` and ```bash fluvio topic create -c topic.yaml ``` with such configuration the window will be last 10 records. Closes infinyon/roadmap#114
Upd.: Split into two parts: [first](#3392) Added topic-level deduplication mechanism. Key impacted components: - Topic Config - Fluvio CLI (topic creation, topic list, topic describe) - SPU (replica and producer handler) - SC - K8s CRDs (topic and partition) The feature requires for Dedup SmartModule filter to be provided. It is coming, but the one from the examples can be used to try it out: ```bash cd smartmodule/examples/filter_hashset smdk build smdk load ``` then create a file `topic.yaml`: ```yaml version: 0.1.0 meta: name: dedup-topic deduplication: bounds: count: 10 filter: transform: uses: infinyon/fluvio-smartmodule-filter-hashset@0.1.0 ``` and ```bash fluvio topic create -c topic.yaml ``` with such configuration the window will be last 10 records. Closes infinyon/roadmap#114
This is the first part of Dedup Impl PR split for a more convenient review process.