Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - feat: added topic deduplication mechanism 1/2 #3392

Closed
wants to merge 1 commit into from
Closed

[Merged by Bors] - feat: added topic deduplication mechanism 1/2 #3392

wants to merge 1 commit into from

Conversation

galibey
Copy link
Contributor

@galibey galibey commented Jul 12, 2023

This is the first part of Dedup Impl PR split for a more convenient review process.

@@ -182,6 +182,14 @@ pub enum ErrorCode {
#[fluvio(tag = 9000)]
#[error("a compression error occurred in the SPU")]
CompressionError,

// Deduplication
#[fluvio(tag = 10000)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid adding new error codes until a new strategy is decided. Not show stopper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Especially, on SPU/SC there is no much sense in enum of errors.

@@ -90,6 +90,33 @@ spec:
maxPartitionSize:
type: integer
minimum: 2048
deduplication:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit concerned about duplicating schema. let's create issue to see if we can have better reusability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we plan to move away from k8s deps, maybe re-factoring here is not worth it?

Copy link
Contributor

@sehz sehz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of minor comments. otherwise LGTM. Nice work

Comment on lines +23 to +35
pub struct Bounds {
#[cfg_attr(feature = "use_serde", serde(default, skip_serializing_if = "is_zero"))]
pub count: u64,
#[cfg_attr(
feature = "use_serde",
serde(
default,
skip_serializing_if = "Option::is_none",
with = "humantime_serde"
)
)]
pub age: Option<Duration>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to change for this PR, but I wonder if there is some commonality with Lookback logic we can reuse between the two (slightly different) uses as a generic "record (pre-?)selection"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although Bounds and Lookback are common now, they might differ later. Bound covers both preselection(Lookback) and limits for runtime stream processing (via SmartModule init params).

Copy link
Contributor

@digikata digikata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

@galibey
Copy link
Contributor Author

galibey commented Jul 12, 2023

bors r+

bors bot pushed a commit that referenced this pull request Jul 12, 2023
This is the first part of [Dedup Impl PR](#3385) split for a more convenient review process.
@bors
Copy link

bors bot commented Jul 12, 2023

Pull request successfully merged into master.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title feat: added topic deduplication mechanism 1/2 [Merged by Bors] - feat: added topic deduplication mechanism 1/2 Jul 12, 2023
@bors bors bot closed this Jul 12, 2023
@galibey galibey deleted the feat/dedup-impl-part-1 branch July 12, 2023 17:05
bors bot pushed a commit that referenced this pull request Jul 14, 2023
Upd.: Split into two parts: [first](#3392)

Added topic-level deduplication mechanism. 
Key impacted components:
 - Topic Config
 - Fluvio CLI (topic creation, topic list, topic describe)
 - SPU (replica and producer handler)
 - SC
 - K8s CRDs (topic and partition)

The feature requires for Dedup SmartModule filter to be provided. It is coming, but the one from the examples can be used to try it out:
```bash
cd smartmodule/examples/filter_hashset
smdk build
smdk load
```
then create a file `topic.yaml`:
```yaml
version: 0.1.0
meta:
  name: dedup-topic
deduplication:
  bounds:
    count: 10
  filter:
    transform:
      uses: infinyon/fluvio-smartmodule-filter-hashset@0.1.0
```
and 
```bash
fluvio topic create -c topic.yaml
```
with such configuration the window will be last 10 records.

Closes infinyon/roadmap#114
bors bot pushed a commit that referenced this pull request Jul 14, 2023
Upd.: Split into two parts: [first](#3392)

Added topic-level deduplication mechanism. 
Key impacted components:
 - Topic Config
 - Fluvio CLI (topic creation, topic list, topic describe)
 - SPU (replica and producer handler)
 - SC
 - K8s CRDs (topic and partition)

The feature requires for Dedup SmartModule filter to be provided. It is coming, but the one from the examples can be used to try it out:
```bash
cd smartmodule/examples/filter_hashset
smdk build
smdk load
```
then create a file `topic.yaml`:
```yaml
version: 0.1.0
meta:
  name: dedup-topic
deduplication:
  bounds:
    count: 10
  filter:
    transform:
      uses: infinyon/fluvio-smartmodule-filter-hashset@0.1.0
```
and 
```bash
fluvio topic create -c topic.yaml
```
with such configuration the window will be last 10 records.

Closes infinyon/roadmap#114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants