Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Large backlog processing when materialized view cold start #90

Open
dai-chen opened this issue Oct 23, 2023 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@dai-chen
Copy link
Collaborator

Is your feature request related to a problem?

When auto refresh on materialized view starts, an initial source file list is generated by FileStreamSource. By default, it just returns files sorted by modified time. In the case of aggregate MV, if the timestamp of events in these file are not strictly aligned with file modified time, disordered (late coming) data may be dropped due to watermark bumped.

What solution would you like?

Provide source option for user to enable MV cold start processing backlog sorted by event timestamp in data rather than file modified time in file metadata.

What alternatives have you considered?

Guide user to give initial starting point by WHERE clause and meanwhile configure large watermark to avoid disorder data dropped. But this may consume more memory because of aggregate window open waiting for possible late data for long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant