Skip to content

Add deterministic upload mode#602

Merged
HenryCaiHaiying merged 1 commit intopinterest:masterfrom
glasser:glasser/deterministic-upload
Mar 2, 2019
Merged

Add deterministic upload mode#602
HenryCaiHaiying merged 1 commit intopinterest:masterfrom
glasser:glasser/deterministic-upload

Conversation

@glasser
Copy link
Copy Markdown
Contributor

@glasser glasser commented Mar 1, 2019

This is to avoid #600.

In this mode, decisions about whether to upload files are only based on
properties of the input messages themselves: timestamps and input message
payload size. We don't care about real-world time, disk file timestamps, or log
file size; we don't support upload on shutdown; and we check for uploads after
every message.

Configuration:

  • set secor.upload.deterministic=true
  • Configure at least one of secor.max.file.timestamp.range.millis and
    secor.max.input.payload.size.bytes.
  • If you've configured secor.max.file.timestamp.range.millis, you must
    set kafka.useTimestamp=true and ensure that your FileReader/FileWriter
    supports timestamps.

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Mar 1, 2019

I don't know if this fix is good to be upstreamed or if it's what the project wants, but it's what we're going to try for our installation.

@glasser glasser marked this pull request as ready for review March 1, 2019 06:15
@glasser glasser force-pushed the glasser/deterministic-upload branch 5 times, most recently from 0e78c64 to 1143908 Compare March 1, 2019 06:26
This is to avoid pinterest#600.

In this mode, decisions about whether to upload files are *only* based on
properties of the input messages themselves: timestamps and input message
payload size.  We don't care about real-world time, disk file timestamps, or log
file size; we don't support upload on shutdown; and we check for uploads after
every message.

Configuration:

- set secor.upload.deterministic=true
- Configure at least one of secor.max.file.timestamp.range.millis and
  secor.max.input.payload.size.bytes.
- If you've configured secor.max.file.timestamp.range.millis, you must
  set kafka.useTimestamp=true and ensure that your FileReader/FileWriter
  supports timestamps.
@glasser glasser force-pushed the glasser/deterministic-upload branch from 1143908 to 19dfa6f Compare March 1, 2019 09:11
@HenryCaiHaiying
Copy link
Copy Markdown
Contributor

Looks good to me. I will merge in this PR, it's a good fix for secor.

@HenryCaiHaiying HenryCaiHaiying merged commit 7f1c4b2 into pinterest:master Mar 2, 2019
@jeremyplichtafc
Copy link
Copy Markdown
Contributor

@glasser - I was reading your code and trying to understand something. If deterministic mode only looks at the time difference in the message for that partition and the size of input for that partition wont you have the case where a partition only gets a handful of messages (never triggering the time or size criteria) and is never uploaded? Especially if you are splitting them up into smaller partitions using SplitByFieldMessageParter.

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented May 10, 2019

That's probably correct, and should be mentioned in the docs for the option — open a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants