Add FileOutput file rotation #976
Comments
👍 :-) |
Taking a look at this, committer is responsible for writing to a given file in increments of up to 10000 bytes. It's fairly simple to modify this code to respect a threshold and rotate when necessary. However, since the messages are decoded into bytes at this point, this wouldn't respect message boundaries and so a message may be split across two files. One option I'm considering: don't decode the messages in receiver, and pass an array of messages instead of bytes over the channel to the committer. This would delay recycling the message pack (it'd need to be done when the message is written in committer). Another option is that there might be a simple way of delimiting messages in the byte stream (I'm not familiar with the format), which could then be used to prevent splits across files. |
The first option that you're considering is a no-go; we definitely don't want to be pushing the message objects further along. In fact, we're considering changing the API so that outputs don't receive messages at all, but instead only get []byte slices directly from the encoder (see #930). You're right that the data is raw bytes by that point, but the receiver is careful to only place full message encodings into the slices that go out over the batchChan. One possibility would be to make sure that the committer never writes a partial batch, since the end of a batch will always correspond to the end of a message. If you really want the file writing batch size to be of a finer grain than the batchChan batch size (i.e. you really want to be able to write partial batches out to disk) then you'll need to use framing. You could consider using the same framing that we use elsewhere, described here and implemented here. I'd lean towards the first choice, myself, to avoid all of the extra mem copies that the use of framing would likely introduce. |
We do plan on adding this feature, but we're going to keep it simple, supporting rotation based on time intervals only, emitting files with timestamps embedded in the filename. |
That is good news, I'm very interested in this feature. If I understand correctly, it means we'll be able to create a new file every day?
Do you know what is the roadmap for this feature? |
I am also very interested by this feature. Do you think it's will be possible to see it in the next release ? |
Probably not, alas... I believe @4r9h might take a look at this soon, but I don't know what the timing will be, and there are other issues that are higher priority that we're currently tackling. |
@rafrombrc I'll try.... is there any eta for the next release? |
Targeting a 0.9 release by then end of next week, or first week of Feb at the latest. |
I would also like to see it in the next release, and that could be a good way for me to start with golang. @rafrombrc: do you have any requirements, guidance for this feature? for example what will the configuration look like? Do you want the rotation time interval to be configurable or should it be hardcoded to one day? how would you make the path to the output files configurable so that one could output a file tree as shown in my previous comment? |
@bbinet I'll look into this today/tomorrow and i let you know if i can't make it or i'll take something else. |
Here are my thoughts on how this should work:
I think this is useful enough w/o being so heavy as to force us to add a new dependency for parsing cron format, etc. Does this all make sense? |
Sounds okay. I'll try to hack something asap (probably during the weekend). On Thu, Jan 22, 2015 at 10:33 PM, Rob Miller notifications@github.com
jarek@reijutsu: |
Thanks @rafrombrc and @4r9h , it sounds good. |
Hi @rafrombrc @bbinet, sorry for a sudden change of heart but i'll probably won't be able to make this feature before release in next week. I'll gladly take any task with longer eta. |
Ok, I am starting to work on this right now: I hope I could send a PR before the 0.9 release. |
should fix mozilla-services#976 (untested yet)
Here is the pull request: #1294. |
We've had several requests for FileOutput to be able to do file rotation w/o the use of an external rotation tool, in part b/c less tools, and in part b/c Heka needs to get a HUP signal to actually notice that a file has been rotated out from under it, and the person running Heka doesn't always have access to when rotation has happened and when HUP needs to be sent.
The text was updated successfully, but these errors were encountered: