Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

Add FileOutput file rotation #976

Closed
rafrombrc opened this issue Jul 22, 2014 · 17 comments · Fixed by #1294
Closed

Add FileOutput file rotation #976

rafrombrc opened this issue Jul 22, 2014 · 17 comments · Fixed by #1294
Milestone

Comments

@rafrombrc
Copy link
Contributor

We've had several requests for FileOutput to be able to do file rotation w/o the use of an external rotation tool, in part b/c less tools, and in part b/c Heka needs to get a HUP signal to actually notice that a file has been rotated out from under it, and the person running Heka doesn't always have access to when rotation has happened and when HUP needs to be sent.

@skurfuerst
Copy link

👍 :-)

@mattrco
Copy link
Contributor

mattrco commented Sep 1, 2014

Taking a look at this, committer is responsible for writing to a given file in increments of up to 10000 bytes.

It's fairly simple to modify this code to respect a threshold and rotate when necessary. However, since the messages are decoded into bytes at this point, this wouldn't respect message boundaries and so a message may be split across two files.

One option I'm considering: don't decode the messages in receiver, and pass an array of messages instead of bytes over the channel to the committer. This would delay recycling the message pack (it'd need to be done when the message is written in committer).

Another option is that there might be a simple way of delimiting messages in the byte stream (I'm not familiar with the format), which could then be used to prevent splits across files.

@rafrombrc
Copy link
Contributor Author

The first option that you're considering is a no-go; we definitely don't want to be pushing the message objects further along. In fact, we're considering changing the API so that outputs don't receive messages at all, but instead only get []byte slices directly from the encoder (see #930).

You're right that the data is raw bytes by that point, but the receiver is careful to only place full message encodings into the slices that go out over the batchChan. One possibility would be to make sure that the committer never writes a partial batch, since the end of a batch will always correspond to the end of a message.

If you really want the file writing batch size to be of a finer grain than the batchChan batch size (i.e. you really want to be able to write partial batches out to disk) then you'll need to use framing. You could consider using the same framing that we use elsewhere, described here and implemented here. I'd lean towards the first choice, myself, to avoid all of the extra mem copies that the use of framing would likely introduce.

@rafrombrc rafrombrc added this to the 0.8 milestone Sep 3, 2014
@rafrombrc
Copy link
Contributor Author

We do plan on adding this feature, but we're going to keep it simple, supporting rotation based on time intervals only, emitting files with timestamps embedded in the filename.

@bbinet
Copy link
Contributor

bbinet commented Jan 14, 2015

That is good news, I'm very interested in this feature.

If I understand correctly, it means we'll be able to create a new file every day?
So for example, does the FileOutput plugin would be able to generate the following file structure:

.
|-- 2014
|   `-- 12
|       |-- 29.log
|       |-- 30.log
|       `-- 31.log
`-- 2015
    `-- 01
        |-- 01.log
        `-- 02.log

Do you know what is the roadmap for this feature?
I'm not proficient with golang, but I can provide feedback and help test this feature.

@rafrombrc rafrombrc assigned rafrombrc and unassigned rafrombrc Jan 16, 2015
@djamelfel
Copy link

I am also very interested by this feature.

Do you think it's will be possible to see it in the next release ?

@rafrombrc
Copy link
Contributor Author

Probably not, alas... I believe @4r9h might take a look at this soon, but I don't know what the timing will be, and there are other issues that are higher priority that we're currently tackling.

@jotes
Copy link
Contributor

jotes commented Jan 20, 2015

@rafrombrc I'll try.... is there any eta for the next release?

@rafrombrc
Copy link
Contributor Author

Targeting a 0.9 release by then end of next week, or first week of Feb at the latest.

@bbinet
Copy link
Contributor

bbinet commented Jan 21, 2015

I would also like to see it in the next release, and that could be a good way for me to start with golang.
@4r9h if you plan to work on it before the next release, please go ahead, else I'd like to have a try.

@rafrombrc: do you have any requirements, guidance for this feature? for example what will the configuration look like? Do you want the rotation time interval to be configurable or should it be hardcoded to one day? how would you make the path to the output files configurable so that one could output a file tree as shown in my previous comment?

@jotes
Copy link
Contributor

jotes commented Jan 21, 2015

@bbinet I'll look into this today/tomorrow and i let you know if i can't make it or i'll take something else.
@rafrombrc it would be awesome if you could give some ideas about final features for this implementation/issue?

@rafrombrc
Copy link
Contributor Author

Here are my thoughts on how this should work:

  • We support a limited number of time intervals: hourly, every 4 hours, every 12 hours, or daily. I'm not married to that exact set, but what's important is that an interval goes evenly into 24 hours.
  • If you use rotation, regardless of when you start Heka, the files will be named relative to midnight of the day. So if you pick hourly, file rotation happens on the hour, you don't end up rotating files at 42 minutes after every hour, or something like that. If you pick every 4 hours, it happens at midnight, 4am, 8am, 12pm, 4pm, 8pm, etc.
  • If Heka starts and a file already exists for the current interval, the existing file should be appended to.
  • If rotation is in use, then the output file name can support Go's time.Format syntax to embed timestamps in the filename. This should allow folks to do nested folders just by putting a path separator in the filename. So to achieve what @bbinet asked for above, you'd use a setting of path = "2006/01/02.log".

I think this is useful enough w/o being so heavy as to force us to add a new dependency for parsing cron format, etc. Does this all make sense?

@jotes
Copy link
Contributor

jotes commented Jan 23, 2015

Sounds okay. I'll try to hack something asap (probably during the weekend).

On Thu, Jan 22, 2015 at 10:33 PM, Rob Miller notifications@github.com
wrote:

Here are my thoughts on how this should work:

  • We support a limited number of time intervals: hourly, every 4
    hours, every 12 hours, or daily. I'm not married to that exact set, but
    what's important is that an interval goes evenly into 24 hours.
  • If you use rotation, regardless of when you start Heka, the files
    will be named relative to midnight of the day. So if you pick hourly, file
    rotation happens on the hour, you don't end up rotating files at 42 minutes
    after every hour, or something like that. If you pick every 4 hours, it
    happens at midnight, 4am, 8am, 12pm, 4pm, 8pm, etc.
  • If Heka starts and a file already exists for the current interval,
    the existing file should be appended to.
  • If rotation is in use, then the output file name can support Go's
    time.Format http://golang.org/pkg/time/#Time.Format syntax to embed
    timestamps in the filename. This should allow folks to do nested folders
    just by putting a path separator in the filename. So to achieve what
    @bbinet https://github.com/bbinet asked for above, you'd use a
    setting of path = "2006/01/02.log".

I think this is useful enough w/o being so heavy as to force us to add a
new dependency for parsing cron format, etc. Does this all make sense?


Reply to this email directly or view it on GitHub
#976 (comment)
.

jarek@reijutsu:$ fortune
You have Egyptian flu: you're going to be a mummy.
jarek@reijutsu:
$ fortune
You now have Asian Flu.

@bbinet
Copy link
Contributor

bbinet commented Jan 23, 2015

Thanks @rafrombrc and @4r9h , it sounds good.
@4r9h: I'm happy to help and if you don't find time to work on it, please tell me as I can have a try early next week.

@jotes
Copy link
Contributor

jotes commented Jan 25, 2015

Hi @rafrombrc @bbinet, sorry for a sudden change of heart but i'll probably won't be able to make this feature before release in next week. I'll gladly take any task with longer eta.
I hope i didn't generated too much problems for you.

@bbinet
Copy link
Contributor

bbinet commented Jan 26, 2015

Ok, I am starting to work on this right now: I hope I could send a PR before the 0.9 release.

bbinet added a commit to bbinet/heka that referenced this issue Jan 28, 2015
bbinet added a commit to bbinet/heka that referenced this issue Jan 29, 2015
@bbinet
Copy link
Contributor

bbinet commented Jan 29, 2015

Here is the pull request: #1294.
I would be happy to apply any changes are needed so that it can be included in the next release.

relistan pushed a commit to Nitro/heka that referenced this issue Jan 26, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants