-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken state w/ multiple logs; improve rename handling; add reconnect timeout option; resume old files; start from beginning of file if created after startup (same as LogStash) #151
Conversation
…hen read error during ack
…, and (re?)implement configurable dead_time
I found a solution for the issue elastic#149
Looks like travis is failing on this PR. Looks like stud is Gemlock'd to 0.0.5 and Stud.Temporary was not introduced until 0.0.14 |
Sorry for the delay in response. We'll get to this soon! |
No worries Jordan! I noticed the gofmt commits so just merging them in to fix the conflicts. |
… startup, while running always start beginning (match LogStash behaviour) - also store the start offset straight away in statefile to ensure it persists through restart even if file does not change
OK, so I fixed 144 too in my pull as I needed it too and it didn't make sense to fix it twice on two branches. If we want to reduce changes let me know and we can close this PR and I'll reopen with just the original fixes in the title (I kept a branch for it) - then when happy I can resubmit 144. I also fixed the Gemfile.lock version for stub in hopes of fixing Travis-CI build - the commit message vanished in my rebase though. |
Fixed: file offset still calculated wrong. File was detected as truncated again :-( Fixed: loosing the beginning of a partial line Fixed: A CR is also removed from the EOL Known issues: If the last line was logged with CRLF as EOF then the file offset is not correct calculated (missing one byte for the CR) in the Registry
OK that didn't fix it because it still refers to lumberjack... I've fixed the tests now and I will add some new tests to test the bugs we're fixing. Just adding one for #164 at the moment too. |
Cool. I have merged #164 which is now complete, this PR #151, and also the tests PR #172 into https://github.com/driskell/logstash-forwarder/tree/driskell_merge_pr164 |
…g, reported in issue elastic#144
I'm working with @polyfractal to get another Elasticsearch dev up to speed on this project so we can hopefully put more energy into fixes and PR management. Again, sorry for the delay. |
…olang os.SameFile() that works on windows; implement windows specific file state code to persist vol/idxhi/idxlo for rotation detection
…on/rename through restarts) due to the inaccessible os pkg struct entries we would need
Thanks @jordansissel I've done some more restructuring on the branch for this pull - but not pushed it to GitHub just yet:
I also branched from this PR to implement a multiline codec with "negate" and "pattern" options - default is to use a codec called "plain" which simply passes each line onwards. This is merged into driskell_merge_pr164 (turned out to be a shit branch name) along with the other changes on this branch so are visible online there. Lastly, the protocol stuff has smoothed my network beyond bounds and fixed all of our duplication issues - throughput on evening shipping from legacy systems is also phenomenally smoother and quicker! I stuck with my email and did the less weird data frame version 2 for partial ack - added benefit there is its compatible with old LogStash server gem. Jason |
I'm keen to help out to fix this up. Is it best we chat somehow at some point (IRC?) or have you started working on your own fixes? The channel from harvester to prospector to facilitate resume on dead time should be ok. The design deviation going from harvester/prospector direct to registrar I think is probably the main concern here. I think it's important all offset changes are saved as soon as they need to be, on start as well as on acknowledgement. But I can split that into another branch and another PR if it unblocked this one. Thanks for your time and I appreciate the pain in such a large code review! |
it would be great if directions on how to proceed with this would be discussed. (splitting to multiple pull requests or reviewing it as a whole) |
Not sure if you made any decision yet, on whether to do things yourself and only use this PR for reference, or if you were still open to merging if certain things are changed. Will it help if I remove the harvester -> registrar channel (that saves initial offsets immediately) to reduce the PR? It will put it back to the original design you had too, with the only adjustment the two-way prospector<->harvester for offset resume after dead time:
I know some of the variable names are a bit off too and I can go through making them more explanatory - or that can wait if you're happy with them. |
Tidies a bit and improved some variable naming. Two functional tweaks: always save a state even if its empty - simplifies code; don't clean out memorised file state for removed files until harvester stops (more logical) Removed the harvester->registrar channel (my plan is another PR in future with an event channel to allow runtime clean up of deleted files from state file - current way to clean is restart LSF) |
Merged master to resolve conflict with merged #198 |
…ry so we can safely track file rotations across restarts and maintain exact behaviour across all platforms. If the Go library changes to become incompatible a WARNING message will print and we will resort to old behaviour: having no ability to track rotations across restarts.
Managed to get Windows to save the vol/idxhi/idxlo which Go uses for "os.SameFile" so we can detect file rotation during startup. Seems to work. Might be something you want or not but it's there for review. |
OK FYI - there will be no more change to this. (Unless to tidy and requested.) This contains only the wide changes I did to fix the various issues with rename detection, start offsets, dead time and Windows compatibility. The only additional feature should be the configurable reconnect timeout which can be removed if needed. I can split the bug fixes further but from the beginning I'd fixed so much they tied together so it may be difficult. If this is not to be and you're doing your own thing then let me know. |
Gonna give you a cut down with minimal change to start with. |
hm what happended to this? I was eagerly waiting for logrotate support. Why was it closed? |
Hi Guys, I'm not sure what the problem is, but I am restarting LF on a constantly changing file and I am still getting "Skipped old file". The file is an apache access json log which obviously does not sleep a long time. Any way to manually tell LF to reconsider it? (I use version 0.3.1) |
@teu simplest workaround seems to be |
Hi Jordan / People,
I needed to do some changes to get the forwarder to work safely for me as the registrar persistence was not working properly at all. On each iteration, it only saves persistence for log files mentioned in the current batch of events due to a scoping problem on the state. But even fixing the scope means during startup, until all log files have been touched there is risk of state loss if another restart happens.
Also, if a read ACK occurred (seems very often caused when the remote is logstash with redis output where redis has hit congestion_threshold) it would reconnect immediately and spam reconnect due to missing sleep. When it does sleep on other forms of errors, it reconnect in 1s too, which for me was too fast.
Lastly, I needed old files to be reopened even if the scanning stopped after 24 hours. Currently, the harvester stops after 24 hours on a file. The prospector has no knowledge of this so keeps the file in fileinfo[path] and never starts it again even if the file is modified again.
I also found a possibly logic problem during rotation where it might reopen the rotated file and reimport it all if it detected the new file BEFORE the rotated one. When detecting new file it would overwrite the fileinfo[path] for the previous, meaning the is_renamed_file fails when it does get to the rotated file. I think nobody sees this as Glob() maybe returning in alphabetical order, and most log rotate is in such a way it maintains correct order. Minor - but I adjusted to avoid this edge case as it would be bound to confuse the hell out of the first person that messes with the log rotate settings.
I have posted in the following pull regarding the changes I have done:
#123
I've also seen this just appear by jhanson for the dead time config which conflicts with mine unfortunately. His approach was different and added it as a command line option.
#150
I've signed the CLA for a Kibana change I did earlier so that should be sorted. I really do apologise for not discussing this all in mailing list. I did look through the mailing list but seems activity for forwarder is low. It was something I just needed to get working and I did get a little carried away.
I welcome all feedback and comments, not all of my changes will be perfect or desired. I've cleaned up my commits today in a new branch and I'm willing to create other branches with cherry picked if desired.
Also, apologies for the long post. I do ramble on sometimes!
Regards,
Jason