Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate logs? #149

Closed
sysmonk opened this issue Jan 31, 2014 · 5 comments
Closed

duplicate logs? #149

sysmonk opened this issue Jan 31, 2014 · 5 comments
Milestone

Comments

@sysmonk
Copy link

sysmonk commented Jan 31, 2014

While searching for some logs, i noticed that there are duplicate logs in ElasticSearch.
Logs are exactly the same (same server,file,timestamp,message and etc), except for the 'offset' being different by 2 (i.e. one log has offset 100 other has offset 102).Logstash-forwarder was not restarted for at least half a dayafter the current logfile was rotated.

I've got another case where there's even 3 copies of the log, but they have same offset and logstash-forwarder was restarted a few times before the log got rotated.

tzahari pushed a commit to tzahari/logstash-forwarder that referenced this issue Feb 14, 2014
@tzahari
Copy link

tzahari commented Feb 14, 2014

I detected the same problem and found a way to reproduce this issue.
If a line is not ended by a newline, then the readLine returns a incomplete line. And the offset is calculated + 1 (for the newline). Therefor the offset is behind the file size and the harvester thinks the file is truncated.

Steps to reproduce:

echo -n "Hello" >> logfile.log  && sleep 15 && echo " World" >> logfile.log 

After the "Hello" the harvester thinks the file was truncated and starts from the beginning.

2014/02/14 12:11:27.477930 Registrar received 1 events
2014/02/14 12:11:32.473682 Registrar received 1 events
2014/02/14 12:11:37.475362 File truncated, seeking to beginning: logfile.log
2014/02/14 12:11:40.033208 Registrar received 5 events
2014/02/14 12:11:47.488347 Registrar received 1 events
2014/02/14 12:11:52.486090 File truncated, seeking to beginning: logfile.log

I created a patch for it. Please try it: tzahari@2c7d915

@driskell
Copy link
Contributor

ReadLine returning incomplete line should be fine due to the is_partial.
Is it because ReadLine treats EOF as end of line?...
If so the PR will fix another issue I've seen... where lines get split.

Did you find out if that is the case? ReadLine treats EOF as an end-of-line?

@tzahari
Copy link

tzahari commented Feb 25, 2014

Hi Driskell,
I think you are right.
Just to be sure. I added in the original version of Harvester some debug information and did some tests.
If I add a incomplete line (without newline) to the log file, ReadLine (from go) returns a buffer != nil, is_partial=false and err=nil.
Therefore the readline function from harvester returns a incomplete String!
Afterwards the offset is calculated wrong and a file truncation is detected.
So, yes. EOF threads ReadLine as a End-of-Line and does not mark it a incomplete.
I hope that helps.

@driskell
Copy link
Contributor

Hi tzahari,

Nice work. So it does look like ReadLine() will actually treat EOF as end-of-line terminator... that's not even documented :)

Just checked the source for ReadLine and indeed it essentially treats EOF as end-of-line - thus this problem. Looks like your fix will also fix my partial line issue - where I sometimes see entries where the line has been split in two. It must be the line had to be written in two write() calls, and LF picked up the first write as an entire line (it hit EOF) - then remainder of line gets discovered and it sends that separately as a new line. Because I have fast updating logs, and logs that always have new lines, I guess that's the only reason I never hit this truncation bug.

Thanks and nice work :)

Jason

driskell added a commit to driskell/logstash-forwarder that referenced this issue Feb 28, 2014
driskell pushed a commit to driskell/logstash-forwarder that referenced this issue Feb 28, 2014
…ines causing split events and sometimes incorrect truncation detection resulting in full rescan of entire log file
@jordansissel
Copy link
Contributor

I think this is fixed in master which uses ReadBytes("\n") instead of ReadLine

@jordansissel jordansissel modified the milestone: 0.4.0 Mar 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants