duplicate logs? #149

sysmonk · 2014-01-31T15:46:16Z

While searching for some logs, i noticed that there are duplicate logs in ElasticSearch.
Logs are exactly the same (same server,file,timestamp,message and etc), except for the 'offset' being different by 2 (i.e. one log has offset 100 other has offset 102).Logstash-forwarder was not restarted for at least half a dayafter the current logfile was rotated.

I've got another case where there's even 3 copies of the log, but they have same offset and logstash-forwarder was restarted a few times before the log got rotated.

I found a solution for the issue elastic#149

tzahari · 2014-02-14T14:47:35Z

I detected the same problem and found a way to reproduce this issue.
If a line is not ended by a newline, then the readLine returns a incomplete line. And the offset is calculated + 1 (for the newline). Therefor the offset is behind the file size and the harvester thinks the file is truncated.

Steps to reproduce:

echo -n "Hello" >> logfile.log  && sleep 15 && echo " World" >> logfile.log

After the "Hello" the harvester thinks the file was truncated and starts from the beginning.

2014/02/14 12:11:27.477930 Registrar received 1 events
2014/02/14 12:11:32.473682 Registrar received 1 events
2014/02/14 12:11:37.475362 File truncated, seeking to beginning: logfile.log
2014/02/14 12:11:40.033208 Registrar received 5 events
2014/02/14 12:11:47.488347 Registrar received 1 events
2014/02/14 12:11:52.486090 File truncated, seeking to beginning: logfile.log

I created a patch for it. Please try it: tzahari@2c7d915

driskell · 2014-02-25T08:11:19Z

ReadLine returning incomplete line should be fine due to the is_partial.
Is it because ReadLine treats EOF as end of line?...
If so the PR will fix another issue I've seen... where lines get split.

Did you find out if that is the case? ReadLine treats EOF as an end-of-line?

tzahari · 2014-02-25T09:14:21Z

Hi Driskell,
I think you are right.
Just to be sure. I added in the original version of Harvester some debug information and did some tests.
If I add a incomplete line (without newline) to the log file, ReadLine (from go) returns a buffer != nil, is_partial=false and err=nil.
Therefore the readline function from harvester returns a incomplete String!
Afterwards the offset is calculated wrong and a file truncation is detected.
So, yes. EOF threads ReadLine as a End-of-Line and does not mark it a incomplete.
I hope that helps.

driskell · 2014-02-25T09:21:34Z

Hi tzahari,

Nice work. So it does look like ReadLine() will actually treat EOF as end-of-line terminator... that's not even documented :)

Just checked the source for ReadLine and indeed it essentially treats EOF as end-of-line - thus this problem. Looks like your fix will also fix my partial line issue - where I sometimes see entries where the line has been split in two. It must be the line had to be written in two write() calls, and LF picked up the first write as an entire line (it hit EOF) - then remainder of line gets discovered and it sends that separately as a new line. Because I have fast updating logs, and logs that always have new lines, I guess that's the only reason I never hit this truncation bug.

Thanks and nice work :)

Jason

…rent issues elastic#149, elastic#144 and elastic#167

…ines causing split events and sometimes incorrect truncation detection resulting in full rescan of entire log file

jordansissel · 2015-03-04T00:47:34Z

I think this is fixed in master which uses ReadBytes("\n") instead of ReadLine

tzahari pushed a commit to tzahari/logstash-forwarder that referenced this issue Feb 14, 2014

Bugfix for logging lines without or delayed newlines

2c7d915

I found a solution for the issue elastic#149

tzahari mentioned this issue Feb 17, 2014

Bugfix for log lines without or delayed newlines results in a file truncated detection and reading the file again #164

Closed

driskell added a commit to driskell/logstash-forwarder that referenced this issue Feb 28, 2014

Fix stud version requirement and test spec, and add new tests for cur…

9e49618

…rent issues elastic#149, elastic#144 and elastic#167

driskell mentioned this issue Feb 28, 2014

Fix stud version requirement and test spec, and add new tests for current issues #172

Closed

jordansissel closed this as completed Mar 4, 2015

jordansissel modified the milestone: 0.4.0 Mar 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicate logs? #149

duplicate logs? #149

sysmonk commented Jan 31, 2014

tzahari commented Feb 14, 2014

driskell commented Feb 25, 2014

tzahari commented Feb 25, 2014

driskell commented Feb 25, 2014

jordansissel commented Mar 4, 2015

duplicate logs? #149

duplicate logs? #149

Comments

sysmonk commented Jan 31, 2014

tzahari commented Feb 14, 2014

driskell commented Feb 25, 2014

tzahari commented Feb 25, 2014

driskell commented Feb 25, 2014

jordansissel commented Mar 4, 2015