Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for log lines without or delayed newlines results in a file truncated detection and reading the file again #164

Closed
wants to merge 3 commits into from

Conversation

tzahari
Copy link

@tzahari tzahari commented Feb 17, 2014

Changed the file reading from ReadLine() to ReadBytes('\n').
Now the file is not longer recognized as truncated (and start reading the file again from the beginning).
Details here: #149

The problem is that ReadLine() http://golang.org/pkg/bufio/#Reader.ReadLine returns also a incomplete line (without newline) but there is no indicator that the newline is missing.
Therefor I changed the reading to ReadBytes() http://golang.org/pkg/bufio/#Reader.ReadBytes

@driskell
Copy link
Contributor

Just for reference, as described in #149 we discovered ReadLine() actually treats EOF as an end-of-line delimiter (I checked the pkg source) - which is the root cause of the issue.

@driskell
Copy link
Contributor

Hi tzahari,
Looks like this won't work properly with resume as the +1 still needs to be there on the offset.
Offset will be behind by a number of bytes equal to the number of lines previously imported.
I've not tested this hypothesis but it seems to be the case running through the code...
Can you put the +1 back in and try it out?
Jason

@driskell
Copy link
Contributor

I've been testing without the removal of +1 and it does work OK, no issue with the reproduction steps you provided before. So if you can put that +1 back in that will make this fix perfect :-)
Thanks for sorting it!

@tzahari
Copy link
Author

tzahari commented Feb 28, 2014

Hi Jason,
Thanks for the hint. I am currently checking it and found a possible another issue on windows system with CRLF. The current version of the method would return the line with a CR. I think it would be better to remove this also.
What do you think?

@driskell
Copy link
Contributor

Hi tzahari,

I agree. I think the best way seems to be, readline() returns another value - the number of characters stripped - and that is added to offset.

Jason.

@driskell
Copy link
Contributor

Hi tzahari,

Found another possible issue - if the newline does not appear for more than 10 seconds, we lose all data up until the line because it's kept in buffer and we lose it when we return nil, io.EOF.

I think it need to do one of:

  1. Before returning EOF after the timeout, seek back to the beginning of that line so the next call reads it again
  2. More drastic... the buffer needs to be scoped to the caller so it can pass it back in on subsequent calls

Jason

Fixed: file offset still calculated wrong. File was detected as
truncated again :-(
Fixed: loosing the beginning of a partial line
Fixed: A CR is also removed from the EOL

Known issues:
If the last line was logged with CRLF as EOF then the file offset is
not correct calculated (missing one byte for the CR) in the Registry
@tzahari
Copy link
Author

tzahari commented Feb 28, 2014

Hi Jason,
I fixed some more issues and tested a lot. tzahari@2c7d915
I'll hope everything is correct now.
Thanks for your response.
Best Regards
Thomas

@driskell
Copy link
Contributor

Cool

That issue with the registrar... I've fixed that in my PR by moving the offset change to BEFORE the registrar event is sent. So the registrar actually has the calculated offset and doesn't need to do anything.

I'll drop a comment on one bit I'm not sure why was moved.

I've also got the tests working here so I can apply your PR to my PR, fix conflicts, and see if the test passes (currently fails of course!)

Regards

Jason

}

if segment != nil && len(segment) > 0 {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 145. Just needs blank line removing here and at end of block - forget what I said about moved - got it now, needed bufferSize earlier ^^ Coolios

@driskell
Copy link
Contributor

Hi tzahari,

I got the tests working again and created a test, but it fails.
I just worked out the problem - the offset in the file is basically at EOF because the data is buffered.
ReadBytes is pulling from a buffer. So when we call readline(), we've already read the partial line. What we need is offset from the beginning of the partial line.

So looks like the buffer is gonna need to be held by the caller, Harvest() and it passes it back in on each subsequent call. If readline() returns a line it will need to clear the buffer (since it won't go out of scope anymore).

Jason

@driskell
Copy link
Contributor

So I'm thinking if we do this in Harvest()

  // TODO(sissel): Make the buffer size tunable at start-time
  reader := bufio.NewReaderSize(h.file, 16<<10) // 16kb buffer by default
  buffer := new(bytes.Buffer)

  var read_timeout = 10 * time.Second
  last_read_time := time.Now()
  for {
    text, bytesread, err := h.readline(reader, buffer, read_timeout)

Then the later bit. Don't worry about the +1 in the registrar - my PR ultimately fixes it but its minor - with CRLF it just means we resume and read a blank line - no bad thing really

offset += int64(bytesread)

Then we have for readline()

func (h *Harvester) readline(reader *bufio.Reader, buffer *bytes.Buffer, eof_timeout time.Duration) (*string, int, error) {
  var is_partial bool = true
  var newline_length int = 1
  start_time := time.Now()

  for {
    segment, err := reader.ReadBytes('\n')

    if segment != nil && len(segment) > 0 {
      if segment[len(segment)-1] == '\n' {
        // Found a complete line
        is_partial = false

        // Check if also a CR present
        if len(segment) > 1 && segment[len(segment)-2] == '\r' {
          newline_length++
        }
      }

      // TODO(sissel): if buffer exceeds a certain length, maybe report an error condition? chop it?
      buffer.Write(segment)
    }

    if err != nil {
      if err == io.EOF && is_partial {
        time.Sleep(1 * time.Second) // TODO(sissel): Implement backoff

        // Give up waiting for data after a certain amount of time.
        // If we time out, return the error (eof)
        if time.Since(start_time) > eof_timeout {
          return nil, 0, err
        }
        continue
      } else {
        log.Println(err)
        return nil, 0, err // TODO(sissel): don't do this?
      }
    }

    // If we got a full line, return the whole line without the EOL chars (CRLF or LF)
    if !is_partial {
      // Get the str length with the EOL chars (LF or CRLF)
      bufferSize := buffer.Len()
      str := new(string)
      *str = buffer.String()[:bufferSize - newline_length]
      // Reset the buffer for the next line
      buffer.Reset()
      return str, bufferSize, nil
    }
  } /* forever read chunks */

  return nil, 0, nil
}

Using the above, the test I've made passes!

@driskell
Copy link
Contributor

FYI - just updated my comment with the fixed code - I simplified the CRLF detection and fixed a few formatting bits

@tzahari
Copy link
Author

tzahari commented Feb 28, 2014

Hi Jason,
Thanks for your optimizations and help.
Do you have any ideas why the Travis CI build fails every time?
Have a nice weekend.
Thomas

@driskell
Copy link
Contributor

No problem!
It's because the test suite is expecting newer gems than what the Gemspec.lock specifies.
And also because it still refers to lumberjack and not logstash-forwarder - looks like the rename wasn't finished for the tests.

I fixed it all though in another PR - and also added test cases for this issue and the couple others I fixed. That one doesn't fail because of the above reasons, it just fails because the tests fail (which is good!)

Once the test PR is merged, this one too, and my one, it should start succeeding. Although Jordan might kill me when he sees the conflicts lol... but when I did a test merge it didn't look too bad...

Have a good weekend too :-)
Thanks for letting everyone know the issue and its good to have it sorted!
I may email Jordan at some point if I can track down contact info and offer to merge the PRs for him so he can do one feel sweep - this is assuming he doesn't think my code is crap though :)

@driskell
Copy link
Contributor

When I say merge the PR for him - I mean into one big PR. I'm not a maintainer! lol

@glongman
Copy link

any news of forward movement on this fix?

@driskell
Copy link
Contributor

Hi glongman,
It's complete. Just needs merge.

You can find a fully working logstash-forwarder including this and more in my github:
Http://github.com/driskell/logstash-forwarder

@jordansissel
Copy link
Contributor

Apologies for the delay in attending to many of these pull requests. The PRs now appear to be some kind of spaghetti of merges crossing into other PRs. My plan is to spend some time next week untangling what has occurred in my absence. Hopefully will have some of these things merged soon.

Thanks again for your patience!

@danslimmon
Copy link

I can confirm that this fix works. I've been using it for a week, and I haven't had any more floods of repeated log data.

for {
segment, is_partial, err := reader.ReadLine()
segment, err := reader.ReadBytes('\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully read this patch, but it looks like this reimplements go's ReadLine() method with respect to partial detection. Is there a bug we should file upstream to the Go developers? Put differently, what caused this to need reimplementing of is_partial?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in #149 we discovered ReadLine() actually treats EOF as an end-of-line delimiter (I checked the pkg source) - which is the root cause of the issue.
So if you have data at the end of the log, with no new line, ReadLine() returns normally with is_partial set to false and you can never know if there was a new line or not - and LSF assumes there was (the +1 we have for offset). So we end up offset too far ahead of the Size() and detect truncation, wrongly.

It's also worth noting http://golang.org/pkg/bufio/#Reader.ReadLine

ReadLine is a low-level line-reading primitive. Most callers should use ReadBytes('\n') or ReadString('\n') instead or use a Scanner.

Using ReadBytes('\n') we can safely ensure that we don't treat EOF as a line delimiter and properly count the \r\n bytes at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think Go have documented the behaviour at EOF. Note the second sentence.

The text returned from ReadLine does not include the line end ("\r\n" or "\n"). No indication or error is given if the input ends without a final line end.

So an upstream patch is probably not viable as it's the defined behaviour.

@tzahari
Copy link
Author

tzahari commented Jun 6, 2014

Hi @jordansissel ,
Can I support you to merge this pull request?
Do not hesitate to contact me.
Regards
Thomas

@driskell
Copy link
Contributor

@twood9003 are you sure it's the same issue? This was a very specific issue and you haven't mentioned any of the symptoms you are seeing.

If you could raise a new issue it would be better and we can all look at it with fresh eyes in the event it is different. It helps too as this is now closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants