New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file tailing not working when a relative log path name is passed to --logs flag #151
Comments
Tailing only picks up from the end of file when running normally. It doesn't read the entire file from the beginning except in one-shot mode, so that's a red herring. The first case looks like a bug, because the last lines of the log show it sees the update to the logs/a.log file -- but you've got a custom progs path so what is the source to |
Is there a situation where it is beneficial to have the days prior work ("reading the entire file from beginning") later on in the day in a time series data structure? Acknowledge your gap, but ignore the data that came from it. If you can backfill, great, but every program in the chain needs to understand (prometheus, statsd, collectd, etc) you are sending past metrics. In the event of past data (especially time series data), it's almost never good to process old events as now events. You don't want to have a huge (anomalous spike) immediately coming back from your gap, throws off your alerting systems. |
@jaqx0r count_lines.mtail is the example provided in the examples folder:
For any program I've written it works with rc2 and doesn't work with rc10. One_shot mode provides the right results but tailing does not process the lines added after starting. |
I've recorded a screencast showing the issue. |
Cannot replicate with latest pull from master. https://asciinema.org/a/Z3ApggBKCiQSRbyM7vw7HZDds
|
@jnovack if I run it inside docker as you did it also works for me. But If I run it on the local os (and I've tested ubuntu 16.10, wsl on windows, an old redhat and running os OSX) tailing doesn't work. If I compile instead the rc2 version than tailing works. |
Could it be some difference in behavior between dockered versions and versions running natively? |
did you build latest master? docker builds the current directory, the
I don't keep |
I cloned from github without selecting a label:
I think I'm running exactly the same as you @jnovack 😀 |
Yes, but your original post, which started this issue, was version |
same issue @jnovack |
So, just to be clear. Docker works, native binary does NOT. You really going to make me install |
you don't need to install go, you can download a release 😉 |
What is fun is that if I build rc2 it works, every version since rc3 doesn't. Yes, I did try them all 😓 @jnovack |
I'm not concerned about the old Can you confirm (just hit the Thumbs Up), that |
I confirm that the docker version binary |
Ok.. I confirm. This is valid. https://asciinema.org/a/m5WVxAS5ALDEgNmpurcy20NVg
|
Thank goodness.... I thought I was going nuts 😜 |
@jaqx0r this seems to be the bad commit. 7aacfb9 Works prior to this commit... ALTHOUGH, I see that it's commented out in Perhaps it's an issue in |
Thanks both of you for the triage! I'll take a look in a moment.
…On Mon., 23 Apr. 2018, 04:13 Justin J. Novack, ***@***.***> wrote:
@jaqx0r <https://github.com/jaqx0r> this seems to be the bad commit.
7aacfb9
<7aacfb9>
Works prior to this commit.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC5b-y5vYkrNWHHRLFblXt8jNZx4nzKfks5trMhTgaJpZM4TIM1A>
.
|
I can't explain why I made that change. The loop won't iterate if there
are zero matches, but the error return is probably triggering an early
return up the stack by accident.
…On 23 April 2018 at 04:13, Justin J. Novack ***@***.***> wrote:
@jaqx0r <https://github.com/jaqx0r> this seems to be the bad commit.
7aacfb9
<7aacfb9>
Works prior to this commit.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC5b-y5vYkrNWHHRLFblXt8jNZx4nzKfks5trMhTgaJpZM4TIM1A>
.
|
Curiously, that code is commented out at HEAD. @jnovack sorry I just saw you already said that. I've been thinking of ditching afero -- it continually ends up triggering race detector problems. I only used it to speed up unit tests but if we are blind to actual filesystem issues that's no good. |
@riclib @jnovack what platforms are you testing on? I can't reproduce it on my Debian system at 517231f -- I see the line counter increase on /metrics using the same setup that @jnovack provided in the asciinema link. @riclib you said you confirmed the docker version works on all platforms and the native binary does not work on all platforms, but I can't understand how you saw it not working. Can you please, for one last datapoint, |
@jaqx0r I can reproduce it on my ubuntu 17.10 and on my mac OSX 10.13.4
I can get you access to the ubuntu box, it's one of our test servers. I've seen the issue not happening on docker for Mac on the same mac above. |
just tried doing the make clean, still same issue @jaqx0r |
@jaqx0r reproducing the issue is easy. When the issue happens appends to logs being tracked don't generate changees to metrics. If you use the simple linecount program it's enough to test.
on the mtail log you see for above:
|
Heh, it's not easy for me to reproduce :-)
Can you run your mtail with flags --logtostderr
--vmodule=tail=2,log_watcher=2 and send the output, annotated with when you
append to the log being watched?
Can you both also pull to HEAD (currently f7ece4f) and run the shell script
tests/tail_test.sh (from the top of the source tree, possibly modifying the
path to mtail if necessary) and tell me if that script repros on your
systems.
…On 23 April 2018 at 14:36, Ricardo Liberato ***@***.***> wrote:
@jaqx0r <https://github.com/jaqx0r> reproducing the issue is easy. When
the issue happens appends to logs being tracked don't generate changees to
metrics. If you use the simple linecount program it's enough to test.
mtail -logtostderr -v 1 -progs ./progs -logs a.log
***@***.*** ~/mt curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0
***@***.*** ~/mt curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0
***@***.*** ~/mt echo `date` >> a.log
***@***.*** ~/mt curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0
***@***.*** ~/mt
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC5b-7ZpoE_AY9q91pMmjt5ZbuCZvMwmks5trVpkgaJpZM4TIM1A>
.
|
Session with the parameters you asked above
|
running tests/test_tail.sh:
|
Interesting! Your output is missing log lines that I see when you reproduce the problem. There should be calls to handleLogEvent on taill.go:170 after you add the log lines to your log file. Finding out why that is will lead us in the right direction. Also, when you run my test script, your line_count metric and debug var are 2, which is what I expected to see from mtail correctly working. (I've discovered another bug in this process, the counter should be 3, but that's a different story.) So.. something in your test setup that's not in my test setup is affecting this too? |
tail.go:448 shows us where we head to tail.go:170, inside a conditional "isWatching" So for some reason your mtail doesn't think you're watching that file for updates. |
my test setup looks like this: I have a subfolder in my home directory called mt
|
Interestingly.... If I place the logs in /tmp/test/logfile... ...it works
|
Seems to be a much more mundane error. If I put a full absolute path it works, if I put a relative path it doesn't @jaqx0r. 😱 If I put /Home/riclib/mt/a.log (same file) as input to --logs than tailing works. |
Yep, that's what I'm thinkgin too. I managed to reproduce the bug with a relative path to the logfile directly, instead of either an absolute path or a relative path to the directory holding the log, and I'm seeing log lines that indicate that the pattern match from fsnotify is different than what's in the 'watched' map. I'm trying to use filepath.Abs to resolve the absolute path of the files for comparison, but it's making some other tests fail so the fix will take a little while. I'm glad I was able to reproduce it here though. I didn't realise before that you were using relative paths to a single file and that's what it took to trigger the bug. |
Well, following the traditions of our ancestors, this bug is hereby named "riclib". :) |
Please fetch head and try it out. |
Seems to be doing a one shot.... I only added one line and...
|
OK, not fixed :( |
This time for sure! |
Tested and fix works. Thanks @jaqx0r !
|
woot!
Tagged as rc11
…On 23 April 2018 at 22:41, Ricardo Liberato ***@***.***> wrote:
Tested and fix works. Thanks @jaqx0r <https://github.com/jaqx0r> !
***@***.*** ~/mt clear
***@***.*** ~/mt echo `date` >> a.log
***@***.*** ~/mt echo `date` >> a.log
***@***.*** ~/mt curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 2
***@***.*** ~/mt echo `date` >> a.log
***@***.*** ~/mt curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 3
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#151 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC5b--8PG6FabZxmfdpEn6nF3W_qJcm7ks5trcwUgaJpZM4TIM1A>
.
|
Hi!
Using the count_lines sample, since version rc4 I can't get tailing to work. I get the right results if I run mtail in one_shot mode, but no metric on the Prometheus metrics endpoint.
log:
At 06:10:27 I added a line to the tailed file using
echo "Hello" >>logs/a.log
.in the metrics endpoint I get:
The same scenario works perfectly under version rc2:
with metrics output:
In both versions running with the option one_shot works perfectly.
Am I doing something wrong or is tailing broken currently?
I've tried with the precompiled binaries and builds I made myself. I've tested in ubuntu 16.04, an old redhat and wsl on windows with Ubuntu 16.04.
The text was updated successfully, but these errors were encountered: