Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file tailing not working when a relative log path name is passed to --logs flag #151

Closed
riclib opened this issue Apr 5, 2018 · 43 comments
Closed
Assignees
Labels
bug This is considered a bug and shall get fixed

Comments

@riclib
Copy link

riclib commented Apr 5, 2018

Hi!

Using the count_lines sample, since version rc4 I can't get tailing to work. I get the right results if I run mtail in one_shot mode, but no metric on the Prometheus metrics endpoint.

log:

$ ./mtail_rc10 --progs="/apps/solidmon/agent/mtail/programs/" --logs="logs/a.log" -v=10 -logtostderr
I0405 06:10:19.350138   62463 main.go:108] mtail version v3.0.0-rc10 git revision 60ed333f3672ec835c09c507f095fdd0be6ec1d7 go version go1.9.4
I0405 06:10:19.350205   62463 main.go:109] Commandline: ["./mtail_rc10" "--progs=/apps/solidmon/agent/mtail/programs/" "--logs=logs/a.log" "-v=10" "-logtostderr"]
I0405 06:10:19.350429   62463 lexer.go:179] Emitting COUNTER at count_lines.mtail:2:23
I0405 06:10:19.350455   62463 lexer.go:179] Emitting ID at count_lines.mtail:2:9-18
I0405 06:10:19.350467   62463 lexer.go:179] Emitting NL at count_lines.mtail:3:20
I0405 06:10:19.350478   62463 lexer.go:179] Emitting DIV at count_lines.mtail:3:1
I0405 06:10:19.350506   62463 yaccpar:954] position marked at count_lines.mtail:3:1
I0405 06:10:19.350515   62463 driver.go:76] Entering regex
I0405 06:10:19.350521   62463 lexer.go:179] Emitting REGEX at count_lines.mtail:3:2
I0405 06:10:19.350530   62463 lexer.go:555] Exiting regex
I0405 06:10:19.350535   62463 lexer.go:556] Regex at line 2, startcol 2, col 2
I0405 06:10:19.350542   62463 lexer.go:179] Emitting DIV at count_lines.mtail:3:3
I0405 06:10:19.350554   62463 lexer.go:179] Emitting LCURLY at count_lines.mtail:3:5
I0405 06:10:19.350564   62463 lexer.go:179] Emitting NL at count_lines.mtail:4:7
I0405 06:10:19.350574   62463 lexer.go:179] Emitting ID at count_lines.mtail:4:3-12
I0405 06:10:19.350585   62463 lexer.go:179] Emitting INC at count_lines.mtail:4:13-14
I0405 06:10:19.350595   62463 lexer.go:179] Emitting NL at count_lines.mtail:5:16
I0405 06:10:19.350605   62463 lexer.go:179] Emitting RCURLY at count_lines.mtail:5:1
I0405 06:10:19.350618   62463 lexer.go:179] Emitting EOF at count_lines.mtail:5:2
I0405 06:10:19.350660   62463 checker.go:84] found sym &{line_count variable typeVar4 count_lines.mtail:2:9-18 <nil> 0 false}
I0405 06:10:19.350691   62463 types.go:285] Unifying Int and typeVar4
I0405 06:10:19.350700   62463 types.go:285] Unifying typeVar4 and Int
I0405 06:10:19.350707   62463 types.go:300] Making "typeVar4" type "Int"
I0405 06:10:19.350746   62463 loader.go:200] Loaded program count_lines.mtail
I0405 06:10:19.350755   62463 loader.go:219] Program count_lines.mtail has goroutine marker 0x636f756e
I0405 06:10:19.350785   62463 vm.go:640] Starting program count_lines.mtail
I0405 06:10:19.350800   62463 loader.go:223] Started count_lines.mtail
I0405 06:10:19.350849   62463 mtail.go:55] Tail pattern "logs/a.log"
I0405 06:10:19.350872   62463 tail.go:129] glob matches: [logs/a.log]
I0405 06:10:19.350931   62463 tail.go:222] Read: 0 EOF
I0405 06:10:19.350942   62463 tail.go:227] Suspected truncation.
I0405 06:10:19.350948   62463 tail.go:194] current seek position at 629148964
I0405 06:10:19.350955   62463 tail.go:204] File size is 629148964
I0405 06:10:19.350961   62463 tail.go:230] handletrunc with error 'no truncate appears to have occurred'
I0405 06:10:19.350975   62463 tail.go:404] EOF on first read
I0405 06:10:19.351001   62463 tail.go:423] Tailing logs/a.log
I0405 06:10:19.351137   62463 mtail.go:227] Listening on port :3903
I0405 06:10:27.916078   62463 log_watcher.go:72] watcher event "logs/a.log": WRITE
I0405 06:10:27.916197   62463 log_watcher.go:72] watcher event "logs/a.log": WRITE
I0405 06:10:27.916243   62463 loader.go:97] Skipping logs/a.log due to file extension.
I0405 06:10:27.916260   62463 loader.go:97] Skipping logs/a.log due to file extension.
I0405 06:10:27.916307   62463 tail.go:435] Event type watcher.UpdateEvent{Pathname:"logs/a.log"}
I0405 06:10:27.916334   62463 tail.go:435] Event type watcher.UpdateEvent{Pathname:"logs/a.log"}

At 06:10:27 I added a line to the tailed file using echo "Hello" >>logs/a.log.
in the metrics endpoint I get:

# TYPE line_count counter
# line_count defined at count_lines.mtail:2:9-18
line_count{prog="count_lines.mtail"} 0

The same scenario works perfectly under version rc2:

$ ./mtail_rc2 --progs="/apps/solidmon/agent/mtail/programs/" --logs="logs/a.log" -v=10 -logtostderr  I0405 06:14:47.558366   63852 main.go:101] mtail version v3.0.0-rc2 git revision 5e6d38908091a8648c0f26c44ebd708e241f3814 go version go1.9.4
I0405 06:14:47.558424   63852 main.go:102] Commandline: ["./mtail_rc2" "--progs=/apps/solidmon/agent/mtail/programs/" "--logs=logs/a.log" "-v=10" "-logtostderr"]
I0405 06:14:47.558702   63852 loader.go:196] Loaded program count_lines.mtail
I0405 06:14:47.558713   63852 loader.go:215] Program count_lines.mtail has goroutine marker 0x636f756e
I0405 06:14:47.558730   63852 loader.go:218] Started count_lines.mtail
I0405 06:14:47.558772   63852 mtail.go:103] Tail pattern "logs/a.log"
I0405 06:14:47.558796   63852 tail.go:125] glob matches: [logs/a.log]
I0405 06:14:47.559063   63852 tail.go:349] Tailing /apps/solidmon/agent/mtail/logs/a.log
I0405 06:14:47.559087   63852 vm.go:715] Starting program count_lines.mtail
I0405 06:14:47.559131   63852 mtail.go:290] Listening on port :3903

with metrics output:

# TYPE line_count counter
# line_count defined at count_lines.mtail:2:9-18
line_count{prog="count_lines.mtail"} 1

In both versions running with the option one_shot works perfectly.

Am I doing something wrong or is tailing broken currently?

I've tried with the precompiled binaries and builds I made myself. I've tested in ubuntu 16.04, an old redhat and wsl on windows with Ubuntu 16.04.

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 19, 2018

Tailing only picks up from the end of file when running normally. It doesn't read the entire file from the beginning except in one-shot mode, so that's a red herring.

The first case looks like a bug, because the last lines of the log show it sees the update to the logs/a.log file -- but you've got a custom progs path so what is the source to count_lines.mtail?

@jnovack
Copy link
Contributor

jnovack commented Apr 19, 2018

Is there a situation where it is beneficial to have the days prior work ("reading the entire file from beginning") later on in the day in a time series data structure? Acknowledge your gap, but ignore the data that came from it. If you can backfill, great, but every program in the chain needs to understand (prometheus, statsd, collectd, etc) you are sending past metrics.

In the event of past data (especially time series data), it's almost never good to process old events as now events. You don't want to have a huge (anomalous spike) immediately coming back from your gap, throws off your alerting systems.

@riclib
Copy link
Author

riclib commented Apr 21, 2018

@jaqx0r count_lines.mtail is the example provided in the examples folder:

# Copyright 2011 Google Inc. All Rights Reserved.
# This file is available under the Apache license.

counter line_count

/$/ {
  line_count++
}

For any program I've written it works with rc2 and doesn't work with rc10. One_shot mode provides the right results but tailing does not process the lines added after starting.

@riclib
Copy link
Author

riclib commented Apr 21, 2018

I've recorded a screencast showing the issue.
https://drive.google.com/file/d/1-19KjVQMZtIkermHOz7M0aorqaCDhKvk/view?usp=sharing

@jnovack
Copy link
Contributor

jnovack commented Apr 21, 2018

Cannot replicate with latest pull from master.

https://asciinema.org/a/Z3ApggBKCiQSRbyM7vw7HZDds

git clone https://github.com/google/mtail.git
cd mtail
docker build -t mtail .
echo "line" > foo.log
docker run --rm -v `pwd`/examples:/opt/progs -v `pwd`/foo.log:/opt/foo.log \
     -p 3903:3903 mtail -progs /opt/progs/linecount.mtail -logs /opt/foo.log
curl http://localhost:3903
echo "line" >> foo.log
echo "line" >> foo.log
echo "line" >> foo.log
curl http://localhost:3903

@riclib
Copy link
Author

riclib commented Apr 22, 2018

@jnovack if I run it inside docker as you did it also works for me.

But If I run it on the local os (and I've tested ubuntu 16.10, wsl on windows, an old redhat and running os OSX) tailing doesn't work. If I compile instead the rc2 version than tailing works.

@riclib
Copy link
Author

riclib commented Apr 22, 2018

Could it be some difference in behavior between dockered versions and versions running natively?

@jnovack
Copy link
Contributor

jnovack commented Apr 22, 2018

did you build latest master? docker builds the current directory, the mtail_rc10 binary is a few commits back from latest master.

$ ./mtail_rc10 --version
mtail version v3.0.0-rc10 git revision 60ed333f3672ec835c09c507f095fdd0be6ec1d7 go version go1.9.4
$ docker run -it --rm mtail --version                       
mtail version v3.0.0-rc10-106-g517231f git revision 517231f79449b457d7709eb6cc34bbba77bc1065 go version go1.10.1

I don't keep go on my machine (communist, I know), which is why I was using the docker version.

@riclib
Copy link
Author

riclib commented Apr 22, 2018

I cloned from github without selecting a label:

 riclib@libMac  ~/go/src/github.com/google/mtail   master  git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
 riclib@libMac  ~/go/src/github.com/google/mtail   master 

 riclib@libMac  ~/go/src/github.com/google/mtail   master  mtail --version
mtail version v3.0.0-rc10-106-g517231f git revision 517231f79449b457d7709eb6cc34bbba77bc1065 go version go1.10.1

I think I'm running exactly the same as you @jnovack 😀

@jnovack
Copy link
Contributor

jnovack commented Apr 22, 2018

Yes, but your original post, which started this issue, was version v3.0.0-rc10. The one you JUST built was v3.0.0-rc10-106-g517231f. Perform the same test with v3.0.0-rc10-106-g517231f.

@riclib
Copy link
Author

riclib commented Apr 22, 2018

image

same issue @jnovack

@jnovack
Copy link
Contributor

jnovack commented Apr 22, 2018

So, just to be clear. Docker works, native binary does NOT.

You really going to make me install go on my machine??

@riclib
Copy link
Author

riclib commented Apr 22, 2018

you don't need to install go, you can download a release 😉

@riclib
Copy link
Author

riclib commented Apr 22, 2018

What is fun is that if I build rc2 it works, every version since rc3 doesn't. Yes, I did try them all 😓 @jnovack

@jnovack
Copy link
Contributor

jnovack commented Apr 22, 2018

I'm not concerned about the old rc releases at this point. I'm concerned that you are suggesting there's a inconsistency between v3.0.0-rc10-106-g517231f running in Docker, and v3.0.0-rc10-106-g517231f running as a native OSX binary.

Can you confirm (just hit the Thumbs Up), that v3.0.0-rc10-106-g517231f Docker WORKS as expected (does NOT have the issue), and v3.0.0-rc10-106-g517231f native OSX binary does NOT work (has the issue)?

@riclib
Copy link
Author

riclib commented Apr 22, 2018

I confirm that the docker version binary v3.0.0-rc10-106-g517231f works and the native (OSX+Ubuntu 16.04+Windows wsl ubuntu 16.10) binary v3.0.0-rc10-106-g517231f doesn't. I only mention rc2 because it may help narrow down when the non regression was introduced (if one was) @jnovack .

@jnovack
Copy link
Contributor

jnovack commented Apr 22, 2018

Ok.. I confirm. This is valid.

https://asciinema.org/a/m5WVxAS5ALDEgNmpurcy20NVg

  • Docker version WORKS as expected (no issue).
  • Native Binary does NOT work (exhibits issue).

@riclib
Copy link
Author

riclib commented Apr 22, 2018

Thank goodness.... I thought I was going nuts 😜

@jnovack
Copy link
Contributor

jnovack commented Apr 22, 2018

@jaqx0r this seems to be the bad commit. 7aacfb9

Works prior to this commit... ALTHOUGH, I see that it's commented out in latest. So, I'm confused.

Perhaps it's an issue in spf13/afero? My guess is that CircleCI is doing testing in Docker Containers (as they SHOULD!), but that is why you are not seeing the tests fail.

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018 via email

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018 via email

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

Curiously, that code is commented out at HEAD. @jnovack sorry I just saw you already said that.

I've been thinking of ditching afero -- it continually ends up triggering race detector problems. I only used it to speed up unit tests but if we are blind to actual filesystem issues that's no good.

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

@riclib @jnovack what platforms are you testing on? I can't reproduce it on my Debian system at 517231f -- I see the line counter increase on /metrics using the same setup that @jnovack provided in the asciinema link.

@riclib you said you confirmed the docker version works on all platforms and the native binary does not work on all platforms, but I can't understand how you saw it not working.

Can you please, for one last datapoint, make clean mtail so it forces a go get -u to update all the dependencies? I worry that you've cached an older version of afero (and I haven't made any attempt to vendor the dependencies.)

@riclib
Copy link
Author

riclib commented Apr 23, 2018

@jaqx0r I can reproduce it on my ubuntu 17.10 and on my mac OSX 10.13.4

 riclib@solidmon  ~  lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 17.10
Release:	17.10
Codename:	artful
 riclib@solidmon  ~ 

I can get you access to the ubuntu box, it's one of our test servers.

I've seen the issue not happening on docker for Mac on the same mac above.

@riclib
Copy link
Author

riclib commented Apr 23, 2018

just tried doing the make clean, still same issue @jaqx0r

@riclib
Copy link
Author

riclib commented Apr 23, 2018

@jaqx0r reproducing the issue is easy. When the issue happens appends to logs being tracked don't generate changees to metrics. If you use the simple linecount program it's enough to test.

mtail -logtostderr -v 1 -progs ./progs -logs a.log
riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0
 riclib@libMac  ~/mt  echo `date` >> a.log
 riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0
 riclib@libMac  ~/mt 

on the mtail log you see for above:

 riclib@libMac  ~/mt  mtail -logtostderr -v 1 -progs ./progs -logs a.log
I0423 07:35:04.404764    9280 main.go:108] mtail version v3.0.0-rc10-106-g517231f git revision 517231f79449b457d7709eb6cc34bbba77bc1065 go version go1.10.1
I0423 07:35:04.404903    9280 main.go:109] Commandline: ["mtail" "-logtostderr" "-v" "1" "-progs" "./progs" "-logs" "a.log"]
I0423 07:35:04.406213    9280 loader.go:200] Loaded program linecount.mtail
I0423 07:35:04.406227    9280 loader.go:219] Program linecount.mtail has goroutine marker 0x6c696e65
I0423 07:35:04.406274    9280 vm.go:680] Starting program linecount.mtail
I0423 07:35:04.406284    9280 loader.go:223] Started linecount.mtail
I0423 07:35:04.406328    9280 mtail.go:55] Tail pattern "a.log"
I0423 07:35:04.406356    9280 tail.go:129] glob matches: [a.log]
I0423 07:35:04.406494    9280 tail.go:417] EOF on first read
I0423 07:35:04.406502    9280 tail.go:436] Tailing a.log
I0423 07:35:04.406582    9280 mtail.go:226] Listening on port :3903

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018 via email

@riclib
Copy link
Author

riclib commented Apr 23, 2018

Session with the parameters you asked above
I've annotated with --->

 riclib@libMac  ~/mt  mtail -logtostderr -v 1 -progs ./progs -logs a.log --vmodule=tail=2,log_watcher=2
I0423 13:49:28.052774   11362 main.go:108] mtail version v3.0.0-rc10-106-g517231f git revision 517231f79449b457d7709eb6cc34bbba77bc1065 go version go1.10.1
I0423 13:49:28.052879   11362 main.go:109] Commandline: ["mtail" "-logtostderr" "-v" "1" "-progs" "./progs" "-logs" "a.log" "--vmodule=tail=2,log_watcher=2"]
I0423 13:49:28.054567   11362 loader.go:200] Loaded program linecount.mtail
I0423 13:49:28.054577   11362 loader.go:219] Program linecount.mtail has goroutine marker 0x6c696e65
I0423 13:49:28.055028   11362 vm.go:680] Starting program linecount.mtail
I0423 13:49:28.055042   11362 loader.go:223] Started linecount.mtail
I0423 13:49:28.055098   11362 mtail.go:55] Tail pattern "a.log"
I0423 13:49:28.055544   11362 tail.go:129] glob matches: [a.log]
I0423 13:49:28.055572   11362 tail.go:342] openlogPath a.log false
I0423 13:49:28.055655   11362 tail.go:359] open succeeded a.log
I0423 13:49:28.055681   11362 tail.go:222] Read: 0 EOF
I0423 13:49:28.055692   11362 tail.go:227] Suspected truncation.
I0423 13:49:28.055698   11362 tail.go:194] current seek position at 189
I0423 13:49:28.055710   11362 tail.go:204] File size is 189
I0423 13:49:28.055717   11362 tail.go:230] handletrunc with error 'no truncate appears to have occurred'
I0423 13:49:28.055722   11362 tail.go:417] EOF on first read
I0423 13:49:28.055727   11362 tail.go:436] Tailing a.log
I0423 13:49:28.055830   11362 mtail.go:226] Listening on port :3903

---> will add line now
I0423 13:50:12.973235   11362 log_watcher.go:72] watcher event "a.log": WRITE
I0423 13:50:12.973350   11362 tail.go:448] Event type watcher.UpdateEvent{Pathname:"a.log"}
I0423 13:50:14.154772   11362 log_watcher.go:72] watcher event "a.log": CHMOD
I0423 13:50:14.154817   11362 tail.go:448] Event type watcher.UpdateEvent{Pathname:"a.log"}


---> will add second line now
I0423 13:50:38.486833   11362 log_watcher.go:72] watcher event "a.log": WRITE
I0423 13:50:38.486875   11362 tail.go:448] Event type watcher.UpdateEvent{Pathname:"a.log"}
I0423 13:50:39.818438   11362 log_watcher.go:72] watcher event "a.log": CHMOD
I0423 13:50:39.818530   11362 tail.go:448] Event type watcher.UpdateEvent{Pathname:"a.log"}



---> output from curl
 ✘ riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0


---> will CTRL-C now
^CI0423 13:51:55.872449   11362 mtail.go:251] Received SIGTERM, exiting...
I0423 13:51:55.878180   11362 mtail.go:263] Shutdown requested.
I0423 13:51:55.915231   11362 log_watcher.go:89] Shutting down log watcher.
I0423 13:51:55.915275   11362 tail.go:483] Shutting down tailer.
I0423 13:51:55.915317   11362 loader.go:370] Shutting down loader.
I0423 13:51:55.915334   11362 vm.go:686] Stopping program linecount.mtail
I0423 13:51:55.915347   11362 mtail.go:283] All done.

@riclib
Copy link
Author

riclib commented Apr 23, 2018

running tests/test_tail.sh:

 riclib@libMac  ~/go/src/github.com/google/mtail   master  tests/tail_test.sh
I0423 13:54:39.276756   11490 main.go:108] mtail version v3.0.0-rc10-106-g517231f git revision 517231f79449b457d7709eb6cc34bbba77bc1065 go version go1.10.1
I0423 13:54:39.276860   11490 main.go:109] Commandline: ["mtail" "--progs" "examples/linecount.mtail" "--logs" "/tmp/test/logs/*" "--logtostderr" "-v" "1" "--vmodule=tail=2,log_watcher=2"]
I0423 13:54:39.279005   11490 loader.go:200] Loaded program linecount.mtail
I0423 13:54:39.279015   11490 loader.go:219] Program linecount.mtail has goroutine marker 0x6c696e65
I0423 13:54:39.279412   11490 vm.go:680] Starting program linecount.mtail
I0423 13:54:39.279424   11490 loader.go:223] Started linecount.mtail
I0423 13:54:39.279471   11490 mtail.go:55] Tail pattern "/tmp/test/logs/*"
I0423 13:54:39.279826   11490 tail.go:129] glob matches: []
I0423 13:54:39.279982   11490 mtail.go:226] Listening on port :3903
APPENDING
I0423 13:54:40.241482   11490 log_watcher.go:72] watcher event "/tmp/test/logs/log": CREATE
I0423 13:54:40.241607   11490 tail.go:448] Event type watcher.CreateEvent{Pathname:"/tmp/test/logs/log"}
I0423 13:54:40.242108   11490 tail.go:466] New file "/tmp/test/logs/log" matched existing glob "/tmp/test/logs/*"
I0423 13:54:40.242147   11490 tail.go:342] openlogPath /tmp/test/logs/log false
I0423 13:54:40.242216   11490 tail.go:359] open succeeded /tmp/test/logs/log
I0423 13:54:40.242290   11490 tail.go:222] Read: 0 EOF
I0423 13:54:40.242319   11490 tail.go:227] Suspected truncation.
I0423 13:54:40.242338   11490 tail.go:194] current seek position at 2
I0423 13:54:40.242355   11490 tail.go:204] File size is 2
I0423 13:54:40.242376   11490 tail.go:230] handletrunc with error 'no truncate appears to have occurred'
I0423 13:54:40.242388   11490 tail.go:417] EOF on first read
I0423 13:54:40.242408   11490 tail.go:436] Tailing /tmp/test/logs/log
APPENDING
I0423 13:54:41.239661   11490 log_watcher.go:72] watcher event "/tmp/test/logs/log": WRITE
I0423 13:54:41.239744   11490 tail.go:448] Event type watcher.UpdateEvent{Pathname:"/tmp/test/logs/log"}
I0423 13:54:41.239803   11490 tail.go:170] handleLogUpdate /tmp/test/logs/log
I0423 13:54:41.239830   11490 tail.go:222] Read: 2 <nil>
I0423 13:54:41.239854   11490 tail.go:222] Read: 0 EOF
APPENDING
I0423 13:54:42.244536   11490 log_watcher.go:72] watcher event "/tmp/test/logs/log": WRITE
I0423 13:54:42.244568   11490 tail.go:448] Event type watcher.UpdateEvent{Pathname:"/tmp/test/logs/log"}
I0423 13:54:42.244581   11490 tail.go:170] handleLogUpdate /tmp/test/logs/log
I0423 13:54:42.244609   11490 tail.go:222] Read: 2 <nil>
I0423 13:54:42.244624   11490 tail.go:222] Read: 0 EOF
debug/vars:
"cmdline": ["mtail","--progs","examples/linecount.mtail","--logs","/tmp/test/logs/*","--logtostderr","-v","1","--vmodule=tail=2,log_watcher=2"],
"line_count": 2,
"log_count": 1,
"log_watcher_error_count": 0,
"log_watcher_event_count": {"/tmp/test/logs/log": 3},
"prog_loads_total": {"linecount.mtail": 1},
metrics:
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 2
I0423 13:54:44.302544   11490 mtail.go:251] Received SIGTERM, exiting...
I0423 13:54:44.302558   11490 mtail.go:263] Shutdown requested.
I0423 13:54:44.390432   11490 log_watcher.go:89] Shutting down log watcher.
I0423 13:54:44.390482   11490 tail.go:483] Shutting down tailer.
I0423 13:54:44.390507   11490 loader.go:370] Shutting down loader.
I0423 13:54:44.390519   11490 vm.go:686] Stopping program linecount.mtail
I0423 13:54:44.390528   11490 mtail.go:283] All done.

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

Interesting!

Your output is missing log lines that I see when you reproduce the problem. There should be calls to handleLogEvent on taill.go:170 after you add the log lines to your log file. Finding out why that is will lead us in the right direction.

Also, when you run my test script, your line_count metric and debug var are 2, which is what I expected to see from mtail correctly working. (I've discovered another bug in this process, the counter should be 3, but that's a different story.) So.. something in your test setup that's not in my test setup is affecting this too?

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

tail.go:448 shows us where we head to tail.go:170, inside a conditional "isWatching" So for some reason your mtail doesn't think you're watching that file for updates.

@riclib
Copy link
Author

riclib commented Apr 23, 2018

my test setup looks like this:

I have a subfolder in my home directory called mt

 riclib@libMac  ~/mt  ls -lR
total 8
-rw-r--r--  1 riclib  staff  249 Apr 23 13:50 a.log
drwxr-xr-x  3 riclib  staff   96 Apr 21 18:24 progs

./progs:
total 8
-rw-r--r--  1 riclib  staff  145 Apr 21 18:24 linecount.mtail
 riclib@libMac  ~/mt  cat progs/linecount.mtail
# Copyright 2011 Google Inc. All Rights Reserved.
# This file is available under the Apache license.

counter line_count

/$/ {
  line_count++
}
 riclib@libMac  ~/mt 

@riclib
Copy link
Author

riclib commented Apr 23, 2018

Interestingly.... If I place the logs in /tmp/test/logfile...

...it works

 riclib@libMac  ~/mt  mtail -logtostderr -v 1 -progs ./progs -logs /tmp/test/logfile  --vmodule=tail=2,log_watcher=2
I0423 14:13:57.030071   11644 main.go:108] mtail version v3.0.0-rc10-106-g517231f git revision 517231f79449b457d7709eb6cc34bbba77bc1065 go version go1.10.1
I0423 14:13:57.030182   11644 main.go:109] Commandline: ["mtail" "-logtostderr" "-v" "1" "-progs" "./progs" "-logs" "/tmp/test/logfile" "--vmodule=tail=2,log_watcher=2"]
I0423 14:13:57.034632   11644 loader.go:200] Loaded program linecount.mtail
I0423 14:13:57.034652   11644 loader.go:219] Program linecount.mtail has goroutine marker 0x6c696e65
I0423 14:13:57.035106   11644 vm.go:680] Starting program linecount.mtail
I0423 14:13:57.035122   11644 loader.go:223] Started linecount.mtail
I0423 14:13:57.035554   11644 mtail.go:55] Tail pattern "/tmp/test/logfile"
I0423 14:13:57.035941   11644 tail.go:129] glob matches: [/tmp/test/logfile]
I0423 14:13:57.035966   11644 tail.go:342] openlogPath /tmp/test/logfile false
I0423 14:13:57.036101   11644 tail.go:359] open succeeded /tmp/test/logfile
I0423 14:13:57.036392   11644 tail.go:222] Read: 0 EOF
I0423 14:13:57.036406   11644 tail.go:227] Suspected truncation.
I0423 14:13:57.036414   11644 tail.go:194] current seek position at 30
I0423 14:13:57.036429   11644 tail.go:204] File size is 30
I0423 14:13:57.036435   11644 tail.go:230] handletrunc with error 'no truncate appears to have occurred'
I0423 14:13:57.036440   11644 tail.go:417] EOF on first read
I0423 14:13:57.036445   11644 tail.go:436] Tailing /tmp/test/logfile
I0423 14:13:57.036507   11644 mtail.go:226] Listening on port :3903
I0423 14:14:39.878693   11644 log_watcher.go:72] watcher event "/tmp/test/logfile": WRITE
I0423 14:14:39.879421   11644 tail.go:448] Event type watcher.UpdateEvent{Pathname:"/tmp/test/logfile"}
I0423 14:14:39.880173   11644 tail.go:170] handleLogUpdate /tmp/test/logfile
I0423 14:14:39.880194   11644 tail.go:222] Read: 30 <nil>
I0423 14:14:39.880215   11644 tail.go:222] Read: 0 EOF
I0423 14:14:43.769332   11644 log_watcher.go:72] watcher event "/tmp/test/logfile": WRITE
I0423 14:14:43.769387   11644 tail.go:448] Event type watcher.UpdateEvent{Pathname:"/tmp/test/logfile"}
I0423 14:14:43.769404   11644 tail.go:170] handleLogUpdate /tmp/test/logfile
I0423 14:14:43.769426   11644 tail.go:222] Read: 30 <nil>
I0423 14:14:43.769443   11644 tail.go:222] Read: 0 EOF
^CI0423 14:15:36.866174   11644 mtail.go:251] Received SIGTERM, exiting...
I0423 14:15:36.871942   11644 mtail.go:263] Shutdown requested.
I0423 14:15:36.900978   11644 log_watcher.go:89] Shutting down log watcher.
I0423 14:15:36.901007   11644 tail.go:483] Shutting down tailer.
I0423 14:15:36.901033   11644 loader.go:370] Shutting down loader.
I0423 14:15:36.901044   11644 vm.go:686] Stopping program linecount.mtail
I0423 14:15:36.901052   11644 mtail.go:283] All done.

@riclib
Copy link
Author

riclib commented Apr 23, 2018

Seems to be a much more mundane error. If I put a full absolute path it works, if I put a relative path it doesn't @jaqx0r. 😱

If I put /Home/riclib/mt/a.log (same file) as input to --logs than tailing works.
mtail -logtostderr -v 1 -progs ./progs -logs /Users/riclib/mt/a.log --vmodule=tail=2,log_watcher=2

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

Yep, that's what I'm thinkgin too. I managed to reproduce the bug with a relative path to the logfile directly, instead of either an absolute path or a relative path to the directory holding the log, and I'm seeing log lines that indicate that the pattern match from fsnotify is different than what's in the 'watched' map.

I'm trying to use filepath.Abs to resolve the absolute path of the files for comparison, but it's making some other tests fail so the fix will take a little while.

I'm glad I was able to reproduce it here though. I didn't realise before that you were using relative paths to a single file and that's what it took to trigger the bug.

@jaqx0r jaqx0r changed the title file tailing not working - one_shot works file tailing not working when a relative log path name is passed to --logs flag Apr 23, 2018
@jaqx0r jaqx0r assigned jaqx0r and unassigned riclib Apr 23, 2018
@jaqx0r jaqx0r added the bug This is considered a bug and shall get fixed label Apr 23, 2018
@jnovack
Copy link
Contributor

jnovack commented Apr 23, 2018

Well, following the traditions of our ancestors, this bug is hereby named "riclib". :)

@jaqx0r jaqx0r closed this as completed in 50689db Apr 23, 2018
@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

Please fetch head and try it out.

@riclib
Copy link
Author

riclib commented Apr 23, 2018

Seems to be doing a one shot.... I only added one line and...

 riclib@libMac  ~/mt  mtail -logtostderr -v 1 -progs ./progs -logs a.log  --vmodule=tail=2,log_watcher=2
I0423 15:08:26.529708   12304 main.go:108] mtail version v3.0.0-rc10-114-g133942e git revision 133942ec574f2df4a7a7f3842b8652679a269e36 go version go1.10.1
I0423 15:08:26.529842   12304 main.go:109] Commandline: ["mtail" "-logtostderr" "-v" "1" "-progs" "./progs" "-logs" "a.log" "--vmodule=tail=2,log_watcher=2"]
I0423 15:08:26.530735   12304 loader.go:200] Loaded program linecount.mtail
I0423 15:08:26.530749   12304 loader.go:219] Program linecount.mtail has goroutine marker 0x6c696e65
I0423 15:08:26.530806   12304 vm.go:680] Starting program linecount.mtail
I0423 15:08:26.530818   12304 loader.go:223] Started linecount.mtail
I0423 15:08:26.530865   12304 mtail.go:55] Tail pattern "a.log"
I0423 15:08:26.530906   12304 tail.go:148] glob matches: [a.log]
I0423 15:08:26.530939   12304 tail.go:129] Resolved path for lookup "/Users/riclib/mt/a.log"
I0423 15:08:26.530957   12304 tail.go:106] Adding a watch on resolved path "/Users/riclib/mt/a.log"
I0423 15:08:26.530991   12304 tail.go:363] openlogPath a.log false
I0423 15:08:26.531008   12304 tail.go:129] Resolved path for lookup "/Users/riclib/mt"
I0423 15:08:26.531014   12304 tail.go:106] Adding a watch on resolved path "/Users/riclib/mt"
I0423 15:08:26.531125   12304 tail.go:382] open succeeded a.log
I0423 15:08:26.531143   12304 tail.go:429] Adding a file watch on "a.log"
I0423 15:08:26.531159   12304 tail.go:106] Adding a watch on resolved path "/Users/riclib/mt/a.log"
I0423 15:08:26.531181   12304 tail.go:242] Read: 0 EOF
I0423 15:08:26.531194   12304 tail.go:247] Suspected truncation.
I0423 15:08:26.531202   12304 tail.go:214] current seek position at 429
I0423 15:08:26.531211   12304 tail.go:224] File size is 429
I0423 15:08:26.531218   12304 tail.go:250] handletrunc with error 'no truncate appears to have occurred'
I0423 15:08:26.531223   12304 tail.go:441] EOF on first read
I0423 15:08:26.531229   12304 tail.go:460] Tailing a.log
I0423 15:08:26.531245   12304 tail.go:129] Resolved path for lookup "/Users/riclib/mt"
I0423 15:08:26.543862   12304 mtail.go:226] Listening on port :3903

----> before adding one line to log
 riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 0


----> will add one line now
I0423 15:09:22.540047   12304 log_watcher.go:72] watcher event "/Users/riclib/mt/a.log": WRITE
I0423 15:09:22.540135   12304 tail.go:472] Event type watcher.UpdateEvent{Pathname:"/Users/riclib/mt/a.log"}
I0423 15:09:22.540173   12304 tail.go:129] Resolved path for lookup "/Users/riclib/mt/a.log"
I0423 15:09:22.540184   12304 tail.go:190] handleLogUpdate /Users/riclib/mt/a.log
W0423 15:09:22.540192   12304 tail.go:195] No file descriptor found for "/Users/riclib/mt/a.log", but is being watched; opening
I0423 15:09:22.540207   12304 tail.go:363] openlogPath /Users/riclib/mt/a.log true
I0423 15:09:22.540216   12304 tail.go:129] Resolved path for lookup "/Users/riclib/mt"
I0423 15:09:22.540275   12304 tail.go:382] open succeeded /Users/riclib/mt/a.log
I0423 15:09:22.541581   12304 tail.go:382] open succeeded /Users/riclib/mt/a.log
I0423 15:09:22.544192   12304 tail.go:382] open succeeded /Users/riclib/mt/a.log
I0423 15:09:22.549175   12304 tail.go:382] open succeeded /Users/riclib/mt/a.log
I0423 15:09:22.549205   12304 tail.go:429] Adding a file watch on "/Users/riclib/mt/a.log"
I0423 15:09:22.549214   12304 tail.go:106] Adding a watch on resolved path "/Users/riclib/mt/a.log"
I0423 15:09:22.549242   12304 tail.go:242] Read: 459 <nil>
I0423 15:09:22.549454   12304 tail.go:242] Read: 0 EOF
I0423 15:09:22.549468   12304 tail.go:441] EOF on first read
I0423 15:09:22.549475   12304 tail.go:460] Tailing /Users/riclib/mt/a.log
I0423 15:09:23.727615   12304 log_watcher.go:72] watcher event "/Users/riclib/mt/a.log": CHMOD
I0423 15:09:23.727684   12304 tail.go:472] Event type watcher.UpdateEvent{Pathname:"/Users/riclib/mt/a.log"}
I0423 15:09:23.727726   12304 tail.go:129] Resolved path for lookup "/Users/riclib/mt/a.log"
I0423 15:09:23.727737   12304 tail.go:190] handleLogUpdate /Users/riclib/mt/a.log
I0423 15:09:23.727756   12304 tail.go:242] Read: 0 EOF
I0423 15:09:23.727767   12304 tail.go:247] Suspected truncation.
I0423 15:09:23.727783   12304 tail.go:214] current seek position at 459
I0423 15:09:23.727808   12304 tail.go:224] File size is 459
I0423 15:09:23.727828   12304 tail.go:250] handletrunc with error 'no truncate appears to have occurred'

----> result
 riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 18

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

OK, not fixed :(

@jaqx0r jaqx0r reopened this Apr 23, 2018
@riclib
Copy link
Author

riclib commented Apr 23, 2018

bugs called riclib are like cockroaches, hard to get rid of @jaqx0r @jnovack 😉

@jaqx0r jaqx0r closed this as completed in 56d35b6 Apr 23, 2018
@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018

This time for sure!

@riclib
Copy link
Author

riclib commented Apr 23, 2018

Tested and fix works. Thanks @jaqx0r !

 riclib@libMac  ~/mt  clear
 riclib@libMac  ~/mt  echo `date` >> a.log
 riclib@libMac  ~/mt  echo `date` >> a.log
 riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 2
 riclib@libMac  ~/mt  echo `date` >> a.log
 riclib@libMac  ~/mt  curl http://localhost:3903/metrics
# TYPE line_count counter
# line_count defined at linecount.mtail:4:9-18
line_count{prog="linecount.mtail"} 3

@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 23, 2018 via email

jaqx0r added a commit that referenced this issue Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This is considered a bug and shall get fixed
Projects
None yet
Development

No branches or pull requests

3 participants