Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grok should ignore tilde backup files when processing patterns_dir #2271

Closed
mrec opened this issue Dec 19, 2014 · 6 comments
Closed

grok should ignore tilde backup files when processing patterns_dir #2271

mrec opened this issue Dec 19, 2014 · 6 comments
Assignees

Comments

@mrec
Copy link

mrec commented Dec 19, 2014

(This comes from the discussion of #2244)

When testing a config using grok and custom patterns, a user will often be editing pattern definition files in patterns_dir between run attempts. Many (most?) Linux-ey text editors create backup files, named as the original filename plus a ~ suffix, in the same location as the original; even though they aren't hidden these are often invisible by default in file browsers. When dealing with multiple pattern definition files, and especially when renaming them, it's possible to have a lot of these tilde files lying around after a while.

grok currently reads everything in patterns_dir, including any tilde backups. It quite reasonably doesn't define the order in which it reads them, and it doesn't warn if e.g. the definition of MYPATTERN in a stale patterns~ or previousfilename~ backup file overrides the definition of MYPATTERN in patterns. Hilarity ensues. Also hair-tearing, teeth-gnashing, bad language and various other undesirable outcomes.

I propose that grok should ignore any files in patterns_dir ending in a ~. There may be other things it'd be beneficial to blacklist too, but this seems like a good start.

@jordansissel
Copy link
Contributor

I like the idea, but I"m not sold on the proposed solution. What editor uses '~' suffix for backup files? Emacs? Vim uses .<Filename>.swp by default, although some users configure backup files to go to an entirely different directory and wouldn't be impacted by this problem.

Tested on my OSX 10.10 laptop:

  • vim with no configuration loaded: .some_file_name.swp
  • emacs (default config): when I ran it, added some weird .#<filename> symlink pointing at 'user@hostname.pid'
  • emacs: after quitting, there was a file named #<filename># in the current directory
  • atom (default config): no backup file or other state file was found in the current directory. I have an ancient version of Atom that I never use so I'm not sure if the behavior has changed in the past year.

I haven't tested any other editors, but this behavior seems hardly uniform :(

I'd be open to maybe having the default be more explicit in what filenames are accepted. Perhaps only files matching /[0-9A-Za-z_-]+/ would be ok? This would ignore any dot files and any weird local state files created by the editors I've tested so far.

If you use a different editor or an editor tested above with a different behavior, please update this ticket with what backup or local state files are created by your editor so we can try to generalize a pattern here.

@mrec
Copy link
Author

mrec commented Dec 20, 2014

@jordansissel - I was using gedit on Oracle Linux at the time, but a quick Google to see how widespread this practice was suggests that emacs, vi and nano all follow the tilde convention.

Edit: not sure about a whitelist; there's a danger it'll just end up frustrating people the other way when their pattern files get unexpectedly ignored. I can't imagine anyone choosing to tilde-suffix their pattern files, but I can definitely imagine people adding file extensions, which would fail your proposed test. Dotfiles proper sound like another good candidate for blacklisting, though.

@jordansissel
Copy link
Contributor

@mrec See my previous comment. Neither the "default" (as provided by either OSX or homebrew, I don't know) emacs nor the "default" (same) vim on OSX use tilde-suffix.

Additionally doing some additional exploration:

  • Ubuntu 12.04 default vim uses .<file>.swp not tilde.
  • Ubuntu 12.04 default emacs uses the same weird .#<file> symlink pointing to user@hostname.pid:timestamp. It will later eventually write a .#<file># as a backup it seems.
  • Ubuntu 12.04 default nano does not appear to use any local backup file

Of vim, emacs, and nano, I can't see any examples of this tilde convention.

@mrec
Copy link
Author

mrec commented Dec 20, 2014

@jordansissel I'm not disputing what you're seeing, but unless a lot of people out there are lying then that's not the whole story. Timing here is unfortunate since I'm not going to have access to any *NIX machines for the next two weeks now, but Google shows many many mentions of tildes, including from OSX and Ubuntu users, although they do fall off after 2011 so maybe some distros changed their default configs around then.

Original problem was seen in Oracle Linux 6.4, which was released in 2013.

@magnusbaeck
Copy link
Contributor

@jordansissel Emacs's .#filename and #filename# files aren't backup files but interlock and autosave files, respectively. Neither of which should be read by grok, obviously.

Emacs has had *~ backup files since at least the mid-90s and unless my sysadmin has re-enabled that feature it's still enabled by default in Ubuntu 12.04.

Ignoring .*, #*, and *~ should take care of all types of state files and backup files that I've heard about. Restricting the names to [0-9A-Za-z_-]+ would be unnecessarily strict and would prevent users of even most European countries (never mind most countries outside of Europe) to name files in their native language.

@jordansissel
Copy link
Contributor

For Logstash 1.5.0, we've moved all plugins to individual repositories, so I have moved this issue to logstash-plugins/logstash-filter-grok#33. Let's continue the discussion there! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants