Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move filewatch lib here, refactor and add new features. #171

Merged

Conversation

guyboertje
Copy link
Contributor

@guyboertje guyboertje commented Mar 16, 2018

NOTES for Reviewers:

There are a lot of code changes, mainly to split the monolithic procedures filewatch had before into separate classes.

BEFORE (filewatch 0.9.0):
watched_file.rb -> WatchedFile
  Responsibilities
    Holds lifecycle state
    Holds file info of physical file
    Holds last-known progress position

tail_base.rb & observing_tail.rb -> ObservingTail
  @sincedb -> 'sincedb records' hash, inode => position (int) 
  Responsibilities
    Reading and writing sincedb file
    Reacting to the block callback signals (e.g. create, unignore)
    Opening files
    Adding or removing sincedb hash entries depending on the callback signal
    Reading the opened file, calling the file input with content and updating the watched_file position and sincedb position
    
watch.rb -> Watch class
  Holds WatchedFile hash, path => WatchedFile instance 
  Responsibilities
    "Discovery"
      Discover new files in glob, create WatchedFile instance
      Discard if path is excluded or ignorable
      Add to @files.
    "Iteration" with block based callback to ObserveTail
      select and iterate over watched_files in closed state
        move it to watched state if necessary.
      select and iterate over watched_files in ignored state
        move it to watched state if necessary and signal 'unignore'
      if there is space in the active files window
        select some files and iterate over watched_files in watched state
          move to the active state
          signal 'create' or 'create_initial'
      else
        periodically warn that there are watched entries waiting to be activated.
      select and iterate over watched_files in active state
        if its not readable
          prepare to delete entry
          move to unwatched state
          signal 'delete'
        if its closable
          signal 'timeout'
          move to closed state
        if it has changed inode
          swap the inode
          signal 'delete'
          signal 'create'
          # probable bug, should be delete (old inode), swap, create perhaps
        if it has shrunk in size
          signal 'delete'
          signal 'create'
        if it has grown
          signal 'modify'
      actually delete entries

AFTER (filewatch in file input):
  bootstrap.rb
    Defines Constants and InodeStruct as well as Windows vs *nix specific adaptations.
    Requires dependencies

  watched_files_collection.rb -> WatchedFilesCollection
    Wraps the watched files hash, with extra methods.

  sincedb_collection.rb -> SincedbCollection
    Wraps the sincedb hash, with extra methods.
    Read and write physical sincedb file.
    Serialize and deserialize (v1, v2) records.
    Encapsulates logic for finding and associating discovered files with previously seen sincedb records.
  
  sincedb_value.rb -> SincedbValue
    Is the 'value' in the sincedb collection.
    Tracks position, last changed time and last known path.
    Holds the currently associated WatchedFile reference.
  
  settings.rb -> Settings
    The options passed from the user config + defaults
    Globally available under the OPTS constant
  
  discoverer.rb -> Discoverer
    All discover functionlity moved to this class.
    Holds references to watched_files and sincedb collections.
    Adds and associates discovered files to the collections.
  
  watch.rb -> Watch
    Provides mutex access to the discover, iterate and quit functions.
    Delegates discover to Discoverer
    Delegates processing of closed, ignored, watched and active watched_files collection value entries to the appropriate processor (Tail or Read).
  
  observing_base.rb -> ObservingBase
    Is a mixin module for ObservingTail and ObservingRead
    Provides the external entry points
      new(opts)
      quit
      watch_this(path)
  
  observing_tail.rb -> ObservingTail
    Builds TailHandlers::Processor.new
           TailHandlers::Dispatch.new
  
  observing_read.rb -> ObservingRead
    Builds ReadHandlers::Processor.new
           ReadHandlers::Dispatch.new
  
  tail_handlers/*
    Processor and Dispatch classes to adapt the WatchedFile handling for Tail.
    Handler classes (provide specific `handle_specifically(watched_file)` and `update_existing_specifically(watched_file, sincedb_value)`:
      Base (handlers super class)
      Create
      CreateInitial
      Delete
      Grow
      Shrink
      Timeout
      Unignore
  
  read_handlers/*
    Processor and Dispatch classes to adapt the WatchedFile handling for Read.
    Handler classes (provide specific `handle_specifically(watched_file)`:
      Base (handlers super class)
      ReadFile
      ReadZipFile

Rakefile Outdated
end

desc "Run full check with custom Logstash path"
task :custom_ls_check, :ls_dir do |task, args|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this is required? what I usually do is

LOGSTASH_SOURCE=1 bundle exec ...

not sure we need/want to add yet another custom rake task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this to run the specs against any downloaded LS as well as my LS dev repo.
I have this:

logstash-2.1.3
logstash-2.2.2
logstash-2.3.4
logstash-2.4.0
logstash-2.4.1
logstash-5.0.0
logstash-5.2.1
logstash-5.2.2
logstash-5.3.0
logstash-5.4.3
logstash-5.5.0
logstash-5.5.1
logstash-5.6.2
logstash-5.6.4
logstash-5.6.8-SNAPSHOT
logstash-6.0.0
logstash-6.1.0
logstash-6.1.1
logstash-6.1.2
logstash-6.2.1
logstash-6.2.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that looks super useful, my problem is that this is specific to your workflow and is not really documented or used anywhere else, or is it?

project.sourceCompatibility = 1.8

dependencies {
compileOnly group: 'org.jruby', name: 'jruby-complete', version: "9.1.13.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this double dependencies on jruby-complete ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure. If buildscript defines repositories and dependencies - does this mean that we can remove the root repositories and dependencies at lines 41 and 48?

Copy link
Contributor Author

@guyboertje guyboertje Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed buildscript - seems to work.

@colinsurprenant
Copy link
Contributor

About rake/gradle: I see we have very different rake/gradle tasks from other "hybrid" java plugins. I am wondering if we want to or should try to be consistent with that?

# all the required constants and files
# defined in one place
module FileWatch
HOST_OS_WINDOWS = (RbConfig::CONFIG['host_os'] =~ /mswin|mingw|cygwin/) != nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI in Logstash https://github.com/elastic/logstash/blob/5e49b5d9cbb0a66482555a2ae57c0e89379f3923/lib/bootstrap/environment.rb#L45-L47 this is how we test for the windows platform. Should we be consistent?

    def windows?
      ::Gem.win_platform?
    end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. It is a leftover from filewatch being standalone.

Will refactor this to use logstash/environment.


if HOST_OS_WINDOWS
require "winhelper"
FILEWATCH_INODE_METHOD = :win_inode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: like we don in LS for the windows? method instead of using a constant, maybe it would be cleaner to define methods likes win_inodes? and nix_inodes?

FILEWATCH_INODE_METHOD = :nix_inode
end

if defined?(JRUBY_VERSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case would that be undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

require_relative "read_handlers/base"

# TODO [guy] make this a config option, perhaps.
CurrentSerializerClass = SincedbRecordSerializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't support alternate serializers, I suggest we drop the CurrentSerializerClass and directly set the SincedbRecordSerializer into SincedbCollection constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we will need to serialize to a ES based Shared State store soon - effectively JSON.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but I am not sure this "empty" decoupling brings any actual of future value? for now it's just an "empty" abstraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it.

@input.empty?
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logstash core exposes BufferedTokenizer (logstash-core/lib/logstash/util/buftok.rb) do we also need it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will switch. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an addition attr_reader :input. I will have to reopen the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see... AFAICT, attr_reader :input is only used for the FileWatch::WatchedFile.reset_buffer method?

def reset_buffer
  @buffer.input.clear
end

But this could be replaced with the BufferedTokenizer.flush method. It will be slighly less efficient but this is not a code hotspot so it should not have any impact?

end

def add_path(path)
return if @watching.member?(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will likely fail if paths are not normalized, is that a consideration here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me rephrase that: this will not catch duplicate paths unless they are normalized, is that a consideration here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this again. The path is either a file or a glob, if the glob is functionally the same as one we have seen before but not the same string then it will be added but at discovery any already discovered files will not be added to the collection - so a small overhead.

I checked the filewatch repo. This has been so since Aug 30, 2011. I don't recall any issues with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that seems reasonable ...

# end
#
# If your temp directory is not on the same filesystem as the file you're
# trying to write, you can provide a different temporary directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a check for this? or we could create the tmp file in the same directory as the target file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Its copied verbatim from the rails code. We control the call to this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, upon closer inspection the implementation is not sync'ed with the comments. there is no usage of a temp directory and this implementation does not match the rails link above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the comment.

@@ -0,0 +1,68 @@
# encoding: utf-8
require_relative 'bootstrap' unless defined?(FileWatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO because I wanted the file input to not need to know about bootstrap.rb. So the entry point is either ObservingTail or ObservingRead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but can't the require order be done in a way that we don't need this kind of conditional require? in my experience these become hard to maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the unless.

include LogStash::Util::Loggable
include ObservingBase

def build_specific_processor(settings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, I think it would be useful to document somehow the interface for the ObservingXxx classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# observer here is the file input
dispatcher = TailHandlers::Dispatch.new(sincedb_collection, observer, @settings)
watch.subscribe(dispatcher)
sincedb_collection.write("subscribe complete - shutting down")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"subscribe complete - shutting down" is inconsistent with the ObservingRead class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make the same.

end
end end

require_relative "dispatch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a require at the end of the file is a bit weird. would it be better do a require "base" at the beginning of the dispatch.rb ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# If 'delete' is specified then the file will be deleted.
# If 'log' is specified then the full path of the file is logged to the file specified
# in the `file_completed_log_path` setting.
config :file_completed_action, :validate => ["delete", "log"], :default => "delete"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • could there be just a "do nothing" option? no delete, no log?
  • also, are both options mutually exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do nothing" - I suppose so. This was a last minute change - people have asked for this since forever. I was going to generate a 'special' event tagged with EOF having the path and force people to handle it in their output section - IIRC Jordan suggested to not do the event thing and give people an action to take.

Doing nothing is problematic though, because if a user is reading 10000 files, say, the user does not know when the file is done with. The discover glob order is indeterminately OS specific. Do you think we should sort the watched file collection before the iteration? The order is then known and we can doc that. Then again with striped reading, files will be done with in an order that is determined by its size with the smaller files finishing before the larger files regardless of the glob order or the collection sorted order.

"do both" - yeah sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to reorder files.
It would be even better to have an option to tell how sort is done : by file name, by last modified file (from the oldest to the newest).
I’d love the second option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fbaligand
Do you see yourself using striped reading?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guyboertje
I just looked at "striped reading", if I understand well, it lets to read files in parallel, with a chunk read mechanism.
IMHO, it doesn't let to read files in a particular order. That feature is particularly useful when start_position=beginning, and at Logstash startup, there are already numerous files. It can be important that files are read using a specific order : from the oldest to the newest, or alphabetically.

Copy link
Contributor Author

@guyboertje guyboertje Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fbaligand
Sorry, my question was a general one, not related to discovered file collection order.

I am happy to add collection ordering, with a new settings file_sort_by -> "date"|"path" and file_sort_direction -> "asc"|"desc".
Q: do these ^ settings names and values make sense? Are they useful in tail mode (or which defaults are better for existing users in tail mode, IMO path + asc)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great !

  • Maybe replace "date" value by "last_modified" would be more meaningful ?
  • Currently (so in tail mode), even if start_position=end, file input continues to discover new files every 15 seconds (by default) and read them from beginning. During these 15s, several files could arrive, that's why I find it relevant to apply order in tail mode.
  • IMO, I prefer default as date + asc, as it means that files are processed as they arrived in time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guyboertje
I just looked at FileBeat options for that (https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#scan-sort).
They name it scan.sort and scan.order.
I like the prefix scan_, I find it meaningfull.

# If no changes are detected in tracked files in the last N days their sincedb
# tracking record will expire and not be persisted.
# This option protects against the well known inode recycling problem. (add reference)
config :sincedb_clean_after, :validate => :number, :default => 14 # days
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: could the actual file cleanup/delete be triggered by the sincedb cleanup?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this new option !
it could potentially fix issue #173

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other feature to fix inode recycling problem would be to regularly remove entries in sincedb where inode file does not exist anymore. For example every discover_interval seconds.
I would love such a feature !

# See the option `max_open_files` for more info, but note in "read" mode, close_older is
# not required as files are closed at EOF
# The default set internally is very large (the maximum positive Ruby integer)
config :read_iterations, :validate => :number, :default => (2**(0.size * 8 - 2) - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand this option - maybe we should rephrase it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse the FIXNUM_MAX constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Constant.

Is the description confusing or the setting name? both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me both I guess, but if we have a good description then maybe the setting name will make more sense to me 😆

# Specify the size in bytes of each chunk read in each read iteration
# See `read_iterations` to see why and when to change this from the default.
# The default set internally is 32768 (32KB)
config :file_chunk_size, :validate => :number, :default => 32768
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that 32768 is defined twice (also in bootstrap.rb), should it be a contant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


# Join by ',' to make it easy for folks to know their own sincedb
# generated path (vs, say, inspecting the @path array)
@sincedb_path = File.join(sincedb_dir, ".sincedb_" + Digest::MD5.hexdigest(@path.join(",")))
@sincedb_path = sincedb_dir.join(".sincedb_" + Digest::MD5.hexdigest(@path.join(",")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice code duplication with the sincedb filename generation, we should DRY it up

end

if tail_mode?
require "filewatch/observing_tail"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need dynamic class loading here? I usually try to avoid that unless there is a good reason to. sometime, depending on test coverage it will hide errors in ont or the other file.

@colinsurprenant
Copy link
Contributor

Ok, just completed a first review pass.
Note that given the complexity & amount of code here (it is basically a completely new file input plugin) it will be hard to meticulously review all the code here. We will have to trust the tests/specs and also proceed with some integration tests and/or beta tests.

Have we considered creating a whole new plugin instead of updating the new one?

require "jruby_file_watch"
end

if HOST_OS_WINDOWS && defined?(JRUBY_VERSION)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: JRuby is guaranteed.

# or allow you to specify any delimiter token you so choose, which will then
# be used by String#split to tokenize the input data

attr_reader :delimiter, :input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also note (following our discussion on attr_reader :input below) that this copy of BufferedTokenizer also exposes delimiter but AFAICT this is not required our code? I will comment on where I believe this was required but I think it is actually a bug?

end

def buffer_delimiter_byte_size
@buffer.delimiter_byte_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT BufferedTokenizer does not have a delimiter_byte_size method. It looks like the original intent might have been @buffer.delimiter.bytesize ? hence the need to expose attr_reader :delimiter ?
In any case, can't this be replaced with:

def buffer_delimiter_byte_size
  @settings.delimiter_byte_size
end

or

def buffer_delimiter_byte_size
  @settings.delimiter.bytesize
end

so that attr_reader delimiter is not required and we can reuse the Logstash BufferedTokenizer ?

@guyboertje
Copy link
Contributor Author

CHANGES

Rakefile task :custom_ls_check. Removed

compileOnly group: 'org.jruby'. Removed

change def windows?. Done

define methods likes win_inodes? and nix_inodes? I decided to define two modules to mixin to the WatchedFile class depending on LogStash::Environment.windows? or not. Better?

defined?(JRUBY_VERSION). Fixed

CurrentSerializerClass = SincedbRecordSerializer. Removed CurrentSerializerClass, use SincedbRecordSerializer

use logstash core BufferedTokenizer. See below

also exposes delimiter. See below

replaced with the BufferedTokenizer.flush method? I opted for this solution, removed buftok file but not the test - making sure the LS version of BufferedTokenizer still does what we need it to do

return if @watching.member?(path). Answered inline, this code has been as such since Aug 30, 2011. I don't recall any issues with this

trying to write, you can provide a different temporary directory. should we add a check for this? Replied inline, IMO no action needed

require_relative 'bootstrap' unless defined?(FileWatch). Completely reorganised require order throughout

document somehow the interfaces contracts. Added documentation in the ObservingBase class - however, in preparing the documentation it became clear to me that the Processor - Dispatch separation no longer made sense, so I merged Dispatch into Processor with the necessary adjustments. Renamed ReadHandlers and TailHandlers to ReadMode and TailMode and moved the Handler classes into a Handlers module namespace to better reflect their intention without needing to add _handler to each file and class name

sincedb_collection.write("subscribe complete - shutting down"). Unified both arguments to this call

require_relative "dispatch" - require at the end of the file is a bit weird. Completely reorganised require order throughout

WatchedFile buffer_delimiter_byte_size unused. Removed

could there be just a "do nothing" option? no delete, no log? Answered inline. Do nothing is undesirable IMO

also, are both options mutually exclusive? Refactored to do one, other or both with a third way option log_and_delete QUESTION: Should we do any file exist checks when writing? How will a user effectively rotate the file?

read_iterations - I dont understand this option. Renamed to file_chunk_count and reworded comments - better?

I see that 32768 is defined twice (also in bootstrap.rb), should it be a contant? Refactored to make both constants - needed require "filewatch/bootstrap" v early

sincedb filename generation, we should DRY it up. Refactored to handle sincedb_path determinism in one pass. Added a test to verify that a legacy sincedb file was renamed

do we need dynamic class loading here? Refactored to load both module trees but only use one

ADDITIONS

Added sorting of watched_file_collection as per Fabian's request. Added sort specs.

run_thread_proc.call
sleep 0.25
sleep 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 both

expect(subject.extract("hello\r\nworld\r\n")).to eq ["hello", "world"]
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@colinsurprenant
Copy link
Contributor

@guyboertje did another review pass! good stuff.
one observation worth mentioning: we seem to have an inconsistency problem WRT env vars... I think we need to verify this against https://github.com/elastic/logstash/blob/57e7a8a56bdbf81647d6e116e75263796d6daf47/bin/logstash.lib.sh#L44-L48

@fbaligand
Copy link

@guyboertje
As requested, I just opened an issue to add "clean_removed" option : #176

sleep 0.25
first_lsof = lsof_proc.call
expect(first_lsof.scan(file_path).size).to eq(1)
wait(1).for{lsof_proc.call.scan(file_path).size}.to eq(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1 @@
1.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! I wonder how we manage jar versions in other plugins?! this seems like a reasonable strategy. the other strategy would be to no have any versions number on the jar since it is not published outside the plugin?

@colinsurprenant
Copy link
Contributor

I left a few more comments but overall I am LGTM at this stage!

  • Maybe a closer look at the ensure block potential issue
  • I suggest we do a beta testing cycle since this is basically an almost complete rewrite?

@colinsurprenant
Copy link
Contributor

Ah also, for the jar versioning - we should open a separate thread/issue to define a consistent strategy for jar(s) versioning across plugins.

@guyboertje
Copy link
Contributor Author

guyboertje commented Apr 23, 2018

For my part, I like the jar to have a version, more so if the Java code is where the main functionality lies. Mainly for my own peace of mind, in seeing which version is inside the gem (I had to yank dissect 1.1.3 because it had the jar from 1.1.1 in it.)

I suppose, for all hybrid plugins, we need to make sure that no jars get committed and pushed to Github, i.e. a gem publish will always rebuild the latest jar and include it in the gem.

@guyboertje
Copy link
Contributor Author

About the GZIP ensure block.
This is shamelessly copied from the S3 input.
I will move the code to a private method and wrap the method call in a rescue.

begin
closeable.close
rescue Exception # IOException can be thrown by any of the Java classes that implement the Closable interface.
# ignore this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I'd suggest adding a log.warn here with something like "ignoring close excetion blablaba" so that if something else breaks somewhere else, we can maybe see a relation with a close exception if it happened.


private

def close_java_closeable(closeable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe close_and_ignore or something like that? because this method does 2 things, it closes but also explicitly ignores exceptions.

begin
closeable.close
rescue Exception # IOException can be thrown by any of the Java classes that implement the Closable interface.
# ignore this
logger.warn("Ignoring an IOException when closing an instance of #{closeable.class.name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: add the exception detail too?

end
sincedb_collection.unset_watched_file(watched_file)
end

private

def close_java_closeable(closeable)
def close_and_ignore_ioexception(closeable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

rescue Exception # IOException can be thrown by any of the Java classes that implement the Closable interface.
logger.warn("Ignoring an IOException when closing an instance of #{closeable.class.name}")
rescue Exception => e # IOException can be thrown by any of the Java classes that implement the Closable interface.
logger.warn("Ignoring an IOException when closing an instance of #{closeable.class.name}", "exception" => e.class.name, "message" => e.message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe our "normal" exception logging pattern is more like this no?

logger.warn("some problem happened", :exception => e, :backtrace => e.backtrace)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit wary of passing an object as the value in a logging statement because it simply calls to_s on the object.
I will change this case to :exception => e

We don't need the backtrace in this case because we know exactly where it is happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@colinsurprenant
Copy link
Contributor

while looking at the exception logging, throughout the plugin I think we should followup with a cleanup issue to have a more uniform way of logging using the following pattern:

logger.warn("some problem happened", :exception => e, :backtrace => e.backtrace)
logger.info("some info", :attribute1 => foo.value , :attribute2 => bar.value)
...

@colinsurprenant
Copy link
Contributor

LGTM²

@fbaligand
Copy link

Great !

@guyboertje
Copy link
Contributor Author

Now for the doc changes.

@guyboertje
Copy link
Contributor Author

I am going to merge this and deal with the doc changes in a new PR.

@guyboertje guyboertje merged commit c23d776 into logstash-plugins:master Apr 24, 2018
@guyboertje guyboertje deleted the feature/import-filewatch branch April 24, 2018 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants