FilterTable is intended to help you author "plural" resources.
Plural resources examine platform objects in bulk. For example, sorting through which packages are installed on a system, or which virtual machines are on a cloud provider. You don't know the identifiers of the objects, but you may know some of their properties, and you may want to filter the objects based on those - for example, all processes running more than an hour, or all VMs on a particular subnet.
Singular resources, in contrast, are designed to examine a particular platform object in detail, when you know an identifier. For example, you would use a singular resource to fetch a VM by its ID, then interrogate its networking configuration. Singular resources are able to provide richer properties and matchers than plural resources, because the semantics are clearer.
If you can't tell if the resource you are authoring is singular or plural, STOP and consult with a team member. This is a fundamental design question, and while we have had some resources that "straddle the fence" in the past, they are very difficult to use and maintain.
Suppose you have a person, and you want to represent that person's shoes. Should you use FilterTable for that?
NO. FilterTable is intended to represent pluralities inherent to the resource itself, not a property of the resource. So, you would use FilterTable to represent people. To represent shoes, you could have a simple, dumb array-of-strings property on Person. Or, you could create a new resource, Shoe, or Shoes, which has a person_name or person_id property. Or expose a complex structure as a low-level property, and create mid-level properties/matchers that compute on the values internally (shoe_fit?
, has_shoes_for_occasion?('red_carpet')
)
In theory, yes - that would be used to implement different data fetching / caching strategies. It is a very advanced usage, and no core resources currently do this, as far as I know.
Suppose you are writing a resource, things
. You want it to behave like any plural resource (we'll explore what that means in a moment). That is the basic expected behavior of any plural resource.
require 'inspec/utils/filter'
class Thing < Inspec.resource(1)
#... other Resource DSL work goes here ...
# FilterTable setup
filter_table_config = FilterTable.create
filter_table_config.register_column(:thing_ids, field: :thing_id)
filter_table_config.register_column(:colors, field: :color, style: :simple)
filter_table_config.install_filter_methods_on_resource(self, :fetch_data)
def fetch_data
# This method should return an array of hashes - the raw data. We'll hard code it here.
[
{ thing_id: 1, color: :red },
{ thing_id: 2, color: :blue, tackiness: 'very' },
{ thing_id: 3, color: :red },
]
end
def some_other_property
# We'll examine this later
end
end
Note that all of the methods on filter_table_config
support chaining, so you will sometimes see it as:
filter_table_config = FilterTable.create
.register_column(:thing_ids, field: :thing_id)
.register_column(:colors, field: :color, style: :simple)
.install_filter_methods_on_resource(self, :fetch_data)
etc.
With a (fairly standard) implementation like that above, what behavior do you get out of the box?
In the past, you needed to request certain methods be installed. These are now installed automatically: where
, entries
, raw_data
, count
, and exist?
. You only have to declare your columns unique to your resource, and then attach the data fetcher.
Nothing special immediately happens to your class or instances of it. The data fetcher is not called yet.
When most of the following methods are called, it may trigger the instantiation of a FilterTable::Table anonymous subclass. That instance will have called the raw data fetcher, and will wrap the raw data inside it. Many of the following methods return the Table instance.
The resource class gains a method, where
. If called with a single nil
param or no params, it will call the data fetcher method, wrap it up, and return the Table instance. Calling where
in other modes will do the same thing, but will filter the data.
describe things.where(nil)
it { should exist }
its('count') { should cmp 3 }
end
# This works, too, but for different internal reasons
describe things.where
it { should exist }
its('count') { should cmp 3 }
end
If you call where
as a method with no block and passing hash params, with keys you know are in the raw data, it will fetch the raw data, then filter row-wise and return the resulting Table.
Multiple criteria are joined with a logical AND.
The filtering is fancy, not just straight equality.
describe things.where(color: :red) do
its('count') { should cmp 2 }
end
# Regexes
describe things.where(color: /^re/) do
its('count') { should cmp 2 }
end
# It eventually falls out to === comparison
# Here, range membership 1..2
describe things.where(thing_id: (1..2)) do
its('count') { should cmp 2 }
end
# Things that don't exist are silently ignored, but do not match
# See https://github.com/chef/inspec/issues/2943
describe things.where(none_such: :nope) do
its('count') { should cmp 0 }
end
# irregular rows are supported
# Only one row has the :tackiness key, with value 'very'.
describe things.where(tackiness: 'very') do
its('count') { should cmp 1 }
end
You can also call the where
method with a block. The block is executed row-wise. If it returns truthy, the row is included in the results. register_custom_propertyitionally, within the block each field declared with the register_custom_property
configuration method is available as a data accessor.
# You can have any logic you want in the block
describe things.where { true } do
its('count') { should cmp 3 }
end
# You can access any fields you declared using `register_column`
describe things.where { thing_id > 2 } do
its('count') { should cmp 1 }
end
The first time where
is called, the data fetcher method is called. where
performs filtration on the raw data table. It then constructs a new FilterTable::Table, directly passing in the filtered raw data; this is then the return value from where
.
# This only calls fetch_data once
describe things.where(color: :red).where { thing_id > 2 } do
its('count') { should cmp 1 }
end
Some other methods return a Table object, and they may be chained without a re-fetch as well.
The other register_filter_method
call enables a pre-defined method, entries
. entries
is much simpler than where
- in fact, its behavior is unrelated. It returns an encapsulated version of the raw data - a plain array, containing Structs as row-entries. Each struct has an attribute for each time you called register_column
.
Overall, in my opinion, entries
is less useful than params
(which returns the raw data). Wrapping in Structs does not seem to add much benefit.
Importantly, note that the return value of entries
is not the resource, nor the Table - in other words, you cannot chain it. However, you can call entries
on any Table.
If you call entries
without chaining it after where
, calling entries will trigger the call to the data fetching method.
# Access the entries array
describe things.entries do
# This is Array#count, not the resource's `count` method
its('count') { should cmp 3}
end
# Access the entries array after chaining off of where
describe things.where(color: :red).entries do
# This is Array#count, not the resource's or table's `count` method
its('count') { should cmp 2}
end
# You can access the struct elements as a method, as a hash keyed on symbol, or as a hash keyed on string
describe things.entries.first.color do
it { should cmp :red }
end
describe things.entries.first[:color] do
it { should cmp :red }
end
describe things.entries.first['color'] do
it { should cmp :red }
end
This register_custom_matcher
call:
filter_table_config.register_custom_matcher(:exist?) { |filter_table| !filter_table.entries.empty? }
causes a new method to be defined on both the resource class and the Table class. The body of the method is taken from the block that is provided. When the method it called, it will receive the FilterTable::Table instance as its first parameter. (It may also accept a second param, but that doesn't make sense for this method - see thing_ids).
As when you are implementing matchers on a singular resource, the only thing that distinguishes this as a matcher is the fact that it ends in ?
.
# Bare call on the matcher (called as a method on the resource)
describe things do
it { should exist }
end
# Chained on where (called as a method on the Table)
describe things.where(color: :red) do
it { should exist }
end
This register_custom_property
call:
filter_table_config.register_custom_property(:count) { |filter_table| filter_table.entries.count }
causes a new method to be defined on both the resource class and the Table class. As with exists?
, the body is taken from the block.
# Bare call on the property (called as a method on the resource)
describe things do
its('count') { should cmp 3 }
end
# Chained on where (called as a method on the Table)
describe things.where(color: :red) do
its('count') { should cmp 2 }
end
This register_column
call:
filter_table_config.register_column(:thing_ids, field: :thing_id)
will cause a method to be defined on both the resource and the Table. Note that this register_column
call does not provide a block; so FilterTable::Factory generates a method body. The :field
option specifies which column to access in the raw data (that is, which hash key in the array-of-hashes).
The implementation provided by Factory changes behavior based on calling pattern. If no params or block is provided, a simple array is returned, containing the column-wise values in the raw data.
# Use it to check for presence / absence of a member
# This retains nice output formatting - we're testing on a Table associated with a Things resource
describe things.where(color: :red) do
its('thing_ids') { should include 3 }
end
# Equivalent but with poor formatting - we're testing an anonymous array
describe things.where(color: :red).thing_ids do
it { should include 3 }
end
# Use as a test-less enumerator
things.where(color: :red).thing_ids.each do |thing_id|
# Do something with thing_id, maybe
# describe thing(thing_id) do ...
end
# Can be used without where - enumerates all Thing IDs with no filter
things.thing_ids.each do |thing_id|
# Do something with thing_id, maybe
# describe thing(thing_id) do ...
end
This method behaves just like thing_ids
, except that it returns the values of the color
column. In addition, the style: :simple
option causes it to flatten and uniq the array of values when called without args or a block.
# Three rows in the data: red, blue, red
describe things.colors do
its('count') { should cmp 2 }
it { should include :red }
it { should include :blue }
end
You also get this for thing_ids
. This is unrelated to style: :simple
for colors
.
People definitely use this in the wild. It reads badly to me; I think this is a legacy usage that we should consider deprecating. To me, this seems to imply that there is a sub-resource (here, colors) we are auditing. At least two core resources (xinetd_conf
and users
) advocate this as their primary use.
# Filter on colors
describe things.colors(:red) do
its('count') { should cmp 2 }
end
# Same, but doesn't imply we're now operating on some 'color' resource
describe things.where(color: :red) do
its('count') { should cmp 2 }
end
You also get this for thing_ids
. This is unrelated to style: :simple
for colors
.
I haven't seen this used in the wild, but its existence gives me a headache.
# Example A, B, C, and D are semantically the same
# A: Filter both on colors and the block
describe things.colors(:red) { thing_id < 2 } do
its('count') { should cmp 1 }
its('thing_ids') { should include 1 }
end
# B use one where block
describe things.where { color == :red && thing_id < 2 } do
its('count') { should cmp 1 }
its('thing_ids') { should include 1 }
end
# C use two where blocks
describe things.where { color == :red }.where { thing_id < 2 } do
its('count') { should cmp 1 }
its('thing_ids') { should include 1 }
end
# D use a where param and a where block
describe things.where(color: :red) { thing_id < 2 } do
its('count') { should cmp 1 }
its('thing_ids') { should include 1 }
end
# This has nothing to do with colors at all, and may be broken - the lack of an arg to `colors` may make it never match
describe things.colors { thing_id < 2 } do
its('count') { should cmp 1 }
end
People definitely use this out in the wild. Unlike entries
, which wraps each row in a Struct and omits undeclared fields, raw_data
simply returns the actual raw data array-of-hashes. It is not dup
'd.
tacky_things = things.where(color: :blue).raw_data.select { |row| row[:tackiness] }
tacky_things.map { |row| row[:thing_id] }.each do |thing_id|
# Use to audit a singular Thing
describe thing(thing_id) do
it { should_not be_paisley }
end
end
You could use this to do something fairly complicated.
describe things.where do # Just getting a Table object
its('resource_instance.some_method') { should cmp 'some_value' }
end
However, the resource instance won't know about the filtration, so I'm not sure what good this does. Chances are, someone is doing something horrid using this feature in the wild.
In some cases, the raw data may require multiple actions to populate. For example, if you wanted a list of processes, and their open files, you might need to call 'ps' once, then 'lsof' one or more times. That would become slow, and so you would only want to do it if you knew it was going to be used.
Lazy loaded columns are absent in the raw data, until they are accessed (either by method-where, block-where, or a list property). When they are accessed, a user-provided Lambda is called, which populates one or more columns. FilterTable remembers which lazy columns have been populated, and will not call the lambda again.
If you know you want to access the resource instance in your lazy method callback, see lazy_instance
.
You declare a field to be lazy by providing an option, lazy
, whose value is the lambda to be called.
You can use the 'stabby lambda' syntax:
filter_table_config.register_column(
:open_files,
field: :files,
lazy: ->() {|r,c,t| r[:files] = lookup_files_for_pid(r[:pid])},
)
You can also refer to a class method. You cannot use an instance method, because FilterTable binds to the resource class, not the resource instance.
def self.populate_lsof(row, criteria, table)
row[:files] = ...
end
filter_table_config.register_column(
:open_files,
field: :files,
lazy: method(:populate_lsof),
)
The lambda will be provided three arguments:
row
. This is a Hash, the current row of the raw_data. You will likely need to examine this to find an ID value or other field that will act as a search key for your fetch. You are expected to add one or more entries to this hash, as a result of your fetch.condition
. In some cases, a condition (desired value) is provided; the semantics of this are up to you.table
. A reference to the FilterTable. You can use this to access other context - including the entire raw data (table.raw_data
) or the resource instance (table.resource_instance
).
Lazy-loading will not clobber an existing value in raw data. For example:
# Your raw data table:
[
{ id: 1 },
{ id: 2, color: :blue },
{ id: 3 },
]
# On lazy load, set all rows to color red
filter_table_config.register_column(
:colors,
field: :color,
lazy: ->() { |r,c,t| r[:color] = :red },
)
# Trigger a fetch
my_resource.colors => [:red, :blue, :red]
# Raw data now:
[
{ id: 1, color: :red },
{ id: 2, color: :blue },
{ id: 3, color: :red },
]
Note that not only was the :color
blue not overwritten, in fact the fetcher lambda was only called twice.
Yes. If your fetching action provides you with data to populate multiple columns, you are free to set any columns you wish in the row
.
You can even have multiple lazy columns share an implementation; the first one to be called will populate all the columns that share that implementation, and if any of the others are later triggered, the no-clobber effect will kick in, and the fetcher will not be called again.
Yes. Using table.raw_data
, you could perform a column-at-once population. After the fetcher was called for the first row, all other rows would already be populated, so the fetcher would not be called again due to the no-clobber effect.
If you wish to do lazy loading but wish that you could use an instance method of the resource, you can do so using the lazy_instance
property to set the callback.
filter_table_config.register_column(
:colors,
field: :color,
lazy_instance: :make_it_red },
)
# instance, not class method
def make_it_red(row, condition, table)
row[:color] = :red
end
The method will be provided three arguments:
row
. This is a Hash, the current row of the raw_data. You will likely need to examine this to find an ID value or other field that will act as a search key for your fetch. You are expected to add one or more entries to this hash, as a result of your fetch.condition
. In some cases, a condition (desired value) is provided; the semantics of this are up to you.table
. A reference to the FilterTable. You can use this to access other context - including the entire raw data (table.raw_data
).
To me, calling things.thing_ids should always return the same type of value. But if you call it with args or a block, it not only runs a filter, but also changes its return type to Table.
# This is an Array of color values (symbols, here)
things.colors
# This is a FilterTable::Table and these are equivalent
things.colors(:red)
things.where(color: :red)
# This is a FilterTable::Table and these are equivalent
things.colors { color == :red } # I think there is a bug here which makes this never match
things.where(color: :red)
entries
will only know about the fields declared by register_column
with field:
. And...
Each time you call register_custom_property
, register_custom_matcher
or register_column
- even for things like count
and exists?
- that will add an attribute to the Struct that is used to represent a row. Those attributes will always be nil.
This is because the raw data fetcher is not called until as late as possible. That's good - it might be expensive - but it also means we can't scan it for columns. There are ways around that.
You can't use a column name in a where
block unless it was declared as a field using register_column
# This will give a NameError - :tackiness is in the raw
# data hash but not declared using `register_custom_property`.
describe things.where { tackiness == 'very' } do
its('count') { should cmp 1 }
end
# NameError: undefined local variable or method `tackiness' for #<struct :exists?=nil, count=nil, id=nil>
# But this works:
describe things.where(tackiness: 'very') do
its('count') { should cmp 1 }
end
You can't get to the resource or the table from there. (It's the Entry Struct type).
You can in fact get the FilterTable::Table instance by calling where
with no args. But that is not obvious.
Especially while developing in inspec shell, it would be nice to be able to get at the FilterTable::Factory object, perhaps to add more accessors.