Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI fix and doc #38

Merged
merged 5 commits into from Sep 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .yardopts
Expand Up @@ -4,6 +4,7 @@
--charset utf-8
--no-private
--markup markdown
--markup-provider redcarpet
-
LICENSE.txt
doc/file_registry_entry.md
2 changes: 1 addition & 1 deletion Gemfile.lock
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
kiba-extend (2.4.0)
kiba-extend (2.4.1)
activesupport (~> 6.1.4)
csv (~> 3.0)
dry-configurable (~> 0.11)
Expand Down
2 changes: 1 addition & 1 deletion Thorfile
Expand Up @@ -3,4 +3,4 @@
require 'bundler/setup'
require_relative 'lib/kiba/extend'

Dir["./lib/kiba/extend/tasks/**/*.thor"].sort.each { |f| load f }
Dir["./lib/tasks/**/*.thor"].sort.each { |f| load f }
103 changes: 103 additions & 0 deletions doc/cli.md
@@ -0,0 +1,103 @@
# Command line interface (CLI) for running jobs/tasks

`kiba-extend` uses [Thor](http://whatisthor.com/) to provide a command line interface for working with your ETL project.

I chose Thor over Rake because it is awkward to pass options/parameters in Rake, and because automated testing of Rake tasks is convoluted. ([ref](https://technology.doximity.com/articles/move-over-rake-thor-is-the-new-king))

## Help on the CLI

The following command will list all available tasks.

`thor -T`

This lets you search for only tasks beginning with "reg":

`thor list reg`

Some of the task descriptions may be truncated in the display, though. This also doesn't tell you what parameters/options you can pass in.

To get more details on a given task:

`thor --help TASKNAME`

For example: `thor --help reg:list` or `thor --help jobs:tagged`

### Conventions in the help

#### Plain parameters

When you see:

```
Usage:
thor jobs:tagged TAG
```

The all caps word is a placeholder for a parameter that gets passed in without an option flag. For example, the following returns a list of jobs tagged with "report":

`thor jobs:tagged report`

#### Boolean options

Boolean options are presented a bit oddly in the help. For example:

```
Options:
r, [--run], [--no-run] # Whether to run the matching jobs
```

Any of the following will work, according to your preference:

To find the jobs, list, and run them:

```
thor jobs:tagged report -r true
thor jobs:tagged cspace --run
thor jobs:tagged -r true cspace
thor jobs:tagged --run true cspace
thor jobs:tagged --run cspace
```

The find and list the jobs without running them:

```
thor jobs:tagged report -r false
thor jobs:tagged cspace --no-run
thor jobs:tagged -r false cspace
thor jobs:tagged --run false cspace
thor jobs:tagged --no-run cspace
```

However the following **does** run the jobs, so use one of the more straightforward options above:

```
thor jobs:tagged cspace --no-run true
```

#### Other options

```
Usage:
v, [--verbosity=VERBOSITY] # Only relevant if run=true. How much info to print to screen
# Default: normal
# Possible values: minimal, normal, verbose
```

In this case, replace the all-caps word with one of the possible values (if listed), or your uncontrolled string.

To use the full option name:

`thor jobs:tagged cspace --run --verbosity=verbose`

To use the alias:

`thor jobs:tagged cspace --run -v verbose`

## Architecture/design

Thor tasks are defined in `kiba-extend/lib/tasks`.

There is a Thorfile in the `kiba-extend` base directory that autoloads those tasks and runs the CLI when you type thor commands. (How this works is some kinda ruby/thor library magic I haven't dug into fully).

Your ETL project base directory (if following the repo template/FWM example), will also have a Thorfile in its base directory, which will call in all of `kiba-extend`'s tasks, as well as any you create in your own repo's `/lib/tasks` directory.

10 changes: 6 additions & 4 deletions doc/file_registry_entry.md
@@ -1,12 +1,14 @@
# File Registry Entry

## PATH_REQ
## Note: SourceDestRegistry

Constant registering the known source/destination classes and whether each requires a file path for read/write.
`Kiba::Extend::Registry::FileRegistryEntry` mixes in the `Kiba::Extend::Registry::SourceDestRegistry` module, which provides certain information about source and destination types necessary for validating them and preparing entries using them for use in jobs.

If you create or incorporate a new source/destination class, you will get a warning if you use it and do not register it here.
If you create/use a new source or destination type in your File Registry, it will need to be added to `SourceDestRegistryConstant` or you will get errors.

## reghash
(This is one of the signs I made a poor design choice around FileRegistryEntry modeling, which I now thing needs to be re-implemented using the Strategy pattern or something else. But here it is for now.)

## File Registry Data hashes in your ETL application

A file registry entry is initialized with a Hash of data about the file. This Hash will be sent from your ETL application.

Expand Down
Empty file removed docs/.keep
Empty file.
24 changes: 13 additions & 11 deletions lib/kiba/extend.rb
Expand Up @@ -12,7 +12,7 @@
require 'byebug'
require 'xxhash'

# require 'kiba/extend/version'
require 'kiba/extend/registry/file_registry'

# Default CSV options
CSVOPT = { headers: true, header_converters: :symbol }.freeze
Expand All @@ -36,39 +36,41 @@ module Extend
require rbfile.delete_prefix("#{File.expand_path(__dir__)}/lib/")
end

registry = Kiba::Extend::Registry::FileRegistry.new

# So we can call Kiba.job_segment
Kiba.extend(Kiba::Extend::Jobs::JobSegmenter)

# Default options for reading/writing CSVs
setting :csvopts, { headers: true, header_converters: %i[symbol downcase] }, reader: true
setting :csvopts, default: { headers: true, header_converters: %i[symbol downcase] }, reader: true

# Default settings for Lambda destination
setting :lambdaopts, { on_write: ->(r) { accumulator << r } }, reader: true
setting :lambdaopts, default: { on_write: ->(r) { accumulator << r } }, reader: true

# Default delimiter for splitting/joining values in multi-valued fields
setting :delim, ';', reader: true
setting :delim, default: ';', reader: true

# Default source class for jobs
setting :source, Kiba::Common::Sources::CSV, reader: true
setting :source, default: Kiba::Common::Sources::CSV, reader: true

# Default destination class for jobs
setting :destination, Kiba::Extend::Destinations::CSV, reader: true
setting :destination, default: Kiba::Extend::Destinations::CSV, reader: true

# Prefix for warnings from the ETL
setting :warning_label, 'KIBA WARNING', reader: true
setting :warning_label, default: 'KIBA WARNING', reader: true

setting :registry, Kiba::Extend::Registry::FileRegistry.new, reader: true
setting :registry, default: registry, reader: true

setting :job, reader: true do
# Whether to output results to STDOUT for debugging
setting :show_me, false, reader: true
setting :show_me, default: false, reader: true
# Whether to have computer say something when job is complete
setting :tell_me, false, reader: true
setting :tell_me, default: false, reader: true
# How much output about jobs to output to STDOUT
# :debug - tells you A LOT - helpful when developing pipelines and debugging
# :normal - reports what is running, from where, and the results
# :minimal - bare minimum
setting :verbosity, :normal, reader: true
setting :verbosity, default: :normal, reader: true
end

# strips, collapses multiple spaces, removes terminal commas, strips again
Expand Down
2 changes: 1 addition & 1 deletion lib/kiba/extend/version.rb
Expand Up @@ -2,6 +2,6 @@

module Kiba
module Extend
VERSION = '2.4.0'
VERSION = '2.4.1'
end
end
14 changes: 4 additions & 10 deletions lib/tasks/jobs.thor
@@ -1,13 +1,7 @@
require 'thor'
require_relative 'runnable'

class Jobs < Thor
# Tasks that list jobs and optionally run them
class Jobs < Runnable
class_option :run, required: false, type: :boolean, default: false, aliases: :r,
desc: 'Whether to run the matching jobs'
class_option :show, required: false, type: :boolean, default: false, aliases: :s,
desc: 'Only relevant if run=true. Whether to print job results to STDOUT'
class_option :tell, required: false, type: :boolean, default: false, aliases: :t,
desc: 'Only relevant if run=true. Whether to SAY job is complete. Useful for long running jobs'
class_option :verbosity, required: false, type: :string, default: 'normal', aliases: :v,
desc: 'Only relevant if run=true. How much info to print to screen',
enum: ['minimal', 'normal', 'verbose']
desc: 'Whether to run the matching jobs. Default: false'
end
13 changes: 10 additions & 3 deletions lib/tasks/jobs/tagged.thor
@@ -1,5 +1,11 @@
class Jobs < Thor
desc 'tagged TAG', 'List entries tagged with given tag'
class Jobs < Runnable
desc 'tagged TAG', 'List entries tagged with given tag and optionally run them'
long_desc <<~LONG
Lists entries tagged with given tag and optionally run them

NOTE that the show, tell, and verbosity options are only relevant if you indicate the jobs should be run.
LONG

def tagged(tag)
getter = Kiba::Extend::Registry::RegistryEntrySelector.new
result = getter.tagged_any(tag)
Expand All @@ -8,6 +14,7 @@ class Jobs < Thor
Kiba::Extend::Registry::RegistryList.new(result)

return unless options[:run]
result.each{ |res| puts "RUN #{res}" }

result.map(&:key).each{ |key| run_job(key) }
end
end
23 changes: 23 additions & 0 deletions lib/tasks/jobs/tagged_and.thor
@@ -0,0 +1,23 @@
class Jobs < Runnable
desc 'tagged_and', 'List entries tagged with given tags, ANDed together, and optionally run them '
long_desc <<~LONG
List entries tagged with given tags, ANDed together, and optionally run them

NOTE that the show, tell, and verbosity options are only relevant if you indicate the jobs should be run.
LONG

option :tags, required: true, type: :array, aliases: :t, banner: 'TAG1 TAG2',
desc: 'The tags for which to return entries'

def tagged_and
getter = Kiba::Extend::Registry::RegistryEntrySelector.new
result = getter.tagged_all(options[:tags])
return if result.empty?

Kiba::Extend::Registry::RegistryList.new(result)

return unless options[:run]

result.map(&:key).each{ |key| run_job(key) }
end
end
23 changes: 23 additions & 0 deletions lib/tasks/jobs/tagged_or.thor
@@ -0,0 +1,23 @@
class Jobs < Runnable
desc 'tagged_or', 'List entries tagged with given tags, ORed together, and optionally run them'
long_desc <<~LONG
List entries tagged with given tags, ORed together, and optionally run them

NOTE that the show, tell, and verbosity options are only relevant if you indicate the jobs should be run.
LONG

option :tags, required: true, type: :array, aliases: :t, banner: 'TAG1 TAG2',
desc: 'The tags for which to return entries'

def tagged_or
getter = Kiba::Extend::Registry::RegistryEntrySelector.new
result = getter.tagged_any(options[:tags])
return if result.empty?

Kiba::Extend::Registry::RegistryList.new(result)

return unless options[:run]

result.map(&:key).each{ |key| run_job(key) }
end
end
23 changes: 0 additions & 23 deletions lib/tasks/reg.thor
Expand Up @@ -7,29 +7,6 @@ class Reg < Thor
puts Kiba::Extend::Registry::RegistryList.new
end


desc 'tagged_and', 'List entries tagged with given tags, ANDed together'
option :tags, required: true, type: :array, aliases: :t, banner: 'reports warnings',
desc: 'The tags for which to return entries'
def tagged_and
getter = Kiba::Extend::Registry::RegistryEntrySelector.new
result = getter.tagged_all(options[:tags])
return if result.empty?

Kiba::Extend::Registry::RegistryList.new(result)
end

desc 'tagged_or', 'List entries tagged with given tags, ORed together'
option :tags, required: true, type: :array, aliases: :t, banner: 'reports warnings',
desc: 'The tags for which to return entries'
def tagged_or
getter = Kiba::Extend::Registry::RegistryEntrySelector.new
result = getter.tagged_any(options[:tags])
return if result.empty?

Kiba::Extend::Registry::RegistryList.new(result)
end

desc 'tags', 'List tags used in the registry'
def tags
tags = []
Expand Down
33 changes: 2 additions & 31 deletions lib/tasks/run.thor
@@ -1,34 +1,5 @@
require 'thor'
require_relative 'runnable'

class Run < Thor
class_option :show, required: false, type: :boolean, default: false, aliases: :s,
desc: 'Whether to print job results to STDOUT'
class_option :tell, required: false, type: :boolean, default: false, aliases: :t,
desc: 'Whether to SAY job is complete. Useful for long running jobs'
class_option :verbosity, required: false, type: :string, default: 'normal', aliases: :v,
desc: 'How much info to print to screen',
enum: ['minimal', 'normal', 'verbose']
class Run < Runnable

private

def preprocess_options
Kiba::Extend.config.job.show_me = options[:show]
Kiba::Extend.config.job.tell_me = options[:tell]
Kiba::Extend.config.job.verbosity = options[:verbosity].to_sym
end

def resolve_job(key)
Kiba::Extend.registry.resolve(key)
rescue Dry::Container::Error
puts "No job with key: #{key}"
:failure
end

def resolve_creator(job)
creator = job.creator
return creator if creator

puts "No creator method for #{job.key}"
:failure
end
end
12 changes: 3 additions & 9 deletions lib/tasks/run/job.thor
@@ -1,16 +1,10 @@
require 'thor'

class Run < Thor
class Run < Runnable
desc 'job KEY', 'runs the specified job'

def job(key)
preprocess_options

job = resolve_job(key)
exit if job == :failure

creator = resolve_creator(job)
exit if creator == :failure

creator.call
run_job(key)
end
end