Skip to content

Commit

Permalink
Derivative attach queue (#83)
Browse files Browse the repository at this point in the history
* PluggableDerivativeService will not invoke (plugin for) creation if that
plugin's destination name already has an existing derivative file.

* DerivativeAttachment model fields, tests implemented, including
validation.

* WorkDerivatives attach (and consequently commit!) methods log attachment
to RDBMS, via NewspaperWorks::DerivativeAttachment.

* PluggableDerivativeService refactored to filter vs. logged attachments
of pre-existing derivatives, in a more testable way.

* `WorkFiles` `.derivatives` method constructs same WorkDerivatives, only
once.

* WorkFiles delegates construction of derivatives adapter to WorkFile.

* Parent/child relationships for WorkFile <>-- WorkDerivatives, with
ability to follow .parent reference/relationship from WorkDerivatives
object.

* IngestFileRelation model, with tests.

* WorkFile adapter can be constructed with nil fileset, the use of which
is as a mere sentinel value, planned for use by WorkDerivatives.

* DerivativeAttachment model permits nil fileset reference, and save
methods of WorkDerivatives handle nil fileset with exception.

* Make WorkDerivatives logging to DerivativeAttachment happen during
assignment, not attachment, and keep the entries in the table in sync
with transient values for assigned queue.

* WorkDerivatives logs relationship between primary and derivative file
path, esp. for cases of no fileset (deferred attachment use case).

* Remove strict equivalence check from test for WorkFiles.derivatives

* WorkDerivatives.commit_queued! method and integration test.

* Install generator injects `:after_create_fileset` override of Hyrax
default subscriber into consuming application.

    This injected subscriber block then calls
    `NewspaperWorks::Data.handle_after_create_fileset` method,
    which handles deferred derivative attachment.

* On deferred attachment of derivatives, ensure that fileset id is set on
DerivativeAttachment table/model during callback, as insurance that the
derivative creation plugins will not overwrite.
  • Loading branch information
seanupton authored and ebenenglish committed Jan 11, 2019
1 parent c8b8ff0 commit 42c3db7
Show file tree
Hide file tree
Showing 17 changed files with 463 additions and 25 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,19 @@ _More here soon!_
be edited to make uploads optional for (all) work types, by setting
`config.work_requires_files = false`.

* NewspaperWorks overrides Hyrax's default `:after_create_fileset` event
handler, in order to attach pre-existing derivatives in some ingest
use cases. The file attachment adapters for NewspaperWorks use this
callback to allow programmatic assignment of pre-existing derivative
files before the primary file's file set has been created for a new
work. The callback ensures that derivative files are attached,
stored using Hyrax file/path naming conventions, once the file set
has been created. Because the Hyrax callback registry only allows single
subscribers to any event, application developers who overwrite
this handler, or integrate other gems that do likewise, must take care
to create a custom composition that ensures all work and queued jobs
desired run after this object lifecycle event.

## Development and Testing with Vagrant
* clone samvera-vagrant

Expand Down
8 changes: 8 additions & 0 deletions app/models/newspaper_works/derivative_attachment.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
module NewspaperWorks
class DerivativeAttachment < ApplicationRecord
# We can store nil/optional fileset as interim value before fileset
# construction, but we require at minimum, path, destination_name
validates :path, presence: true
validates :destination_name, presence: true
end
end
14 changes: 14 additions & 0 deletions app/models/newspaper_works/ingest_file_relation.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
module NewspaperWorks
class IngestFileRelation < ApplicationRecord
validates :file_path, presence: true
validates :derivative_path, presence: true

# Query by file path for all derivatives, as de-duplicated array of
# derivative paths.
# @param path [String] Path to primary file
# @return [Array<String>] de-duplicated array of derivative paths.
def self.derivatives_for_file(path)
where(file_path: path).pluck(:derivative_path).uniq
end
end
end
48 changes: 46 additions & 2 deletions app/services/newspaper_works/pluggable_derivative_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,17 +54,61 @@ def respond_to_missing?(method_name)
self.class.allowed_methods.include?(method_name) || super
end

# get derivative services relevant to method name and file_set context
# -- omits plugins if particular destination exists or will soon.
def services(method_name)
result = plugins.map { |plugin| plugin.new(file_set) }.select(&:valid?)
result.select do |plugin|
dest = nil
dest = plugin.class.target_ext if plugin.class.respond_to?(:target_ext)
!skip_destination?(method_name, dest)
end
end

def method_missing(name, *args, **opts, &block)
if respond_to_missing?(name)
# we have an allowed method, construct services and include all valid
# services for the file_set
services = plugins.map { |plugin| plugin.new(file_set) }.select(&:valid?)
# services = plugins.map { |plugin| plugin.new(file_set) }.select(&:valid?)
# run all valid services, in order:
services.each do |plugin|
services(name).each do |plugin|
plugin.send(name, *args)
end
else
super
end
end

private

def skip_destination?(method_name, destination_name)
return false if file_set.id.nil? || destination_name.nil?
return false unless method_name == :create_derivatives
# skip :create_derivatives if existing --> do not re-create
existing_derivative?(destination_name) ||
impending_derivative?(destination_name)
end

def existing_derivative?(name)
path = derivative_path_factory.derivative_path_for_reference(
file_set,
name
)
File.exist?(path)
end

# is there an impending attachment from ingest logged to db?
# -- avoids stomping over pre-made derivative
# for which an attachment is still in-progress.
def impending_derivative?(name)
result = NewspaperWorks::DerivativeAttachment.find_by(
fileset_id: file_set.id,
destination_name: name
)
!result.nil?
end

def derivative_path_factory
Hyrax::DerivativePath
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
class CreateNewspaperWorksDerivativeAttachments < ActiveRecord::Migration[5.0]
def change
create_table :newspaper_works_derivative_attachments do |t|
t.string :fileset_id
t.string :path
t.string :destination_name

t.timestamps
end
add_index :newspaper_works_derivative_attachments, :fileset_id
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
class CreateNewspaperWorksIngestFileRelations < ActiveRecord::Migration[5.0]
def change
create_table :newspaper_works_ingest_file_relations do |t|
t.string :file_path
t.string :derivative_path

t.timestamps
end
add_index :newspaper_works_ingest_file_relations, :file_path
end
end
15 changes: 13 additions & 2 deletions lib/generators/newspaper_works/install_generator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@
module NewspaperWorks
# Install Generator Class
class InstallGenerator < Rails::Generators::Base
source_root File.expand_path('../templates', __FILE__)
source_root File.expand_path('templates', __FILE__)

def copy_migrations
rake "newspaper_works:install:migrations"
end

# rubocop:disable Metrics/MethodLength
def register_worktypes
inject_into_file 'config/initializers/hyrax.rb',
after: "Hyrax.config do |config|\n" do
Expand All @@ -14,9 +19,15 @@ def register_worktypes
" config.register_curation_concern :newspaper_issue\n" \
" config.register_curation_concern :newspaper_page\n" \
" config.register_curation_concern :newspaper_title\n" \
' # == END GENERATED newspaper_works CONFIG == '
"\n" \
" config.callback.set(:after_create_fileset) do |file_set, user|\n" \
" require 'newspaper_works'\n" \
" NewspaperWorks::Data.handle_after_create_fileset(file_set, user)\n" \
" end\n" \
" #== END GENERATED newspaper_works CONFIG ==\n\n"
end
end
# rubocop:enable Metrics/MethodLength

def inject_routes
inject_into_file 'config/routes.rb',
Expand Down
19 changes: 19 additions & 0 deletions lib/newspaper_works/data.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,24 @@ module NewspaperWorks
# Module for data access helper / adapter classes supporting, enhancing
# NewspaperWorks work models
module Data
# Handler for after_create_fileset, to be called by block subscribing to
# and overriding default Hyrax `:after_create_fileset` handler, via
# app integrating newspaper_works.
def self.handle_after_create_fileset(file_set, user)
handle_queued_derivative_attachments(file_set)
# Hyrax queues this job by default, and since newspaper_works
# overrides the single subscriber Hyrax uses to do so, we
# must call this here:
FileSetAttachedEventJob.perform_later(file_set, user)
end

def self.handle_queued_derivative_attachments(file_set)
return if file_set.import_url.nil?
work = file_set.member_of.select(&:work?)[0]
derivatives = NewspaperWorks::Data::WorkDerivatives.of(work)
# For now, becuase this is IO-bound operation, it makes sense to have
# this not be a job, but run inline:
derivatives.commit_queued!(file_set)
end
end
end
99 changes: 95 additions & 4 deletions lib/newspaper_works/data/work_derivatives.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ class WorkDerivatives
# @return [FileSet] fileset for work, with regard to these derivatives
attr_accessor :fileset

# Parent pointer to WorkFile object representing fileset
# @return [NewspaperWorks::Data::WorkFile] WorkFile for fileset, work pair
attr_accessor :parent

# Assigned attachment queue (of paths)
# @return [Array<String>] list of paths queued for attachment
attr_accessor :assigned
Expand All @@ -37,12 +41,12 @@ class << self
end

# alternate constructor spelling:
def self.of(work, fileset = nil)
new(work, fileset)
def self.of(work, fileset = nil, parent = nil)
new(work, fileset, parent)
end

# Adapt work and either specific or first fileset
def initialize(work, fileset = nil)
def initialize(work, fileset = nil, parent = nil)
# adapted context usually work, may be string id of FileSet
@work = work
@fileset = fileset.nil? ? first_fileset : fileset
Expand All @@ -52,6 +56,8 @@ def initialize(work, fileset = nil)
@assigned = []
# un-assignments for deletion
@unassigned = []
# parent is NewspaperWorks::Data::WorkFile object for derivatives
@parent = parent
end

# Assignment state
Expand All @@ -69,6 +75,9 @@ def assign(path)
path = normalize_path(path)
validate_path(path)
@assigned.push(path)
# We are keeping assignment both in ephemeral, transient @assigned
# and mirroring to db to share context with other components:
log_assignment(path, path_destination_name(path))
end

# Assign a destination name to unassigned queue for deletion -- OR --
Expand All @@ -77,7 +86,10 @@ def assign(path)
# or source path
def unassign(name)
# if name is queued path, remove from @assigned queue:
@assigned.delete(name) if @assigned.include?(name)
if @assigned.include?(name)
@assigned.delete(name)
unlog_assignment(name, path_destination_name(name))
end
# if name is known destination name, remove
@unassigned.push(name) if exist?(name)
end
Expand All @@ -94,10 +106,35 @@ def commit!
@unassigned = []
end

# Given a fileset meeting both of the following conditions:
# 1. a non-nil import_url value;
# 2. is attached to a work (persisted in Fedora, if not yet in Solr)...
# ...this method gets associated derivative paths queued and attach all.
# @param file_set [FileSet] saved file set, attached to work,
# with identifier, and a non-nil import_url
def commit_queued!(file_set)
raise ArgumentError('No FileSet import_url') if file_set.import_url.nil?
import_path = file_url_to_path(file_set.import_url)
work = file_set.member_of.select(&:work?)[0]
raise ArgumentError('Work not found for fileset') if work.nil?
derivatives = WorkDerivatives.of(work, file_set)
IngestFileRelation.derivatives_for_file(import_path).each do |path|
next unless File.exist?(path)
attachment_record = DerivativeAttachment.where(path: path).first
derivatives.attach(path, attachment_record.destination_name)
# update previously nil fileset id
attachment_record.fileset_id = file_set.id
attachment_record.save!
end
@fileset ||= file_set
load_paths
end

# attach a single derivative file to work
# @param file [String, IO] path to file or IO object
# @param name [String] destination name, usually file extension
def attach(file, name)
raise RuntimeError('Cannot save for nil fileset') if fileset.nil?
mkdir_pairtree
path = path_factory.derivative_path_for_reference(fileset, name)
# if file argument is path, copy file
Expand All @@ -119,6 +156,7 @@ def attach(file, name)
# Delete a derivative file from work, by destination name
# @param name [String] destination name, usually file extension
def delete(name, force: nil)
raise RuntimeError('Cannot save for nil fileset') if fileset.nil?
path = path_factory.derivative_path_for_reference(fileset, name)
# will remove file, if it exists; won't remove pairtree, even
# if it becomes empty, as that is excess scope.
Expand Down Expand Up @@ -177,6 +215,59 @@ def data(name)

private

def primary_file_path
if fileset.nil?
# if there is a nil fileset, we look for *intent* in the form
# of the first assigned file path for single-file work.
work_file = parent
return if work_file.nil?
work_files = work_file.parent
return if work_files.nil?
work_files.assigned[0]
else
file_url_to_path(fileset.import_url) unless fileset.import_url.nil?
end
end

def file_url_to_path(url)
url.gsub('file://', '')
end

def log_primary_file_relation(path)
file_path = primary_file_path
return if file_path.nil?
NewspaperWorks::IngestFileRelation.create!(
file_path: file_path,
derivative_path: path
)
end

def log_assignment(path, name)
NewspaperWorks::DerivativeAttachment.create!(
fileset_id: fileset_id,
path: path,
destination_name: name
)
log_primary_file_relation(path)
end

def unlog_assignment(path, name)
if fileset_id.nil?
NewspaperWorks::DerivativeAttachment.where(
path: path,
destination_name: name
).destroy_all
else
NewspaperWorks::DerivativeAttachment.where(
fileset_id: fileset_id,
path: path,
destination_name: name
).destroy_all
end
# note: there is deliberately no attempt to "unlog" primary
# file relation, as leaving it should have no side-effect.
end

# Load all paths/names to @paths once, upon first access
def load_paths
fsid = fileset_id
Expand Down
Loading

0 comments on commit 42c3db7

Please sign in to comment.