Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valkyrie: use storage adapters for derivatives #5626

Merged
merged 25 commits into from
Oct 24, 2022
Merged

valkyrie: use storage adapters for derivatives #5626

merged 25 commits into from
Oct 24, 2022

Conversation

dunn
Copy link
Contributor

@dunn dunn commented May 12, 2022

Currently when derivatives are generated for uploaded files, Hyrax tries to save them directly to the filesystem, and subsequent access also assumes they will exist in a conventional local location:

# Path on file system where derivative file is stored

When an application is using a different Valkyrie storage adapter, such as valkyrie-shrine for S3, these assumptions fail, and derivative storage doesn't work. This PR teaches Hyrax to use the defined derivatives_storage_adapter instead of directly accessing the filesystem.

  • Valkyrie-based upload tooling is abstracted into ValkyrieUpload in order to be used for original and derivative file upload
  • a new controller pattern is used to create routes for derivatives; these are used instead of the ThumbnailIndexer for FileSets (future work should replace all uses of ThumbnailIndexer in Valkyrie contexts)

@samvera/hyrax-code-reviewers

@dunn dunn requested review from no-reply and tpendragon May 12, 2022 17:39
@dunn dunn force-pushed the valkyrie-upload branch 5 times, most recently from c1475c9 to 22bce4e Compare May 19, 2022 19:39
@dunn dunn marked this pull request as ready for review May 19, 2022 19:46
@dunn
Copy link
Contributor Author

dunn commented May 19, 2022

not sure why we're seeing these errors again; the storage_adapter#upload is definitely happening first: https://app.circleci.com/pipelines/github/samvera/hyrax/7673/workflows/3d3461d7-7092-4999-9d7a-943421b3f1c1/jobs/61138

@dunn
Copy link
Contributor Author

dunn commented May 19, 2022

it might be that #upload is silently failing beforehand?

From: /app/samvera/hyrax-engine/lib/wings/valkyrie/persister.rb:41 Wings::Valkyrie::Persister#save:

    26: def save(resource:)
    27:   af_object = resource_factory.from_resource(resource: resource)
    28:
    29:   check_lock_tokens(af_object: af_object, resource: resource)
    30:
    31:   # the #save! api differs between ActiveFedora::Base and ActiveFedora::File objects,
    32:   # if we get a falsey response, we expect we have a File that has failed to save due
    33:   # to empty content
    34:   af_object.save! ||
    35:     raise(FailedSaveError.new("#{af_object.class}#save! returned non-true. It might be missing required content.", obj: af_object))
    36:
    37:   resource_factory.to_resource(object: af_object)
    38: rescue ActiveFedora::RecordInvalid, RuntimeError => err
    39:   require 'pry'; binding.pry
    40:   raise MissingOrUnsavedFileError.new(err.message, obj: af_object) if
 => 41:     err.message == 'Save the file first'
    42:
    43:   raise FailedSaveError.new(err.message, obj: af_object)
    44: end

[6] pry(#<Wings::Valkyrie::Persister>)> resource.file_identifier
=> #<Valkyrie::ID:0x00007fc931752ef0 @id="fedora://fcrepo:8080/rest/test/4c/14/17/d3/4c1417d3-8d3c-459b-9927-22c9906add91/files/b1af2514-f8d9-43ac-8213-f7745a1275e8">
[7] pry(#<Wings::Valkyrie::Persister>)> resource.file_identifier.id
=> "fedora://fcrepo:8080/rest/test/4c/14/17/d3/4c1417d3-8d3c-459b-9927-22c9906add91/files/b1af2514-f8d9-43ac-8213-f7745a1275e8"
[8] pry(#<Wings::Valkyrie::Persister>)> Hyrax.query_service.find_by(id: resource.file_identifier.id)
Ldp::BadRequest: Path contains empty element! /test/fe/do/ra/:/fedora://fcrepo:8080/rest/test/4c/14/17/d3/4c1417d3-8d3c-459b-9927-22c9906add91/files/b1af2514-f8d9-43ac-8213-f7745a1275e8
from /usr/local/bundle/gems/ldp-1.0.3/lib/ldp/client/methods.rb:118:in `block in check_for_errors'
Caused by RuntimeError: Save the file first
from /usr/local/bundle/gems/active-fedora-13.2.7/lib/active_fedora/with_metadata/metadata_node.rb:49:in `save'

@dunn
Copy link
Contributor Author

dunn commented May 31, 2022

back to this error:

Wings::Valkyrie::Persister::MissingOrUnsavedFileError:
  Failed to save object {obj}.
  Wings tried to save metadata for a file which has not been saved. Fedora creates a metadata node when the file is created, so it's not possible to add metadata for a file until the file contents are persisted.
    Use the Hyrax.storage_adapter to save the file before trying to save metadata.
  Save the file first
./lib/wings/valkyrie/persister.rb:39:in `rescue in save'
./lib/wings/valkyrie/persister.rb:26:in `save'
./app/services/hyrax/valkyrie_upload.rb:51:in `upload'
./app/services/hyrax/valkyrie_persist_derivatives.rb:33:in `call'
./spec/services/hyrax/valkyrie_persist_derivatives_spec.rb:18:in `block (4 levels) in <top (required)>'

https://app.circleci.com/pipelines/github/samvera/hyrax/7742/workflows/3d3d6833-fa41-449a-90bc-383be21406b2/jobs/61662

@dunn
Copy link
Contributor Author

dunn commented May 31, 2022

this looks to be where the File ID gets mangled by AF:

From: /usr/local/bundle/gems/active-fedora-13.2.7/lib/active_fedora/ldp_resource_service.rb:30 ActiveFedora::LdpResourceService#to_uri:

    29: def to_uri(klass, id)
 => 30:   klass.id_to_uri(id)
    31: end

[10] pry(#<ActiveFedora::LdpResourceService>)> id
=> "disk:///app/samvera/hyrax-webapp/derivatives/cb/f3/a3/cbf3a38cacd74715bd86816ad7a89636/3-thumbnail.jpeg"
[11] pry(#<ActiveFedora::LdpResourceService>)> klass.id_to_uri(id)
=> "http://fcrepo:8080/rest/test/di/sk/:/disk:///app/samvera/hyrax-webapp/derivatives/cb/f3/a3/cbf3a38cacd74715bd86816ad7a89636/3-thumbnail.jpeg"

Or more specifically by Noid::Rails.treeify

@dunn dunn force-pushed the valkyrie-upload branch 4 times, most recently from e151a22 to 4a02cd6 Compare June 17, 2022 21:51
.dassie/Gemfile Outdated Show resolved Hide resolved
@dunn
Copy link
Contributor Author

dunn commented Jul 5, 2022

one error left

Failure/Error:
  Wings::ActiveFedoraConverter::FileMetadataNode(resource.class)
                              .new(file_identifier: Array(resource.file_identifier)
    .map(&:to_s))

ArgumentError:
  You attempted to set the property `has_model' of  to a scalar value. However, this property is declared as being multivalued.
./vendor/bundle/ruby/2.7.0/gems/active-fedora-13.2.7/lib/active_fedora/attributes/property_builder.rb:20:in `has_model='
./vendor/bundle/ruby/2.7.0/gems/active-fedora-13.2.7/lib/active_fedora/core.rb:111:in `assert_content_model'
./vendor/bundle/ruby/2.7.0/gems/active-fedora-13.2.7/lib/active_fedora/core.rb:38:in `initialize'
./lib/wings/active_fedora_converter/instance_builder.rb:34:in `new'
./lib/wings/active_fedora_converter/instance_builder.rb:34:in `build'
./lib/wings/active_fedora_converter.rb:105:in `instance'
./lib/wings/active_fedora_converter.rb:69:in `convert'
./lib/wings/valkyrie/resource_factory.rb:41:in `from_resource'
./lib/wings/valkyrie/persister.rb:27:in `save'

@dunn
Copy link
Contributor Author

dunn commented Jul 6, 2022

now hang on a second

From: /usr/local/bundle/gems/active-fedora-13.2.7/lib/active_fedora/core.rb:113 ActiveFedora::Core#assert_content_model:

    110: def assert_content_model
    111:   require 'pry'; binding.pry
    112:
 => 113:   self.has_model = self.class.to_rdf_representation
    114: end

[1] pry(#<Wings(Wings(Hyrax::FileMetadata))>)> self.class.to_rdf_representation
=> "Wings(Wings(Hyrax::FileMetadata))"

@dlpierce dlpierce added the notes-valkyrie Release Notes: Valkyrie specific label Aug 12, 2022
@dlpierce
Copy link
Contributor

Rebased on main and got specs passing and files ingesting.

@dlpierce dlpierce added this to Ready for Review in Hyrax-Valkyrization Sep 16, 2022
dunn and others added 23 commits October 3, 2022 16:36
it's not handled correctly, and sometimes passed directly to the underlying
process (e.g., shrine)
don't try to infer the id of the file metadata node. these resources are handled
differently by different adapters, just making one with a file_identifer set
doesn't do the trick.

see, e.g., the usage here:
https://github.com/samvera/hyrax/blob/main/spec/models/hyrax/file_metadata_spec.rb#L27-L43
all but one of the arguments to `ValkyrieUpload` are arguments to a specific
action (an upload), but the `storage_adapter` parameter is more stable and
cumbersome to pass per upload.

in the default case, this doesn't much matter, but when using one of the
alternative adapters (e.g. for derivatives), we can reduce the dependencies for
collaborator objects by allowing them to use a preconfigured instance of the
upload service. e.g:

```ruby
derivatives_uploader.upload(filename: 'test.txt', file_set: my_fs, io: StringIO.new('test'))
```

instead of:

```ruby
derivatives_adapter = Hyrax.config.derivatives_storage_adapter
Hyrax::ValkyrieUpload.file(filename: 'test.txt', file_set: my_fs, io: StringIO.new('test'))
```
Previously the final group of characters was not included in the extracted id.
Attempting to resave a converted resource resulted in an error due to the attribute setter now being defined by ActiveFedora instead of ActiveTriples.
Add mock ids to satisfy services that need ids.
Include mock_thumbnail in file_ids
thumbnail_path should include the fileset id
@dunn
Copy link
Contributor Author

dunn commented Oct 21, 2022

@dlpierce your additions all look good!

@dlpierce dlpierce merged commit a0f8455 into main Oct 24, 2022
@dlpierce dlpierce deleted the valkyrie-upload branch October 24, 2022 19:38
@dunn dunn moved this from Ready for Review to DONE in Hyrax-Valkyrization Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes-valkyrie Release Notes: Valkyrie specific
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants