Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Estimate for upgrading GBH bulkrax #77

Closed
1 task
jillpe opened this issue Aug 21, 2023 · 5 comments
Closed
1 task

Spike: Estimate for upgrading GBH bulkrax #77

jillpe opened this issue Aug 21, 2023 · 5 comments
Assignees

Comments

@jillpe
Copy link

jillpe commented Aug 21, 2023

Summary

Time box for 1/2 day

Acceptance Criteria

  • Effort to upgrade bulkrax is estimated
@ShanaLMoore
Copy link

ShanaLMoore commented Aug 21, 2023

must be done in order to determine vakyrizing bulkrax work.

@ShanaLMoore ShanaLMoore self-assigned this Aug 21, 2023
@ShanaLMoore
Copy link

ShanaLMoore commented Aug 21, 2023

DEV NOTES

  1. dry-monads conflict: see wip bulkrax branch: hyrax-4-support with GBH wip pr
  • HYRAX 4.0: ~> 1.5
  • BULKRAX 5.3.0: ~> 1.4.0
  • would we have to wait until the app is full on rails 6?
    • DEPENDENCIES_NEXT=1 rails db:migrate:status don't show all of the expected bulkrax migrations. date stops at 2021. so a dry run through of testing csv import against the upgrade fails because bulkrax is looking for properties that don't exist. Where are all the other migrations? 🤔

image

vs latest migrations

image
  1. GBH already has it's own branch: gbh-patch
  2. BULKRAX OVERRIDES:
    - app/jobs/bulkrax/child_relationships_job_decorator.rb
# OVERRIDE bulkrax v.1.0.0 to add a limit to the job rescheduling
# while forming relationships to child works that were found

- app/jobs/bulkrax/delete_work_job.rb
- includes their custom AMS::AssetDestroyer
- app/jobs/bulkrax/import_work_job.rb

# OVERRIDE Bulkrax 1.0.2 to rescue errors (and to add queued indexing if App.rails_5_1?)
      # overridding the xml parser to remove the 'multiple' import_type option,
      # as this app currently does not support it

- app/factories/bulkrax/object_factory.rb

  • They don't have child/parent relationships set up. Will they need it?
    • If so, how could this work for their PbcoreXML and Manifest parsers?
  1. Bulkrax Release notes
  • we will need to go through release note changes. One that immediately comes to mind is Bulkrax added an ability that users need to consider in their applications.
image
  1. We'd only need to consider changes to their csv and xml importers. We don't need to worry about export functionality since the client isn't actively using it.
  2. Methods like #collection_field and #children_field no longer exists
  3. collection_field_mapping is no longer supported as of Hyrax 3. I see several references to it in gbh.
  4. GBH has custom header processing. all properties are prefixed with the a model. [model period property name] ie: Asset.id => id

My rough estimate is a 5 or 8 as there will be discovery work to see what breaks along the way, which will then require planning + implementation hrs to fix it. There may be some hurdles too.

5: "5 --> This task has several unknowns or will require code changes in multiple places in the app. You have an idea of direction but will need to do some discovery in order to fully plan and execute the solution."
8: "8 --> The requirements are well defined but will require a complex solution. Much of the work on this will be spent investigating and there are many unknowns, but the task still feels accomplishable."

ShanaLMoore added a commit to samvera/bulkrax that referenced this issue Aug 21, 2023
Hyrax 4.0.0 requires a dependency upgrade for
dry-monads. I could not upgrade GBH's bulkrax without
doing this change.

- Issue:
- scientist-softserv/ams#77
- Ref:
- https://github.com/samvera/hyrax/blob/cbe9278b919485f90a37630d3f3157ecef59cd7c/hyrax.gemspec#L48
@ShanaLMoore ShanaLMoore added the needs discussion has open questions or need for discussion label Aug 21, 2023
ShanaLMoore added a commit to WGBH-MLA/ams that referenced this issue Aug 21, 2023
upgrade bulkrax from v1.0.2 => v5.3.0

Issue:
- scientist-softserv#77
@ShanaLMoore
Copy link

image

@ShanaLMoore
Copy link

Q: Do we want to backport Kiah's branch into gbh without upgrading?

@ShanaLMoore
Copy link

consider looking at bulkrax import migration. rake task to copy all the migrations into the app.

@ShanaLMoore ShanaLMoore removed the needs discussion has open questions or need for discussion label Aug 24, 2023
ShanaLMoore added a commit to samvera/bulkrax that referenced this issue Apr 2, 2024
* create an object factory that supports Valkyrie

All code in this commit has been adapted from Surfliner:
https://github.com/surfliner/surfliner-mirror

* temp gem conflict workaround

* ⚙️ upgrade dry-monads dependency to ~> 1.5.0

Hyrax 4.0.0 requires a dependency upgrade for
dry-monads. I could not upgrade GBH's bulkrax without
doing this change.

- Issue:
- scientist-softserv/ams#77
- Ref:
- https://github.com/samvera/hyrax/blob/cbe9278b919485f90a37630d3f3157ecef59cd7c/hyrax.gemspec#L48

* 🧹 Add extra parameter for fill_in_blank_source_identifiers

gbh got an error that we were passing too many arguments
when setting the source_identifier in the bulkrax config.
ref:
- https://github.com/samvera-labs/bulkrax/wiki/Configuring-Bulkrax#source-identifier

* Revert ":broom: Add extra parameter for fill_in_blank_source_identifiers"

This reverts commit df96de6.

* 🧹 delegate create_parent_child_relationships from importer to parser

* allow ruby 3 syntax in migrations

* 🧹 change exists? to exist? to support Ruby 3.2

* 🚧 add support for Hyrax 5, valkyrie and ruby 3.2

* add temp workaround for blank title and creator

* ⚙️ Switch find methods with custom queries for Valkyrie

* hyrax 4 permission service does both valk and non-valk

* new bagit

* handle validation failure

* better failure detection for vaklyrie object

* fix validation message

* importer failure helpers

* improve multiple detection in matchers

* fix matcher on missing field

* rob cant remember that its include?

* Appeasing rubocop

* ♻️ Handle exist? and/or exists? for finding objects

See inline comments

* Add dry/monads require for specs

* I897 Bulkrax readiness for Hyku 6 and Hyrax 4 & 5 (#898)

* 🧹 relocates transactions from inititalizer file

Issue:
- #897

Co-Authored-By: LaRita Robinson <laritakr@users.noreply.github.com>

* 🧹 Add specs for container.rb, relocate files

Co-Authored-By: LaRita Robinson <laritakr@users.noreply.github.com>

* 🧹 normalize magic strings into constants for referencing later

Convert the create_with_bulk_behavior and update_with_bulk_behavior to a constant; that way we can reference it in IiifPrint and document the “magic” string.

Co-Authored-By: LaRita Robinson <laritakr@users.noreply.github.com>

* 🧹 correct camel case to constant notation for easier referencing

Co-Authored-By: LaRita Robinson <laritakr@users.noreply.github.com>

* 💄 rubocop fixes

Co-Authored-By: LaRita Robinson <laritakr@users.noreply.github.com>

* Update app/factories/bulkrax/valkyrie_object_factory.rb

* Update spec/bulkrax/transactions/container_spec.rb

* 🧹 Move container & steps

Match Hyrax convention by using bulkrax/transactions.

* restructure org to run specs locally

receiving error when trying to run the entire spec suite due to restructuring files but not moving the spec file.

* 🚧 WIP: Consolidate HasMatchers with HasMappingExt

Remove HasMappingExt and consolidate logic within HasMatchers. HasMatchers should handle both cases, when objects are ActiveFedora vs Valkyrie.

* 🧹 Fix Specs & add Valkyrie Specs

* 🧹 Fix Rubocop complaint

* 🧹 Address Valkyrie's determination of multiple?

* 🧹 Address permitted attributes

In Valkyrie, we use the schema to identify the permitted attributes. All
allowed attributes should be on the schema, so no additional attributes
should be required.

Also add a fallback for permitted attributes in case an ActiveFedora
model class goes through the ValkyrieObjectFactory. This supports the
case where we want to always force a Valkyrie resource to be created,
regardless of the model name given.

* 🧹 Update TODO comment

Adjust TODO message because referring to a handler that doesn't exist
anywhere is confusing. We may need to register steps for file sets once
the behavior is implemented.

---------

Co-authored-by: LaRita Robinson <laritakr@users.noreply.github.com>
Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>
Co-authored-by: LaRita Robinson <larita@scientist.com>

* 📚 Adding documentation for configuration (#896)

This builds on a [question asked in Slack][1]

[1]: https://samvera.slack.com/archives/C03S9FS60KW/p1705681632335919

* ♻️ Extract Bulkrax::FactoryClassFinder (#900)

This refactor introduces consolidating logic for determining an entry's
factory_class.

The goal is to begin to allow for us to have a CSV record that says
"model = Work" and to use a "WorkResource".

Note, there are downstream implementations that overwrite
`factory_class` and we'll need to consider how we approach that.

* 🐛 [i134] - Fix missing translations

Missing translations were evaluating to false.

Issue:
- scientist-softserv/hykuup_knapsack#134

* Renaming method for parity

* ♻️ Favor Bulkrax's persistence layer

Instead of direct calls to a deprecated service favor a persistence
layer call; one that defines an interface.

Note this means we need to implement the methods in the Valkyrie
adapter; but those should be trivial.

* ♻️ Favor Bulkrax.persistence_adapter over ActiveFedora::Base

* Moving methods to adapter pattern

* use find_by_source_identifier instead of find_by_bulkrax_identifier (#907)

* i903 - move bulkrax identifier custom queries into bulkrax

move bulkrax identifier custom queries into bulkrax

Issue:
- scientist-softserv/hykuup_knapsack#136

* make find_by_source_identifier dynamic

Import a csv with child works. The forming of relationships is not working. Part of the problem is the find_by_bulkrax_identifier call.

From GBH, this used to be find_by_bulkrax_identifier which not all clients will configure as their source identifier. Instead we need to ask for the source identifier and use that for the sql query. This commit goes along with a PR from Hyku which currently has the find_by_source_identifier.rb files defined.

Issue:
- scientist-softserv/hykuup_knapsack#128

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

* remove files: they live in Hyku for now

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

* 🧹 Place custom queries back in Bulkrax

* 🧹 remove misleading comment

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

* 🧹 Entry is a required argument when initializing ObjectFactory

Fix for broken specs

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

* revert changes to pass Entry arg

The object factory already has work_identifier: parser.work_identifier. we don't need the entry argument after all.

ref:
- https://github.com/samvera/bulkrax/blob/main/app/models/concerns/bulkrax/import_behavior.rb#L181

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

---------

Co-authored-by: Kirk Wang <k3wang@gmail.com>
Co-authored-by: Kirk Wang <kirk.wang@scientist.com>

* 🧹 Make CreateRelationshipJob work for Valkyrie (#908)

* 🧹 Make the relationships job work for Valkyrie

This will add a relationships path for Valkyrie objects.  It also will
add a transactions call so set child flag will fire off in IIIF Print.

ref:
  - scientist-softserv/hykuup_knapsack#141

* 💄 rubocop fix

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

* ♻️ Adjust rescue logic to move closer to error

This also adds some consideration for refactoring the queries to instead
use the persistence layer.

* Adding notes about transactions

---------

Co-authored-by: Shana Moore <shana@scientist.com>
Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>

* Add todo comment

Co-Authored-By: Kirk Wang <k3wang@gmail.com>

* 🎁 Switch transaction to listener

This commit will switch the membership transaction to a listener.

* ♻️ Migrate persistence layer methods to object factory (#911)

* ♻️ Migrate persistence layer methods to object factory

In review of the code and in brief discussion with @orangewolf, the
methods of the persistence layer could be added to the object factory.

We already were configuring the corresponding object factory for each
implementation of Bulkrax; so leveraging that configuration made
tremendous sense.

The methods on the persistence layer remain helpful (perhaps necessary)
for documented reasons in the `Bulkrax::ObjectFactoryInterface` module.

See:

- #895 and its discussion

* 🎁 Add Valkyrie object factory interface methods

* 🧹 Favor interface based exception

Given that we are not directly exposing ActiveFedora nor Hyrax nor
Valkyrie objects, we want to translate/transform exceptions into a
common exception based on an interface.

That way downstream implementers can catch the Bulkrax specific error
and not need to do things such as `if
defined?(ActiveFedora::RecordInvalid) rescue
ActiveFedora::RecordInvalid`

It's just funny looking.

* 🧹 Get exporters to work

This commit contains various changes to get the
exporters to work correctly.

* make updates work

* 🧹 Make DeleteJob work wth new class method .find (#912)

* 🧹 Make DeleteJob work wth new class method .find

The DeleteJob previously was not working with the
old factory#find method because when it is doing a
delete action, the parsed_metadata does not get
generated like during a regular import.  Because
of this, the #search_by_identifier method fails to
find anything because we don't have a
`work_identifier` field which would have came from
the parsed_metadata.  So instead, we are using the
new class method .find which will take an id
(which we find on the raw_metadata) to find the
object.  We make sure to reindex and publish the
action to any relevant listeners.

* 🎁 Implement a #delete method for the ObjectFactory

This commit will add a delete method to the ObjectFactory and the
ValkyrieObjectFactory so we can avoid unnecessary conditionals.

* 🧹 Rework factories to implement delete method

This cuts down on the method chaining.

* ♻️ Remove constant

This creates hard to parse chatter, and is not needed as we were relying
on it for IIIF Print to be able to reference.

* ♻️ Reworking structure

The Hyrax transactions create a lot of pre-amble and post-amble for
performing the save.  This commit attempts to consolidate logic to
reduce redundancy of that boilerplate.

Further, it adds handling for creating collections.

We still need to handle form validation.

* Adding index to schema

* ♻️ Favor asking about model_name over class (#934)

Given our effort at lazy migration in Bulkrax we want to do a bit more
sniffing regarding the objects.  This is not quite adequate for the
general case of Collections but it is an improvement.

Ideally we should be interrogating the class and asking
`klass.collection?` but there are some confounding edge cases around
routing that we are in this pickle.

```ruby
irb(main):002:0> CollectionResource.model_name
=>
 @collection="collections",
 @element="collection",
 @Human="Collection",
 @i18n_key=:collection,
 @klass=CollectionResource,
 @name="CollectionResource",
 @param_key="collection",
 @plural="collections",
 @route_key="collections",
 @Singular="collection",
 @singular_route_key="collection">
irb(main):003:0> Collection.model_name
=>
 @collection="collections",
 @element="collection",
 @Human="Collection",
 @i18n_key=:collection,
 @klass=Collection,
 @name="Collection",
 @param_key="collection",
 @plural="collections",
 @route_key="collections",
 @Singular="collection",
 @singular_route_key="collection">
irb(main):004:0>
```

* Favor object factory for find

* ♻️ Fix return value of transaction create

* Correct Hyrax.object_factory -> Bulkrax.object_factory

* Download cloud files later (#930)

* 🎁 Reschedule ImporterJob if downloads aren't done

This commit will add a check in the `ImporterJob` to see if the cloud
files finished downloading.  If they haven't, the job will be
rescheduled until they are.

* 🎁 Download Cloud Files later

This commit will bring in changes from `5.3.1-british_library` to move
the download of cloud files to a background job.

---------

Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>

* ♻️ Favor configuration over hard-coding and reaching assumptions

The main "flip" of logic is that we can remove the `curation_concern?`
method because we can instead ask "if Collection || FileSet" and infer
when that is false that we have a work.

This means removing the very reaching assumption of Hyku and it's
implementation foibles for work types.

* ♻️ Extract Bulkrax.collection_class_method

Instead of relying on the hard-coding, allow for configuration.

Co-authored-by: Shana Moore <shana.lavina.moore@gmail.com>

* ♻️ Favor Bulkrax.collection_model_class

Co-authored-by: Shana Moore <shana@scientist.com>

* ♻️ Favor Bulkrax.object_factory.find

Instead of relying on the direct call to a constant.

Co-authored-by: Shana Moore <shana@scientist.com>

* ♻️ Extract Bulkrax.object_factory.save! method for

We have a place where we try to call save! directly.  We do need to pass
a user for save event; hence the added method.

* ♻️ Favor using object_factory for save!

Co-authored-by: Shana Moore <shana@scientist.com>

* ♻️ Extract Hyrax.object_factory.search_by_property

There is a duplication of this logic elsewhere, but I first wanted to
extract common logic then begin extracting full replacement and
conforming object interface for Valkyrie.

* ♻️ Extract method for Valkyrization

We cannot directly query the class.  But must instead favor the
object_factory.

* 🎁 Adding query for find_by_model_and_property_value

* ♻️ Remove custom Valkyrie search_by_identifer

The super method was refined to use the class object factory; making it
redundant and flexible in the same manner as
`Bulkrax::ObjectFactory#search_by_identifer`.

* ♻️ Favor internal_resource definitions (when available)

* ♻️ Extract internal_resources method for curation concerns

* ♻️ Favor Bulkrax.object_factory and add fault tolerance

* Addressing TODO and minor refactoring

* I161 import collection resources (#933)

* 🚧 WIP: Import Collection Resource

A user should be able to import a collection resource. In this commit, we are able to successfully import and create collection resources. From the console we can see the collection formed relatioships with works, but the frontend's count and display shows 0 relationships. Additionally, we are unable to re run the importer without receiving errors on the collection entry.

TODO: specs, refactor,

Issue:
- scientist-softserv/hykuup_knapsack#161

* remove unused code

* refactor #conditionally_destroy_existing_files

This refactor was necessary because even though klass == ImageResource, which inherits from Valkyrie::Resouce through it's chain, klass === Valkyrie::Resource was returning false.

* exclude CollectionResource class from #destroy_existing_files

* WIP - try to import filesets with valkyrie resources

* Revert "WIP - try to import filesets with valkyrie resources"

This reverts commit 4ab31b6.

* 💄 rubocop fix

* i162 - import valkyrie works with filesets (#936)

* Revert "WIP - try to import filesets with valkyrie resources"

This reverts commit 4ab31b6.

* WIP

* WIP - try to import filesets with valkyrie resources

* 🚧 WIP: get filesets to import via bulkrax x valkyrie

* 🎉 WIP: filesets to imports via bulkrax x valkyrie

There's still a lot to clean up here, but the import is successful in this commit.

* 💄 rubocop fixes

* uncomment #get_s3_files call and add collections to configuration

* Update object_factory.rb

* ♻️ Move method and remove single instance definition

I'm unclear why we were defining methods on the conf instance;
especially given that these exist on the configuration.

With this refactor, we're favoring using the Configuration object as the
container.

* Revert changes due to refactor coming in from main

* address errors post big refactor

* Refactoring for consistent method signatures

Also avoiding setting an unused instance variable

* 🐛 remove passing user to work_resource add_file_sets and save merge to strategies

Importing a CSV of valkyrie works, collections, files and relationships is working at this point 🎉

* 🎁 Adding a new transaction step to handle different association

* ♻️ Extract update_index method to object factory

* ♻️ Extract object factory method

* ♻️ Extract add_resource_to_collection method

* ♻️ XIT out the mockery and stubbery of a spec

* ♻️ Extract method publish and add_child_to_parent_work

* ♻️ Rename method as it's not conditional

Yes, it is conditional but it operates on arrays that could be empty.

* Remove add to collection step

* 🐛 Fix publish parameter mismatch

* Removing custom transaction container.  We weren't using it

* Favor keyword args instead of hashes

* 💄fixing typo

* 🎁 Add update_collection to valkyrie object factory

* 💄 endless and ever appeasing of the coppers

---------

Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>

---------

Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>

* ♻️ Extract logic for add_user_to_collection_permissions

* 📚 Tidying documentation

* ♻️ Refactor Object Factories to leverage more inheritance

* ♻️ Extract abstract class for ObjectFactory

In constructing object inheritance, a more robust strategy is to create
an abstract class and then have classes directly extend that abstract
class.

This helps define and narrow an interface.

* ♻️ Move method to interface

This is used in both ObjectFactory and ValkyrieObjectFactory

* ♻️ Organizing code for Valkyrie Object Factory

* Refactoring method names for sorting order

* ♻️ Handle Valkyrie::Resource situation

* ♻️ Puzzling through implementation details

* ♻️ Extract method to enable removal of conditionals

* ♻️ Extract FileFactory::InnerWorkings

The goal of this extraction is to minimize the exposed interface of what
is quite complicated and state dependent logic.

* ♻️ Refactor to extract local variable

* Adding class attribute for Bulkrax::FileFactory

* ♻️ Adding inner methods for file factory interaction

* 🐛🏳️ post Big refactor fixes

Refactoring caused some bugs. At this point we are able to successfully import CSV works again.

* fix typo

* 🧹 Add case for `'collectionresource'`

In Valkyrie Hyku we're using CollectionResource and this was not being
recognized by the CSV parser.

* reload the object before calling persisted? on it

resolves failure saying that errors is undefined. object.persisted? returned false even though we could see that they got created in the UI.

* 💄 rubocop fix

* 🐛 Add return in ObjectFactory if valkyrie

Adding this early return here so we don't go down to the the #where and
trigger a NoMethodError. What it seems like it's doing is checking
Postgres for the object but if it doesn't find it then tries in Fedora,
however, Valkyrie object don't respond to #where so it throws an error.

* save parent object to establish relationships

This fixes the reason why works weren't forming relationships with other works

* Add FileSet branch to coercer conditional

This is in prep to handle Hyrax::FileSets being imports as rows.

* Add commit to clarify casecmp in CsvParser

* 🎁 Add ability to use tar.gz files

This commit will allow users to use tar.gz files as well as zip files
for importing.

* 🐛 Changing guard to #respond_to?(:where)

A spec was failing with the previous way we were checking.

* 🎁 Change glyphicon to font awesome

Hyrax 4+ applications use font awesome and not glyphicon. This commit
will convert all glyphicon to font awesome.

* Add require ruby-progressbar (#942)

Update bulkrax_tasks.rake

Fixes #941

* 🐛 Ensure we include visibility and other keywords for collection

Related to:

- scientist-softserv/hykuup_knapsack#182

Co-authored-by: LaRita Robinson <laritakr@users.noreply.github.com>

* 🐛 Fix visibility check on the object

This commit will add a guard for visibility because it is not on a
valkyrie resource.

* 🐛 Save provided visibility from CSV

CSV provided visibility was being clobbered in the ImportCollectionJob.

Refs scientist-softserv/hykuup_knapsack#182

* ♻️ Extract methods for better composition

* ♻️ Extracting object factory methods

I want to avoid having conditionals regarding object factories.  This
violates the polymorphism and means that other implementors that choose
a different `Bulkrax.object_factory` will have unintended consequences.

* 💄 endless and ever appeasing of the coppers

* ♻️ Favor object factory over hard-coded

* Amend the see/refer documentation for parser

* 💄 endless and ever appeasing of the coppers

* Updating test schema

* Remove transactions from initialization

* ♻️ Remove explicit calls to AdminSet

* 📚 Adding TODO items

---------

Co-authored-by: Benjamin Kiah Stroud <32469930+bkiahstroud@users.noreply.github.com>
Co-authored-by: Rob Kaufman <rob@notch8.com>
Co-authored-by: Kirk Wang <kirk.wang@scientist.com>
Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>
Co-authored-by: LaRita Robinson <laritakr@users.noreply.github.com>
Co-authored-by: LaRita Robinson <larita@scientist.com>
Co-authored-by: Kirk Wang <k3wang@gmail.com>
Co-authored-by: Dan Kerchner <kerchner@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants