-
Notifications
You must be signed in to change notification settings - Fork 0
HykuAddons: Import Mode and Bulkrax
This article is based around the following Draft Pull Request that was started by Tom Giles and transcribed from a conversation where Tom explained the work he had completed and issues that had arisen. This branch could be treated as a spike for someone else to take inspiration from and see how it was first attempted.
The principle was initially to reduce the load on the web nodes by moving allowing customer onboarding imports to be moved to separate worker queues. However, because of the way the Bulkrax Gem is structured and relies on perform_now in some circumstances, this didn't resolve many issues and lead to downtime.
Otherwise issues involve the customers needing to toggle the feature flipper on and back off again afterwards, which was frequently the cause of issues around lost jobs and failure to perform basic admin tasks.
By default 6 queues are creator for each account:
- default
- import
- export
<tenant>_import_default<tenant>_import_import<tenant>_import_export
HykuAddons contain an override called HykuAddons::ImportMode, which is injected into ActiveJob::Base inside the Engine. By default this checked if the Flipflop have been enabled and if so, switches the queue name. The main problem with this is that the user must manually enable and disable the process or it will add normal jobs to the wrong queue, then when it is switched back, they will not be executed.
The PR tries a different approach where by the number of entries is used in order to check whether it is necessary to toggle the queue names.
The HykuAddons engine.rb has a list of modules which are injected into different jobs. These modules make up four strategies to find what we are referring to as "portable" objects - Bulkrax uses "importexport":
- BulkraxEntryBehavior - Find
Bulkrax::Entryrecords from first argument - PortableActiveFedoraBehavior - Use
ActiveFedorato find the record source identifier and use that findBulkrax::Entryinstance - PortableBulkraxImporterBehavior - Defines
portable_objectas the source/subject of a job, i.e. works, collections, Bulkrax importer before they are discovered. - PortableGenericBehavior - Objects are passed directly and it either has a source identifier or an ID
Inside of import_mode_spec.rb a loop iterates over the strategies and sets up the app according to their requirements. The only expectations are that the queue name should be correct depending on the side of the total results returned.
As far as we currently know, the queue assignment is correct, however, jobs seem to be triggered with perform_now, which bypasses the workers queues completely - the job will run immediately within the Web process for that request.
To see the jobs runnings:
- Edit your sidekiq.yml file and add the required queues for your local tenant:
---
# ...
:queues:
- export
- import
- default
- <tenant>_import_import
- <tenant>_import_default-
You can then pass the file location to your sidekiq worker execution command in your local environment.
-
Inside of Dashboard Importers section, create an importer using "Ubiquity Repositories CSV" parser, and specify the following file: bulkrax-import.csv
-
Watch jobs created and added to the
/sidekiqendpoint interface.
When this is tried with perform_later you may find that Sidekiq cannot find the CSV file from its temp file location and will cause an error when trying to access it.
There are a number of places where perform_now is called including the follow. However, as stated above, sidekiq might not be able to access the CSV:
CollectionBehavior#call_collection_jobHykuAddons::CsvParser#perform_method
Some ideas of the next steps that could be undertaken to resolve some of the issues laid out in this article:
- We want to see it behave in the specs as it would in production, however this has not been possible yet as the specs pass, but the actions are not being seen replicated in the dashboard importer.
- We need to compare the logs to see which processes are picking up which jobs
- The Web container seems to run a lot of jobs when importer run, what are they?
- When the specs have a Bulkrax import with an entry total over the threshold, then has that the worker has actually performed the import?
- What happens when the Rails server is started in production mode?
Because of the current state of developer flux, making changes to a very complicated, highly customised and critically important section of the application might not be a prudent idea at this time. If changes were made to the way this feature works that ended up causing issues preventing new customers from being onboarded, that could have real financial issues for the company. A new developer with little or no experience might not be able to fix any issues and thus no customer imports could be made.
In order to test this feature in a production like environment, then a staging environment would need to be created, which was separate from the current feature branch to main flow. This could entail having a staging environment that allowed a specific branch or commit to be deployed and so didn't require any interaction with the main production-ready branch.