Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I11 job split pdfs into child works #83

Merged
merged 6 commits into from
Feb 6, 2023

Conversation

laritakr
Copy link
Contributor

Ref #11

  • Create jobs to split pdfs into child works, using Hyrax's BatchCreateJob and CreateChildWork
  • Create job to add child works to parent work
  • Create services to split PDFs into TIFFs
  • Migration to create IiifPrintPendingRelationships to track child works and add to parent
  • Allow functionality to be added to work types by including configuration to the model: include IiifPrint.model_configuration(pdf_split_child_model: GenericWork)

Specs are still pending completion. Skeleton spec files have been added with TODO notes.

Ref issue #11

* Create jobs to split pdfs into child works, using Hyrax's BatchCreateJob and CreateChildWork
* Create job to add child works to parent work
* Create services to split PDFs into TIFFs
* Migration to create IiifPrintPendingRelationships to track child works and add to parent
* Allow functionality to be added to work models by including configuration
  `include IiifPrint.model_configuration(pdf_split_child_model: GenericWork)`

Specs are still pending completion at this point.
Create TODO specs or passing specs for all files related to jobs issue.
@laritakr laritakr mentioned this pull request Jan 25, 2023
4 tasks
@laritakr laritakr marked this pull request as ready for review January 26, 2023 19:25
@laritakr
Copy link
Contributor Author

laritakr commented Jan 26, 2023

To test via hyku, create a Generic Work with a PDF file. This can be done via the UI or via Bulkrax. (If via bulkrax, the file name must be in the same CSV row as the work itself).

Other work types are not configured to split PDFs, and should behave normally.

You should see:

  • parent work is created with the pdf in a fileset with the derivative thumbnail.
  • each page of the pdf is split as a TIFF into it's own Image child work. The title will match the title of the parent, with the PDF sequence and page numbers added.
  • when viewing the parent work show page, the PDF fileset will appear first, and all child works will follow in page number sequence.
  • if multiple PDFs are attached to the parent, sequence should still work appropriately as long as all PDFs are added at the same time. (see note re edit below)

Note:

  • editing and adding a second PDF to a work will currently result in duplicate file names because PDF sequencing when adding PDFs is not yet implemented.
  • relationships are created last. Until the relationships are completed, the new works will not appear on the parent work.

Screenshot 2023-01-27 at 1 33 05 PM

Screenshot 2023-01-27 at 1 36 27 PM

@@ -28,16 +30,20 @@ def hold_upload_paths(env)
return if upload_ids.empty?
uploads = Hyrax::UploadedFile.find(upload_ids)
paths = uploads.map(&method(:upload_path))
@pdf_paths = paths.select { |path| path.end_with?('.pdf') }
@pdf_paths = paths.select { |path| path.end_with?('.pdf', '.PDF') }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case there's something weird like a_file.Pdf consider using case insensitive regex

@pdf_paths = paths.select { |path| path.match? /\.pdf$/i }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly? I was thinking a TODO may be necessary to find a way to identify PDFs differently which is why I left it this way for now. Remote URLS can contain PDF files but don't always end in .pdf (i.e. google docs links).

Copy link
Contributor

@kirkkwang kirkkwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for answering the questions and awesome job on this feature!

@ShanaLMoore
Copy link
Contributor

blocked by #100

@ShanaLMoore ShanaLMoore self-assigned this Feb 2, 2023
@ShanaLMoore ShanaLMoore removed their assignment Feb 2, 2023
@jeremyf jeremyf merged commit f729884 into main Feb 6, 2023
@jeremyf jeremyf deleted the i11-job-split-pdfs-into-child-works branch February 6, 2023 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants