Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make CreateIssuePageJob more generic #11

Closed
2 tasks
jillpe opened this issue Dec 5, 2022 · 12 comments
Closed
2 tasks

make CreateIssuePageJob more generic #11

jillpe opened this issue Dec 5, 2022 · 12 comments
Assignees

Comments

@jillpe
Copy link

jillpe commented Dec 5, 2022

Summary

CreateIssuePageJob should be more generic.

class method that should take an argument of what type of class it should split it into. ie #split_pdf_into("Book")

Currently the newspaper gem doesn't give you a choice. All created children are type NewspaperPage.

ref: https://github.com/samvera-labs/newspaper_works/blob/bbe8736b0cf622d9c2577faf7a6f0a4e8dd1452c/lib/newspaper_works/ingest/newspaper_issue_ingest.rb#L41

Acceptance Criteria

  • CreateIssuePageJob is more generic
  • #split_pdf_into takes an argument of a work type
@kirkkwang kirkkwang changed the title make CreateissuePageJob more generic make CreateIssuePageJob more generic Dec 8, 2022
@ShanaLMoore
Copy link
Contributor

renamed to CreatePagesJob per LaRita. Currently the functionality of this job is all commented out.

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Dec 12, 2022

maybe reference and compare it to ConvertPdftoJpg job in LV.

@laritakr laritakr self-assigned this Dec 12, 2022
@laritakr
Copy link
Contributor

Test comment for @ShanaLMoore

@laritakr
Copy link
Contributor

Note: Per discussion at 12/14 standup, this job will be based on the Newspaper gem's job, where one or more PDFs are added to a parent work, and processing will split each PDF into individual works with multiple FileSets for each page of the PDF.

@jillpe
Copy link
Author

jillpe commented Jan 3, 2023

#32 depends on this ticket

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Jan 16, 2023

TODO: Question about tiffs

@kirkkwang to help change tiffs to jpgs for now to unblock LaRita. We will revisit this when we have Rob's attention

We will split this off to its own ticket.

scientist-softserv/utk-hyku#310

@ShanaLMoore
Copy link
Contributor

TODO: @laritakr to add QA instructions with a demo/screenshots to this ticket. We will spin off the specs for this work in a separate ticket, to unblock other iiif_print work.

laritakr added a commit that referenced this issue Jan 24, 2023
Ref issue #11

* Create jobs to split pdfs into child works, using Hyrax's BatchCreateJob and CreateChildWork
* Create job to add child works to parent work
* Create services to split PDFs into TIFFs
* Migration to create IiifPrintPendingRelationships to track child works and add to parent
* Allow functionality to be added to work models by including configuration
  `include IiifPrint.model_configuration(pdf_split_child_model: GenericWork)`

Specs are still pending completion at this point.
@laritakr
Copy link
Contributor

laritakr commented Jan 31, 2023

To test via hyku, create a Generic Work with a PDF file. This can be done via the UI or via Bulkrax. (If via bulkrax, the file name must be in the same CSV row as the work itself).

Other work types are not configured to split PDFs, and should behave normally.

You should see:

  • parent work is created with the pdf in a fileset with the derivative thumbnail.
  • each page of the pdf is split as a TIFF into it's own Image child work. The title will match the title of the parent, with the PDF sequence and page numbers added.
  • when viewing the parent work show page, the PDF fileset will appear first, and all child works will follow in page number sequence.
  • if multiple PDFs are attached to the parent, sequence should still work appropriately as long as all PDFs are added at the same time. (see note re edit below)

Note:

  • editing and adding a second PDF to a work will currently result in duplicate file names because PDF sequencing when adding PDFs is not yet implemented.
  • relationships are created last. Until the relationships are completed, the new works will not appear on the parent work.

Image

Image

@laritakr
Copy link
Contributor

laritakr commented Feb 2, 2023

Blocked by deploy failure. #100

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Feb 6, 2023

@laritakr did this get QA'd? I see it moved to this column but don't see notes detailing the results (unless the previous comment got edited with them?).

@laritakr
Copy link
Contributor

laritakr commented Feb 7, 2023 via email

@laritakr
Copy link
Contributor

laritakr commented Feb 7, 2023

Note that the relationship job is delayed 10 minutes, so there can be a lag before the child works appear on the parent.

Test with one PDF
Test work with multiple PDFs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants