Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF.js not loading PDFs prior to splitting #180

Closed
KatharineV opened this issue Apr 12, 2024 · 14 comments
Closed

PDF.js not loading PDFs prior to splitting #180

KatharineV opened this issue Apr 12, 2024 · 14 comments
Labels
bug something isn't working Knapsack Upgrade needs rework issue needs additional work

Comments

@KatharineV
Copy link

After deploying Knapsack, ADL and SDAPI are not loading PDF works in the PDF.js viewer prior to splitting. The PDF.js viewer should display PDFs immediately, and the UV should display them after relationship jobs and splitting run later.

This bug is affected works uploaded before and after the cutover to Knapsack. Works uploaded before Knapsack are showing a gray box on the page where PDF.js should render, with an error message instead of the PDF. Works uploaded after Knapsack are showing the PDF icon and no viewer at all.

Production has this error. Staging looks like it hasn't cut over the Knapsack, based on the appearance of the dashboard. So I haven't testing staging as much.

Old Works not currently showing PDF.js (uploaded 2022 to the present):

ADL prod:
https://adl.b2.adventistdigitallibrary.org/concern/published_works/a8d40de8-a116-4bd5-890a-98edf027b44d
Image

New Works not currently showing PDF.js (works uploaded 4/11 and 4/12):
The ADL production example was imported from the OAI set. The SDAPI prod example was uploaded with CSV + Files using Bulkrax.

SDAPI prod:
https://sdapi.b2.adventistdigitallibrary.org/concern/journal_articles/dialogue_2023_1_9_13_a_sower_went_out_to_sow
Image

ADL prod:
https://adl.b2.adventistdigitallibrary.org/concern/published_works/20186590_the_worker_s_bulletin_january_15_1918
Image

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 13, 2024

Locally created an Image work type and attached a pdf file:

Image

Shouldn't we be hitting iiif print? in the stack trace. that iiif_viewer? doesn't call iiif_media as shown in the error above.

Image

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 13, 2024

After fixing the above error (should we create a separate ticket?), this PR may solve this issue.

ShanaLMoore referenced this issue in samvera/hyku Apr 13, 2024
We needed to move everything into a web subdirectory of PDF.js for the app to find files.

Related Issues:
- https://github.com/scientist-softserv/adventist-dl/issues/748
- https://github.com/scientist-softserv/adventist-dl/issues/746
@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 13, 2024

Oddly, after a few page refreshes I also see this error.

Putting my debugger in Hyku::WorkShowPresenterDecorator#video_embed_viewer? isn't stopping, as if the decorator file isn't getting picked up. Hyrax::ImagePresenter < Hyku::WorkShowPresenter which is what the knapsack should be decorating.

Image

@kirkkwang
Copy link
Contributor

kirkkwang commented Apr 15, 2024

@KatharineV the three examples you provided are now displaying the viewer. We had to reindex the file sets. Still not 100% sure what happened but it could have been just bad timing. I feel that if we initiate a reindex of everything, it'll fix a lot of these issues though that might be overkill since we still don't know if it's just a bad timing thing isolated to the date of the cutover to Knapsack.

@KatharineV
Copy link
Author

Team, I see improved functionality with PDF.js, but we aren't back to normal yet. I see a work uploaded on 4/16 is displaying the PDF.js viewer, but the progress bar spins for several minutes without (so far) displaying the PDF.

https://adl.b2.adventistdigitallibrary.org/concern/published_works/22266526_cardinal_2015

Image

This is a work that has not split, so I expect it to load quickly and be viewable in the PDF.js viewer. That behavior is not happening as expected.

@KatharineV
Copy link
Author

For comparison, here's a work that was uploaded several years ago (12/21), and it behaves as expected.

https://adl.b2.adventistdigitallibrary.org/concern/published_works/20122750_the_home_missionary_december_1_1889

The PDF.js viewer loads the document almost instantaneously, and it can be fully read.

@KatharineV
Copy link
Author

Team, the behavior on the SDAPI tenant is behind ADL. The PDF.js viewer is still not loading on SDAPI, and neither is the UV. Yikes! Thanks for helping us fix this. Here's a sample work that doesn't show any in-browser viewer, despite the Items list including the full PDF and the split child works:

https://sdapi.b2.adventistdigitallibrary.org/concern/journal_articles/dialogue_2023_2_18_20_hearing_god_above_the_headlines

Image

Image

@ShanaLMoore ShanaLMoore added the needs rework issue needs additional work label Apr 18, 2024
@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 18, 2024

Noting that the file in question is very large, over a gig in size. Wondering if that's part of the issue. digging in 👀

@kirkkwang
Copy link
Contributor

This one looked like a reindex issue again, i reindexed this particular work to get it to show but we'll likely need to do a file set reindex to hit all of them

image

@ShanaLMoore
Copy link
Contributor

@kirkkwang that's a great find but I wonder what's causing this. We've seen it multiple times now with newly added works. Why isn't the indexing happening?

@kirkkwang
Copy link
Contributor

@ShanaLMoore this work was ingested on the 12th, that was the cutover day right? I wonder if that timing had something to do with it.

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 23, 2024

This still seems like it's an issue. If the PDF.js is supposed to render while we wait on the split to switchover to the UV, here is what I'm seeing: (a blank UV while we wait for the children's jobs to complete)

Image

@ShanaLMoore
Copy link
Contributor

Disregard my previous comment. The PDF.js rendering first is only configured to happen with remote urls from s3 (ie: this was set up specifically as a workaround for the preprocessed files of the mass migrations.) Manually created works are not expected to render the PDF.js first, per Kirk and LaRita.

cc @KatharineV Could you confirm if this is still an issue for you?

@KatharineV
Copy link
Author

@ShanaLMoore Yeah! I do recall the PDF.js workaround being implemented specifically to account for the mass migration failing to split fast enough for the UV. However, the PDF.js has been working (pre-Knapsack) for all files uploaded to our sites. Not just new S3-linked uploads. And based on my testing today, PDF.js is indeed working in that more expansive capacity--it is rendering everything we need it to render on production right now. So, thanks for the work on this ticket, even if it crept beyond the scope of what LaRita and Kirk intended for PDF.js in the beginning! The PDF.js band-aid is actually much more useful than expected. It's providing a critical service, and it's working again as it was prior to Knapsack.

Here's the tl;dr of functionality I'm seeing and that I'd like to keep, regardless of the intentions of the workaround...

PDF.js works for new works uploaded via the UI (example: uploaded yesterday) or CSV/zip files (example: uploaded today). All PDFs render in the PDF.js before they split and display in the UV. Files uploaded prior to the team installing a PDF viewer in our application are now rendering in PDF.js, since they won't split until we tell them to (example: one of the first works loaded to Hyku back in 2021). The way the behavior has been working from installation of PDF.js to Knapsack is more fully realized than the behavior I saw when I created this ticket. We have that full functionality back in place as of my testing today. I understand if the intention was more narrow, but the functionality we received in the end is greater, and I hate to see any part of this platform move backwards in functionality. We need a way to read every PDF in the repository or we can't launch this site, and the work you've done to get PDF.js to where it is today has solved that problem. I hope it's alright to attach our hopes to keeping this functionality exactly as it is, at least! haha

Screenshot showing PDF.js for the work uploaded today via files & CSV, which demonstrates that PDF.js is working for more than S3 links via OAI:
image

@jillpe jillpe closed this as completed Apr 30, 2024
@kirkkwang kirkkwang transferred this issue from scientist-softserv/adventist-dl May 10, 2024
@jillpe jillpe reopened this May 10, 2024
@jillpe jillpe closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something isn't working Knapsack Upgrade needs rework issue needs additional work
Projects
Status: Done
Archived in project
Development

No branches or pull requests

4 participants