Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a mechanism to expire the image server cache for one or more objects #743

Open
anarchivist opened this issue Feb 2, 2021 · 8 comments

Comments

@anarchivist
Copy link

anarchivist commented Feb 2, 2021

@andrewjbtw shared that our long-lasting caching of imageserver resources is having a negative impact when images are remediated in SDR.

This has impacted us a few times recently, both with remediating digital borrowing objects, and with dealing with remediated objects in a digital collections project with an external partner. In each case what happened was:

  1. Images were accessioned into the SDR
  2. The viewer displayed those images
  3. The images were revised and re-accessioned using the same file names as before
  4. The viewer still displayed the old versions of the image for days or even weeks, even though the files on Stacks were the new versions
  5. The short-term workaround (remediating files using a different name) added unnecessary overhead, and impacted high-profile work or time-sensitive requests

We should thus expose a mechanism that allows us to expire the image server's cache for a given object. While we should work with Andrew on the specific needs further, this should be accessible at least to repository admins/service managers.

Andrew adds:

I think the most common uses cases will be for individual druids or subsets of individual druids found during a QC process rather than at the level of an entire collection. I think if there is the ability to act on multiple druids in a batch, then that could cover the collection use case via getting a list of all the druids in the collection.

@anarchivist anarchivist changed the title Expose a mechanism to expire the image server cache for an object/collection Expose a mechanism to expire the image server cache for an object Feb 2, 2021
@anarchivist
Copy link
Author

Seems related to #503 as well.

@anarchivist anarchivist changed the title Expose a mechanism to expire the image server cache for an object Expose a mechanism to expire the image server cache for one or more objects Feb 6, 2021
@edsu
Copy link

edsu commented Jul 11, 2022

If Cantaloupe is responsible for holding onto the cached derivative I wonder if various workflows could POST to the PurgeItemFromCache API endpoint when derivatives need regeneration?

@cbeer
Copy link
Member

cbeer commented Jul 12, 2022

Yes, now that we're on Cantaloupe 5.

@edsu
Copy link

edsu commented Jul 12, 2022

@cbeer I'm new to the SUL architecture, so this might be way off, but does it seem like PURL is the right place to post to PurgeItemFromCache when a public facing object is created/updated?

@cbeer
Copy link
Member

cbeer commented Jul 12, 2022

DOR robots just dump files on NFS mounts shared with stacks and purl, so neither stack knows what's changed.

I think purl-fetcher (or some new service listening to the purl-fetcher data feed?) would be our best bet for now. It gets pinged when objects are republished (not sure about files shelved...).

@mjgiarlo
Copy link
Member

@edsu I chatted with @cbeer and we think it could work to hit Cantaloupe's purge API from DSA's shelving service, in here somewhere: https://github.com/sul-dlss/dor-services-app/blob/main/app/services/shelving_service.rb#L19-L34

Neither of us loves coupling the management side of SDR to the access side more than it is now, but until there's a shelving API (not currently planned or even designed), this is our best approach to improving caching behavior. Do you want to take a hack at writing up an issue and tossing it in ready?

@cbeer anything we need to know about hitting Cantaloupe's APIs? (AuthN, etc.)

@cbeer
Copy link
Member

cbeer commented Jul 13, 2022

I don't think we know anything about Cantaloupe's APIs.

@mjgiarlo
Copy link
Member

I can confirm Cantaloupe's HTTP API is enabled and working.

Credentials are here: https://github.com/sul-dlss/puppet/blob/production/modules/profile/files/cantaloupe/cantaloupe505.properties#L125-L127

Prod URL (load-balanced): http://imageserver-prod.stanford.edu
Stage URL (load-balanced): http://imageserver-stage.stanford.edu

Note the these respond via HTTP, not HTTPS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants