Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

browse-everything support #1076

Merged
merged 21 commits into from Jun 12, 2018

Conversation

Projects
None yet
1 participant
@jrochkind
Copy link
Contributor

jrochkind commented May 30, 2018

With extensive refactoring of actor/jobs stuff for ingest that turned out kind of neccesary to make it work right. :(

Closes #1021

jrochkind added some commits May 17, 2018

override sufia template to get correct browse-everything data-target
for edit forms.

This is the fix in hyrax, although I don't understand why that DOM ID makes sense in the first place
get batch upload to work with browse-everything
lots of sufia overrides taken from scholarsphere's version of fixing this, at:
psu-stewardship/scholarsphere@5529a50

We had to tweak a few things, not sure why.

Not sure B-E uploads are _actually_ working right yet, sometimes derivatives don't seem to be created for some reason,
but we're getting closer.
Complete override of CreateWithFilesActors and it's dependencies
It now does the work of CreateWithRemoteFilesActor too (that one has been zeroed out).

To avoid doing things in the foreground web server loop that should be in a bg job, and generally make this saner and easier to reason about. It was such a mess, it just made sense to take it over.
Custom ImportUrlJob
based on scholarsphere.

* not create file as ruby TempFile, so it doesn't disappear being GC'd when another job needs it
* get correct filename into sufia file

TODO: Clean up remote downloaded file at some point when it's done, prob after CreateDerivatives
on ingest, create FileSets serially
in a new job. cause file sets can't be attached to work in parallel, they were fighting with each other.

but we still need it in the bg not fg cause it's slow on bulk ingest.

Then the new AttachFileSetsJobs kicks off a job per file to actually ingest bytestreams
Deal with our new ImportUrlJob signature.
Yes, this is all messy.
browse-everything working local file, use standard location
So other jobs can find it rather than re-download, and so we can clean it up at the end.
use same job queue for AttachRemoteFileJob
To keep our resource consumption more similar to stock before browse-everything support,
and cause I found MORE places in the stack with cooperating jobs that assume the file will be avail on the same server. :(
@jrochkind

This comment has been minimized.

Copy link
Contributor Author

jrochkind commented May 31, 2018

This majorly refactors ingest behavior, but I haven't written any tests. :( I just got overwhelmed, there are so many moving parts, and testing file ingest is hard (and slow) to begin with. It's possible Scholarsphere has some we could copy or crib from. It's possible that should be a pre-req for merging this, not sure.

@jrochkind

This comment has been minimized.

Copy link
Contributor Author

jrochkind commented May 31, 2018

There is half-finished support in sufia for actually giving you some status/progress info on ingests as the files are slowly added. It, I believe, remains half-finished and not doing too much in Hyrax, so we wouldn't have gotten it by upgrading to hyrax.

This would be a nice thing to have, it is not included in this PR at present, we can talk about if we need to/want to prioritize providing such a UI, either based on existing stack code or not.

remove unneccesary browse-everything warning
should not apply to us with S3, and is a cheesy warning instead of fixing the issue anyway.
samvera/hyrax#3090

@jrochkind jrochkind force-pushed the browse_everything branch from b76f66e to c2e56a7 Jun 6, 2018

jrochkind added some commits May 31, 2018

refactor AttachFileSetsJob to be more efficient
Extracting some things from FileSetActor#create_metadata. Trying to avoid saving Fedora objects multiple times cause it's slow, and to kick off jobs to actually ingest bytestreams as early as possible

@jrochkind jrochkind force-pushed the browse_everything branch from c2e56a7 to 36fbad3 Jun 6, 2018

jrochkind added some commits Jun 6, 2018

try to get around weird non-reproducible bug
https://app.honeybadger.io/projects/53196/faults/38119137

Don't really know what we're doing, but on a wing and a prayer, alas.
batch create, make 'set all to this resource type' work for browse-ev…
…erything too

Also be slightly less crazy JS code. It's still crazy JS code though.

@jrochkind jrochkind changed the title WIP browse-everything support browse-everything support Jun 12, 2018

@jrochkind jrochkind merged commit 5f5c15c into master Jun 12, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@jrochkind jrochkind deleted the browse_everything branch Jun 12, 2018

@jrochkind

This comment has been minimized.

Copy link
Contributor Author

jrochkind commented Jun 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.