Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Split Media Objects #111

Closed
carrickr opened this issue Jun 9, 2017 · 13 comments
Closed

Handle Split Media Objects #111

carrickr opened this issue Jun 9, 2017 · 13 comments
Assignees

Comments

@carrickr
Copy link

carrickr commented Jun 9, 2017

In Avalon 6 the Johnny Cash item only has 28 sections:

http://avalon.repo.rdc.library.northwestern.edu/media_objects/pk02c977h

In Avalon 4 it has 104 sections:

https://media.northwestern.edu/media_objects/numedia:9916

It used to have 104 in Avalon 6, but the number has decreased.

@carrickr
Copy link
Author

carrickr commented Jun 9, 2017

This persists through a reindex, so it isn't just the solr core, they've detached in Fedora.

@carrickr
Copy link
Author

carrickr commented Jun 9, 2017

Media object appears to have split, with each one getting some of the sections in the custody proceedings

@davidschober
Copy link

@carrickr do we know how prevalent this is?

@carrickr carrickr changed the title Figure Out if We're Losing Sections Handle Split Media Objects Jun 12, 2017
@carrickr carrickr self-assigned this Jun 12, 2017
@carrickr
Copy link
Author

carrickr commented Jun 12, 2017

I can't find the split Johnny Cash using identifier _ssim.

I was able to find one duplicate via this method

t148fh13t
9k41zd48h

t148fh13t has no media associated with it, 9k41zd48h has 102 parts which matches the section count on numedia:10247 in Legacy Avalon. So t148fh13t should probably be deleted.

@carrickr
Copy link
Author

Once the rogue worker instance is under less load I'll see about auditing master files, splits have it pretty tied up though

@carrickr
Copy link
Author

["Masterfile k0698770z appears to be unattached",
 "Masterfile assignment error for z603qx574, it should be assigned to media object  it is assigned to 12579s327",
 "Masterfile assignment error for vq27zn73r, it should be assigned to media object  it is assigned to g445cd18p",
 "Masterfile assignment error for zg64tm30p, it should be assigned to media object  it is assigned to k930bx056"]

After running my crawl I only find 4 that are unassigned, @mbklein this seems way too low given that Johnny Cash alone is missing more.

Code for the crawl is here:

https://gist.github.com/carrickr/004246549ce1606abe01fca0006cf4af

@carrickr
Copy link
Author

carrickr commented Jun 14, 2017

One thing to note is that we have 10682 master files on AWS, we have 10829 on Avalon 4, will check to see what we are missing. I know AWS should be lower due to grooming of dead master files, i'll see if we overgroomed.

@carrickr
Copy link
Author

Items not in six are:

["numedia:10059", "numedia:10004", "numedia:10007", "numedia:10006", "numedia:10051", "numedia:10058", "numedia:10053", "numedia:10008", "numedia:10060", "numedia:10005", "numedia:10069", "numedia:10073", "numedia:10071", "numedia:10084", "numedia:10089", "numedia:10087", "numedia:10096", "numedia:10092", "numedia:10090", "numedia:10011", "numedia:10009", "numedia:10019", "numedia:10016", "numedia:10010", "numedia:10028", "numedia:10025", "numedia:10020", "numedia:10037", "numedia:10036", "numedia:10031", "numedia:10043", "numedia:10114", "numedia:9980", "numedia:9974", "numedia:9978", "numedia:9979", "numedia:9981", "numedia:9970", "numedia:9973", "numedia:11975", "numedia:11944", "numedia:11981", "numedia:11978", "numedia:11856", "numedia:11999", "numedia:11960", "numedia:11966", "numedia:11972", "numedia:11996", "numedia:11993", "numedia:11990", "numedia:11902", "numedia:11888", "numedia:12005", "numedia:12011", "numedia:11802", "numedia:9924", "numedia:9955", "numedia:9945", "numedia:9922", "numedia:9920", "numedia:9921", "numedia:9948", "numedia:9944", "numedia:9949", "numedia:9947", "numedia:9953", "numedia:9954", "numedia:9950", "numedia:9951", "numedia:9956", "numedia:9957", "numedia:9959", "numedia:9917", "numedia:9918", "numedia:9919", "numedia:11007", "numedia:11048", "numedia:11045", "numedia:11022", "numedia:11063", "numedia:11004", "numedia:11016", "numedia:11017", "numedia:11014", "numedia:11015", "numedia:11012", "numedia:11013", "numedia:11010", "numedia:11011", "numedia:11018", "numedia:11019", "numedia:11005", "numedia:11006", "numedia:11009", "numedia:11008", "numedia:11052", "numedia:11050", "numedia:11051", "numedia:11059", "numedia:11056", "numedia:11055", "numedia:11049", "numedia:11047", "numedia:11046", "numedia:11020", "numedia:11023", "numedia:11021", "numedia:11060", "numedia:23009", "numedia:14439", "numedia:14434", "numedia:14425", "numedia:14431", "numedia:14418", "numedia:15841", "numedia:26956", "numedia:47967", "numedia:20102", "numedia:20667", "numedia:20664", "numedia:26969", "numedia:26965", "numedia:26967", "numedia:21854", "numedia:23406", "numedia:24005", "numedia:24387", "numedia:24583", "numedia:24612", "numedia:24611", "numedia:24610", "numedia:47958", "numedia:3864", "numedia:26937", "numedia:26939", "numedia:26938", "numedia:26940", "numedia:26941", "numedia:26942", "numedia:26955", "numedia:26963", "numedia:28747", "numedia:28742", "numedia:28741", "numedia:28745", "numedia:28743", "numedia:28744", "numedia:28859", "numedia:28746", "numedia:28862", "numedia:46828", "numedia:47954", "numedia:47960"]

will do more analysis on if those deserved to be groomed or not

@carrickr
Copy link
Author

  • k0698770z is an orphan, it's media object was deleted in Avalon 4 but the master file was missed, drowned

@carrickr
Copy link
Author

z603qx574 belonged to numedia:10247 which is now 9k41zd48h, but was never attached. It was stuck in the cleaning up state (the counts match for the media object in 4 and AWS at 102 each), this was orphaned way back in 4, deleted

@carrickr
Copy link
Author

carrickr commented Jun 15, 2017

106 split obects, although all numedia 10247, which is good

["numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247", "numedia:10247"]

Many of these have no masterfiles though, I will look further.

@carrickr
Copy link
Author

Okay, looks like beethoven split, but 9k41zd48h got custody of the kids

irb(main):092:0> MediaObject.find('9k41zd48h').master_files.size
=> 102

irb(main):093:0> MediaObject.find('t148fh13t').master_files.size
=> 0

So t148fh13t is clear for drowning

@carrickr
Copy link
Author

carrickr commented Jun 15, 2017

Only split object deleted, this can be closed, the impetus for the whole thing (Johnny Cash missing files) is handled in #142 since it turns out they never need to be remigrated.

I'll run through #142 and all pids found there can be used to supplement the pids we'll migrate in #143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants