New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve work indexing of representative data #736

Merged
merged 5 commits into from Jul 28, 2017

Conversation

Projects
None yet
2 participants
@jrochkind
Contributor

jrochkind commented Jul 26, 2017

With tests for edge cases I could think of.
Ended up simpler code doing some TDD-ish dev, hooray.

Tricky taking care of reindexing a work when it has a child work as a representative, and that child work's representative has changed. Have to put some expensive solr lookups in indexing.

Tests are pretty slow, maybe another 30s to these new tests. Best I could come up with so far. Wary of mocking too much stuff that I'm not super confident won't change in a future version, causing tests to false pass.

improve work indexing of representative data
With tests for edge cases I could think of.
Ended up simpler code doing some TDD-ish dev, hooray.

@jrochkind jrochkind changed the title from improve work indexing of representative data to WIP improve work indexing of representative data Jul 26, 2017

@jrochkind jrochkind changed the title from WIP improve work indexing of representative data to improve work indexing of representative data Jul 26, 2017

child_work.representative = new_file_set
indexed_parent = SolrDocument.find(work.id)
expect(indexed_parent["representative_original_file_id_tesim"]).not_to include(new_file_set.original_file.id)

This comment has been minimized.

@hackmastera

hackmastera Jul 26, 2017

Contributor

do we want contain_exactly instead of include?

@hackmastera

hackmastera Jul 26, 2017

Contributor

do we want contain_exactly instead of include?

This comment has been minimized.

@jrochkind

jrochkind Jul 26, 2017

Contributor

not_to contain_exactly does not mean the same thing as not_to include. Like it could include those things, but not exactly, and still pass.

I guess I could say I expect it to eq [] if that's what I expect. I guess I'll look more into what I expect.

@jrochkind

jrochkind Jul 26, 2017

Contributor

not_to contain_exactly does not mean the same thing as not_to include. Like it could include those things, but not exactly, and still pass.

I guess I could say I expect it to eq [] if that's what I expect. I guess I'll look more into what I expect.

@jrochkind

This comment has been minimized.

Show comment
Hide comment
@jrochkind

jrochkind Jul 26, 2017

Contributor

deployed to staging, started rake chf:reindex with this on staging, under screen.

Contributor

jrochkind commented Jul 26, 2017

deployed to staging, started rake chf:reindex with this on staging, under screen.

@jrochkind

This comment has been minimized.

Show comment
Hide comment
@jrochkind

jrochkind Jul 27, 2017

Contributor

Indexing completed on staging, without errors. Completed in 2 hours 50 minutes.

I realize this is potentially doing a bit of extra work on bulk-reindex, due to the code to trigger an index on 'parent' work using a child work as a representative, when a child work is reindexed. Some works could end up reindexed multiple times in a bulk reindex.

Still 2 hours 50 minutes seems about what we expect for a bulk reindex, not sure if it's worth trying to fix.

Edited: Actually, no, this won't be triggered in reindexing, because our reindexing is still using the (deprecated) ActiveFedora::Base.find(ActiveFedora::Base.uri_to_id(uri)).to_solr, so update_index will not be triggered. Even if we change this, our indexing's attempts to use batch adds probably will never call update_index on individual items, so prob all good.

Contributor

jrochkind commented Jul 27, 2017

Indexing completed on staging, without errors. Completed in 2 hours 50 minutes.

I realize this is potentially doing a bit of extra work on bulk-reindex, due to the code to trigger an index on 'parent' work using a child work as a representative, when a child work is reindexed. Some works could end up reindexed multiple times in a bulk reindex.

Still 2 hours 50 minutes seems about what we expect for a bulk reindex, not sure if it's worth trying to fix.

Edited: Actually, no, this won't be triggered in reindexing, because our reindexing is still using the (deprecated) ActiveFedora::Base.find(ActiveFedora::Base.uri_to_id(uri)).to_solr, so update_index will not be triggered. Even if we change this, our indexing's attempts to use batch adds probably will never call update_index on individual items, so prob all good.

@jrochkind

This comment has been minimized.

Show comment
Hide comment
@jrochkind

jrochkind Jul 27, 2017

Contributor

Did a few test edits on staging. All seemed to work. Set up a "Work1 has representative Work2 which has representative FileSet" situation, edited Work2's representative fileset, Verified that Work1 did have it's index updated.

Merge to master?

Contributor

jrochkind commented Jul 27, 2017

Did a few test edits on staging. All seemed to work. Set up a "Work1 has representative Work2 which has representative FileSet" situation, edited Work2's representative fileset, Verified that Work1 did have it's index updated.

Merge to master?

@hackmastera hackmastera merged commit 076d3e2 into master Jul 28, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@hackmastera hackmastera deleted the fix_work_representative_indexing branch Jul 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment