Consider adding a "Write-Only" Adapter #598

tpendragon · 2018-09-12T18:37:49Z

Right now we index via a full MetadataAdapter, but it leads to bloat in the index and may result in strange inconsistencies between what you want for pulling out a Valkyrie object and what you want for search and display.

An Indexer would probably just have a pattern for building one up, some sane defaults, save_all, and delete_all.

When developing the indexer's API, think of it as a write-only MetadataAdapter. This will make it easy for downstream users to transition.

The text was updated successfully, but these errors were encountered:

tpendragon · 2018-09-12T18:38:00Z

@awead and @Cam156 would you mind commenting?

awead · 2018-09-13T12:22:18Z

Maybe we could approach this in the same way we do RDF predicate mappings.

tpendragon · 2018-09-18T20:00:43Z

@awead Can you expand on what you mean?

awead · 2018-09-18T20:15:32Z

@tpendragon I was thinking of architectural consistency. We can inject different kinds of mechanisms to map properties to RDF predicates. Perhaps we can use the same technique for indexing? It's similar: mapping properties to one or more solr fields.

tpendragon · 2018-10-10T03:50:52Z

Some discussions from tonight: Is an indexer just a "write only" persister? What's the difference between an indexer and any sort of transformation logic from the hash that is a Valkyrie::Resource to some other representation? Is it useful to have APIs here?

elrayle · 2019-06-18T14:31:59Z

I like the idea of an indexer abstraction in Valkyrie. My thoughts on this...

Having an abstraction allows Hyrax to write its indexers compliant with the Valkyrie indexer abstraction. Then Hyrax can be delivered out of the box with the Solr indexer and allow sites to switch to using an elastic search indexer just by changing the registration. Something like...

  Valkyrie::IndexAdapter.register(
    Valkyrie::SolrIndexAdapter,
    :solr_indexer
  )
  Valkyrie.config.indexer_adapter = :solr_indexer

# BECOMES

  Valkyrie::IndexAdapter.register(
    Valkyrie::ElasticSearchIndexAdapter,
    :elastic_indexer
  )
  Valkyrie.config.indexer_adapter = :elastic_indexer

Additional benefits of moving away from a fully functioning metadata adapter...

removes solr doc bloat as it removes the requirement that all attributes be in the solr doc to allow reconstruction of the resource from the solr doc. Solr doc bloat has the potential to effect search results.
provides more flexibility in the handling of fields (e.g. dates can be in the solr doc as dates optimized for range searching as a date facet)

elrayle · 2019-06-18T14:32:52Z

@tpendragon I see there hasn't been much movement on this. Are there plans to move this forward?

tpendragon · 2019-06-20T23:44:35Z

@elrayle Sorry, yes. We intend to do this, but it needs some research into the best interface. It looks like y'alls work on Wings is going to be a good smoke test on how useful the abstraction will be.

hackartisan · 2019-06-27T19:17:54Z

Thoughts and updates from initial progress on this in Hyrax / Wings.

I wrote an indexing adapter that registers the same way other adapters do. I started working on a solr indexing adapter, hoping to re-use valkyrie code directly. Initially it looked like I may be able to use the solr metadata adapter repository, but that didn't pan out because:

it required a resource factory and that required a metadata adapter, so I wondered if some of those classes could maybe be detangled a bit. Maybe this is a non-issue if we go with @tpendragon's proposal to make the metadata adapter take a "write-only" option.
in the context of an indexing adapter or write-only adapter, you don't want to return the document translated back into a valkyrie resource. and you can't actually because you won't likely be storing all that data in the index. This work happens in the repository, which you'd likely want to be able to share whether you have a separate registry or a write-only option. So it seems like we'll want to pull that transformation up a level or add in switches or something.

I am essentially pulling code out of valkyrie and simplifying it a bunch instead of actually reusing anything. Figuring out how the indexing behavior itself should work, I stripped out some logic that seems too opinionated, e.g. places where all the values are indexed by default -- these should probably be extracted into a proper indexer that you can configure to use or not use. https://github.com/samvera/valkyrie/blob/master/lib/valkyrie/persistence/solr/model_converter.rb#L63

hackartisan · 2019-06-27T19:27:06Z

In working on the indexer concept in Hyrax, a difference between active fedora's generate_solr_document and valkyrie's to_solr seems worthy of further discussion. The valkyrie solr metadata adapter builds the index by running through all the indexers and concatenating a big hash. I'm thinking of a scenario where Hyrax has defined its core indexers and someone else wants to remove one of those values, maybe because they want it in a field with a different name, or maybe they don't want it at all. I think they'd need to override the Hyrax indexer class in that case, and they'd just configure that in the initializer. That's probably fine. But the active fedora way of doing it is to build the hash through yielding a block; I think that allows just deleting a field and putting in a different field, which feels more light-weight.

@tpendragon commented: I'm not sure how I feel about mutating the hash, but there's actually a ticket in Valkyrie that proposes yielding the context: #599...
I'll be happy if we're able to figure out a good pattern, nobody's ever really liked to_solr and the current indexer is basically just an OO version of that.

Is it important to be able to mutate the actual hash? @no-reply ?

escowles · 2019-06-27T19:30:20Z

Assuming that the Hyrax indexing config is roughly the same as Figgy (https://github.com/pulibrary/figgy/blob/master/config/initializers/valkyrie.rb#L224-L238), couldn't an implementation remove the indexing class they didn't want from Hyrax's config somehow?

no-reply · 2020-01-03T23:28:18Z

I've become convinced that the #to_solr method composition approach is actually quite powerful for Hyrax's use cases.

Decoupling it entirely from the model seems good. We've done that in Hyrax with a factory method Hyrax::ValkyrieIndexer.for(resource: my_resource). From the old ActiveFedora::IndexingService classes, we drop #generate_solr_document in favor of just #to_solr. Modules can be injected with include and prepend, and we have a houndstooth style config-based module builder system with include/prepend Hryax::Indexer(:core_metadata)

tpendragon added the question label Sep 12, 2018

awead mentioned this issue Oct 23, 2018

Indexing fields in solr for search versus functionality #616

Closed

This was referenced Jun 18, 2019

setup solr indexers to use valkyrie/wings samvera/hyrax#3800

Closed

[wings] Inline #query, #post, and #get into solr_service from AF samvera/hyrax#3834

Merged

no-reply mentioned this issue Oct 16, 2019

Implement Valkyrie native Collections as PcdmCollection samvera/hyrax#4076

Merged

tpendragon changed the title ~~Consider adding an "Indexer" concept.~~ Consider adding a "Write-Only" Adapter Oct 25, 2019

no-reply mentioned this issue Jan 3, 2020

Write Metadata and ACL changes to valkyrie index with listeners samvera/hyrax#4175

Merged

tpendragon added this to 2.2.0 (Indexers) in 2022 Roadmap Jan 6, 2020

tpendragon added this to Indexing in Samvera Connect 2020 Hackathon Oct 14, 2020

tpendragon moved this from Indexing to Ready in Samvera Connect 2020 Hackathon Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding a "Write-Only" Adapter #598

Consider adding a "Write-Only" Adapter #598

tpendragon commented Sep 12, 2018 •

edited by hackartisan

tpendragon commented Sep 12, 2018

awead commented Sep 13, 2018

tpendragon commented Sep 18, 2018

awead commented Sep 18, 2018

tpendragon commented Oct 10, 2018

elrayle commented Jun 18, 2019

elrayle commented Jun 18, 2019

tpendragon commented Jun 20, 2019

hackartisan commented Jun 27, 2019

hackartisan commented Jun 27, 2019

escowles commented Jun 27, 2019

no-reply commented Jan 3, 2020

Consider adding a "Write-Only" Adapter #598

Consider adding a "Write-Only" Adapter #598

Comments

tpendragon commented Sep 12, 2018 • edited by hackartisan

tpendragon commented Sep 12, 2018

awead commented Sep 13, 2018

tpendragon commented Sep 18, 2018

awead commented Sep 18, 2018

tpendragon commented Oct 10, 2018

elrayle commented Jun 18, 2019

elrayle commented Jun 18, 2019

tpendragon commented Jun 20, 2019

hackartisan commented Jun 27, 2019

hackartisan commented Jun 27, 2019

escowles commented Jun 27, 2019

no-reply commented Jan 3, 2020

tpendragon commented Sep 12, 2018 •

edited by hackartisan