Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a "Write-Only" Adapter #598

Open
tpendragon opened this issue Sep 12, 2018 · 12 comments
Open

Consider adding a "Write-Only" Adapter #598

tpendragon opened this issue Sep 12, 2018 · 12 comments

Comments

@tpendragon
Copy link
Collaborator

tpendragon commented Sep 12, 2018

Right now we index via a full MetadataAdapter, but it leads to bloat in the index and may result in strange inconsistencies between what you want for pulling out a Valkyrie object and what you want for search and display.

An Indexer would probably just have a pattern for building one up, some sane defaults, save_all, and delete_all.

When developing the indexer's API, think of it as a write-only MetadataAdapter. This will make it easy for downstream users to transition.

@tpendragon
Copy link
Collaborator Author

@awead and @Cam156 would you mind commenting?

@awead
Copy link
Contributor

awead commented Sep 13, 2018

Maybe we could approach this in the same way we do RDF predicate mappings.

@tpendragon
Copy link
Collaborator Author

@awead Can you expand on what you mean?

@awead
Copy link
Contributor

awead commented Sep 18, 2018

@tpendragon I was thinking of architectural consistency. We can inject different kinds of mechanisms to map properties to RDF predicates. Perhaps we can use the same technique for indexing? It's similar: mapping properties to one or more solr fields.

@tpendragon
Copy link
Collaborator Author

Some discussions from tonight: Is an indexer just a "write only" persister? What's the difference between an indexer and any sort of transformation logic from the hash that is a Valkyrie::Resource to some other representation? Is it useful to have APIs here?

@elrayle
Copy link
Contributor

elrayle commented Jun 18, 2019

I like the idea of an indexer abstraction in Valkyrie. My thoughts on this...

Having an abstraction allows Hyrax to write its indexers compliant with the Valkyrie indexer abstraction. Then Hyrax can be delivered out of the box with the Solr indexer and allow sites to switch to using an elastic search indexer just by changing the registration. Something like...

  Valkyrie::IndexAdapter.register(
    Valkyrie::SolrIndexAdapter,
    :solr_indexer
  )
  Valkyrie.config.indexer_adapter = :solr_indexer

# BECOMES

  Valkyrie::IndexAdapter.register(
    Valkyrie::ElasticSearchIndexAdapter,
    :elastic_indexer
  )
  Valkyrie.config.indexer_adapter = :elastic_indexer

Additional benefits of moving away from a fully functioning metadata adapter...

  • removes solr doc bloat as it removes the requirement that all attributes be in the solr doc to allow reconstruction of the resource from the solr doc. Solr doc bloat has the potential to effect search results.
  • provides more flexibility in the handling of fields (e.g. dates can be in the solr doc as dates optimized for range searching as a date facet)

@elrayle
Copy link
Contributor

elrayle commented Jun 18, 2019

@tpendragon I see there hasn't been much movement on this. Are there plans to move this forward?

@tpendragon
Copy link
Collaborator Author

@elrayle Sorry, yes. We intend to do this, but it needs some research into the best interface. It looks like y'alls work on Wings is going to be a good smoke test on how useful the abstraction will be.

@hackartisan
Copy link
Contributor

Thoughts and updates from initial progress on this in Hyrax / Wings.

I wrote an indexing adapter that registers the same way other adapters do. I started working on a solr indexing adapter, hoping to re-use valkyrie code directly. Initially it looked like I may be able to use the solr metadata adapter repository, but that didn't pan out because:

  1. it required a resource factory and that required a metadata adapter, so I wondered if some of those classes could maybe be detangled a bit. Maybe this is a non-issue if we go with @tpendragon's proposal to make the metadata adapter take a "write-only" option.

  2. in the context of an indexing adapter or write-only adapter, you don't want to return the document translated back into a valkyrie resource. and you can't actually because you won't likely be storing all that data in the index. This work happens in the repository, which you'd likely want to be able to share whether you have a separate registry or a write-only option. So it seems like we'll want to pull that transformation up a level or add in switches or something.

I am essentially pulling code out of valkyrie and simplifying it a bunch instead of actually reusing anything. Figuring out how the indexing behavior itself should work, I stripped out some logic that seems too opinionated, e.g. places where all the values are indexed by default -- these should probably be extracted into a proper indexer that you can configure to use or not use. https://github.com/samvera/valkyrie/blob/master/lib/valkyrie/persistence/solr/model_converter.rb#L63

@hackartisan
Copy link
Contributor

In working on the indexer concept in Hyrax, a difference between active fedora's generate_solr_document and valkyrie's to_solr seems worthy of further discussion. The valkyrie solr metadata adapter builds the index by running through all the indexers and concatenating a big hash. I'm thinking of a scenario where Hyrax has defined its core indexers and someone else wants to remove one of those values, maybe because they want it in a field with a different name, or maybe they don't want it at all. I think they'd need to override the Hyrax indexer class in that case, and they'd just configure that in the initializer. That's probably fine. But the active fedora way of doing it is to build the hash through yielding a block; I think that allows just deleting a field and putting in a different field, which feels more light-weight.

@tpendragon commented: I'm not sure how I feel about mutating the hash, but there's actually a ticket in Valkyrie that proposes yielding the context: #599...
I'll be happy if we're able to figure out a good pattern, nobody's ever really liked to_solr and the current indexer is basically just an OO version of that.

Is it important to be able to mutate the actual hash? @no-reply ?

@escowles
Copy link
Contributor

Assuming that the Hyrax indexing config is roughly the same as Figgy (https://github.com/pulibrary/figgy/blob/master/config/initializers/valkyrie.rb#L224-L238), couldn't an implementation remove the indexing class they didn't want from Hyrax's config somehow?

@tpendragon tpendragon changed the title Consider adding an "Indexer" concept. Consider adding a "Write-Only" Adapter Oct 25, 2019
@no-reply
Copy link
Member

no-reply commented Jan 3, 2020

I've become convinced that the #to_solr method composition approach is actually quite powerful for Hyrax's use cases.

Decoupling it entirely from the model seems good. We've done that in Hyrax with a factory method Hyrax::ValkyrieIndexer.for(resource: my_resource). From the old ActiveFedora::IndexingService classes, we drop #generate_solr_document in favor of just #to_solr. Modules can be injected with include and prepend, and we have a houndstooth style config-based module builder system with include/prepend Hryax::Indexer(:core_metadata)

@tpendragon tpendragon added this to 2.2.0 (Indexers) in 2022 Roadmap Jan 6, 2020
@tpendragon tpendragon moved this from Indexing to Ready in Samvera Connect 2020 Hackathon Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
2022 Roadmap
  
2.2.0 (Indexers)
Development

No branches or pull requests

6 participants