Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Enable global meta data for batch commit identification #9

Open
tkurz opened this issue Nov 6, 2017 · 1 comment
Open

Enable global meta data for batch commit identification #9

tkurz opened this issue Nov 6, 2017 · 1 comment
Assignees
Labels

Comments

@tkurz
Copy link
Contributor

tkurz commented Nov 6, 2017

Current State

Vind https://javadoc.io/page/com.rbmhtechnology.vind/vind/latest/com/rbmhtechnology/vind/api/SearchServer.html provides some methods to index documents:

  • void index(Document... doc)
  • void index(List<Document> doc)
  • void indexBean(List<Object> t)
  • void indexBean(Object... t)

Internally, both methods trigger an indexing process but not a commit (which is an intended behavior, as the server itself can handle commits internally much more efficient). Note, there are methods for commit, which guarantee that all indexing processes are commited (with all negative consequences regarding performance).

Problem

In applications that support Read-Your-Writes this behaviour might be a problem (because the application has to guarantee an always-up-to-date index status and thus is forced to use many hard commits).

Idea

Vind could support version numbering for indexing processes so an application could proof, which is the latest version that has been indexed (and thus is able to control via an additional method, if the necessary indexes already has been processed). This could be an internal counter or a counter based within the application, which could lead to the following api:

  • long index(List<Document> doc)
  • void index(List<Document> doc, long version)

Note, that the other methods would work analogous. To get the latest index version there could be a method, like:

  • long getLatestVersion()
  • boolean isVersionIndexed(long version)

In addition, each Document could have an additional field version.

@alfonso-noriega
Copy link
Contributor

A solution could be making use of solr document versioning:

  • By adding to the index request the parameter version=true solr reponse will provide the future version of each document being updated. So we could change the API to
    Map<String,Long> index(List<Document>)
    And the client could manage versioning in their application.
  • In the Solr schema, a new multi valued field to keep the historic of versions would be added to the document in order to ensure that when we check the doc version, a later update has not made impossible to find the expected one.
  • As mentioned previously in the issue, a new method to check weather the document has been already indexed or not:
    Boolean isVersionIndexed(String docId, Long version)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants