Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Index Handling & Synchronization #522

Closed
Jotschi opened this issue Oct 25, 2018 · 4 comments
Closed

Search Index Handling & Synchronization #522

Jotschi opened this issue Oct 25, 2018 · 4 comments

Comments

@Jotschi
Copy link
Contributor

Jotschi commented Oct 25, 2018

Abstract

Index operations (Create, Update, Delete) should happen async. Sync operations currently impact write request performance. Sync operations are currently required since Gentics Mesh does not yet support Rest filtering #485.

Tasks

  • Replace sync with async operations
  • Add circuit breakers to detect index operation failures
  • Index operations failures should trigger a recovery mode in which a sync should be executed
  • Add verticle for recovery operations. The verticle should periodically check whether the needed indices exist. Unknown indices with the configured prefix should be removed.
  • Gentics Mesh should keep a reference of the current index state to automatically pick up the sync process. This should be done to avoid full-sync operations. The sync state needs to be stored in a dedicated ES index or in the index metadata if possible.

Operation Handling

  • Use regular mesh events to determine which operations need to be executed on the elasticsearch indices.
  • Add root cause to events. We can use the root cause to skip certain events. (e.g. Project deletion does not need to remove documents from the index. Instead the whole index can be dropped)
  • Operations need to be batches for a given time window.
  • Batch processing will be triggered once the batch size limit has been reached.
  • Batch processing can also be triggered once a schema migration completed event has been received.
  • The tx handlers of mesh rest operations collects the events (similar to SQB's). This collection will be processed once the tx has been comitted in order to trigger the events.
@philippguertler
Copy link
Contributor

philippguertler commented Feb 4, 2019

Here is an incomplete list of dependencies between elasticsearch documents. We have to keep that in mind when updating documents.

  • Updating Groups

    • Update group name in User documents
  • Updating TagFamily

    • Update tagFamily in Tag documents
    • Update tagFamily in Node documents
  • Updating Schema (name)

  • Update schema in Node documents

  • Updating Project (name)

    • Update project in Tag documents
    • Update project in Node documents
  • Updating Tags

    • Update tags in Node documents
  • Add Tags to Node

    • Update tags in Node documents

@Jotschi
Copy link
Contributor Author

Jotschi commented Feb 13, 2019

TODO

  • Add cause field handling (to reduce bogus operations)
    Do we want to omit those entries or process them directly. This may be desired when invoking a large delete operation which will be committed in batches.
    Option: We could just remove the HAS_PROJECT edge and consider the project as deleted. We however would need to deal with the webroot index (in HAS_FIELD_CONTAINER edge)

  • Event handling missing: Schema, Microschema - Refactor Event Handling #649

  • Revisit node event handling - Refactor Event Handling #649

  • Add new Index Sync method

  • Add perm event handler - We need to add dedicated POJO's for TagFamily, Tag, Project .. ROLE_PERMISSIONS_CHANGED events.

  • Rework bootstrap handling for ES startup

  • Add resilience handling for ES

  • Search options: Back pressure buffer size, bulk size (ES bulks), bulk timeout

  • Change logger for Events to Trace

@philippguertler
Copy link
Contributor

A big breaking change is that mutating actions to mesh don't wait until the changes are synced to elasticsearch anymore.

A possible solution to that would be to add an option to the search endpoints to wait for elasticsearch to be synced before the search request is sent. This option could be turned on by default at first, which would not introduce any breaking changes at all.

@Jotschi Jotschi moved this from Won't have to Must Have in Release 1.0 Estimation Apr 24, 2019
@Jotschi Jotschi added this to the 1.0.0 milestone Apr 24, 2019
@Jotschi Jotschi added this to In progress in 1.0 Apr 24, 2019
@Jotschi
Copy link
Contributor Author

Jotschi commented May 8, 2019

Suggested solution has been implemented.

@Jotschi Jotschi closed this as completed May 8, 2019
1.0 automation moved this from In progress to Reviewed / Merged May 8, 2019
@Jotschi Jotschi moved this from Reviewed / Merged to Released in 1.0 May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
1.0
Released
Development

Successfully merging a pull request may close this issue.

2 participants