Skip to content

feat(elastic): sub index by thing type#1735

Merged
olovy merged 4 commits intodevelopfrom
feature/es-sub-index
Mar 27, 2026
Merged

feat(elastic): sub index by thing type#1735
olovy merged 4 commits intodevelopfrom
feature/es-sub-index

Conversation

@olovy
Copy link
Copy Markdown
Contributor

@olovy olovy commented Mar 27, 2026

Add support for splitting the ES index in different parts based on thing type.
Based on the great work by @andersju in #1569

A new parameter in secret.properties specifies which types (and subtypes) are placed in their own "subindex".
All other documents remain in the main index.

example elasticSubIndexTypes = Work,Instance,Item

https://localhost:9200/_cat/indices?v

health status index                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
green  open   libris_local_7          KZTFSH3HSgydWH_ti0lKEg  15   0     766864            0        1gb            1gb          1gb
green  open   libris_local-item_7     2gCdmXtnQnu_TeJph6KV1A  15   0       6736            0     26.7mb         26.7mb       26.7mb
green  open   libris_local-instance_7 IDzSl0NOR4i8W1n-_ykFdg  15   0      84688            0    247.6mb        247.6mb      247.6mb
green  open   libris_local-work_7     UY6dlSZPQ3mhy3f3H8Xqig  15   0      83488            0    265.7mb        265.7mb      265.7mb

See https://github.com/libris/devops/pull/298 for how to set up

Summary of changes:
Basically just a lot of plumbing to make ElasticSearch and some related places aware of the subindices

  • Add configuration parameter elasticSubIndexTypes
  • Refactor ElasticSearch initialization since it now needs JsonLd
  • Make indexing aware of subindices
  • Add parameter for which indices to query - when not specified, query all
  • Use type information in search2 to only query relevant indices

@olovy
Copy link
Copy Markdown
Contributor Author

olovy commented Mar 27, 2026

(fixing tests)

@olovy
Copy link
Copy Markdown
Contributor Author

olovy commented Mar 27, 2026

merging this so we can do a test indexing on QA over the weekend.
please still add your reviews

@olovy olovy merged commit cf91b20 into develop Mar 27, 2026
1 check passed
@olovy olovy deleted the feature/es-sub-index branch March 27, 2026 15:28
@andersju
Copy link
Copy Markdown
Member

andersju commented Mar 30, 2026

Great work! I'll look at it more closely but one thing we really should do right away is change number_of_shards in libris_config.json. We've been using 15 which was (probably?) reasonable for a single 700GB+ index, but if we have a bunch of smaller ones, it's way too much and will negatively affect performance.

The general guidelines from https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards are "Aim for shard sizes between 10GB and 50GB" and "Keep the number of documents on each shard below 200 million".

The item/work/instance shards on QA are now 170-180 GB each, so maybe 5 shards each? And a single shard for the tiny ones. Ideally we'd set an index-specific number but until then I suggest we lower it to 5 in libris_config.json.

@olovy
Copy link
Copy Markdown
Contributor Author

olovy commented Mar 30, 2026

Great work! I'll look at it more closely but one thing we really should do right away is change number_of_shards in libris_config.json. We've been using 15 which was (probably?) reasonable for a single 700GB+ index, but if we have a bunch of smaller ones, it's way too much and will negatively affect performance.

The general guidelines from https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards are "Aim for shard sizes between 10GB and 50GB" and "Keep the number of documents on each shard below 200 million".

The item/work/instance shards on QA are now 170-180 GB each, so maybe 5 shards each? And a single shard for the tiny ones. Ideally we'd set an index-specific number but until then I suggest we lower it to 5 in libris_config.json.

Yes!!
Wanted to see where they landed size wise. 5 sounds like a good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants