Skip to content

Update search because it can now be instantiated#1170

Merged
mmattel merged 3 commits into
masterfrom
search_can_now_be_instantinated
Jun 20, 2025
Merged

Update search because it can now be instantiated#1170
mmattel merged 3 commits into
masterfrom
search_can_now_be_instantinated

Conversation

@mmattel

@mmattel mmattel commented Jun 16, 2025

Copy link
Copy Markdown
Contributor

Fixes: #1135 (Allow scaling of the search service)

Documents around search are updated to match the new functionality of the search service.

  • Relocate content to have it in one place
  • Update search restriction for scaling
  • Text updates

No backport, it is part of the upcoming 7.2

Rendered locally, looks good.

@mmattel mmattel requested review from 2403905 and phil-davis June 16, 2025 15:04
@mmattel mmattel added the documentation Improvements or additions to documentation label Jun 16, 2025
@mmattel mmattel force-pushed the search_can_now_be_instantinated branch from e17fabc to 7c67e2a Compare June 16, 2025 15:38
@mmattel mmattel requested a review from jvillafanez June 17, 2025 07:16
NOTE: Both metadata and content extractions are stored as indexes via the search service. Keep in mind that indexing requires adequate storage capacity, and this requirement will grow over time. To prevent the index from filling up the file system and rendering Infinite Scale unusable, it should reside on its own file system.

You can change the path to where search maintains its data in case the filesystem gets close to full and you need to relocate the data. Stop the service, move the data, reconfigure the path in the environment variable and restart the service.
In case the file system gets close to full and you need to relocate {service_name} data, you can change the path to where {service_name} maintains its index data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you'll need to copy the index data into the new location. Just changing the location will break things because oCIS won't find the index data.
It's mentioned below... maybe it's better to link to where the actual info is

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an own section about this, see Reloacting the Index. This is also part of the table of contents.
This here should only tell that it is possible.

* The embedded `basic` configuration provides metadata extraction which is always on.
* The `tika` configuration, which _additionally_ provides content extraction, if installed and configured.
* The embedded `basic` configuration provides metadata extraction which is always on. This includes all data that _describes_ the file like `Name`, `Size`, `MimeType`, `Tags` and `Mtime`.
* The `tika` configuration, which _additionally_ provides content extraction, if installed and configured. This includes all data that _relates to content_ of the file like `words`, `geo data`, `exif data` etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract exif data and geo data with the ocis_full example? I guess tika can be configured to extract the data, but I'm not sure if oCIS is prepared to handle it. If we don't have instructions to set it up, I'd rather skip this part.

@mmattel mmattel Jun 17, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
I think that this is two fold:

  1. The data is extracted and stored in the bleve index, this comes from tika and it does (should do) it without extra config. I have not heard any negative reponses that if tika is on, an image is saved, that ocis logs an error. The question is, can we e.g. prove it that data is present in the index.
  2. How to access the data from the UI. We are using KQL, at least a subset of it, see the search in the dev docs

With ocis_full on hetzer, such a test could be done...

@mmattel mmattel requested a review from kobergj June 20, 2025 08:36
Comment thread modules/ROOT/pages/deployment/services/s-list/search.adoc Outdated
Co-authored-by: kobergj <juliankoberg@googlemail.com>
@mmattel mmattel merged commit 8cbe9ea into master Jun 20, 2025
1 check passed
@mmattel mmattel deleted the search_can_now_be_instantinated branch June 20, 2025 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow scaling of the search service

3 participants