Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-DOC Document remaining indexing & query features #10788

Merged
merged 5 commits into from Apr 14, 2023

Conversation

fax4ever
Copy link
Contributor

@fax4ever fax4ever commented Apr 5, 2023

@fax4ever fax4ever requested a review from domiborges April 5, 2023 06:26
@fax4ever fax4ever changed the title ISPN-DOC Document update index schema ISPN-DOC Document remaining indexes feature Apr 5, 2023
@fax4ever fax4ever changed the title ISPN-DOC Document remaining indexes feature ISPN-DOC Document remaining indexing & query features Apr 5, 2023

[source,options="nowrap",subs=attributes+]
----
POST /v2/caches/{cacheName}/search/indexes?action=mass-index
POST /v2/caches/{cacheName}/search/indexes?action=reindex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fax4ever I saw that mass-index is deprecated. Is this ok?

Rebuilding indexes can take a long time to complete because the process takes place for all data in the grid.
While the rebuild operation is in progress, queries might also return fewer results.

[NOTE]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfink mentioned this information. I'm not sure if it applies to reindexing and also updating indexes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean with reindex vs. updating index.
To me reindex is to delete and create a new one - because every entry changed because of config
updating index is to change one entry because the data of it changed (which is the ongoing work)

@domiborges
Copy link
Member

@fax4ever thank you so much for the docs. I added some docs suggestions. I'll review Querying ProtoStream Common Types when I'm back on Tuesday.

Copy link
Contributor Author

@fax4ever fax4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job @dvagnero. I have only few further ides here.

The alternative to reindex is the update index schema, which allows you to acquire some schema changes without touching the preexisting index data.
Operation is much faster, but it is not applicable to all the changes:
= Updating index schema
Update index schema operation lets you acquire additional schema changes without downtime.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\without downtime\with a minimal downtime\s


[IMPORTANT]
====
Update index schema operation can be used only if we have done only additive changes, meaning that preexisting fields shouldn't be
affected by the changes.
When you change definitions of indexed types or analyzers or when you delete them, you must rebuild the index.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it is the same concept. What I wanted to stress here is the fact that using this function is supposed to be done only if the changes on the schema do not affect pre-existing indexed fields.

= Updating index schema
Update index schema operation lets you acquire additional schema changes without downtime.
When the schema changes don't affect the existing schema model, you can update the index schema without removing and recreating previously indexed data.
Updating index schema is much faster than rebuilding the index but you can update schema only when your changes does not affect existing index schema.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say when your changes do not affect pre-existing indexed fields.

== Re-indexing Data
Re-index all data in caches with `POST` requests and the `?action=mass-index` parameter.
== Rebuilding indexes
When you change existing Protobuf schema definition at runtime or when the index is persistent, you must reindex the data in the index.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the sentence:
when the index is persistent, you must reindex the data in the index.

I would write something more general:
or in case of any kind of misalignment between the cache data and the indexes

== Update index schema
Update the index schema in caches with `POST` requests and the `?action=updateSchema` parameter.
== Updating index schema
Update index schema operation lets you acquire additional schema changes without downtime.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write with a minimal downtime

Comment on lines 40 to 46
[IMPORTANT]
====
When you change existing Protobuf schema definition, you must reindex the data in the index.
Rebuilding indexes has impact on performance and can take a long time to complete because the process takes place for all data in the grid.
While the rebuild operation is in progress, queries might also return fewer results.
====

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that the change can not have an impact on the indexes, e.g.: if the related caches are not indexed.
I would write something more general, such as:

If the change has an impact on some indexed caches, these should be aligned if we want the change to be visible for them. To align the index schema is it possible to reindex (valid for the general case) or update the index schema (only if the changes do not touch pre-existing fields).

@domiborges
Copy link
Member

HI @fax4ever I made some updates based on your feedback. I'm still working on the BigInteger and BigDecimal. Will have updates tomorrow

Copy link
Contributor Author

@fax4ever fax4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dvagnero, the changes look great.
I would say that the "Document update index schema" is mergeable.
Thanks for reviewing also the "Document BigDecimal and BigInteger Ickle queries" changes.

@domiborges
Copy link
Member

@fax4ever let me know what you think about the updates.

@fax4ever fax4ever merged commit 45a32d8 into infinispan:main Apr 14, 2023
3 of 4 checks passed
@fax4ever fax4ever deleted the doc branch April 14, 2023 15:45
@fax4ever
Copy link
Contributor Author

The changes looks great, thanks @dvagnero.
I think we've done here. So I merged it.
I'm going to prepare the backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants