Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATAES-233 Utility class to support rolling index strategy #184

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@zzt93
Copy link
Contributor

commented Jul 5, 2017

  • You have read the Spring Data contribution guidelines.
  • There is a ticket in the bug tracker for the project in our JIRA.
  • You use the code formatters provided here and have them applied to your changes. Don’t submit any formatting related changes.
  • You submit test cases (unit or integration tests) that back your changes.
  • You added yourself as author in the headers of the classes you touched. Amend the date range in the Apache license header if needed. For new types, add the license header (copy from another file and set the current year only).

zzt93 added some commits Jul 4, 2017

DATAES-233 - Thread-safe utility for rolling index and more explicit …
…annotations for a set of similar indices.

The utility class should be registered as a bean (or spring boot auto config) to be used in index name expression language, which is a wrapper of thread local variable to avoid multi-thread issues.
DATAES-233 - Thread-safe utility for rolling index and more explicit …
…annotations for a set of similar indices.

Add test cases for utility class.
DATAES-233 - Thread-safe utility for rolling index and more explicit …
…annotations for a set of similar indices.

Update bugs in test cases and add comments in `Documents` annotation.
DATAES-233 - Thread-safe utility for rolling index and more explicit …
…annotations for a set of similar indices.

Add ticket in test.
DATAES-233 - Thread-safe utility for rolling index and more explicit …
…annotations for a set of similar indices.

Remove deprecated constructor.

@odrotbohm odrotbohm force-pushed the spring-projects:master branch from 020b5e1 to ba3eba5 Jun 18, 2018

@edwardcapriolo

This comment has been minimized.

Copy link

commented Apr 27, 2019

Wouldnt using aliases accomplish the same thing?

@zzt93

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2019

It seems not same.

  • Alias use different names point to a single index;
  • Index sharding using same clazz with suffix to point to multiple indicies, to use different index at different situation;
@edwardcapriolo

This comment has been minimized.

Copy link

commented May 6, 2019

The complex part will be how to decide what index to writer to and read from fo a given search.You end up trying to build a table-by-day like soluttion

Page<Chat> s1 = repository.findBySender("s1", PageRequest.of(0, 10));
// then
assertThat(s1.getTotalElements() == 2, is(true));
System.out.println(s1.getContent());

This comment has been minimized.

Copy link
@edwardcapriolo
Page<Chat> s1 = repository.findBySender("s1", PageRequest.of(0, 10));
// then
assertThat(s1.getTotalElements() == 1, is(true));
System.out.println(s1.getContent());

This comment has been minimized.

Copy link
@edwardcapriolo
@Getter
@NoArgsConstructor
@Builder
@Documents(indexPattern = "chat_#{indexNameVar.toString()}", type = "chat")

This comment has been minimized.

Copy link
@edwardcapriolo

edwardcapriolo May 6, 2019

Can this reference a field in the document that contains date information?It feels like the reference should be to something in the object to be persisted

This comment has been minimized.

Copy link
@edwardcapriolo

edwardcapriolo May 6, 2019

@field(name="createDate" store=false index= false)
private Date createDate

or @IndexVar
String getIndexName(){
return //date based logic based on pojo
}

@documents(indexPattern = "chat_#{getIndexName()}", type = "chat")

@mp911de mp911de force-pushed the spring-projects:master branch from 01cda35 to b6fa4c8 May 6, 2019

@edwardcapriolo

This comment has been minimized.

Copy link

commented Jul 16, 2019

Can one of the contributers comment on the approach and the code. table by day is very popular and if there is no support many of my applications can not use spring-data-elastic

@zzt93

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

@edwardcapriolo This PR is opened two years ago, I will try to recall the thought and consider your suggestions to adjust the implementation.

@sothawo

This comment has been minimized.

Copy link
Collaborator

commented Jul 17, 2019

I just wrote a comment in the jira issue, I would like to first challenge this solution, if it the best way to solve a problem like rolling indices.

@zzt93

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

@sothawo I read your comment in jira. As far as I can see, the best practice is to put the index template in ES at the beginning, so we don't need to do any check for index existence and let ES do other thing. Then, we can just save entity to ES easily.

As far as about the index name binding mechanism, @edwardcapriolo provides a suggestions: using a field in that entity, which is more intuitive. In that case, we can't use a global bean for root of Spring EL. We need to set the root object (Again, using thread local context to parse index name) before saving this kind of document, however, in that case, It seems only making index obj a little easier, and not for all other operation. Besides, it break the backward compatibility, so I wonder whether it is a better solution.

@sothawo How do you think?

@sothawo

This comment has been minimized.

Copy link
Collaborator

commented Jul 17, 2019

why a ThreadLocal? What if some process submits tasks to a threadpool that insert into ES? Then these tasks run on different threads but should probably use the same index, although there are different name reolvers?

Having the option to define a field on the object to provide the indexname - perhaps together with a @Transient annotation to not store it in the index - might be a solution which could override the indexname set in the @Document annotation. Then the indexname creation is completely out of Spring Data Elasticsearch and up to the code creating the objects to be stored in the index.

But we need to have control over the creation of the index, because if we let Elasticsearch just create an index and store some document in it, then Elasticsearch will use the default mapping for the properties of the document, and we loose all the information that might be annotated with @Field or the other relevant configurations that must be put into the mapping.

@zzt93

This comment has been minimized.

Copy link
Contributor Author

commented Jul 18, 2019

Maybe I not express clearly, but ThreadLocal is not the core of index suffix. The suffix must be decided by biz logic (may be time or id etc), then use a threadlocal bean to store and parse. ThreadLocal is just to avoid thread issue in a web service, any other way to solve concurrent bean access is fine.

When it comes to index creation, index template is more convenient and suitable for this kind index partitioning as its official guide said. We can use @Field and relevant conf to create right index template, rather than manually create index.

@sothawo

This comment has been minimized.

Copy link
Collaborator

commented Jul 18, 2019

The creation of a name for the index is business logic, that's right. The question is, how is the index name for the current document passed into Spring Data Elasticsearch?

This can be for example by a bean resolved via SpEL. If this is not feasible, we could add an annotation like @IndexName which can be put on a property of the document designating the value of this property being the index name.

But there is the problem that for new indices, the ElasticsearchOperations.createIndex() and ElasticsearchOperations.putMapping() methods need to be called to have a consistent mapping across the different indices. Applications using ElasticsearchOperations implementations (*Template classes) directly, they can handle this by themselves.

But when the *Repository classes are used, this must be handled be the Repository implementations and this is currently not supported. (1)

A different solution would be to support index templates like you mentioned, but this is not implemented yet. (2)

I will create two tickets later for (1) and (2)

Edit:
when implementing (2) we would not need any custom index creation and mapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.