Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-6395 Unify clustered queries with non clustered queries #5600

Closed
wants to merge 5 commits into from

Conversation

gustavocoding
Copy link

@gustavocoding gustavocoding commented Nov 21, 2017

https://issues.jboss.org/browse/ISPN-6395

Preview, just to gather feedback (from @anistor mainly) on the backwards compatible API changes.

This PR introduces a IndexQueryQueryMode that specifies how an indexed query is executed. BROADCAST mean the query is sent to all nodes (and results aggregated), while CALLER FETCH (the default) executes the query directly (thus needs a distributed index to work).

Why is this relevant? Because with BROADCAST, each node can index its own data on its own index, so both indexing and querying is way more scalable than having a single global index handled by the InfinispanIndexManager

@gustavocoding
Copy link
Author

Updated

Copy link
Member

@tristantarrant tristantarrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal stuff

ConfigurationBuilder configurationBuilder = new ConfigurationBuilder();
configurationBuilder.clustering().cacheMode(CacheMode.LOCAL);
configurationBuilder.indexing().index(Index.ALL)
.addProperty("default.directory_provider", "ram");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ram/local-heap/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Muscle memory :)

*
* @since 9.2
*/
public enum IndexedQueryMode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also have an AUTO mode which chooses the best strategy depending on the underlying indexing mode ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I was planning to add it later (another PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for introducing an AUTO mode. I think that mode should become our default. Do we actually need a value called DEFAULT?

I think the one we now call DEFAULT should be named differently. We need to find a better name that describes in a word that: 'it is executed in a single phase in the caller'. What is the opposite of clustered/distributed? Can't find a good name right now. :)

And then we could also add a DEFAULT enum value, for convenience, that is just an alias to one of the other constants (and warn users that the DEFAULT might be different in next major ispn version, as we might invent new query mechanisms). :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTO will come in a later PR, I possibly will be able to squeeze it in before the year ends.

Suggestions for renaming DEFAULT ?

  • SINGLE_PHASE
  • SINGLE_STEP
  • LOCAL
  • NON_BROADCAST :)
  • CALLER

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking and all the names above have issues, apart for NON_BROADCAST, but this is silly. I'll go with FETCH which I believe reflects accurately the fact that the query is run locally and will read the index, potentially from remote nodes.

@gustavocoding
Copy link
Author

gustavocoding commented Dec 6, 2017

Updated again. I removed one commit which was just about refactoring the REST search related tests, as broadcast support for REST is not 100% yet.

@gustavocoding
Copy link
Author

updated one more time. Removed a "implements Serializable" left behind.

@anistor
Copy link
Member

anistor commented Dec 6, 2017

I'll have a look again also.

@gustavocoding
Copy link
Author

CI seems unstable. Triggering another build

* Creates a Query based on an Ickle query string
* @param queryMode the {@link IndexedQueryMode} dictating the indexed query execution mode if applicable.
*/
Query create(String queryString, IndexedQueryMode queryMode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking for the way to build a QueryBuilder-DSL based query and also specify IndexedQueryMode but could not find it. Maybe I'm missing something.

And another thing, I'm not sure whether we need to specify IndexedQueryMode at query creation time; maybe it's better to specify it at execution time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not touch the DSL, I though it'd be in maintenance mode only?

WRT IndexedQueryMode at execution time, I've chosen at creation time because:

  • It's how it's working now: SearchManager.getQuery vs SearchManager.getClusteredQuery
  • I tried to avoid having to change the API everywhere, i.e., query.list(mode), query.iterator(mode), query.getResultSize(mode) for each of the query types

@gustavocoding
Copy link
Author

CI is green!

@gustavocoding gustavocoding added this to the 9.2.0.Beta2 milestone Dec 7, 2017
return firstResult;
}

public void sort(Sort sort) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setSort? to match with getSort.

return hsQuery;
}

public void setMaxResults(int maxResults) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep getter-setter pairs like getMaxResults/setMaxResults and getFirstResult/setFirstResult together without mising them with other methods like sort.

@gustavocoding
Copy link
Author

Updated again. Exposed the IndexedQueryMode for Rest and Hot Rod. Did not change previous commits, so the reviews done so far are still valid!


public HSQuery getHsQuery(AdvancedCache<?, ?> cache) {
if (hsQuery == null) {
QueryEngine queryEngine = cache.getComponentRegistry().getComponent(EmbeddedQueryEngine.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to pass the EmbeddedQueryEngine somehow to QueryDefinition instead of grabbing it from the component registry? Maybe in the QueryDefinition(String queryString) constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no, the QueryDefinition is what is broadcast, so the serialization layers is responsible to construct it.

@anistor
Copy link
Member

anistor commented Dec 7, 2017

Ok. I'm still looking here at some aspects. Please do not merge it yet.

@gustavocoding
Copy link
Author

Addressed reviews, changed IndexedQueryMode.DEFAULT to IndexedQueryMode.FETCH and updated documentation.

return super.maxResults(maxResults);
}

@Override
public CacheQuery<E> firstResult(int firstResult) {
this.firstResult = firstResult;
this.queryDefinition.setFirstResult(firstResult);
return this;
}

@Override
public CacheQuery<E> sort(Sort sort) {
this.sort = sort;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the sort field used at all? I see it being set but don't see it used anywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort is used in the ctor of DistributedIterator

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, was looking at the wrong branch. sort is unused, I'll remove it

}

public void setNamedParameters(Map<String, Object> params) {
if (params != null) params.forEach(this.namedParameters::put);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is forEach better that putAll? I suppose we also need to take care of the case when params are null.

   if (params == null) {
       namedParameters.clear();
   } else {
       namedParameters.putAll(params); 
   }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (parsingResult.hasGroupingOrAggregations()) {
throw log.groupAggregationsNotSupported();
}
LuceneQueryParsingResult luceneParsingResult = transformParsingResult(parsingResult, EMPTY_MAP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing EMPTY_MAP with emptyMap() spares of a generics warning.

@gustavocoding
Copy link
Author

Updated

@@ -97,10 +97,14 @@ protected ModelFactory getModelFactory() {
return cache;
}

protected int getNodesCount() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used. The derived class overrides createCacheManagers anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, let me sort this out

output.writeUTF(object.getQueryString().get());
} else {
output.writeBoolean(false);
output.writeObject(object.getHsQuery(null));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this can work with null cache parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, let me clean this up

@gustavocoding
Copy link
Author

addressed latest reviews

}

public HSQuery getHsQuery() {
return hsQuery;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this method should throw an IllegalStateException if hsQuery is null, ie initialize(...) was not called prior to this.

@@ -57,11 +60,26 @@ public CacheQueryImpl(Query luceneQuery, SearchIntegrator searchFactory, Advance
cache, keyTransformationHandler);
}

public CacheQueryImpl(Query luceneQuery, SearchIntegrator searchFactory, AdvancedCache<?, ?> cache,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the constructors in this class is no longer used.

@@ -59,10 +61,12 @@ private QueryRequest getQueryRequest() throws IOException {
if (request.method() == HttpMethod.GET) {
String queryString = getParameterValue(QUERY_STRING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole body of this ifstatement would look better if extracted as a separate getQueryFromString method in the spirit of getQueryFromJSON

public ClusteredQueryCommandWorker getCommand(Cache<?, ?> cache, HSQuery query, UUID lazyQueryId,
int docIndex) {
public ClusteredQueryCommandWorker getCommand(Cache<?, ?> cache, QueryDefinition queryDefinition, UUID lazyQueryId,
int docIndex) {
ClusteredQueryCommandWorker command = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declaration and initialization of command can be done in same line.

} else {
queryDefinition.initialize(cache);
HSQuery hsQuery = queryDefinition.getHsQuery();
CacheQuery cacheQuery = new CacheQueryImpl<>(hsQuery, cache, keyTransformationHandler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two line can become return new CacheQueryImpl<E>(hsQuery, cache, keyTransformationHandler); to avoid the warning.

@anistor
Copy link
Member

anistor commented Dec 13, 2017

I'm happy with this refactoring. I can still spot some small design issues that we can fix later.

The existence of RemoteQueryDefinition and HsQueryRequest seems to be a symptom of misplaced responsibility. It all starts with QueryDefinition.initialize, which IMO should actually be placed inside QueryEngine, not QueryDefinition. Doing that refactoring will remove the need for RemoteQueryDefinition, which now exists just to differentiate between embedded and remote case, but that differentiation can be done inside the query engine itself. Also, HsQueryRequest is just a data holder that carries the return value of QueryEngine.createHsQuery. If QueryDefinition.initialize is moved to QueryEngine we would also not need this anymore.

I did not think about it in detail but maybe we would also need to make QueryDefinition mutable for QueryEngine and immutable for external parties. In that case we can extract QueryDefinition as an immutable interface (exposing getters only) and it's implementation class could have package local setters accessible to QueryEngine only.

But let's leave those improvements for another day. I'll merge this today as it is after you have applied the last 2-3 minor changes I suggested. Thanks @gustavonalle !

@gustavocoding
Copy link
Author

Updated

@gustavocoding
Copy link
Author

@anistor Created https://issues.jboss.org/browse/ISPN-8628 to further refactor it

@anistor
Copy link
Member

anistor commented Dec 13, 2017

Integrated in master. Thanks @gustavonalle !

1 similar comment
@anistor
Copy link
Member

anistor commented Dec 13, 2017

Integrated in master. Thanks @gustavonalle !

@anistor anistor closed this Dec 13, 2017
LuceneQueryParsingResult luceneParsingResult = transformParsingResult(parsingResult, nameParameters);
org.apache.lucene.search.Query luceneQuery = makeTypeQuery(luceneParsingResult.getQuery(), luceneParsingResult.getTargetEntityName());
SearchIntegrator searchFactory = getSearchFactory();
HSQuery hsQuery = metadata == null ? searchFactory.createHSQuery(luceneQuery) : searchFactory.createHSQuery(luceneQuery);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here you intended to write HSQuery hsQuery = metadata == null ? searchFactory.createHSQuery(luceneQuery) : searchFactory.createHSQuery(luceneQuery, metadata);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this here: 29e92ee

@gustavocoding gustavocoding deleted the ISPN-6395 branch February 20, 2018 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants