[TMP] IBX-11146: Updated filtering logic to use LanguageCode criterion with {terms} clause #107
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



Related PRs:
Description:
Problem
When many languages are configured in Ibexa (e.g., >20-30), the language fallback logic in
NativeCoreFiltergenerates an exponentially growing number of boolean clauses. This causes Solr queries to fail with amaxBooleanClauseserror (default limit is 1024).The problematic query structure for exclusion was:
NOT (content_language_codes_ms:"lang1" OR content_language_codes_ms:"lang2" ...)For a fallback chain, this exclusion is repeated for every priority level, leading to O(N^2) complexity in clause count.
Solution
Optimized the query generation to use Solr's Terms Query Parser (
{!terms}) for language exclusions. This parser treats a list of values as a single query clause, regardless of the number of elements.The new query structure is:
_query_:"{!terms f=content_language_codes_ms}lang1,lang2..."This reduces the complexity from O(N^2) boolean clauses to O(N) single clauses, effectively bypassing the
maxBooleanClauseslimit for this use case.Applies
strtolower()to values because the Terms parser bypasses analysis, but the underlying field (content_language_codes_ms) is lowercased in the schema.For QA:
Dont focus at the 50 languages thing - check if site access combinations languages work properly, take
alwaysAvailableflag into consideration, fallback, default languages and so on.Documentation: