Skip to content

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Jun 9, 2021

A change that went in a while ago has made changes to the wildcard query functionality. It now functions as it had historically. This PR updates the testing logic to make sure the wildcard tests account for these differences.

@jbaiera
Copy link
Member Author

jbaiera commented Jun 9, 2021

See also elastic/elasticsearch#71751

@danhermann danhermann self-requested a review June 10, 2021 15:14
def testDataSourcePushDown12And() {
val df = esDataSource("pd_and")
var filter = df.filter(df("reason").isNotNull.and(df("airport").endsWith("O")))
var filter = df.filter(df("reason").isNotNull.and(df("tag").equalTo("jan")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand why df("airport").endsWith("O") changed to df("tag").equalTo("jan") here and on sql-20/AbstractScalaEsSparkSQL.scala:1104?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endsWith predicate translates into a wildcard query. Usually we would lowercase the O character to make the query string be *o. Wildcard queries are term level queries and do not have analyzers applied to them. When run in "strict" mode, the query keeps the character uppercased (*O) which does not match the term in lucene. Lowercasing the O in the test would lead to Spark filtering the data out from the results (since "o" != "O").

I swapped the endsWith predicate on the And-testing-methods because I wanted to bring back positive matches from the query in all test cases instead of assuming empty results when just strict mode is enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more concisely: The bug deals with strict mode being enabled, and this test shouldn't be affected by that setting. Removing the feature that does depend on the setting from the test (hopefully) makes it simpler to reason about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, thanks for the explanation. LGTM!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jbaiera jbaiera merged commit 42dfee6 into elastic:master Jun 10, 2021
@jbaiera jbaiera deleted the fix-failing-tests branch June 10, 2021 21:13
jbaiera added a commit to jbaiera/elasticsearch-hadoop that referenced this pull request Jun 10, 2021
* Handle changes to the wildcard query in strict mode

* fix more test
jbaiera added a commit to jbaiera/elasticsearch-hadoop that referenced this pull request Jun 10, 2021
* Handle changes to the wildcard query in strict mode

* fix more test
jbaiera added a commit that referenced this pull request Jun 10, 2021
* Handle changes to the wildcard query in strict mode

* fix more test
jbaiera added a commit that referenced this pull request Jun 10, 2021
* Handle changes to the wildcard query in strict mode

* fix more test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants