Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of RegexpQuery #11896

Conversation

Projects
None yet
4 participants
@alexksikes
Copy link
Contributor

commented Jun 26, 2015

Relates to #10217

This PR is against the query-refactoring branch.

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 1, 2015

heya @alexksikes sorry may I ask you to rebase please? #11974 changed things quite a bit, for the better!

@alexksikes alexksikes force-pushed the alexksikes:feature/query-refactoring-regexp branch Jul 2, 2015

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 2, 2015

@javanna OK you can go ahead, it's rebased. Thank you.

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated
flags.add(flag);
}
}
query.flags(randomFrom(flags));

This comment has been minimized.

Copy link
@javanna

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 2, 2015

Author Contributor

I think what I wanted to do is generate arrays of unique flags. I'll fix this.

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated
assertThat(commonTermsQueryBuilder.validate().validationErrors().size(), is(1));

commonTermsQueryBuilder = new RegexpQueryBuilder("field", "regex");
assertNull(commonTermsQueryBuilder.validate());

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

very minor comment, but with randomization we could also test the case when there are multiple errors. Just randomize the builder that you create, based on what you set to it keep track of how many errors you expect, then base the assertion on it. A single randomized builder creation and a single assertion at the end that will cover all the cases.

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 2, 2015

Author Contributor

I'm not sure to know how to make this builder fail a random number of times.

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

by passing in "", null I think? you can skip randomizing with such a simple query if you want though, just add the case where you have both errors at the same time.

@javanna

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java Outdated

private final String fieldName;

private final String value;

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

I think the regexp should be an object here. We should be consistent with what we did in term query and similar. The value is an object and can either be a number, string or BytesRef. We use internally bytesref for string when parsing through the parser, but java api users can set string and we take care of the conversion.

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 2, 2015

Author Contributor

Yes I was wondering, should we be consistent across all queries? Here RegexpQueryBuilder takes a String, all that was changed is the name of the variable from rexgexp to value.

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

well if it was a string all the way I wouldn't change it maybe, but given that the parser parses an object, I think this was a bug in the java api previously. We should rather have an Object here than rcompareed to moving to Strings everywhere

@javanna

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryParser.java Outdated
@@ -72,11 +61,11 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars
currentFieldName = parser.currentName();
} else {
if ("value".equals(currentFieldName)) {
value = parser.objectBytes();
value = parser.text();

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

this should stay as-is I think

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 2, 2015

Author Contributor

I changed this to parser.text() because the original builder only takes strings. This should still correctly parse 1, only that it will be converted to "1". Do we want to make all query builders that originally take strings also take objects such as number, BytesRef or dates? I am happy either way.

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

before the refactoring the parser holds most of the code. builders are used only from the java api and what they do is they allow you to set stuff which gets then serialized in json format anyway for the parser to parse it on each data node. if the parser previously supported objects, but the builder only strings, that is a bug in the java api to me. What rules is the parser, and we should do what the parser supported in our new builder, which will be also used for communication between the nodes when the refactoring is done, meaning that if we keep string the bug will not only be in the java api but in es core in general.

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 2, 2015

Author Contributor

Thanks for your answer. Yes that makes it clear. So we should consider this a bug in the java API and fix it.

@javanna

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryParser.java Outdated
@@ -96,7 +85,7 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars
queryName = parser.text();
} else {
fieldName = currentFieldName;
value = parser.objectBytes();
value = parser.text();

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

this should stay as-is I think

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 2, 2015

I did a first round of review, left a few comments

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated

@Test
public void testValidate() {
RegexpQueryBuilder commonTermsQueryBuilder = new RegexpQueryBuilder("", "regex");

This comment has been minimized.

Copy link
@javanna

javanna Jul 2, 2015

Member

rename variable to regexpQueryBuilder?

@javanna

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java Outdated

private final String fieldName;

private final Object value;

This comment has been minimized.

Copy link
@javanna

javanna Jul 6, 2015

Member

same as I mentioned in the prefix query PR, wa may want to backport this fix for the java api to master.

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 6, 2015

Author Contributor

ok here #12067

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

I meant changing the value to Object from String... not sure what #12067 does to be honest. Does it fix a bug?

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 7, 2015

Author Contributor

The documentation mentions that default regex flag is ALL which isn't ordinal value of -1.

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

ok didn't know that. yet another bug fixed in master then it seems

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated

if (randomBoolean()) {
List<RegexpFlag> flags = new ArrayList<>(); // used to test combination of flags
for (RegexpFlag flag : RegexpFlag.values()) {

This comment has been minimized.

Copy link
@javanna

javanna Jul 6, 2015

Member

why not deciding upfront how many flags you want (randomized) and then call randomFrom(RegexpFlag.values()) n times?

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated
}
value = randomDouble();
break;
default:

This comment has been minimized.

Copy link
@javanna

javanna Jul 6, 2015

Member

no tests for dates, they are not supported here?

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 6, 2015

left a few comments

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 7, 2015

@javanna thanks for the review, I updated the PR.

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/BaseQueryTestCase.java Outdated
@@ -131,6 +132,7 @@ protected void configure() {
String type = randomAsciiOfLengthBetween(1, 10);
mapperService.merge(type, new CompressedXContent(PutMappingRequest.buildFromSimplifiedDef(type,
DATE_FIELD_NAME, "type=date",
DATE_FIELD_NAME_REGEX, "type=date,format=YYYY\\-MM",

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

couldn't we re-use the existing date field that we already have?

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 7, 2015

Author Contributor

i did try that but got failures when it gets to handle regex on dates. The accepted date format need all special regex parameters to be escaped, so the regex (here just a plain date) could be valid.

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated

if (randomBoolean()) {
List<RegexpFlag> flags = new ArrayList<>();
for (int i = 0; i < randomInt(5); i++) {

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

randomInt cannot be within the loop, otherwise it changes at every iteration...

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 7, 2015

Author Contributor

late night coding leads to silly mistakes like that :) Thanks for catching it!

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated
// NO COMMIT: although this is correctly accepted as a date
// the regex query will miserably fail with a end-of-string expected at position ..
// does it even make sense to have regex on dates?
fieldName = DATE_FIELD_NAME_REGEX;

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

I am not sure either, I was just wondering what we support and what we do not support. can you test against master?

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 7, 2015

I did another round of review, maybe @cbuescher wants to have a quick look too?

@cbuescher

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java Outdated
/**
* Sets the regexp maxDeterminizedStates.
*/
public RegexpQueryBuilder maxDeterminizedStates(int value) {
this.maxDeterminizedStates = value;
this.maxDetermizedStatesSet = true;

This comment has been minimized.

Copy link
@cbuescher

cbuescher Jul 7, 2015

Member

This was a bug previously? Seems so, just trying to understand.

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

oh boy I didn't spot this :) we should definitely backport this to master

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 7, 2015

Author Contributor

while refactoring some of the queries I did find quite a few bugs just like that. Should we always backport them to master? What's our overall policy on this matter?

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

I think you can answer for yourself at least for this one: would you be comfortable with 2.0 released with still this bug in it? Mayeb this fix should even be backported to 1.x, it's a bug and should be treated as such

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 7, 2015

Author Contributor

If you ask me I'd be tempted to say that a bug is a bug and should definitely be backported. However, there are bugs which are only relevant in master, and so I thought that the query refactoring branch would be merged to master before 2.0 release. For this bug, I'll submit a fix for sure.

This comment has been minimized.

Copy link
@javanna

javanna Jul 7, 2015

Member

good! as for when we merge the branch...well we will do it when it's ready, most likely not before 2.0 but we don't know yet. One other thing about backporting fixes is that the branch is already big enough with the changes that we are making. If we can isolate non related fixes we simplify things a lot and clarify what happened when for the future.

This comment has been minimized.

Copy link
@javanna

javanna Jul 23, 2015

Member

didn't you send a PR against master to fix this?

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 23, 2015

Author Contributor

it wasn't a bug: #12083

@cbuescher

This comment has been minimized.

Copy link
Member

commented Jul 7, 2015

Did a quick round of review as well. Beeing late to the party, I first found the change from allowed regex in Builder from String to general Object confusing, also introducing much complexity that wasn't there before. I tried to follow the argument in the comments, especially seeing that value was Object and also parsed as that in the current parser code. However I feel that using numbers/date values as regex is a little bit strange, would be interested in the reason why this is allowed in the parser (it apparently was introduced long ago with Commit 22077d1).
Looking at how value = parser.objectBytes(); currenty cannot return Date object, I think this doesn't have to be supported since any value will be converted using toString() later anyway.

alexksikes added a commit to alexksikes/elasticsearch that referenced this pull request Jul 7, 2015

Fix RegexpQueryBuilder to support an Object value
The parser takes an Object value, so should the builder.

Relates to elastic#11896
@javanna

This comment has been minimized.

Copy link
Member

commented Jul 7, 2015

I share your concerns @cbuescher around supporting Object rather than String, but thought that we already support objects everywhere (parser + mappers) so this really seemed like a leftover in the java api and it makes sense to fix it. let's ask for a second opinion from @jpountz ? :)

alexksikes added a commit to alexksikes/elasticsearch that referenced this pull request Jul 7, 2015

fix RegexpQueryBuilder#maxDeterminizedStates
Value was improperly set to `true`.

Relates to elastic#11896

alexksikes added a commit that referenced this pull request Jul 7, 2015

fix RegexpQueryBuilder#maxDeterminizedStates
Value was improperly set to true.

Relates to #11896
Closes #12083

alexksikes added a commit that referenced this pull request Jul 7, 2015

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 13, 2015

@javanna Thanks for the review. I updated the PR accordingly.

@javanna

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryBuilder.java Outdated

/**
* Returns the value used in this query.
* If necessary, converts internal {@link org.apache.lucene.util.BytesRef} representation back to string.

This comment has been minimized.

Copy link
@javanna

javanna Jul 13, 2015

Member

this comment is outdated

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated
public class RegexpQueryBuilderTest extends BaseQueryTestCase<RegexpQueryBuilder> {

private static final String[] REWRITE_METHOD = new String[]{"constant_score_auto", "scoring_boolean", "constant_score_boolean",
"constant_score_filter", "top_terms_boost_1", "top_terms_1"};

This comment has been minimized.

Copy link
@javanna

javanna Jul 13, 2015

Member

didn't we say that we are going to use constants from QueryParsers? maybe I am missing something though

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 13, 2015

Author Contributor

Yes I was thinking about getting the method after a rebase, after getting Fuzzy in.

@javanna

View changes

core/src/main/java/org/elasticsearch/index/query/RegexpQueryParser.java Outdated

Object value = null;
String value = null;

This comment has been minimized.

Copy link
@javanna

javanna Jul 13, 2015

Member

we shouldn't make this change here in the branch, but rather in master. Let's get that one in first in master and then this one in our branch.

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 13, 2015

did another round, left a few comments

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 13, 2015

@javanna I updated the PR accordingly. Thank you.

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 13, 2015

I think this is ready (besides my last small comment above), but it needs some rebasing before it can get in

@alexksikes alexksikes force-pushed the alexksikes:feature/query-refactoring-regexp branch Jul 13, 2015

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 13, 2015

@javanna It's rebased

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 13, 2015

I think we may need to rebase again after #12200 I would wait for that one.

alexksikes added a commit to alexksikes/elasticsearch that referenced this pull request Jul 13, 2015

alexksikes added a commit that referenced this pull request Jul 16, 2015

@alexksikes alexksikes force-pushed the alexksikes:feature/query-refactoring-regexp branch Jul 23, 2015

@alexksikes

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2015

@javanna It's rebased you can take a look now.

@javanna

View changes

core/src/test/java/org/elasticsearch/index/query/RegexpQueryBuilderTest.java Outdated

@Override
protected Query doCreateExpectedQuery(RegexpQueryBuilder queryBuilder, QueryParseContext context) throws IOException {
// NO COMMIT: fix to be removed to avoid NPE on unmapped fields

This comment has been minimized.

Copy link
@javanna

javanna Jul 23, 2015

Member

you can replace with //norelease so we don't forget but at least you can get this in while we fix this problem in master.

This comment has been minimized.

Copy link
@alexksikes

alexksikes Jul 23, 2015

Author Contributor

sure i'll replace that

@javanna

This comment has been minimized.

Copy link
Member

commented Jul 23, 2015

left two minor comments, LGTM though

alexksikes added a commit that referenced this pull request Jul 23, 2015

Refactoring of RegexpQuery
Relates to #10217
Closes #11896

This PR is against the query-refactoring branch.

@alexksikes alexksikes closed this Jul 23, 2015

@alexksikes alexksikes force-pushed the alexksikes:feature/query-refactoring-regexp branch to 93674b8 Jul 23, 2015

@alexksikes alexksikes deleted the alexksikes:feature/query-refactoring-regexp branch Jul 23, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Refactoring of RegexpQuery
Relates to elastic#10217
Closes elastic#11896

This PR is against the query-refactoring branch.

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

fix RegexpQueryBuilder#maxDeterminizedStates
Value was improperly set to true.

Relates to elastic#11896
Closes elastic#12083

@clintongormley clintongormley removed the review label Aug 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.