New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default position_offset_gap to 100 #12544

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@nik9000
Contributor

nik9000 commented Jul 29, 2015

This is much more fiddly than it looks because of the way position_offset_gap
is applied in StringFieldMapper. Instead of setting the default to 10 its
simpler to make sure that all the analyzers default to 10 and that
StringFieldMapper doesn't override the default unless the user specifies
something different.

Closes #7268

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jul 29, 2015

Contributor

WIP because it needs tests and more testing. This is way way harder then it ought to be.

Contributor

nik9000 commented Jul 29, 2015

WIP because it needs tests and more testing. This is way way harder then it ought to be.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jul 29, 2015

Contributor

Oh and it needs documentation changes and breaking changes documentation.

Contributor

nik9000 commented Jul 29, 2015

Oh and it needs documentation changes and breaking changes documentation.

@rmuir

This comment has been minimized.

Show comment
Hide comment
@rmuir

rmuir Jul 30, 2015

Contributor

Which one is it? the position gap or the offset gap? Lucene has both, and they both have different meanings. Changing the former to this value makes sense, the latter will break many things.

Contributor

rmuir commented Jul 30, 2015

Which one is it? the position gap or the offset gap? Lucene has both, and they both have different meanings. Changing the former to this value makes sense, the latter will break many things.

@rmuir

This comment has been minimized.

Show comment
Hide comment
@rmuir

rmuir Jul 30, 2015

Contributor

And please, please, we have to rename this, because its totally meaningless. positions and offsets are separate things...

Contributor

rmuir commented Jul 30, 2015

And please, please, we have to rename this, because its totally meaningless. positions and offsets are separate things...

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jul 30, 2015

Contributor

I understand the rename. It's the positions.
On Jul 29, 2015 8:02 PM, "Robert Muir" notifications@github.com wrote:

And please, please, we have to rename this, because its totally
meaningless. positions and offsets are separate things...


Reply to this email directly or view it on GitHub
#12544 (comment)
.

Contributor

nik9000 commented Jul 30, 2015

I understand the rename. It's the positions.
On Jul 29, 2015 8:02 PM, "Robert Muir" notifications@github.com wrote:

And please, please, we have to rename this, because its totally
meaningless. positions and offsets are separate things...


Reply to this email directly or view it on GitHub
#12544 (comment)
.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 3, 2015

Contributor

Does anyone have an opinion on what would be ok behavior from a backwards compatibility standpoint?

Contributor

nik9000 commented Aug 3, 2015

Does anyone have an opinion on what would be ok behavior from a backwards compatibility standpoint?

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 3, 2015

Contributor

Does anyone have an opinion on what would be ok behavior from a backwards compatibility standpoint?

Talked with @dakrone and we decided that the most correct thing as to make position_offset_gap be immutable once the index is created even if it isn't set. So indexes created before 2.0.0-beta1 will always have a default position_offset_gap of 0 regardless of which version of elasticsearch operates on them. Indexes created by versions of Elasticsearch on or after 2.0.0-beta1 will default to 10 like the issue says they should.

Contributor

nik9000 commented Aug 3, 2015

Does anyone have an opinion on what would be ok behavior from a backwards compatibility standpoint?

Talked with @dakrone and we decided that the most correct thing as to make position_offset_gap be immutable once the index is created even if it isn't set. So indexes created before 2.0.0-beta1 will always have a default position_offset_gap of 0 regardless of which version of elasticsearch operates on them. Indexes created by versions of Elasticsearch on or after 2.0.0-beta1 will default to 10 like the issue says they should.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 3, 2015

Contributor

I believe this is ready for review. @jpountz - its more fiddly than I expected at first so it might be worth a review from you or someone who's pretty experienced with mapping.

@mute - I think this is the right way to do #12538. Its much more complicated than it looked and certainly wasn't really low hanging fruit.

Contributor

nik9000 commented Aug 3, 2015

I believe this is ready for review. @jpountz - its more fiddly than I expected at first so it might be worth a review from you or someone who's pretty experienced with mapping.

@mute - I think this is the right way to do #12538. Its much more complicated than it looked and certainly wasn't really low hanging fruit.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 11, 2015

Contributor

I'd love this in 2.0.0 at some point. It'll need a review soon if it is going to.

Contributor

nik9000 commented Aug 11, 2015

I'd love this in 2.0.0 at some point. It'll need a review soon if it is going to.

@mikemccand

View changes

Show outdated Hide outdated docs/reference/mapping/types/core-types.asciidoc Outdated
@mikemccand

View changes

Show outdated Hide outdated docs/reference/analysis/analyzers/custom-analyzer.asciidoc Outdated
@mikemccand

View changes

Show outdated Hide outdated ...src/main/java/org/elasticsearch/index/mapper/core/StringFieldMapper.java Outdated
@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Aug 11, 2015

Member

I agree that 10 is small, and could quite easily overlap with a typical slop value. Personally i'd go for eg 100

Member

clintongormley commented Aug 11, 2015

I agree that 10 is small, and could quite easily overlap with a typical slop value. Personally i'd go for eg 100

@mikemccand

View changes

Show outdated Hide outdated ...src/main/java/org/elasticsearch/index/mapper/core/StringFieldMapper.java Outdated
@mikemccand

View changes

Show outdated Hide outdated docs/reference/migration/migrate_2_0.asciidoc Outdated
@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 11, 2015

Contributor

Ok - I'll make these changes. I don't think I'll have time today because I'm in class and can't really concentrate. But tonight or tomorrow "morning".

Contributor

nik9000 commented Aug 11, 2015

Ok - I'll make these changes. I don't think I'll have time today because I'm in class and can't really concentrate. But tonight or tomorrow "morning".

analyzer = (NamedAnalyzer) analyzerF;
if (overridePositionOffsetGap >= 0 && analyzer.getPositionIncrementGap(analyzer.name()) != overridePositionOffsetGap) {

This comment has been minimized.

@mikemccand

mikemccand Aug 11, 2015

Contributor

Maybe add some comments here explaining? I'm confused...

@mikemccand

mikemccand Aug 11, 2015

Contributor

Maybe add some comments here explaining? I'm confused...

This comment has been minimized.

@nik9000

nik9000 Aug 11, 2015

Contributor

Yeah - this was confusing when I wrote it. Its funky and deserves more comments....

@nik9000

nik9000 Aug 11, 2015

Contributor

Yeah - this was confusing when I wrote it. Its funky and deserves more comments....

This comment has been minimized.

@nik9000

nik9000 Aug 24, 2015

Contributor

Ok - added some comments above.

@nik9000

nik9000 Aug 24, 2015

Contributor

Ok - added some comments above.

@mikemccand

This comment has been minimized.

Show comment
Hide comment
@mikemccand

mikemccand Aug 11, 2015

Contributor

Thanks @nik9000 this is a great change (so prox aware queries never match across 2 values of a multi-valued field), I just left some minor comments.

Can you open a follow-on issue to rename positionOffsetGap? Need not block this good change, but this is extremely confusing :) As @rmuir said, position gap and offset gap are wildly different things!

Contributor

mikemccand commented Aug 11, 2015

Thanks @nik9000 this is a great change (so prox aware queries never match across 2 values of a multi-valued field), I just left some minor comments.

Can you open a follow-on issue to rename positionOffsetGap? Need not block this good change, but this is extremely confusing :) As @rmuir said, position gap and offset gap are wildly different things!

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 11, 2015

Contributor

Can you open a follow-on issue to rename positionOffsetGap? Need not block this good change, but this is extremely confusing :) As @rmuir said, position gap and offset gap are wildly different things!

#12562

Contributor

nik9000 commented Aug 11, 2015

Can you open a follow-on issue to rename positionOffsetGap? Need not block this good change, but this is extremely confusing :) As @rmuir said, position gap and offset gap are wildly different things!

#12562

@nik9000 nik9000 added v2.1.0 and removed v2.0.0 labels Aug 24, 2015

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 24, 2015

Contributor

Swapped 2.0 for 2.1 because we're too late in the 2.0 release cycle to get this merged there.

I'm going to pick this one back up in a few minutes and have another read through. I'll see if I can address that last open comment of @mikemccand and rebase. And I'll change all the 2.0s into 2.1s in the code.

Contributor

nik9000 commented Aug 24, 2015

Swapped 2.0 for 2.1 because we're too late in the 2.0 release cycle to get this merged there.

I'm going to pick this one back up in a few minutes and have another read through. I'll see if I can address that last open comment of @mikemccand and rebase. And I'll change all the 2.0s into 2.1s in the code.

@nik9000 nik9000 changed the title from Mapping: Default position_offset_gap to 10 to Mapping: Default position_offset_gap to 100 Aug 24, 2015

Mapping: Default position_offset_gap to 100
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.

Also postition_offset_gaps less than 0 aren't allowed at all.

New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through

Closes #7268
Fix rebase mistake
My rebased added core-types back which was removed in the mean time. This
properly re-removes the file and moves the updated documentation to the
string field mapper docs.
@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 24, 2015

Contributor

@mikemccand - would you mind having another look at this? Its now ready for review again based on 2.1.

Contributor

nik9000 commented Aug 24, 2015

@mikemccand - would you mind having another look at this? Its now ready for review again based on 2.1.

@mikemccand

This comment has been minimized.

Show comment
Hide comment
@mikemccand

mikemccand Aug 24, 2015

Contributor

LGTM, thanks @nik9000!

Contributor

mikemccand commented Aug 24, 2015

LGTM, thanks @nik9000!

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Aug 25, 2015

Contributor

@nik9000 this looks good I think this should go into 2.0 though

Contributor

s1monw commented Aug 25, 2015

@nik9000 this looks good I think this should go into 2.0 though

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 25, 2015

Contributor

@nik9000 this looks good I think this should go into 2.0 though

Ok - I'll merge to 2.1 now and backport it. I'm not super comfortable sticking it in 2.0 but I'll try because you want it in there.

Contributor

nik9000 commented Aug 25, 2015

@nik9000 this looks good I think this should go into 2.0 though

Ok - I'll merge to 2.1 now and backport it. I'm not super comfortable sticking it in 2.0 but I'll try because you want it in there.

@nik9000 nik9000 changed the title from Mapping: Default position_offset_gap to 100 to Default position_offset_gap to 100 Aug 25, 2015

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 25, 2015

Contributor

I merged this into master but github doesn't recognize it - probably because I merged it locally and pushed. That failed so I rebased onto elastic's master and squashed. At this point github's must have lost any connection to the patch.

I'll leave this open to work the backport to 2.0. I'll close it once I've merged there.

Contributor

nik9000 commented Aug 25, 2015

I merged this into master but github doesn't recognize it - probably because I merged it locally and pushed. That failed so I rebased onto elastic's master and squashed. At this point github's must have lost any connection to the patch.

I'll leave this open to work the backport to 2.0. I'll close it once I've merged there.

* values.
*/
public static final int POSITION_OFFSET_GAP = 100;
public static final int POSITION_OFFSET_GAP_PRE_2_1 = 0;

This comment has been minimized.

@xuzha

xuzha Aug 25, 2015

Contributor

This change is going to 2.0 branch right? Do we need to change this to 2_0?

@xuzha

xuzha Aug 25, 2015

Contributor

This change is going to 2.0 branch right? Do we need to change this to 2_0?

This comment has been minimized.

@nik9000

nik9000 Aug 25, 2015

Contributor

Sadly, yup. I had originally targeted 2.0, then gave up on making it in, then @s1monw said we should jam it in there anyway. So I got what I had reviewed merged as quickly as I could. It's looking like it'll be reasonably quick to merge to 2.0 so I'll do that and then go fix master so it makes sense.

@nik9000

nik9000 Aug 25, 2015

Contributor

Sadly, yup. I had originally targeted 2.0, then gave up on making it in, then @s1monw said we should jam it in there anyway. So I got what I had reviewed merged as quickly as I could. It's looking like it'll be reasonably quick to merge to 2.0 so I'll do that and then go fix master so it makes sense.

@nik9000 nik9000 added the v2.0.0 label Aug 25, 2015

=== Mapping changes
==== position_offset_gap
The default `position_offset_grap` is now 100. Indexes created in Elasticsearch

This comment has been minimized.

@xuzha

xuzha Aug 25, 2015

Contributor

position_offset_grap -> position_offset_gap

@xuzha

xuzha Aug 25, 2015

Contributor

position_offset_grap -> position_offset_gap

This comment has been minimized.

@nik9000

nik9000 Aug 25, 2015

Contributor

Ok - Will fix this one right quick.

@nik9000

nik9000 Aug 25, 2015

Contributor

Ok - Will fix this one right quick.

This comment has been minimized.

@nik9000
@nik9000

nik9000 Aug 25, 2015

Contributor
@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 25, 2015

Contributor

And merged to 2.0. For my last trick on this pull request I'll fix master so that it has the same version range checks as 2.0. And then finally, finally, this is done.

Contributor

nik9000 commented Aug 25, 2015

And merged to 2.0. For my last trick on this pull request I'll fix master so that it has the same version range checks as 2.0. And then finally, finally, this is done.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Aug 25, 2015

Contributor

And merged to 2.0. For my last trick on this pull request I'll fix master so that it has the same version range checks as 2.0. And then finally, finally, this is done.

Scratch that, I'll send that as another pull request.

Contributor

nik9000 commented Aug 25, 2015

And merged to 2.0. For my last trick on this pull request I'll fix master so that it has the same version range checks as 2.0. And then finally, finally, this is done.

Scratch that, I'll send that as another pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment