New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose multi-valued dates to scripts and document painless's date functions #22875

Merged
merged 8 commits into from Feb 2, 2017

Conversation

Projects
None yet
3 participants
@nik9000
Contributor

nik9000 commented Jan 30, 2017

  • Adds ScriptDocValues.Longs#getDates which can be used to iterate over all date doc values. Attempts to minimize the number of allocations involved without adding a lot of management burden.
  • Documents how date functions are exposed to painless. Other than getDates above, Painless had fairly decent, undocumented date support.
  • Adds some tests for painless and dates.
  • Improves the error message when a script attempts to write to a ScriptDocValues.
  • Removes some unneeded wrapping on getValues. Now that we have the nicer error messages it is more obvious why it is unneeded.

Closes #22162

@rjernst

This comment has been minimized.

Member

rjernst commented Jan 30, 2017

Instead of continuing down the rabbit hole of "dates as longs" and the getDate() type methods, can we actually add direct support to ScriptDocValues for date types? Yes, it will still be backed by a long as far as doc values are concerned, but then we would not need getDate(). I just don't want to continue expanding that hack, when it should be removed.

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Jan 30, 2017

I'm all for making dates dates instead of longs. I think it'd break backwards compatibility but it wouldn't be too bad because it is already funny.

@clintongormley do you think we can break backwards compatibility for date fields in scripts in 5.3 or 5.4? If we can't then maybe we should do something like what this PR proposes and pull dates into their own ScriptDocValues type in 6.0?

@clintongormley

This comment has been minimized.

Member

clintongormley commented Jan 31, 2017

@clintongormley do you think we can break backwards compatibility for date fields in scripts in 5.3 or 5.4?

No

If we can't then maybe we should do something like what this PR proposes and pull dates into their own ScriptDocValues type in 6.0?

Is there nothing we can do to support both? Would be good to have deprecation warnings and give people a migration path in the same version. Even if it is a setting that they can change to enable to new behaviour.

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Jan 31, 2017

Is there nothing we can do to support both?

I don't think there is any non-breaking way to do it. We could minimize the breakage by supporting getDate in the ScriptDocsValues for dates, but we'd still break because doc.some_date[0] would return a ReadableDateTime instead of a long. Presumably most people actually using the date as a date are calling getDate now anyway so it is unlikely to be a "big" break.

The other thing we'd lose is the ability to interpret longs as ReadableDateTimes on the fly. Right now users can call getDate on any number and get a date as millis since epoch out of it. We could give them an alternate way to do that with painless - exposing new MutableDateTime or something. I don't think this is super common though.

@clintongormley

This comment has been minimized.

Member

clintongormley commented Jan 31, 2017

We could minimize the breakage by supporting getDate in the ScriptDocsValues for dates,

That's what I was thinking of.

but we'd still break because doc.some_date[0] would return a ReadableDateTime instead of a long

Right. Hmmm I'm really torn here. It's a breaking change, but it would be a significant improvement to the user experience today. As you say, there's a good chance that this won't hurt many users, but I'm still reluctant to do this in a minor release, unless there's a setting (opt in? opt out) to control behaviour.

What about an opt in setting, and deprecation logging that points towards the setting so that users have the opportunity to migrate? I know that sounds complex, but what do you think?

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Jan 31, 2017

What about an opt in setting, and deprecation logging that points towards the setting so that users have the opportunity to migrate? I know that sounds complex, but what do you think?

I think it sounds complex, yeah. What I'd prefer to do is:

  1. Implement something like this PR in 5.x and master.
  2. Build on top of that to make a ScriptDocValues for dates in master and call out the breaking change there. So 6.0 gets the breaking change but even that is fairly minimal. We'd leave getDate and deprecate it is master, to be removed in 7.0. That way upgrading to 6.0 is smooth.

So then:
In 6.0 the breaking change is that doc.some_date.value changes from a long to a ReadableDateTime. But users were likely using getDate or getDates anyway. Those still exist but are now deprecated.
In 7.0 the breaking change would be that getDate and getDates are removed.

@clintongormley

This comment has been minimized.

Member

clintongormley commented Jan 31, 2017

That sounds better to me - thank you

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Jan 31, 2017

@rjernst, how do you feel about my plan in #22875 (comment) ?

@rjernst

This comment has been minimized.

Member

rjernst commented Jan 31, 2017

I'm not thrilled about the length of it, but I can live with it.

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Jan 31, 2017

@rjernst would you like to review this PR then, as step 1 in the three step plan?

@rjernst

Looks good, I left a few suggestions.

core/src/main/java/org/elasticsearch/index/fielddata/AbstractScriptDocValues.java Outdated
/**
* Implements {@link AbstractList} to throw helpful error messages when someone attempts to modify the doc values list.
*/
public abstract class AbstractScriptDocValues<T> extends AbstractList<T> {

This comment has been minimized.

@rjernst

rjernst Jan 31, 2017

Member

Could this just be a private inner class for ScriptDocValues? I don't like it being public.

This comment has been minimized.

@rjernst

rjernst Jan 31, 2017

Member

And can you explain why this is more helpful than using unmodifiableList?

This comment has been minimized.

@nik9000

nik9000 Jan 31, 2017

Contributor

It returns better error messages when you try to modify it. The javadoc mentions that. There is no other reason for the class.

This comment has been minimized.

@nik9000

nik9000 Jan 31, 2017

Contributor

Since ScriptDocValues is an interface this can't be a private inner class. I can only be public. I could have made this private or package private but I didn't because I figured implementers may want to use it. We do let plugins add mappers.

It might make more sense to make ScriptDocValues an abstract class and add these?

This comment has been minimized.

@rjernst

rjernst Jan 31, 2017

Member

Sure, changing ScriptDocValues to an abstract class sounds fine to me.

core/src/main/java/org/elasticsearch/index/fielddata/AbstractScriptDocValues.java Outdated
/**
* Implements {@link AbstractList} to throw helpful error messages when someone attempts to modify the doc values list.
*/
public abstract class AbstractScriptDocValues<T> extends AbstractList<T> {

This comment has been minimized.

@rjernst

rjernst Jan 31, 2017

Member

Maybe call this AbstractUnmodifiableList?

core/src/main/java/org/elasticsearch/index/fielddata/ScriptDocValues.java Outdated
private final SortedNumericDocValues values;
private final MutableDateTime date = new MutableDateTime(0, DateTimeZone.UTC);
/**
* Values wrapped in {@link MutableDateTime}. Null by default an allocated on first usage so there is no cost if not used. We keep

This comment has been minimized.

@rjernst

rjernst Jan 31, 2017

Member

Nit: shorter max column for javadocs?

core/src/main/java/org/elasticsearch/index/fielddata/ScriptDocValues.java Outdated
dates = new MutableDateTime[ArrayUtil.oversize(max(1, values.count()), RamUsageEstimator.NUM_BYTES_OBJECT_REF)];
dates[index] = new MutableDateTime(value, DateTimeZone.UTC);
} else {
if (index >= dates.length) {

This comment has been minimized.

@rjernst

rjernst Jan 31, 2017

Member

Can we just keep it simple here and eagerly allocate the array to full size (and fill it completely) above? These should be small sizes (single digits). I don't think we need to optimize as if this array could be hundreds of elements.

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Feb 1, 2017

@rjernst I pushed a few more patches. Would you like to take a look?

@rjernst

rjernst approved these changes Feb 1, 2017

/**
* Return a copy of the list of the values for the current document.
*/
List<T> getValues();
public final List<T> getValues() {

This comment has been minimized.

@rjernst

rjernst Feb 1, 2017

Member

I'm curious why we have getValues if all of these classes already extend list? Nothing to do with this PR, just something I noticed: since we are looking at breaking changes in the future, maybe this could be cleaned up too (either extend list, or have getValues())

This comment has been minimized.

@nik9000

nik9000 Feb 2, 2017

Contributor

I've opened #22919 to talk about it.

}
return;
}
if (values.count() > dates.length) {

This comment has been minimized.

@rjernst

rjernst Feb 1, 2017

Member

Can you add a note here that this would only happen when switching to a different document?

This comment has been minimized.

@nik9000

nik9000 Feb 2, 2017

Contributor

Sure!

nik9000 added some commits Jan 30, 2017

Add support for `dates` on `ScriptDocValues.Longs`
Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.

Also improves the error message returned when attempting to modify
the `ScriptdocValues` and removes a couple of allocations.

Relates to #22162
Make ScriptDocValues an abstract class
Add implementations of all the List modification methods
that throw nice error messages.

@nik9000 nik9000 merged commit dacc150 into elastic:master Feb 2, 2017

1 of 2 checks passed

elasticsearch-ci Build finished.
Details
CLA Commit author has signed the CLA
Details
@nik9000

This comment has been minimized.

Contributor

nik9000 commented Feb 2, 2017

master: dacc150
5.x: 8c590ea

nik9000 added a commit that referenced this pull request Feb 2, 2017

Expose multi-valued dates to scripts and document painless's date fun…
…ctions (#22875)

Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.

Also improves the error message returned when attempting to modify
the `ScriptdocValues`, removes a couple of allocations, and documents
that the date functions are available in Painless.

Relates to #22162

nik9000 added a commit that referenced this pull request Feb 6, 2017

Make dates be ReadableDateTimes in scripts (#22948)
Instead of longs. If you want millis since epoch you can call doc.date_field.value.millis.

Relates to #22875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment