Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose multi-valued dates to scripts and document painless's date functions #22875

Merged
merged 8 commits into from Feb 2, 2017

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 30, 2017

  • Adds ScriptDocValues.Longs#getDates which can be used to iterate over all date doc values. Attempts to minimize the number of allocations involved without adding a lot of management burden.
  • Documents how date functions are exposed to painless. Other than getDates above, Painless had fairly decent, undocumented date support.
  • Adds some tests for painless and dates.
  • Improves the error message when a script attempts to write to a ScriptDocValues.
  • Removes some unneeded wrapping on getValues. Now that we have the nicer error messages it is more obvious why it is unneeded.

Closes #22162

@nik9000 nik9000 added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache review v5.3.0 v6.0.0-alpha1 labels Jan 30, 2017
@rjernst
Copy link
Member

rjernst commented Jan 30, 2017

Instead of continuing down the rabbit hole of "dates as longs" and the getDate() type methods, can we actually add direct support to ScriptDocValues for date types? Yes, it will still be backed by a long as far as doc values are concerned, but then we would not need getDate(). I just don't want to continue expanding that hack, when it should be removed.

@nik9000
Copy link
Member Author

nik9000 commented Jan 30, 2017

I'm all for making dates dates instead of longs. I think it'd break backwards compatibility but it wouldn't be too bad because it is already funny.

@clintongormley do you think we can break backwards compatibility for date fields in scripts in 5.3 or 5.4? If we can't then maybe we should do something like what this PR proposes and pull dates into their own ScriptDocValues type in 6.0?

@clintongormley
Copy link

clintongormley commented Jan 31, 2017

@clintongormley do you think we can break backwards compatibility for date fields in scripts in 5.3 or 5.4?

No

If we can't then maybe we should do something like what this PR proposes and pull dates into their own ScriptDocValues type in 6.0?

Is there nothing we can do to support both? Would be good to have deprecation warnings and give people a migration path in the same version. Even if it is a setting that they can change to enable to new behaviour.

@nik9000
Copy link
Member Author

nik9000 commented Jan 31, 2017

Is there nothing we can do to support both?

I don't think there is any non-breaking way to do it. We could minimize the breakage by supporting getDate in the ScriptDocsValues for dates, but we'd still break because doc.some_date[0] would return a ReadableDateTime instead of a long. Presumably most people actually using the date as a date are calling getDate now anyway so it is unlikely to be a "big" break.

The other thing we'd lose is the ability to interpret longs as ReadableDateTimes on the fly. Right now users can call getDate on any number and get a date as millis since epoch out of it. We could give them an alternate way to do that with painless - exposing new MutableDateTime or something. I don't think this is super common though.

@clintongormley
Copy link

We could minimize the breakage by supporting getDate in the ScriptDocsValues for dates,

That's what I was thinking of.

but we'd still break because doc.some_date[0] would return a ReadableDateTime instead of a long

Right. Hmmm I'm really torn here. It's a breaking change, but it would be a significant improvement to the user experience today. As you say, there's a good chance that this won't hurt many users, but I'm still reluctant to do this in a minor release, unless there's a setting (opt in? opt out) to control behaviour.

What about an opt in setting, and deprecation logging that points towards the setting so that users have the opportunity to migrate? I know that sounds complex, but what do you think?

@nik9000
Copy link
Member Author

nik9000 commented Jan 31, 2017

What about an opt in setting, and deprecation logging that points towards the setting so that users have the opportunity to migrate? I know that sounds complex, but what do you think?

I think it sounds complex, yeah. What I'd prefer to do is:

  1. Implement something like this PR in 5.x and master.
  2. Build on top of that to make a ScriptDocValues for dates in master and call out the breaking change there. So 6.0 gets the breaking change but even that is fairly minimal. We'd leave getDate and deprecate it is master, to be removed in 7.0. That way upgrading to 6.0 is smooth.

So then:
In 6.0 the breaking change is that doc.some_date.value changes from a long to a ReadableDateTime. But users were likely using getDate or getDates anyway. Those still exist but are now deprecated.
In 7.0 the breaking change would be that getDate and getDates are removed.

@clintongormley
Copy link

That sounds better to me - thank you

@nik9000
Copy link
Member Author

nik9000 commented Jan 31, 2017

@rjernst, how do you feel about my plan in #22875 (comment) ?

@rjernst
Copy link
Member

rjernst commented Jan 31, 2017

I'm not thrilled about the length of it, but I can live with it.

@nik9000
Copy link
Member Author

nik9000 commented Jan 31, 2017

@rjernst would you like to review this PR then, as step 1 in the three step plan?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a few suggestions.

/**
* Implements {@link AbstractList} to throw helpful error messages when someone attempts to modify the doc values list.
*/
public abstract class AbstractScriptDocValues<T> extends AbstractList<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a private inner class for ScriptDocValues? I don't like it being public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can you explain why this is more helpful than using unmodifiableList?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns better error messages when you try to modify it. The javadoc mentions that. There is no other reason for the class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ScriptDocValues is an interface this can't be a private inner class. I can only be public. I could have made this private or package private but I didn't because I figured implementers may want to use it. We do let plugins add mappers.

It might make more sense to make ScriptDocValues an abstract class and add these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changing ScriptDocValues to an abstract class sounds fine to me.

/**
* Implements {@link AbstractList} to throw helpful error messages when someone attempts to modify the doc values list.
*/
public abstract class AbstractScriptDocValues<T> extends AbstractList<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this AbstractUnmodifiableList?


private final SortedNumericDocValues values;
private final MutableDateTime date = new MutableDateTime(0, DateTimeZone.UTC);
/**
* Values wrapped in {@link MutableDateTime}. Null by default an allocated on first usage so there is no cost if not used. We keep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shorter max column for javadocs?

dates = new MutableDateTime[ArrayUtil.oversize(max(1, values.count()), RamUsageEstimator.NUM_BYTES_OBJECT_REF)];
dates[index] = new MutableDateTime(value, DateTimeZone.UTC);
} else {
if (index >= dates.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just keep it simple here and eagerly allocate the array to full size (and fill it completely) above? These should be small sizes (single digits). I don't think we need to optimize as if this array could be hundreds of elements.

@nik9000
Copy link
Member Author

nik9000 commented Feb 1, 2017

@rjernst I pushed a few more patches. Would you like to take a look?


/**
* Return a copy of the list of the values for the current document.
*/
List<T> getValues();
public final List<T> getValues() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we have getValues if all of these classes already extend list? Nothing to do with this PR, just something I noticed: since we are looking at breaking changes in the future, maybe this could be cleaned up too (either extend list, or have getValues())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #22919 to talk about it.

}
return;
}
if (values.count() > dates.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note here that this would only happen when switching to a different document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.

Also improves the error message returned when attempting to modify
the `ScriptdocValues` and removes a couple of allocations.

Relates to elastic#22162
Add implementations of all the List modification methods
that throw nice error messages.
@nik9000 nik9000 merged commit dacc150 into elastic:master Feb 2, 2017
@nik9000
Copy link
Member Author

nik9000 commented Feb 2, 2017

master: dacc150
5.x: 8c590ea

nik9000 added a commit that referenced this pull request Feb 2, 2017
…ctions (#22875)

Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.

Also improves the error message returned when attempting to modify
the `ScriptdocValues`, removes a couple of allocations, and documents
that the date functions are available in Painless.

Relates to #22162
nik9000 added a commit that referenced this pull request Feb 6, 2017
Instead of longs. If you want millis since epoch you can call doc.date_field.value.millis.

Relates to #22875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.3.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants