Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Painless error: ZonedDateTime and getDayOfMonth() #24383

Closed
paulohss opened this issue Apr 28, 2017 · 12 comments
Closed

Painless error: ZonedDateTime and getDayOfMonth() #24383

paulohss opened this issue Apr 28, 2017 · 12 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache feedback_needed

Comments

@paulohss
Copy link

paulohss commented Apr 28, 2017

Elasticsearch version:
5.2.2

JVM version:
Java 8 - Update 121

OS version:
Win 10

Description of the problem including expected versus actual behavior:

I've stumbled upon an issue. The following aggregation is hiding some buckets:

GET /unittesttg1_tg1_fq1/_search?size=0
{
  "size": 0,
  "aggs": {
    "groupby": {
      "terms": {
        "script": {
          "inline": "ZonedDateTime.ofInstant(Instant.ofEpochMilli(doc['LAST_MODIFIED_DATE'].value), ZoneId.of('UTC')).getDayOfMonth()"
        }
      }
    }
  }
}

The indexes that I'm querying against has "Day of Month" raging from 1 to 31. However, it's been returned just a fraction of them.

The data sample can be found in my dropbox, I will grant access to it to the person that eventually come to investigate the issue.

Thanks

@clintongormley
Copy link

Please provide a complete recreation of the problem in this issue, and we'll be happy to help.

@paulohss
Copy link
Author

Hi,

Thanks for the reply.

Ive got a index with a date field that contains all possible "day of month" within it.

After running the mentioned Painless script, some of the days are not returned.
For example, "05", "02" don't show up in the response. Whereas "01", "31" appears normally.

The data sample I have has all days of month, from 1 to 31.

I can provide you the data sample (which exemplifies the issue) via Dropbox, however, I will need your email.

@nik9000
Copy link
Member

nik9000 commented Apr 28, 2017

I can provide you the data sample (which exemplifies the issue) via Dropbox, however, I will need your email.

My email is in my github profile and I can have a look at some point. You are better off providing a recreation bash script or a sequence of CONSOLE commands via gist. So long as it doesn't take hundreds of items.

@paulohss
Copy link
Author

paulohss commented May 1, 2017

All right, I've managed to create a small set of data that indeed replicates the issue.
Please, find the steps to reproduce it in the text file attached:

Es bug 24383.txt

@paulohss
Copy link
Author

paulohss commented May 2, 2017

Hi @nik9000, @clintongormley
Just wondering if you received the script with the data sample.

Thanks

@jimczi
Copy link
Contributor

jimczi commented May 2, 2017

@paulohss the terms aggregation returns the top 10 term sorted by frequency so the result you're seeing is expected. Instead you should maybe use a date_histogram: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
If you don't know how to use it please open a new topic in the discuss forum:
https://discuss.elastic.co/c/elasticsearch and I'll be happy to help there.
We reserve github for verified bugs and new features.

@jimczi jimczi closed this as completed May 2, 2017
@paulohss
Copy link
Author

paulohss commented May 2, 2017

Sorry @jimczi , but it doesn't seem right.

If you try it with GetDayOfWeek() it works just fine.
It fails only when using DayOfMonth() and Hour().

Date Histogram is not helpful in this user case. I explain:


Currently Elastic provides a way to aggregate by "Quarter" and "Week of the Year", "Day of Month" through date histogram. Although it doesn't yield unique buckets as a response
It still does the job, however, when it comes to filtering, the task gets complicated.

For example, let's assume I've got the following data:

1 - 2016/01/12 - Peter
2 - 2013/05/22 - Charles
3 - 2005/07/11 - Mary
4 - 1999/11/09 - Jerry
5 - 1997/09/30 - Charles

If I were to aggregate this data and filter as following:

  A) Only the "Monday's" and "Thursday's"
  B) Regardless the year, month or day
  C) With time zone support

I'm able to simply run this query:

{
"size": 0,
"aggs": {
"groupby": {
"terms": {
"script": {
"inline": "ZonedDateTime.ofInstant(Instant.ofEpochMilli(doc['LAST_MODIFIED_DATE'].value), ZoneId.of('+05:00')).getDayOfWeek()"
}
}
}
},
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"inline": " (ZonedDateTime.ofInstant(Instant.ofEpochMilli(doc['LAST_MODIFIED_DATE'].value), ZoneId.of('+05:00')).getDayOfWeek() == DayOfWeek.TUESDAY) || (ZonedDateTime.ofInstant(Instant.ofEpochMilli(doc['LAST_MODIFIED_DATE'].value), ZoneId.of('+05:00')).getDayOfWeek() == DayOfWeek.THURSDAY) "
}
}
}
]
}
}
}

Unfortunately, there is no way to achieve this for "Quarter" and "Week of the Year" and"Day Of Month". At least, I couldn't find it anywhere in the official/unofficial docs and forums.

(Imagine try to filter all day "01" of each month throughout years worth of data using range filters....)

I was oriented by an Elastic forum collaborator (below) to open a request in here, so I'm doing it.

Thanks all and, please, few free to clarify it even further.

@jimczi
Copy link
Contributor

jimczi commented May 2, 2017

If you try it with GetDayOfWeek() it works just fine.
It fails only when using DayOfMonth() and Hour().

There are 7 days in the week so a terms aggregation of size 10 will return all days. For month or hours you'll need to change the size of your terms aggregation in order to ensure that all buckets are returned (since the default value of 10 is not enough):

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size

@paulohss
Copy link
Author

paulohss commented May 2, 2017

Will try.

Thanks for the response.

@paulohss
Copy link
Author

paulohss commented May 3, 2017

Hi @jimczi ,

It worked all right, thanks for the guidance.
Final question (I've already thrown this one in the discuss forum, they've told come here):

For example, as we discussed, I'm able to aggregate by 'hour' with time zone support, using this script:

ZonedDateTime.ofInstant(Instant.ofEpochMilli(doc['LAST_MODIFIED_DATE'].value), ZoneId.of('UTC')).getHour()

Same for 'getDayOfMonth()' or 'getDayOfWeek()'.

Is there a similar way to do it for 'week of the year' and 'quarter'?

@tiangolo
Copy link

@paulohss That last comment ended up being quite helpful for what I'm needing... 😄

I think you can get a week of the year by taking the day of the year and dividing it by 7 (7 days a week). I'm doing something similar to get 15 minute buckets of the day.

I think something like this could work for that (I just tried it):

{
  "query": {
    "match_all": {}
  },
  "size": 0, 
  "aggs": {
    "by_hour": {
      "terms": {
        "size": 1000, 
        "script": {
          "lang": "painless",
          "source": "ZonedDateTime zonedDateTime = ZonedDateTime.ofInstant(Instant.ofEpochMilli(doc.created_at_desktop.value), ZoneId.of(params.tz)); return zonedDateTime.getDayOfYear() / 7",
          "params": {"tz": "-05:00"}
        }
      }
    }
  }
}

@paulohss
Copy link
Author

paulohss commented Sep 12, 2017 via email

@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache feedback_needed
Projects
None yet
Development

No branches or pull requests

5 participants