Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] index name logstash pattern. #2

Closed
jeesim2 opened this issue May 30, 2016 · 6 comments
Closed

[feature request] index name logstash pattern. #2

jeesim2 opened this issue May 30, 2016 · 6 comments

Comments

@jeesim2
Copy link

jeesim2 commented May 30, 2016

Hello.

Thanks for develop such a cool utility.
I have moved from logstash to this esbulk.
Btw, I have a small leak of function with this utility.

We usually have log files which contains date field. and we create index with logstash index pattern. (e.g logstash-2016.05.30)
But In some(or many) case dates of single file can be spreaded over several days, particularly local date based rolling strategy forced.

For example
event_20160530.json may have these lines

...
{"time":"2016-05-30T00:00:00.000+0900"} // <--- log 1 ( 05/29 15:00 in UTC )
{"time":"2016-05-30T10:00:00.000+0900"} // <--- log 2 ( 05/30 01:00 in UTC )
...

However, elasticsearch and kibana forces UTC convert.
So log 1 have to logstash-2016.05.29 and log 2 have to logstash-2016.05.30.

I know it is not a simple problem.
But could you please consider feature something like this?

-index logstash-{yyyy.MM.dd} -date_field time -date_field_pattern yyyy-MM-dd'T'hh:mm:ss.SSSZ
@miku
Copy link
Owner

miku commented May 30, 2016

Thanks for the suggestion, indeed, I haven't had such a use case, but it looks interesting. If I understand correctly, a single file's records should end up in different indices, based on a certain property of the record.

I'm not sure yet, whether this is a genuine fast indexing use case or more a preprocessing thing (split input file on correct boundaries, then index). Let me think about it.

@jeesim2
Copy link
Author

jeesim2 commented May 31, 2016

@miku Thanks for the response.
As you know logstash-file-input was not invented for read Large complete file.
logstash-plugins/logstash-input-file#78
So In many case esbulk can be a alternatives, include for me.

whether this is a genuine fast indexing use case

I agree that bulk processing's first goal is fast indexing.

split input file on correct boundaries, then index

Also I have considered do some preprocessing to split to individual date's file. but as I have to do that every day, it is a little burdened.

Cheers,
Jihun

@miku
Copy link
Owner

miku commented Jun 22, 2016

Just a quick update: I implemented a first version of dynamic date support - here's a short screencast.

For a given file like this:

$ cat fixtures/dynamic-1.ldj
{"time":"2016-05-01", "name": "a"}
{"time":"2016-05-02", "name": "b"}
{"time":"2016-05-03", "name": "c"}

One can use the golang-style date spec to set a date field and a date field layout:

$ esbulk -verbose -index test-{2006-01-02} -date-field time \
         -date-field-layout 2006-01-02 fixtures/dynamic-1.ldj

The result would be three indices: test-2016-05-01, test-2016-05-02, test-2016-05-03 with one document each.

Another example:

$ cat fixtures/dynamic-2.ldj
{"time":"2016-05-30T10:00:00.000+0900", "name": "a"}
{"time":"2016-05-30T00:00:00.000+0900", "name": "b"}

$ esbulk -verbose -index test-{2006-01-02} -date-field time \
         -date-field-layout 2006-01-02T15:04:05Z0700 fixtures/dynamic-2.ldj

The result would be two indices: test-2016-05-29 and 2016-05-30, due to conversion to UTC.


Just a few points, that make this feature kind of difficult, at least with the current overall implementation:

  • For the dynamic index feature, we have to parse each document into JSON. With a slower elasticsearch, this might not be a big issue - but it should be benchmarked.
  • What to do with time zones. As it is implemented now, a value like 2016-05-30T10:00:00.000+0900 will be parse as a date with a timezone. As I understand, it would be better for kibana to convert these dates to UTC. Maybe there is the need for another option, like -convert-to-utc or something like that.

Here's another screencast, showing UTC conversion.

The code for all this is in https://github.com/miku/esbulk/tree/issue-1, feel free to check it out and test it. I am still a bit hesitant to include this, but if you think it would be useful, I will certainly consider it.

@miku
Copy link
Owner

miku commented Sep 8, 2016

I'm afraid I cannot implement this at the moment. It would add yet another two flags and I cannot think of an easy way to support this for now.

@miku miku closed this as completed Sep 8, 2016
@jeesim2
Copy link
Author

jeesim2 commented Sep 18, 2016

@miku thank you for the feedback!

@miku
Copy link
Owner

miku commented Jul 12, 2018

For the sake of completeness: There is a processor type, that can route documents based on date:

The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants