Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for Date in index #22

Merged
merged 4 commits into from
Feb 6, 2019
Merged

Allow for Date in index #22

merged 4 commits into from
Feb 6, 2019

Conversation

tdeo
Copy link
Contributor

@tdeo tdeo commented Jan 29, 2019

Logstash default option for ElasticSearch is to include the date in the index: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-index

This is quite useful for housekeeping in general, and I'd love to see a similar feature here, I made a quick amendment to support something similar to start a discussion about that, please let me know what you think

@mcollina
Copy link
Member

mcollina commented Feb 1, 2019

@delvedor can you please check this?

@mcollina
Copy link
Member

mcollina commented Feb 1, 2019

@tdeo do we need to change the docs? What's the impact of this change?

@tdeo
Copy link
Contributor Author

tdeo commented Feb 2, 2019

We should indeed mention this is the docs. I'll add it to the PR. I think the change should have no user impact as elasticsearch enforces downcase indexes in the first place

I'll run some benchmark to see the performance impact

@tdeo
Copy link
Contributor Author

tdeo commented Feb 2, 2019

About the performance, here's a quick sample, using the following test file: $ for i in seq 1 100000; do echo '{"a":1}' >> test.txt; done

Before this PR:

$ time cat test.txt | node pino-elasticsearch.js -i pino

real	0m10.554s
user	0m3.303s
sys	0m0.168s
$ time cat test.txt | node pino-elasticsearch.js -i pino

real	0m9.961s
user	0m3.229s
sys	0m0.160s
$ time cat test.txt | node pino-elasticsearch.js -i pino-%{DATE}

real	0m1.940s
user	0m1.675s
sys	0m0.058s

I suspect using the pino-%{DATE} index goes a lot faster because it's an invalid index name and ElasticSearch silently drops the entries.

After:

$ time cat test.txt | node pino-elasticsearch.js -i pino 

real	0m9.496s
user	0m2.448s
sys	0m0.112s
$ time cat test.txt | node pino-elasticsearch.js -i pino 

real	0m9.159s
user	0m2.635s
sys	0m0.114s
$ time cat test.txt | node pino-elasticsearch.js -i pino-%{DATE}

real	0m10.100s
user	0m3.118s
sys	0m0.178s
$ time cat test.txt | node pino-elasticsearch.js -i pino-%{DATE}

real	0m9.884s
user	0m3.232s
sys	0m0.140s

There doesn't seem to be any significant performance impact

Copy link
Collaborator

@delvedor delvedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the index name respects the index name limitations looks good to me.
Date math indexes are supported and encouraged since they are handy!

For the scope of this module though, I would suggest exposing this as an option, since it is possible that someone doesn't want this functionality 👍

@tdeo
Copy link
Contributor Author

tdeo commented Feb 2, 2019

@delvedor in your opinion should we favor using YYYY.MM.DD format here instead of YYYY-MM-DD since that's the default ElasticSearch expects and the default that Logstash send ?

I'm not really sure that needs to be optional, nobody should be using "%{DATE}" in their index name, or in such case it's ignored by ElasticSearch as it doesn't respect the naming limitations.

@mcollina what's your opinion on this option ? do you think we need to support a pattern replacement or only a command line option --append-date-to-index ? And should we support multiple date formats ?

@delvedor
Copy link
Collaborator

delvedor commented Feb 2, 2019

@tdeo I've seen use them both. The important thing is to be consistent, otherwise, your search experience will not be that good ;)
I would recommend to expose it as an option and leave to the user this choice.

@mcollina
Copy link
Member

mcollina commented Feb 2, 2019

@delvedor I think this is already an option, so we should be ok.

@tdeo is this good to land?

@tdeo
Copy link
Contributor Author

tdeo commented Feb 3, 2019

@mcollina It should be good (I'm running the fork on a small webapp already and properly getting logs to ES), however, I'm not sure how much the test I added covers regarding the write and writev methods

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a test for writev by copying what

test('store lines in bulk', { timeout }, (t) => {
is doing.

t.deepEqual(response._source, body, 'obj matches')
})
})
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move it to the end of the file?

@mcollina mcollina merged commit dc0ecb3 into pinojs:master Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants