Add filter by size #1612

IndraGunawan · 2021-07-22T11:24:29Z

Proposed Changes

Add size filter type to filter indices based on their their primary only size or total size.

untergeek · 2021-07-22T14:02:26Z

Thank you for the pull request. I'm curious. This is an unusual feature, at least to me. What's the use case? Why filter by size? I can see some value in an edge case where I have a ton of tiny indices and I have re-indexed everything to a much larger index and want to purge the tiny indices. But outside of an edge case, I (me personally) can't see where this is useful. That doesn't mean I won't merge this PR, I'm just curious.

IndraGunawan · 2021-07-22T15:05:14Z

Here is my use case. I'm using daily based indices to store logs, i set number_of_shards on index template to some value. and there are some days where the logs are not too much, so We we can shrink H+1 days indicies whose size is below N GB to use only 1 shard, it will help us reduce number of shard that store small amount of data

ES7 has shard limit defaults to 1,000 per node (i know we can change this value)

TIP: Small shards result in small segments, which increases overhead. Aim to keep the average shard size between at least a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size. (https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster)

untergeek · 2021-07-22T16:26:34Z

Clever! I get it now! That's a clever way to address your need.

So to follow up in my curiosity, why not use rollover indices and just let them grow to a size? Is there a compelling reason to keep absolute daily indices rather than just letting them fill shards until they're between 20GB and 40GB?

IndraGunawan · 2021-07-22T17:18:38Z

It's related to data retention. For example we want to keep last 1 month indices in the hot tiers, then after that 1 month + 1 day indices we do a daily snapshot lets say production-logs-20210622 and put it to a repository and we have snapshot retention that will remove 1 year old snapshot.

Someday we want to restore snapshot (or specific indices) on specific date lets say 2021/05/01 so we can restore production-logs-20210501 snapshot. By keeping the daily indices we can be sure that the data will be there because when we do snapshot we filter that specific date indices. i don't know how to do it if we do rollover on the indices, or i missed something?

By adding this capability we can reduce number of shards in hot tiers storage.

IndraGunawan · 2021-08-05T08:08:47Z

@untergeek any updates on this?

untergeek · 2021-08-05T15:41:18Z

Apologies. I had to be away from work for an unexpected funeral and an expected wedding. I should hopefully be able to address this shortly.

* add size filtertype * fix SyntaxWarning, revert untouched file * fix wrong dictionary key * add tests * fix tests

7.x branch updates - This release is a simplified release for only `pip` and Docker. It only works with Elasticsearch 7.x and is functionally identical to 5.8.4 - Curator is now version locked. Curator v7.x will only work with Elasticsearch v7.x - Going forward, Curator will only be released as a tarball via GitHub, as an `sdist` or `wheel` via `pip` on PyPI, and to Docker Hub. There will no longer be RPM, DEB, or Windows ZIP releases. I am sorry if this is inconvenient, but one of the reasons the development and release cycle was delayed so long is because of how painfully difficult it was to do releases. - Curator will only work with Python 3.8+, and will more tightly follow the Python version releases. - Python 3.11.1 is fully supported, and all versions of Python 3.8+ should be fully supported. - Use `hatch` and `hatchling` for package building & publishing - Because of `hatch` and `pyproject.toml`, the release version still only needs to be tracked in `curator/_version.py`. - Maintain the barest `setup.py` for building a binary version of Curator for Docker using `cx_Freeze`. - Remove `setup.cfg`, `requirements.txt`, `MANIFEST.in`, and other files as functionality is now handled by `pyproject.toml` and doing `pip install .` to grab dependencies and install them. YAY! Only one place to track dependencies now!!! - Preliminarily updated the docs. - Migrate towards `pytest` and away from `nose` tests. - Scripts provided now that aid in producing and destroying Docker containers for testing. See `docker_test/scripts/create.sh`. To spin up a numbered version release of Elasticsearch, run `docker_test/scripts/create.sh 7.17.8`. It will download any necessary images, launch them, and tell you when it's ready, as well as provide `REMOTE_ES_SERVER` environment variables for testing the `reindex` action, e.g. `REMOTE_ES_SERVER="172.16.0.1:9201" pytest --cov=curator`. These tests are skipped if this value is not provided. To clean up afterwards, run `docker_test/scripts/destroy.sh` - Add filter by size feature. #1612 (IndraGunawan) - Update Elasticsearch client to 7.17.8

Indra Gunawan added 5 commits July 22, 2021 16:27

add size filtertype

f91d3ad

fix SyntaxWarning, revert untouched file

492fa5f

fix wrong dictionary key

566dd30

add tests

d726d91

fix tests

b74bb5d

IndraGunawan changed the title ~~Filter by size~~ Add filter by size Jul 22, 2021

untergeek merged commit c99abbf into elastic:master Aug 10, 2021

IndraGunawan deleted the filter_by_size branch August 10, 2021 18:02

TinLe pushed a commit to TinLe/curator that referenced this pull request Nov 16, 2021

Add filter by size (elastic#1612)

8bb6497

* add size filtertype * fix SyntaxWarning, revert untouched file * fix wrong dictionary key * add tests * fix tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filter by size #1612

Add filter by size #1612

IndraGunawan commented Jul 22, 2021

untergeek commented Jul 22, 2021

IndraGunawan commented Jul 22, 2021 •

edited

untergeek commented Jul 22, 2021

IndraGunawan commented Jul 22, 2021 •

edited

IndraGunawan commented Aug 5, 2021

untergeek commented Aug 5, 2021

Add filter by size #1612

Add filter by size #1612

Conversation

IndraGunawan commented Jul 22, 2021

Proposed Changes

untergeek commented Jul 22, 2021

IndraGunawan commented Jul 22, 2021 • edited

untergeek commented Jul 22, 2021

IndraGunawan commented Jul 22, 2021 • edited

IndraGunawan commented Aug 5, 2021

untergeek commented Aug 5, 2021

IndraGunawan commented Jul 22, 2021 •

edited

IndraGunawan commented Jul 22, 2021 •

edited