-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filter by size #1612
Add filter by size #1612
Conversation
Thank you for the pull request. I'm curious. This is an unusual feature, at least to me. What's the use case? Why filter by size? I can see some value in an edge case where I have a ton of tiny indices and I have re-indexed everything to a much larger index and want to purge the tiny indices. But outside of an edge case, I (me personally) can't see where this is useful. That doesn't mean I won't merge this PR, I'm just curious. |
Here is my use case. I'm using daily based indices to store logs, i set ES7 has shard limit defaults to 1,000 per node (i know we can change this value) TIP: Small shards result in small segments, which increases overhead. Aim to keep the average shard size between at least a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size. (https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster) |
Clever! I get it now! That's a clever way to address your need. So to follow up in my curiosity, why not use rollover indices and just let them grow to a size? Is there a compelling reason to keep absolute daily indices rather than just letting them fill shards until they're between 20GB and 40GB? |
It's related to data retention. For example we want to keep last 1 month indices in the Someday we want to restore snapshot (or specific indices) on specific date lets say 2021/05/01 so we can restore By adding this capability we can reduce number of shards in |
@untergeek any updates on this? |
Apologies. I had to be away from work for an unexpected funeral and an expected wedding. I should hopefully be able to address this shortly. |
* add size filtertype * fix SyntaxWarning, revert untouched file * fix wrong dictionary key * add tests * fix tests
7.x branch updates - This release is a simplified release for only `pip` and Docker. It only works with Elasticsearch 7.x and is functionally identical to 5.8.4 - Curator is now version locked. Curator v7.x will only work with Elasticsearch v7.x - Going forward, Curator will only be released as a tarball via GitHub, as an `sdist` or `wheel` via `pip` on PyPI, and to Docker Hub. There will no longer be RPM, DEB, or Windows ZIP releases. I am sorry if this is inconvenient, but one of the reasons the development and release cycle was delayed so long is because of how painfully difficult it was to do releases. - Curator will only work with Python 3.8+, and will more tightly follow the Python version releases. - Python 3.11.1 is fully supported, and all versions of Python 3.8+ should be fully supported. - Use `hatch` and `hatchling` for package building & publishing - Because of `hatch` and `pyproject.toml`, the release version still only needs to be tracked in `curator/_version.py`. - Maintain the barest `setup.py` for building a binary version of Curator for Docker using `cx_Freeze`. - Remove `setup.cfg`, `requirements.txt`, `MANIFEST.in`, and other files as functionality is now handled by `pyproject.toml` and doing `pip install .` to grab dependencies and install them. YAY! Only one place to track dependencies now!!! - Preliminarily updated the docs. - Migrate towards `pytest` and away from `nose` tests. - Scripts provided now that aid in producing and destroying Docker containers for testing. See `docker_test/scripts/create.sh`. To spin up a numbered version release of Elasticsearch, run `docker_test/scripts/create.sh 7.17.8`. It will download any necessary images, launch them, and tell you when it's ready, as well as provide `REMOTE_ES_SERVER` environment variables for testing the `reindex` action, e.g. `REMOTE_ES_SERVER="172.16.0.1:9201" pytest --cov=curator`. These tests are skipped if this value is not provided. To clean up afterwards, run `docker_test/scripts/destroy.sh` - Add filter by size feature. #1612 (IndraGunawan) - Update Elasticsearch client to 7.17.8
Fixes #1151
Proposed Changes