-
Notifications
You must be signed in to change notification settings - Fork 537
Looking for an elasticsearch alternative #4201
Description
Is your feature request related to a problem? Please describe.
The raw data size(in ndjson) that our elasticsearch cluster currently handles is 2.5T.
We are a small company and we don't own in-house elasticsearch expertise, so our elasticsearch cluster has had numerous operational problems and we've experienced several downtimes over the past year, mainly due to poor configuration on our part.
I was impressed by the ease of use of quickwit - i was able to easily deploy a cluster and this is the main reason why i am looking to switching from elasticsearch to quickwit. We can easily setup a cluster of 3 machines with 512 GB ram each.
The query speeds for less than 10_000 rows are also very promising.
The problem is that we can't retrieve more than 10_000 records at a time.
I was able to comment out the check here https://github.com/quickwit-oss/quickwit/blob/969a96175139f5f2608b4cd6aa9b0a2b47706f6e/quickwit/quickwit-search/src/root.rs#L364C31-L364C31 and build quickwit.
The problem i still can't retrieve 500_000 results, for instance. I tried getting 10_000 results at a time and using start_offset to get the next 10_000 but the memory required to do so is larger than the actual size of the dataset. Moreover, i stopped the query after 2 minutes as the time qucikwit needed to retrieve the results was anyway too large.
Here is the very naive code i used.
let returnedHits = 0; let maxHits = 100_000; let offset = 0; do { const response = await axios.get('http://127.0.0.1:7280/api/v1/abl/search?query=*&max_hits=10000&start_offset='+offset); maxHits = response.data.num_hits; let hits = response.data.hits; //console.log(response); returnedHits += hits.length; offset += hits.length; } while(returnedHits <= maxHits);
Describe the solution you'd like
I would like to be able to retrieve 1M or 10M records at a time for a query, using something like elasticsearch scroll.
For each user that uses our platform, i would also like to store a set of records they have processed and skip them on the next processing request. If i were able to scroll through 1M records, i would be able to filter out the already processed records by keeping another data source - i don't expect quickwit to handle this although it would be amazing(if time permits i would like to look into and experiment with tenancy for this).
Describe alternatives you've considered
Maybe a low-level rust implementation can be done to solve this issue, but i don't quite understand quickwit well enough to do this. I am also not a rust developer.
Additional context
I would consider consulting services to port my elasticsearch cluster to quickwit, if this is something you are interested in.