Looking for an elasticsearch alternative

**Is your feature request related to a problem? Please describe.**

The raw data size(in ndjson) that our elasticsearch cluster currently handles is 2.5T. 
We are a small company and we don't own in-house elasticsearch expertise, so our elasticsearch cluster has had numerous operational problems and we've experienced several downtimes over the past year, mainly due to poor configuration on our part.

I was impressed by the ease of use of quickwit - i was able to easily deploy a cluster and this is the main reason why i am looking to switching from elasticsearch to quickwit. We can easily setup a cluster of 3 machines with 512 GB ram each.

The query speeds for less than 10_000 rows are also very promising.

The problem is that we can't retrieve more than 10_000 records at a time. 

I was able to comment out the check here https://github.com/quickwit-oss/quickwit/blob/969a96175139f5f2608b4cd6aa9b0a2b47706f6e/quickwit/quickwit-search/src/root.rs#L364C31-L364C31 and build quickwit.

The problem i still can't retrieve 500_000 results, for instance. I tried getting 10_000 results at a time and using start_offset to get the next 10_000 but the memory required to do so is larger than the actual size of the dataset. Moreover, i stopped the query after 2 minutes as the time qucikwit needed to retrieve the results was anyway too large.

Here is the very naive code i used.

` let returnedHits = 0;
    let maxHits = 100_000;
    let offset = 0;
    do {
      const response = await axios.get('http://127.0.0.1:7280/api/v1/abl/search?query=*&max_hits=10000&start_offset='+offset);
      maxHits = response.data.num_hits;
      let hits = response.data.hits;
      //console.log(response);
      returnedHits += hits.length;
      offset += hits.length;
    } while(returnedHits <= maxHits);`

**Describe the solution you'd like**

I would like to be able to retrieve 1M or 10M records at a time for a query, using something like elasticsearch scroll.

For each user that uses our platform, i would also like to store a set of records they have processed and skip them on the next processing request. If i were able to scroll through 1M records, i would be able to filter out the already processed records by keeping another data source - i don't expect quickwit to handle this although it would be amazing(if time permits i would like to look into and experiment with tenancy for this).

**Describe alternatives you've considered**

Maybe a low-level rust implementation can be done to solve this issue, but i don't quite understand quickwit well enough to do this. I am also not a rust developer.

**Additional context**

I would consider consulting services to port my elasticsearch cluster to quickwit, if this is something you are interested in.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looking for an elasticsearch alternative #4201

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Looking for an elasticsearch alternative #4201

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions