Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) #1804

Closed
curquiza opened this issue Oct 12, 2021 · 19 comments
Closed
Labels
bug Something isn't working as expected performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption

Comments

@curquiza
Copy link
Member

Following: #1798

Seems like there is a loss of performance during indexation between v0.22.0 and v0.23.0

We need to investigate and fix this
@MarinPostma already started a first investigation about the part he updated between v0.22.0 and v0.23.0: #1798 (comment)

@curquiza curquiza added the bug Something isn't working as expected label Oct 12, 2021
@VinceG
Copy link

VinceG commented Oct 12, 2021

we've upgraded to v0.23 and the time it takes to index increased significantly since we had to re-index everything since we upgraded from v0.20 it takes hours or days to index 100K records. the servers load also increases to a point where it no longer able to handle anything and has to be rebooted.

We also tried to increase the hardware (on aws) a server with 16cores or 32 cores with 128GB ram and the same thing happens, it just runs for a few minutes and then the load increases so much it just can't do anything else.

Screen Shot 2021-10-12 at 10 45 24 AM

@curquiza
Copy link
Member Author

curquiza commented Oct 12, 2021

Hello @VinceG! This is not normal it takes that amount of time of 100K records
Do you push your documents one by one?

We always recommend pushing your documents to MeiliSearch by batch and not one by one. The maximum payload size of MeiliSearch is 100Mb and can be optionally increased. It means most of the dataset can be pushed with one call to the API.

Also this is really weird you have this issue on this big machine. We succeeded to add millions of documents using machine with 8GM of RAM.
Does your datasets only contains 100k documents?

@VinceG
Copy link

VinceG commented Oct 12, 2021

@curquiza we have more, but right now we just tried to index 100K and its still going.

we use Laravel, and Laravel Scout
https://laravel.com/docs/8.x/scout#batch-import

it seems like it does batching but since we only have 550 updates enqueued

it does send over 200 records at a time but i don't think it reaches 100MB

any idea what i can do to troubleshoot?

thank you.

@kdevan
Copy link

kdevan commented Oct 12, 2021

I have a document set of around 10 million, this is for e-commerce where each document contains a title, description, and then a handful of other short attributes like brand, merchant, and category. The titles and descriptions are no more than a few hundred characters.

First I tried on a server with 16 GB of RAM with a chunk size of 20,000 and payload size limit of "256 Mb". That took about 8 hours to get through 4 million documents. After that, it took about an additional 24 hours to add another 2 million records to get to 6 million records before I killed the process.

Now I'm trying on a server with 128 GB of RAM with a chunk size of 10,000,000 and a payload size limit of "30 Gb". It has been hanging on indexing for about 8 hours now.

My current plan is to lower the chunk size until I find something that works and then hope it can finish in a reasonable amount of time. Any suggestions are welcome.

I'm using Laravel with Scout and Postgres.

EDIT: 100,000 didn't work. 50,000 worked and best performance so far, 6 million records in about 8 hours.

@bb
Copy link

bb commented Oct 13, 2021

Thanks for transferring this here, @curquiza.

Looking at the comments above, there seem to be multiple discussions ongoing. I think the primary goal here is preventing regressions, i.e. making sure that in newer versions, things stay at least as fast as they were in an older version.

Here's some overview from my data. It's not really comparable except within a row due specific machines, OS, etc... and in general due to small sample size.

0.19.0 0.20.0 0.21.1 0.22.0 0.23.1
~ 6k records, 19MB JSON, Docker Mac
some settings regarding filterable attributes
~ 3 s ~ 34 s
~ 6k records, 19MB JSON, Docker Mac
default settings (nothing specified)
~ 37 s ~ 180 s
~ 6k records, 19MB JSON, Docker Linux
default settings (nothing specified)
~ 6.1 s ~ 5.7 s ~ 26.7 s ~ 20.8 s ~ 16.1 s
~ 800k records, Docker Linux ~ 518 s ~ 396 s ~ 551 s ~ 480 s ~ 350 s

So, there's a clear jump between 0.20 and 0.21, especially on small data sets. However, OTOH the service became much more responsive during indexing in 0.21+.

On Linux Meilisearch is getting faster and faster, but on Docker for Mac (which is actually Linux running in a VM) it gets worse from 0.22 to 0.23, at least for rather small data sets.

The observation which lead to this ticket is the jump from 0.22 to 0.23 which I observe in Docker for Mac but not on Linux. Maybe it's related to Docker for Mac's poor disk IO performance? Maybe it's too little RAM (or even swapping) - Docker Desktop is configured to 6GB) or too few cores (configured 5) where the same test on the Linux host has plenty of RAM and 12 cores.
Any hints what I could look further or what to try?

@alexchuin
Copy link

I have the same problem here, difference is huge...

I Insert 10k documents :
v0.22.0 "POST /indexes/sirene/documents HTTP/1.1" 202 14 "-" "Symfony HttpClient/Curl" 0.504601
v0.23.1 POST /indexes/sirene/documents HTTP/1.1" 202 15 "-" "Symfony HttpClient/Curl" 43.649841

0.5s versus 43.6s
I'm here if you need some tests/infos :)

@curquiza curquiza changed the title Loss of performance during indexation Not expected loss of performance during indexation (between v0.22.0 and v0.23.0) Oct 13, 2021
@curquiza curquiza changed the title Not expected loss of performance during indexation (between v0.22.0 and v0.23.0) Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) Oct 13, 2021
@ManyTheFish
Copy link
Member

Hello @alexchuin

Do you have the problem with Docker for mac too?

Which dataset do you use?

Hey @bb

Do you run your docker on a mac M1? (I don't manage to reproduce your bug on my intel)

Which dataset do you use?

Thanks to both of you for the help 😃

@alexchuin
Copy link

@ManyTheFish
Meilisearch run on Clever-cloud (L/XL instances)
It's a dataset with id, name, and two dates.

@bb
Copy link

bb commented Oct 13, 2021

@ManyTheFish no, I have an old Intel MacBook Pro. It's a private dataset which I'd need to clean up or anonymize before sharing.

@ManyTheFish
Copy link
Member

Hey @alexchuin @bb,
It's the indexing time that is impacted and not the uploading time, true?

@alexchuin
Copy link

@ManyTheFish for me it's the uploading time (when I POST 10k documents with the PHP API)

@VinceG
Copy link

VinceG commented Oct 13, 2021

FWIW, at least for us, it was the indexing updates that were queued. on an empty index it took less than a second for the first few hundred records, as we sent more documents to index (500 per batch or 1000) each update took longer than the previous one. so it was something like

1 - 0.5s
2 - 0.9s
3 - 1.1s
4 - 2.3s
.....
.....
.....
100 - 43s
101 - 46s

etc...

it kept increasing it duration with each update. Also, since we send updates to delete or update individual records those take a few seconds as well, just one document.

And to be clear i was testing with a 8 core machine with 16GB ram as well as a 92 Cores with 756GB and it was exactly the same thing. The load was always high if we sent multiple batches, seems like maybe nginx was the bottle neck or something was going on when we sent multiple updates using queue workers. if i run the batch updates in sync. it was slow but the load didn't go up.

hope that helps.

@bb
Copy link

bb commented Oct 13, 2021

I just provided @ManyTheFish with my test dataset and my settings via Slack.

The test is a fresh db folder, Meilisearch running in Docker for Mac and three calls:

  1. settings
  2. delete_all_documents
  3. add_documents (all at once)

In the last runs comparing 0.22.0 vs 0.23.1, I had on this specific dataset, I had 14 times slower indexing (1.6 s → 22.7 s) and 3x slower uploads (5 s → 17s).

The indexing time can be observed by looking at the duration field of the DocumentsAddtion request.
The uploading time can be seen by looking at the difference of enqueuedAt between ClearAll and DocumentsAddition. The calls are directly one after the other in a ruby script where the documents were already a prepared array of hashes which just needed to be serialized.

0.22.0 status updates

(note: updateIDs are higher here because I had same settings call multiple before)

  {
        "status": "processed",
        "updateId": 5,
        "type": {
            "name": "ClearAll"
        },
        "duration": 0.015,
        "enqueuedAt": "2021-10-13T20:24:41.363020307Z",
        "processedAt": "2021-10-13T20:24:41.386188054Z"
    },
    {
        "status": "processed",
        "updateId": 6,
        "type": {
            "name": "DocumentsAddition",
            "number": 5942
        },
        "duration": 1.605,
        "enqueuedAt": "2021-10-13T20:24:46.195683759Z",
        "processedAt": "2021-10-13T20:24:47.810306346Z"
    }

0.23.1 status updates

    {
        "status": "processed",
        "updateId": 1,
        "type": {
            "name": "ClearAll"
        },
        "duration": 0.005,
        "enqueuedAt": "2021-10-13T20:26:25.268266946Z",
        "processedAt": "2021-10-13T20:26:25.277046910Z"
    },
    {
        "status": "processed",
        "updateId": 2,
        "type": {
            "name": "DocumentsAddition",
            "number": 5942
        },
        "duration": 22.701999999999998,
        "enqueuedAt": "2021-10-13T20:26:42.306488448Z",
        "processedAt": "2021-10-13T20:27:05.022123113Z"
    }

By the way, the Ruby script's HTTP call of add_documents timed out in 0.23 but not in 0.22. My timeout is configured to be 10 s.

Net::ReadTimeout: Net::ReadTimeout with #<TCPSocket:(closed)>
from /Users/bb/.rvm/rubies/ruby-3.0.2/lib/ruby/3.0.0/net/protocol.rb:219:in `rbuf_fill'

@bb
Copy link

bb commented Oct 13, 2021

@VinceG I also observed consecutive updates on a non-empty DB becoming slower, however that was already the case in earlier versions and AFAIK not different between v0.22 and v0.23.

You should really try bigger batch size - for me, it's no problem to send 1 million documents at once... this is a giant payload and takes several minutes. If you need to send smaller batches, you
could set searchableAttributes and filterableAttributes to [] before adding documents and to your specific settings again afterwards.

@bb
Copy link

bb commented Oct 14, 2021

The reason for the slowdown in 0.23 on Docker for Mac is the way bind-mounts are handled. The regression I observed appears only when running in Docker and only when using a bind-mount, i.e. a local path. (compared to a named volume).

This compose.yaml is slow (but wasn't slow in 0.22 yet):

services:
  meilisearch:
    restart: "no"
    image: getmeili/meilisearch:v0.23.1
    volumes:
      - ./data-meili:/data.ms
    environment:
      MEILI_ENV: development
    ports:
      - 127.0.0.1:7700:7700

while this is fast:

services:
  meilisearch:
    restart: "no"
    image: getmeili/meilisearch:v0.23.1
    volumes:
      - meili:/data.ms
    environment:
      MEILI_ENV: development
    ports:
      - 127.0.0.1:7700:7700
volumes:
  meili:

Not specifing any volume is also fast.

From Docker's side it's because it's manually trying to keep the internal volume and the host file system in sync causing a lot of copying. So I guess 0.22 just kept the files opened and 0.23 is opening and closing the file which makes Docker for Mac very busy.

@ManyTheFish is already working on a fix and can probably explain more.

@curquiza
Copy link
Member Author

#1824 fixes part of the issue: the issue when using docker using a mounted volume.
The loss of performance when loading documents is still present and Marin is working on it.

@curquiza curquiza added this to the v0.24.0 milestone Oct 18, 2021
@MarinPostma
Copy link
Contributor

We have just merged some optimizations that should make ingestion time more acceptable in #1847. This is not gonna be as fast as before, but it is a serious improvement nonetheless.

@curquiza curquiza added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Oct 26, 2021
@curquiza curquiza removed this from the v0.24.0 milestone Oct 26, 2021
@mgralikowski
Copy link

The same observation with 0.24.0 and DigitalOcean droplet.

First new documents take less than a second. Batch inserts take ~3 seconds. When we have more data like 100+ every next insert takes more and more time. For now, it is more than 1 minute :( Any updates about it? In this situation, we will be forced to change a search driver, because users can't wait minutes to see their records.

No difference between the 2 or 4 CPU DO droplets.

@curquiza
Copy link
Member Author

curquiza commented Feb 2, 2022

Hello everyone reading this issue.

The issue regarding the loss of performance between v0.22.0 and v0.23.0 has been fixed. Since it's the specific purpose of this issue, I close it.

However, for anyone having issues in general with indexation performance, please, refer to this discussion. Your feedback will be really appreciated 🙂

@curquiza curquiza closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
Projects
None yet
Development

No branches or pull requests

8 participants