Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) #1804

curquiza · 2021-10-12T16:05:02Z

Following: #1798

Seems like there is a loss of performance during indexation between v0.22.0 and v0.23.0

We need to investigate and fix this
@MarinPostma already started a first investigation about the part he updated between v0.22.0 and v0.23.0: #1798 (comment)

VinceG · 2021-10-12T17:49:43Z

we've upgraded to v0.23 and the time it takes to index increased significantly since we had to re-index everything since we upgraded from v0.20 it takes hours or days to index 100K records. the servers load also increases to a point where it no longer able to handle anything and has to be rebooted.

We also tried to increase the hardware (on aws) a server with 16cores or 32 cores with 128GB ram and the same thing happens, it just runs for a few minutes and then the load increases so much it just can't do anything else.

curquiza · 2021-10-12T17:56:28Z

Hello @VinceG! This is not normal it takes that amount of time of 100K records
Do you push your documents one by one?

We always recommend pushing your documents to MeiliSearch by batch and not one by one. The maximum payload size of MeiliSearch is 100Mb and can be optionally increased. It means most of the dataset can be pushed with one call to the API.

Also this is really weird you have this issue on this big machine. We succeeded to add millions of documents using machine with 8GM of RAM.
Does your datasets only contains 100k documents?

VinceG · 2021-10-12T18:01:10Z

@curquiza we have more, but right now we just tried to index 100K and its still going.

we use Laravel, and Laravel Scout
https://laravel.com/docs/8.x/scout#batch-import

it seems like it does batching but since we only have 550 updates enqueued

it does send over 200 records at a time but i don't think it reaches 100MB

any idea what i can do to troubleshoot?

thank you.

kdevan · 2021-10-12T18:47:13Z

I have a document set of around 10 million, this is for e-commerce where each document contains a title, description, and then a handful of other short attributes like brand, merchant, and category. The titles and descriptions are no more than a few hundred characters.

First I tried on a server with 16 GB of RAM with a chunk size of 20,000 and payload size limit of "256 Mb". That took about 8 hours to get through 4 million documents. After that, it took about an additional 24 hours to add another 2 million records to get to 6 million records before I killed the process.

Now I'm trying on a server with 128 GB of RAM with a chunk size of 10,000,000 and a payload size limit of "30 Gb". It has been hanging on indexing for about 8 hours now.

My current plan is to lower the chunk size until I find something that works and then hope it can finish in a reasonable amount of time. Any suggestions are welcome.

I'm using Laravel with Scout and Postgres.

EDIT: 100,000 didn't work. 50,000 worked and best performance so far, 6 million records in about 8 hours.

bb · 2021-10-13T10:51:35Z

Thanks for transferring this here, @curquiza.

Looking at the comments above, there seem to be multiple discussions ongoing. I think the primary goal here is preventing regressions, i.e. making sure that in newer versions, things stay at least as fast as they were in an older version.

Here's some overview from my data. It's not really comparable except within a row due specific machines, OS, etc... and in general due to small sample size.

	0.19.0	0.20.0	0.21.1	0.22.0	0.23.1
~ 6k records, 19MB JSON, Docker Mac some settings regarding filterable attributes				~ 3 s	~ 34 s
~ 6k records, 19MB JSON, Docker Mac default settings (nothing specified)				~ 37 s	~ 180 s
~ 6k records, 19MB JSON, Docker Linux default settings (nothing specified)	~ 6.1 s	~ 5.7 s	~ 26.7 s	~ 20.8 s	~ 16.1 s
~ 800k records, Docker Linux	~ 518 s	~ 396 s	~ 551 s	~ 480 s	~ 350 s

So, there's a clear jump between 0.20 and 0.21, especially on small data sets. However, OTOH the service became much more responsive during indexing in 0.21+.

On Linux Meilisearch is getting faster and faster, but on Docker for Mac (which is actually Linux running in a VM) it gets worse from 0.22 to 0.23, at least for rather small data sets.

The observation which lead to this ticket is the jump from 0.22 to 0.23 which I observe in Docker for Mac but not on Linux. Maybe it's related to Docker for Mac's poor disk IO performance? Maybe it's too little RAM (or even swapping) - Docker Desktop is configured to 6GB) or too few cores (configured 5) where the same test on the Linux host has plenty of RAM and 12 cores.
Any hints what I could look further or what to try?

alexchuin · 2021-10-13T12:49:16Z

I have the same problem here, difference is huge...

I Insert 10k documents :
v0.22.0 "POST /indexes/sirene/documents HTTP/1.1" 202 14 "-" "Symfony HttpClient/Curl" 0.504601
v0.23.1 POST /indexes/sirene/documents HTTP/1.1" 202 15 "-" "Symfony HttpClient/Curl" 43.649841

0.5s versus 43.6s
I'm here if you need some tests/infos :)

ManyTheFish · 2021-10-13T13:59:07Z

Hello @alexchuin

Do you have the problem with Docker for mac too?

Which dataset do you use?

Hey @bb

Do you run your docker on a mac M1? (I don't manage to reproduce your bug on my intel)

Which dataset do you use?

Thanks to both of you for the help 😃

alexchuin · 2021-10-13T14:10:41Z

@ManyTheFish
Meilisearch run on Clever-cloud (L/XL instances)
It's a dataset with id, name, and two dates.

bb · 2021-10-13T14:18:31Z

@ManyTheFish no, I have an old Intel MacBook Pro. It's a private dataset which I'd need to clean up or anonymize before sharing.

ManyTheFish · 2021-10-13T14:43:42Z

Hey @alexchuin @bb,
It's the indexing time that is impacted and not the uploading time, true?

alexchuin · 2021-10-13T14:46:46Z

@ManyTheFish for me it's the uploading time (when I POST 10k documents with the PHP API)

VinceG · 2021-10-13T15:38:35Z

FWIW, at least for us, it was the indexing updates that were queued. on an empty index it took less than a second for the first few hundred records, as we sent more documents to index (500 per batch or 1000) each update took longer than the previous one. so it was something like

1 - 0.5s
2 - 0.9s
3 - 1.1s
4 - 2.3s
.....
.....
.....
100 - 43s
101 - 46s

etc...

it kept increasing it duration with each update. Also, since we send updates to delete or update individual records those take a few seconds as well, just one document.

And to be clear i was testing with a 8 core machine with 16GB ram as well as a 92 Cores with 756GB and it was exactly the same thing. The load was always high if we sent multiple batches, seems like maybe nginx was the bottle neck or something was going on when we sent multiple updates using queue workers. if i run the batch updates in sync. it was slow but the load didn't go up.

hope that helps.

bb · 2021-10-13T20:55:37Z

I just provided @ManyTheFish with my test dataset and my settings via Slack.

The test is a fresh db folder, Meilisearch running in Docker for Mac and three calls:

settings
delete_all_documents
add_documents (all at once)

In the last runs comparing 0.22.0 vs 0.23.1, I had on this specific dataset, I had 14 times slower indexing (1.6 s → 22.7 s) and 3x slower uploads (5 s → 17s).

The indexing time can be observed by looking at the duration field of the DocumentsAddtion request.
The uploading time can be seen by looking at the difference of enqueuedAt between ClearAll and DocumentsAddition. The calls are directly one after the other in a ruby script where the documents were already a prepared array of hashes which just needed to be serialized.

0.22.0 status updates

(note: updateIDs are higher here because I had same settings call multiple before)

  {
        "status": "processed",
        "updateId": 5,
        "type": {
            "name": "ClearAll"
        },
        "duration": 0.015,
        "enqueuedAt": "2021-10-13T20:24:41.363020307Z",
        "processedAt": "2021-10-13T20:24:41.386188054Z"
    },
    {
        "status": "processed",
        "updateId": 6,
        "type": {
            "name": "DocumentsAddition",
            "number": 5942
        },
        "duration": 1.605,
        "enqueuedAt": "2021-10-13T20:24:46.195683759Z",
        "processedAt": "2021-10-13T20:24:47.810306346Z"
    }

0.23.1 status updates

    {
        "status": "processed",
        "updateId": 1,
        "type": {
            "name": "ClearAll"
        },
        "duration": 0.005,
        "enqueuedAt": "2021-10-13T20:26:25.268266946Z",
        "processedAt": "2021-10-13T20:26:25.277046910Z"
    },
    {
        "status": "processed",
        "updateId": 2,
        "type": {
            "name": "DocumentsAddition",
            "number": 5942
        },
        "duration": 22.701999999999998,
        "enqueuedAt": "2021-10-13T20:26:42.306488448Z",
        "processedAt": "2021-10-13T20:27:05.022123113Z"
    }

By the way, the Ruby script's HTTP call of add_documents timed out in 0.23 but not in 0.22. My timeout is configured to be 10 s.

Net::ReadTimeout: Net::ReadTimeout with #<TCPSocket:(closed)>
from /Users/bb/.rvm/rubies/ruby-3.0.2/lib/ruby/3.0.0/net/protocol.rb:219:in `rbuf_fill'

bb · 2021-10-13T21:01:19Z

@VinceG I also observed consecutive updates on a non-empty DB becoming slower, however that was already the case in earlier versions and AFAIK not different between v0.22 and v0.23.

You should really try bigger batch size - for me, it's no problem to send 1 million documents at once... this is a giant payload and takes several minutes. If you need to send smaller batches, you
could set searchableAttributes and filterableAttributes to [] before adding documents and to your specific settings again afterwards.

bb · 2021-10-14T15:47:19Z

The reason for the slowdown in 0.23 on Docker for Mac is the way bind-mounts are handled. The regression I observed appears only when running in Docker and only when using a bind-mount, i.e. a local path. (compared to a named volume).

This compose.yaml is slow (but wasn't slow in 0.22 yet):

services:
  meilisearch:
    restart: "no"
    image: getmeili/meilisearch:v0.23.1
    volumes:
      - ./data-meili:/data.ms
    environment:
      MEILI_ENV: development
    ports:
      - 127.0.0.1:7700:7700

while this is fast:

services:
  meilisearch:
    restart: "no"
    image: getmeili/meilisearch:v0.23.1
    volumes:
      - meili:/data.ms
    environment:
      MEILI_ENV: development
    ports:
      - 127.0.0.1:7700:7700
volumes:
  meili:

Not specifing any volume is also fast.

From Docker's side it's because it's manually trying to keep the internal volume and the host file system in sync causing a lot of copying. So I guess 0.22 just kept the files opened and 0.23 is opening and closing the file which makes Docker for Mac very busy.

@ManyTheFish is already working on a fix and can probably explain more.

curquiza · 2021-10-18T13:40:14Z

#1824 fixes part of the issue: the issue when using docker using a mounted volume.
The loss of performance when loading documents is still present and Marin is working on it.

MarinPostma · 2021-10-26T14:36:25Z

We have just merged some optimizations that should make ingestion time more acceptable in #1847. This is not gonna be as fast as before, but it is a serious improvement nonetheless.

mgralikowski · 2021-11-28T20:43:00Z

The same observation with 0.24.0 and DigitalOcean droplet.

First new documents take less than a second. Batch inserts take ~3 seconds. When we have more data like 100+ every next insert takes more and more time. For now, it is more than 1 minute :( Any updates about it? In this situation, we will be forced to change a search driver, because users can't wait minutes to see their records.

No difference between the 2 or 4 CPU DO droplets.

curquiza · 2022-02-02T10:34:20Z

Hello everyone reading this issue.

The issue regarding the loss of performance between v0.22.0 and v0.23.0 has been fixed. Since it's the specific purpose of this issue, I close it.

However, for anyone having issues in general with indexation performance, please, refer to this discussion. Your feedback will be really appreciated 🙂

curquiza added the bug Something isn't working as expected label Oct 12, 2021

curquiza mentioned this issue Oct 12, 2021

Automatic ID inference issue #1798

Closed

curquiza changed the title ~~Loss of performance during indexation~~ Not expected loss of performance during indexation (between v0.22.0 and v0.23.0) Oct 13, 2021

curquiza changed the title ~~Not expected loss of performance during indexation (between v0.22.0 and v0.23.0)~~ Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) Oct 13, 2021

curquiza added this to the v0.24.0 milestone Oct 18, 2021

curquiza mentioned this issue Oct 26, 2021

Optimize document transform #1847

Merged

curquiza added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Oct 26, 2021

curquiza removed this from the v0.24.0 milestone Oct 26, 2021

curquiza closed this as completed Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) #1804

Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) #1804

curquiza commented Oct 12, 2021

VinceG commented Oct 12, 2021 •

edited

curquiza commented Oct 12, 2021 •

edited

VinceG commented Oct 12, 2021

kdevan commented Oct 12, 2021 •

edited

bb commented Oct 13, 2021

alexchuin commented Oct 13, 2021

ManyTheFish commented Oct 13, 2021

alexchuin commented Oct 13, 2021

bb commented Oct 13, 2021

ManyTheFish commented Oct 13, 2021

alexchuin commented Oct 13, 2021

VinceG commented Oct 13, 2021

bb commented Oct 13, 2021 •

edited

bb commented Oct 13, 2021

bb commented Oct 14, 2021

curquiza commented Oct 18, 2021

MarinPostma commented Oct 26, 2021

mgralikowski commented Nov 28, 2021

curquiza commented Feb 2, 2022

Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) #1804

Not-expected loss of performance during indexation (between v0.22.0 and v0.23.0) #1804

Comments

curquiza commented Oct 12, 2021

VinceG commented Oct 12, 2021 • edited

curquiza commented Oct 12, 2021 • edited

VinceG commented Oct 12, 2021

kdevan commented Oct 12, 2021 • edited

bb commented Oct 13, 2021

alexchuin commented Oct 13, 2021

ManyTheFish commented Oct 13, 2021

Hello @alexchuin

Hey @bb

alexchuin commented Oct 13, 2021

bb commented Oct 13, 2021

ManyTheFish commented Oct 13, 2021

alexchuin commented Oct 13, 2021

VinceG commented Oct 13, 2021

bb commented Oct 13, 2021 • edited

0.22.0 status updates

0.23.1 status updates

bb commented Oct 13, 2021

bb commented Oct 14, 2021

curquiza commented Oct 18, 2021

MarinPostma commented Oct 26, 2021

mgralikowski commented Nov 28, 2021

curquiza commented Feb 2, 2022

VinceG commented Oct 12, 2021 •

edited

curquiza commented Oct 12, 2021 •

edited

kdevan commented Oct 12, 2021 •

edited

bb commented Oct 13, 2021 •

edited