Skip to content

Asynchronous indexing can break large servers #30012

@gamort

Description

@gamort

Large stores with extremely active product databases can suffer many related problems with indexers not being kept up to date. Products can disappear from the website, product data can be out of date on the website, excessive cache flushing can cause huge server load spikes.

This is due to the manner in which triggers are created by the core magento code in

INSERT INTO will insert a new record regardless of whether one exists or not.
REPLACE INTO will as the name says, replace a record - and in this case it will generate a new version_id since that is an auto_increment field.

Preconditions (*)

  1. An extremely large database. Example 500,000 products
  2. An active update schedule

Steps to reproduce (*)

  1. Create a sample database with 500,000 products
  2. Enable Asynchronous indexing
  3. Reduce crons to every 10 minutes
  4. Run a mass script to change the inventory for the products every 30 seconds. So within 10 minutes, you will have made 10 million updates

Expected result (*)

  1. When the indexer runs, it will index the latest product information once for each index

Actual result (*)

It will actually try to index each product 20 times, once for each update

Additional information to reproduce the issue

  1. Change the price of the product with SKU 24-MB01 20 times via API rest/all/V1/products/24-MB01
  2. Doing the price change data inserted in the table catalog_product_price_cl
  3. Adjust the DEFAULT_BATCH_SIZE in file lib/internal/Magento/Framework/Mview/View.php with 10 from 1000.
  4. Enable the DB logs via the bin/magento dev:query-log:enable command.
  5. grep the text to get the SQL query on table catalog_product_price_cl. Please find attached the screenshot of db.log:

image

The query always returns entity_id 1 each time.

In this case, it seems that entity_id 1 will process many times, instead of only 1 time, and should be on the latest one.

Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.

Metadata

Metadata

Labels

Area: FrameworkComponent: IndexerIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedPriority: P0This generally occurs in cases when the entire functionality is blocked.Priority: P1Once P0 defects have been fixed, a defect having this priority is the next candidate for fixing.Progress: doneReproduced on 2.4.xThe issue has been reproduced on latest 2.4-develop branchSeverity: S0A problem that is blocking the ability to work. An immediate fix is needed.Triage: Dev.ExperienceIssue related to Developer Experience and needs help with Triage to Confirm or Reject itTriage: Performance

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions