Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog_product_index_price reindexing occasionally resulting in missing products on site #35616

Closed
1 of 5 tasks
rob-gonz opened this issue Jun 14, 2022 · 23 comments
Closed
1 of 5 tasks
Assignees
Labels
Issue: needs update Additional information is require, waiting for response Reported on 2.4.3-p2 Indicates original Magento version for the Issue report. Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it

Comments

@rob-gonz
Copy link

Preconditions and environment

  • Magento 2.4.3-p2
  • Running in split database mode
  • PHP 7.4

Steps to reproduce

This bug is hard to reproduce when you want it to occur. It may be easier to reproduce if you were to have everything set up just perfectly. This includes scripts controlling which and how many products to pull in a product collection on a clean cache(no cache).

  1. Use an instance which has more products than what the current batch size is for catalog_product_index_price.
  2. Be sure nothing outside of configuration has been cached (makes replication easier, caching does not have to be disabled)
  3. Begin watching the record count in catalog_product_index_price
  4. Trigger a catalog_product_index_price reindex by invalidating/resetting the index (DO NOT run it manually/directly)
  5. Watch the count in catalog_product_index_price, you will see it drop by the batch_size used for catalog_product_index_price reindexing.
  6. At this point run queries/requests/collections to get product information.

Expected result

catalog_product_index_price (or any table index) should NEVER have the number of items in it reduced. Item information should always be here it should never be removed at any time, for any purpose.

Actual result

What will happen is products that are removed from the catalog_product_index_price index for the purpose of reindexing will be missing from the queries/collections and they will get cached that way in magento2. If you are running a properly configured environment this will also cause the frontend categories to also be missing these products and can be cached not only in M2 but in edge caching. Moreover, when using a 3rd party search tool these products will be missing entirely from the search.

Additional information

Why is this an issue?
We have experienced occasional loss of sales due to missing products on our site. We have been tracking this issue for a bit. We couldn't believe it at first and it still feels like this is a major oversight unless we are missing something.

How did you discover the bug
We realized that when doing a manually reindex (forced) it would create a replica table and rename. That's fine - that works great. However, when reindex is running from the cronjob it is using the DELETE command to delete a batch of N products and then insert a batch of N products. This means that at any given time during indexing you could be missing up to 10,000 versions/variations/pricing of a product.

How do you expect Adobe to Fix This
Stop deleting items from this table and then inserting them. Use the 'REPLACE INTO' Query to make updates. If there are records that do need to be removed those should be removed through a separate deletion or by periodically running reindex that involves a table rename.

** Additional note**
Consider not using batch size paging that uses 'count(*)' to determine what items to page through next. When you get up beyond 8 or 10 million records in a database it slows the performance down. Use the entity_id to key off for paging it is SIGNIFICANTLY faster. We have already modified an indexer because it was taking several hours to run. We increased the speed 7x with this change.

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
@m2-assistant
Copy link

m2-assistant bot commented Jun 14, 2022

Hi @rob-gonz. Thank you for your report.
To speed up processing of this issue, make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, review the Magento Contributor Assistant documentation.

Add a comment to assign the issue: @magento I am working on this

To learn more about issue processing workflow, refer to the Code Contributions.


⚠️ According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.

🕙 You can find the schedule on the Magento Community Calendar page.

📞 The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

✏️ Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel

@rob-gonz
Copy link
Author

Please note. The documentation states that catalog_product_index_price using table switching (https://developer.adobe.com/commerce/php/development/components/indexing/optimization/). It does not use table switching when the indexer is running off of the cronjob. We have confirmed this - maybe it does on occasion. Typically it does not.

@m2-assistant
Copy link

m2-assistant bot commented Jun 20, 2022

Hi @engcom-Delta. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Add label Issue: Confirmed once verification is complete.

  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@duckchip
Copy link
Contributor

@rob-gonz We are experiencing some very similar issues on 2.4.3-p2 with a product database of 30K products
Would you mind sharing the patch you applied?

Thx!!

@m2-assistant
Copy link

m2-assistant bot commented Jul 1, 2022

Hi @engcom-November. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Add label Issue: Confirmed once verification is complete.

  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@engcom-November
Copy link
Contributor

Verified the issue on Magento 2.4-develop branch but could not able to reproduce the issue.
No items are getting reduced in catalog_product_index_price db table after triggering a catalog_product_index_price reindex by invalidating/resetting the index
image
image
Kindly recheck the issue on Magento 2.4-develop branch as its the upcoming 2.4.x release and provide missing steps if any if the issue is still reproducible.

@engcom-November engcom-November added the Issue: needs update Additional information is require, waiting for response label Jul 8, 2022
@m2-community-project m2-community-project bot moved this from Ready for Confirmation to Needs Update in Issue Confirmation and Triage Board Jul 8, 2022
@rob-gonz
Copy link
Author

rob-gonz commented Jul 8, 2022

@engcom-November This is a scale problem with the M2 handling of catalog_product_index_price indexer. You are not testing the scenario described.

As stated above, the number of products you have used to test this scenario is significantly insufficient. You have not replicated the scenario I described and that is why you were not able to reproduce the issue. The speed at which the database updates is beyond the capability of a human to run a select search at the cmd line for catalog_product_index_price. The 462 rows is complete in ~ 1 second, if not much less, once the query is made. Even then, the insert command used will cause table locking for a majority of the request because the table has to lock while building its index making it even harder for your Select to detect an issue at such a small count.

We are experiencing this on a catalog_product_index_price index table with > 16 million rows, with the default batch size of 10k. You will only experience the issue if you have more than 10k products. Even then, you will only be able to detect the issue if you have more than probably 50k products. Preferably, you should test with a minimum of 100k products in the catalog_product_index_price. Even then a script will be necessary to pull the count(*) of catalog_product_index_price 2 to 3 times per second. Then it will become obvious what we are describing. At the size of our catalog_product_index_price it is much easier to observe.

For example:
image
You can see in the image above the catalog_product_index_price has fewer products than the replica table. On a forced re-index, the replica table is fully built and then copied/renamed to catalog_product_index_price. On a reset/cronjob re-index things are reindexed in place, however, data is deleted from the index. This perfectly shows the issue at hand.

When you have 24/7 traffic and are constantly running automated price, inventory, and content updates the indexer is always reindexing something. Meaning there is a significantly higher chance for the collection to not get the proper products it would normally. This is what we are experiencing.

@rob-gonz
Copy link
Author

rob-gonz commented Jul 11, 2022

@rob-gonz We are experiencing some very similar issues on 2.4.3-p2 with a product database of 30K products Would you mind sharing the patch you applied?

Thx!!

@duckchip, glad to hear others pay attention to indexers as much as we do. We do not currently have a fix for this. We know what needs to be fixed, but we are looking to Adobe to fix the issue because they are making significant database design changes in the coming versions. It's also a core issue they need to address. I've heard that a 2.4.4+ update will have a direct effect on this indexer table.

@rob-gonz
Copy link
Author

@engcom-November, This is affecting users' experience. When can we expect a fix?

@bka
Copy link
Contributor

bka commented Jul 24, 2022

Thank you @rob-gonz for sharing this. We have similar symptoms in one of our shops and I think we are experiencing the same issue here. I want to share some of my findings related to this issue.

TLDR: As already mentioned catalog_product_index_price should never have items reduced during reindexing, even if the time window is small, it may lead to missing products.

Occasionally, for no reason we have some categories in frontend showing the message: "We can't find products matching the selection.", although there are products in these categories. After clearing the cache, everything is fine again. There was not way to reproduce this, until I found this issue and gave it another try.

I created a test script to reproduce this issue. I will attach it here. Let me explain, how it works:

  1. It fakes some product data updates by inserting entity_ids into table catalog_product_price_cl.
  2. It will watch the count of table catalog_product_index_price and print any differences.
  3. The price indexer (cron) will fetch updates from the changelog table and perform a partial reindex.

vokoscreenNG-2022-07-24_08-43-40 mkv

You will see the count changing during reindex process. It is only a small time window but in some scenarios it may be enough to encounter empty categories in frontend. I think we are encountering this issue so often because we are doing a lot of price updates in this shop which leads to many price index updates.

Digging into deeper into the code, I think the root cause is inside Catalog/Model/Indexer/Product/Price/AbstractAction.php:_reindexRows. We have a deleteIndexData call there, which reduces the number of entries inside the index table for a short period of time, until new data is inserted.

Selection_336

If I remove this function call everything seems to be fine and the issue does not appear anymore. I'm aware that this is not a proper solution but I will try it as a workaround because some orphaned entries inside catalog_product_index_price are much better than empty categories in frontend.

I can confirm this issue on a Mageto 2.4.4 installation. I did not test latest 2.4-develop, but I expect the issue to be still present because code of Catalog/Model/Indexer/Product/Price/AbstractAction.php:_reindexRows is very much the same.

@rob-gonz
Copy link
Author

@bka, thank you for the work you did to provide even more evidence of the issue and provide detailed findings here for the community.

Cheers!

We'll see what we can come up with when we explore the issue again.

@lbajsarowicz
Copy link
Contributor

lbajsarowicz commented Jul 27, 2022

Very well-known issue, sorry I haven't noticed that before.

You can easily fix it with the patch from @magento/creativestyle team:
https://github.com/magesuite/magento-patches/blob/9271a0fe59ca008c04ad1b73b6581a1d4582754e/essential/magento/module-catalog/m242-wrap-price-indexer-with-transaction.patch#L1-L79

@bka
Copy link
Contributor

bka commented Jul 27, 2022

@lbajsarowicz thanks for that hint

@diwipl
Copy link

diwipl commented Jul 27, 2022

Honestly this is just the tip of the iceberg, most of the indexes have the same behavior (lack of transaction), there are much more patches we created for similar issues as yours:

https://github.com/magesuite/magento-patches/blob/master/essential/magento/module-inventory-indexer/wrap-inventory-indexer-with-transaction.patch
https://github.com/magesuite/magento-patches/blob/master/essential/magento/module-catalog-inventory/wrap-cataloginventory-indexer-with-transaction.patch
https://github.com/magesuite/magento-patches/blob/master/essential/magento/module-catalog-search/m242-remove-deletion-of-products-in-partial-indexing.patch
https://github.com/magesuite/magento-patches/blob/9271a0fe59ca008c04ad1b73b6581a1d4582754e/essential/smile/elasticsuite/m24-search-index-delete-products-directly-before-inserting.patch

@hostep
Copy link
Contributor

hostep commented Jul 28, 2022

Great info here!

Why aren't these fixes being contributed back as pull requests to Magento? (I understand the PR process is a big pain in the ass here, but at least it would make the fixes more visible to the entire community)

@diwipl
Copy link

diwipl commented Jul 28, 2022

To be honest, nobody from our side had time to create PRs taking into account how long the process takes.

Since we are open source with our patches we are open for anyone to take those changes and prepare proper PRs based on them.

@engcom-November engcom-November added the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Aug 2, 2022
@engcom-November engcom-November removed their assignment Aug 2, 2022
@m2-assistant
Copy link

m2-assistant bot commented Aug 4, 2022

Hi @engcom-Echo. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

    1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
      DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
    1. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.
    1. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
    1. Verify that the issue is reproducible on 2.4-develop branch
      Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
      - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
      - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

@hostep
Copy link
Contributor

hostep commented Aug 5, 2022

I've noticed 2 commits that got merged here 11 hours ago, under the name "ACP2E-1028: Current Indexer implementation for large catalogs is causing items to not appear on site", that sounds related to this issue report. Unfortunately there is no clean merge commit, it's part of a huge merge commit with a bunch of different tickets, so here are the individual commits:

Not sure if these will fix the problem, but the description sounds related...

@engcom-Echo
Copy link
Contributor

Hello @rob-gonz,

Thanks for raising your concern and make patience.

We have tried to reproduce the issue in Magento 2.4-develop branch, but for us, the issue is not reproducible. We have followed the below steps:

  1. Install the fresh Magento instance.
  2. Create the approx 40 k product.
  3. Run the cache flush.
  4. Run the reindex for catalog_product_index_price.
  5. Check the count catalog_product_index_price.

As per your suggestion we have check on more than 10 k product but we are not getting the any differences.
we are getting the same count in catalog_product_index_price and catalog_product_index_price_replica also.

Please find the below screen shot for your references.
Screen for before reindex and after the reindex of catalog_product_index_price.

image

Screen for count of catalog_product_index_price and catalog_product_index_price_replica table.

image

Please check if i missed anything and kindly check on Magento 2.4-develop branch and provide more information with exact steps and screenshot so that we can reproduce this issue.

Thanks

@rob-gonz
Copy link
Author

@engcom-Echo , I appreciate you are still trying to reproduce - I really do. However, you failed to follow the steps I outlined. If they are not followed exactly you can not reproduce the issue. The initial reindex is NOT the problem. It is once a reindex is occurring based on a cronjob. The issue also only occurs during the reindex NOT after. Running the reindex directly runs a reindex that generates the replica then swaps the tables. The cronjob reindexes for this table does it in place.

  1. Install the fresh Magento instance.
  2. Create the approx 40 k product.
  3. Run the cache flush.
  4. Run the reindex for catalog_product_index_price (bin/magento index:reindex). This gives you a start state.
  5. Update the 40k products by adding 1 penny to their price or adding inventory to them. This queues up a backlog of changes
  6. Build a script that will capture and log the count of catalog_product_index_price and catalog_product_index_price_replica every .5s. (Maybe do this much sooner than this step, maybe this is step 2a)
  7. Reset the catalog_product_index_price (bin/magento index:reset catalogproduct_price or whatever the correct name is.)
  8. Immediately run the script built in step 6.
  9. Wait for catalog_product_index_price to be completely reindexed.
  10. Review your logs from the step 6 script. You will see instances where the replica and the primary index do NOT have the same number of rows.

With only 40k products you are still going to be challenged to see it by hand. It can not be done after a reindex. It has to be done during a reindex to see the problem. Running a script will greatly increase your chances of reproducing it. You still may have to try a time or two. Maybe add multiple storefronts and associate the products so you can multiply the number of products in the index.

I could reproduce it by hand because my catalog_product_price_index table has 15+ million rows. I had over an hour in which I could reproduce the issue. Meaning there was over an hour in which items on my site could be queried, perceived to not exist, and thus be cached as if they did not exist for many hours after resulting in lost sales.

If it helps you at all this: https://github.com/magesuite/magento-patches/blob/9271a0fe59ca008c04ad1b73b6581a1d4582754e/essential/magento/module-catalog/m242-wrap-price-indexer-with-transaction.patch#L1-L79 posted by @lbajsarowicz actually fixes the issue.

@m2-assistant
Copy link

m2-assistant bot commented Oct 25, 2022

Hi @engcom-Hotel. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Add label Issue: Confirmed once verification is complete.

  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@engcom-Hotel engcom-Hotel added the Reported on 2.4.3-p2 Indicates original Magento version for the Issue report. label Oct 26, 2022
@engcom-Hotel
Copy link
Contributor

Hello @rob-gonz,

Thanks for the detailed steps for issue reproduction. However, we have tried to reproduce the issue in the latest 2.4-develop branch but it seems that the flow is working as expected. We have a script as stated by you in this #35616 (comment).

I am attaching the same below for reference:

35616.php.zip

I printed the logs when we have the count of both the tables (catalog_product_index_price_replica, catalog_product_index_price) not equal. and we have observed that the count is not matching for some point of time because Magento tries to copy data from catalog_product_index_price to catalog_product_index_price_replica table. Each time 60000 records have been inserted.

Please refer to the below log file for reference:

35616.txt

We have tried it with almost 58000 records:

image

I think we are missing some steps.

Thanks

@engcom-Hotel
Copy link
Contributor

Dear @rob-gonz,

We have noticed that this issue has not been updated for a period of 14 Days. Hence we assume that this issue is fixed now, so we are closing it. Please raise a fresh ticket or reopen this ticket if you need more assistance on this.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: needs update Additional information is require, waiting for response Reported on 2.4.3-p2 Indicates original Magento version for the Issue report. Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it
Projects
None yet
Development

No branches or pull requests

10 participants