Skip to content

[Issue] Improve performance of indexer when many attribute options #36386

@m2-assistant

Description

@m2-assistant

This issue is automatically created based on existing pull request: #36306: Improve performance of indexer when many attribute options


Description

When there are many product attribute options, the indexer takes a long time to complete. This change improves performance so that the indexing process is much quicker.

I've performed a stand-alone performance analysis between the two approaches here, and I can confirm that with 100,000 values in the $options array, the foreach approach takes around 5.527919 seconds, whereas the array_intersect_key approach takes around 0.012552 seconds to produce identical results.

The real-world task that triggered this investigation (and resulted in this code change) was on a Magento 2.3.6 website with 10,000 options; reindexing before the change took around 5 minutes, and after the change the same took around 40 seconds.

Manual testing scenarios

Example speed comparison script
<?php

// Variables
$rounds = 10;
$optionCount = 100_000;

// Set up
$options = [];
$attributeValues = [];

for ($i = 0; $i < $optionCount; $i++) {
    $options[$i] = [
        'label' => "label $i",
        'value' => $i,
    ];

    $attributeValues[] = $i;
}

// helper for consistently-formatted numbers
function out($label, $value) {
    printf("%s\t%f\n", $label, $value);
}

// ---------------------
echo "\n\nMETHOD 1\n\n";
$timers = [];
for ($i = 0; $i < $rounds; $i++) {
    $attributeLabels = [];

    $start = microtime(true);

    foreach ($options as $option) {
        if (\in_array($option['value'], $attributeValues)) {
            $attributeLabels[] = $option['label'];
        }
    }

    $stop = microtime(true);
    $duration = $stop - $start;
    $timers[] = $duration;

    out($i, $duration);
}
out('avg', array_sum($timers) / count($timers));

// ---------------------
echo "\n\nMETHOD 2\n\n";
$timers = [];
for ($i = 0; $i < $rounds; $i++) {
    $start = microtime(true);

    $options2 = array_intersect_key($options, array_flip($attributeValues));
    $attributeLabels2 = array_column($options2, 'label');

    $stop = microtime(true);
    $duration = $stop - $start;
    $timers[] = $duration;

    out($i, $duration);
}
out('avg', array_sum($timers) / count($timers));


// ---------------------
if ($attributeLabels === $attributeLabels2) {
    echo "\n\nidentical end result\n";
} else {
    echo "\n\ndifferent end result\n";
}

Given there are no functionality changes in this pull request, the website should behave identically before & after the change. The main difference is in the speed of a full reindex when there are product attributes with far too many options.

The unit tests already confirm that these functions behave as expected. (Yes, I've checked that the unit tests cover these functions and behave as expected.)

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • README.md files for modified modules are updated and included in the pull request if any README.md predefined sections require an update
  • All automated tests passed successfully (all builds are green)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: PerformanceComponent: IndexerIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedPriority: P2A defect with this priority could have functionality issues which are not to expectations.Progress: doneReported on 2.3.6Indicates original Magento version for the Issue report.Reproduced on 2.4.xThe issue has been reproduced on latest 2.4-develop branch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions