-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Description
This issue is automatically created based on existing pull request: #36306: Improve performance of indexer when many attribute options
Description
When there are many product attribute options, the indexer takes a long time to complete. This change improves performance so that the indexing process is much quicker.
I've performed a stand-alone performance analysis between the two approaches here, and I can confirm that with 100,000 values in the $options
array, the foreach
approach takes around 5.527919 seconds, whereas the array_intersect_key
approach takes around 0.012552 seconds to produce identical results.
The real-world task that triggered this investigation (and resulted in this code change) was on a Magento 2.3.6 website with 10,000 options; reindexing before the change took around 5 minutes, and after the change the same took around 40 seconds.
Manual testing scenarios
Example speed comparison script
<?php
// Variables
$rounds = 10;
$optionCount = 100_000;
// Set up
$options = [];
$attributeValues = [];
for ($i = 0; $i < $optionCount; $i++) {
$options[$i] = [
'label' => "label $i",
'value' => $i,
];
$attributeValues[] = $i;
}
// helper for consistently-formatted numbers
function out($label, $value) {
printf("%s\t%f\n", $label, $value);
}
// ---------------------
echo "\n\nMETHOD 1\n\n";
$timers = [];
for ($i = 0; $i < $rounds; $i++) {
$attributeLabels = [];
$start = microtime(true);
foreach ($options as $option) {
if (\in_array($option['value'], $attributeValues)) {
$attributeLabels[] = $option['label'];
}
}
$stop = microtime(true);
$duration = $stop - $start;
$timers[] = $duration;
out($i, $duration);
}
out('avg', array_sum($timers) / count($timers));
// ---------------------
echo "\n\nMETHOD 2\n\n";
$timers = [];
for ($i = 0; $i < $rounds; $i++) {
$start = microtime(true);
$options2 = array_intersect_key($options, array_flip($attributeValues));
$attributeLabels2 = array_column($options2, 'label');
$stop = microtime(true);
$duration = $stop - $start;
$timers[] = $duration;
out($i, $duration);
}
out('avg', array_sum($timers) / count($timers));
// ---------------------
if ($attributeLabels === $attributeLabels2) {
echo "\n\nidentical end result\n";
} else {
echo "\n\ndifferent end result\n";
}
Given there are no functionality changes in this pull request, the website should behave identically before & after the change. The main difference is in the speed of a full reindex when there are product attributes with far too many options.
The unit tests already confirm that these functions behave as expected. (Yes, I've checked that the unit tests cover these functions and behave as expected.)
Contribution checklist
- Pull request has a meaningful description of its purpose
- All commits are accompanied by meaningful commit messages
- All new or changed code is covered with unit/integration tests (if applicable)
- README.md files for modified modules are updated and included in the pull request if any README.md predefined sections require an update
- All automated tests passed successfully (all builds are green)