Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecommerce Performance | Use a more efficient data structure for retrieving Elastic Search aggregations? #5478

Open
andreas-gruenwald opened this issue Dec 18, 2019 · 2 comments

Comments

@andreas-gruenwald
Copy link
Contributor

@andreas-gruenwald andreas-gruenwald commented Dec 18, 2019

Performance Improvement

Problem Summary

In one of our projects, the retrieval of a simple category filter takes more than 1.2 seconds.
I found out that some loc and data structures for loading Elastic search aggregations can be improved.
Especially the conversion process for aggregations and buckets in the ElasticSearch product list seems to consume a lot of time for large datasets (>20.000 categories in total; 20 on the first level).

Problem Details

This is the initial test code:

public function testControllerAction() {
        $filter = new \Pimcore\Model\DataObject\Fieldcollection\Data\FilterCategory();
        $includeParentCategories = true;
        $filter->setIncludeParentCategories(false);
        $filter->setRootCategory($rootCategory);
        $ecFactory = \Pimcore\Bundle\EcommerceFrameworkBundle\Factory::getInstance();
        $filter->setScriptPath($renderingScriptPath);
        $filter->setIncludeParentCategories($includeParentCategories);
        $filterService = $ecFactory->getFilterService($ecFactory->getEnvironment()->getCurrentAssortmentTenant());

        $productList = $ecFactory->getIndexService()->getProductListForCurrentTenant();
        $productList->prepareGroupBySystemValues('system.parentCategoryIds', true);

        $productList->setCategory(!$forceRootCategorySetup ? $currentCategory : $rootCategory);

        $renderedCategoryNav = $filterService->getFilterFrontend(
            $filter,
            $productList,
            [
                'categoryIds' => $rootCategory->getId(),
                'env' => $this->getEnv(),
                'actualCurrentCategoryId' => $currentCategory->getId()
            ]);
}

I digged into ProductList\ElasticSearch\AbstractElasticSearch and doLoadGroupByValues and found out that the extraction of the filter aggregations and buckets is very time intense.

The subsequential code sequence took (enabled debugger) took 568ms.

if ($result['aggregations']) {            
                foreach ($result['aggregations'] as $fieldname => $aggregation) {
                    $buckets = $this->searchForBuckets($aggregation);
                    $groupByValueResult = [];
                    if ($buckets) {
                        foreach ($buckets as $bucket) {
                            if ($this->getVariantMode() == self::VARIANT_MODE_INCLUDE_PARENT_OBJECT) {
                                $groupByValueResult[] = ['value' => $bucket['key'], 'count' => $bucket['objectCount']['value']];
                            } else {
                                $data = $this->convertBucketValues($bucket); // support subaggregations
                                $groupByValueResult[] = $data;
                            }
                        }
                    }
                    $this->preparedGroupByValuesResults[$fieldname] = $groupByValueResult;
                }
            }
}

Solution Concept

I used the following code to demonstrate that the retrieval could be done much faster: 32ms (vs. original 568ms).

if ($result['aggregations']) {
                $optimisedResult = [];
                foreach ($result['aggregations'] as $fieldname => $aggregation) {
                    $buckets = $aggregation['buckets'];
                    $json = json_encode($buckets);
                    $json = str_replace('key":', 'value":', $json);
                    $json = str_replace('doc_count":', 'count":', $json);
                    $buckets = json_decode($json, true);
                    $preparedGroupByValuesResultsOptimised[$fieldname] = $buckets;
                    $this->preparedGroupByValuesResults[$fieldname] = $groupByValueResult; 
                }
}

With disabled debugger it is still 27ms vs. 94 ms!

This example is just a demonstration that arrays as data structure can cause a lot of performance overhead. The code above should be refactored carefully. In general it might be very helpful to profile the ecommerce productlists of Pimcore with higher amounts of categories/aggregations, as they turn out to be very useful to identify potential performance issues.

@fashxp

This comment has been minimized.

Copy link
Member

@fashxp fashxp commented Dec 19, 2019

can you provide a PR? With comments why we are doing this that way.

@andreas-gruenwald

This comment has been minimized.

Copy link
Contributor Author

@andreas-gruenwald andreas-gruenwald commented Dec 20, 2019

I am not sure, if this solution is ready for a PR yet, as it might be fragile regarding nested aggregations, etc. We will investigate it within the project and I will create a PR as soon as there is a stable outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.