Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise SiteTree search when elemental is installed #1067

Closed
emteknetnz opened this issue Jun 22, 2023 · 4 comments
Closed

Optimise SiteTree search when elemental is installed #1067

emteknetnz opened this issue Jun 22, 2023 · 4 comments

Comments

@emteknetnz
Copy link
Member

emteknetnz commented Jun 22, 2023

I raised this card after getting feedback from a couple of developers about poor performance in the CMS being a pain point, particularly on a large customer website - slack thread (internal) https://silverstripeltd.slack.com/archives/CLXKD9X51/p1614061096035300?thread_ts=1614061096.035300&cid=CLXKD9X51

The site tree search (magnifying glass at the top right of the "pages" seciton in the CMS) is a performance bottleneck, and gets significantly worse with elemental installed. Performance scales badly and get worse the more pages + elements there are

I had a quick look through the code path with elemental installed, specifically in ElementSiteTreeFilterSearch (which copy pastes a lot from CMSSiteTreeFilter and overrides it, rather than being an extension)

  • There's a lot of PartialMatchFilter on multiple database fields, which translation to a bunch of SQL WHERE Field LIKE '%search_term%' on unindexed columns that contain text, as you'd expect, probably can't do much easily without adding in something like elastic specifically for CMS search, which is non-trivial
  • More interestingly, there's this seemingly unoptimized code:
        // Get an array of SiteTree record IDs that match the search term in nested element data
        /** @var ArrayList $siteTrees */
        $siteTrees = $query->filterByCallback(function (SiteTree $siteTree) {
            // Filter by elemental PHP
            if (!$siteTree->hasExtension(ElementalPageExtension::class)) {
                return false;
            }

            // Check whether the search term exists in the nested page content
            $pageContent = $siteTree->getElementsForSearch();
            return stripos($pageContent ?? '', $this->params['Term'] ?? '') !== false;
        });

        if ($siteTrees->count()) {
            // Apply the list of IDs as an extra filter
            $this->extraTermFilters['ID:ExactMatch'] = $siteTrees->column('ID');
        }

Where $siteTree->getElementsForSearch(); will do a bunch of N+1 stuff in ElementalPageExtension

Given the recent work on eager loading which specifically deals with N + 1 I think this would be a good candidate for using the new eagerLoading() API, or at least doing something similar in a more manual approach where the aim is to do "one big query early" rather than "multiple little queries in a loop"

Acceptance criteria

  • Review site tree search and identify possible performance gain from using Eager loading, especially Elemental Content search
  • Validate performance gain for a site with a substantial number of blocks and page (~2000 pages, ~5000 blocks, ~5 block types)
  • Presuming they are performance gains, report them on the 5.1 changelog
  • Update relevant modules to require framework 5.1

Related

Follow up issues

PRs

@emteknetnz emteknetnz changed the title Loop at optimising SiteTree search with elemental Loop at optimising SiteTree search when elemental is installed Jun 22, 2023
@emteknetnz emteknetnz changed the title Loop at optimising SiteTree search when elemental is installed Optimise SiteTree search when elemental is installed Jun 22, 2023
@emteknetnz
Copy link
Member Author

emteknetnz commented Jun 26, 2023

This is actually seems very unoptimized ... though at the same time I'm not sure if there's scope to optimize.BaseElement::getContentForSearchIndex() will call BaseElement::forTemplate() which will call ViewableData::renderWith() ... so fully rendering every element. About as inefficient as you could get

This makes sense as it's a safe way to ensure that you're only "indexing for search" fields that are actually shown. This is nice for developers because you don't need to separately config searchable fields on each element - however big downside here is we don't know ahead of times which fields to search for content in without a render.

e.g. the basic content block ElementContent has private static $db = ['HTML' => 'HTMLText'] and no $searchable_fields config

If we're to optimize this we'd probably need to just assume every varchar/htmltext style field on every element is cms search indexable and then maybe provide an opt-out via config

@maxime-rainville
Copy link
Contributor

maxime-rainville commented Jun 29, 2023

If your block has relations that are used to generate the content, I take it those won't be pick up with that render_elements_for_cms_search turn off?

@emteknetnz
Copy link
Member Author

That's correct, only content directly on the element will be picked up by search. The assumption I'm making is that bulk of content relevant to search will be on the element itself and not on a relation. This will lead to a loss of search accuracy in some cases, though the idea is that increase in performance outweighs this downside.

@GuySartorelli
Copy link
Member

PRs merged, no remaining ACs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants