Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Optimise site search #1069

Merged
merged 1 commit into from
Jul 4, 2023

Conversation

emteknetnz
Copy link
Member

@emteknetnz emteknetnz commented Jun 27, 2023

Issue #1067

There's a couple of bits to this PR:

A) Utilise eager loading - this gives a small performance gain of ~9%

B) Don't render elemental templates, instead only look at database values on the element (won't include any child DataObjects) - this gave a large performance gain, over 100% improvement

There's likely going to be a "loss in accuracy" (though may also be a gain) in exchange for the massive performance boost - the change in accuracy is a result of:

  • more content being searched across, because now fields not on the rendered template will be searched against
  • less content being searched across, because any child objects on the element that are normally rendered won't be search against

We'd also need to agree whether to make this an opt-in or an opt-out. In this PR I've got it as the more conservative opt-in, though I'd prefer opt-out as most people will likely not even realise it's an option available, and I given it's "only" cms page search I don't think any possible loss in accuracy really warrants the fact that page search really is horrible slow without this optimization (I've had people tell me about 10 second response times, and even sites timing out).

Options:

    1. opt-in in CMS 5.1, stays as opt-in for CMS 6
    1. opt-in in CMS 5.1, change to opt-out for CMS 6
    1. opt-out in CMS 5.1 (my recommendation)

Benchmarks

Running a CMS search for "lorem" against 1,000 pages with 5 blocks each (total of 5,000 blocks) - data generated with https://github.com/emteknetnz/silverstripe-test-data - total of 3 runs - times are in seconds

render elements - eager load off - (current) - avg 5.81
5.92
5.63
5.87

render elements - eager load on - avg 5.35
5.35
5.21
5.50

unrendered - eager load off - avg 2.36
2.37
2.35
2.36

unrendered - eager load on - avg 2.15
2.09
2.24
2.13

@michalkleiner
Copy link
Contributor

As long as this had an extension point to be able to change what it indexes then it should be ok, I guess. We rely on shorcodes being processed as those could bring in content that is not in any of the db fields directly and should be indexed for the given element, so for me it should support that, too. The list of excluded fields could be a map to iterate over, or two maps where one are fields and other are trailing strings like ID, Hash etc.. so that we don't create lists of comparisons that would be harder to manage. Could be config, too.

Copy link
Member

@GuySartorelli GuySartorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously since this is draft I'm mostly just doing a "is this approach okay" review - but I've highlighted a few things that will need to change eventually in any case.

I think this has to be opt in, since we're changing the way developers even define what is searchable for their blocks in the cms, let alone how. The general approach seems sensible - though I wonder if there will be projects that could benefit from this new config that won't just go straight to setting search_for_term_in_content to false... but in lieu of a way to validate that ahead of time, I see no reason to not go ahead with this.

docs/en/searching-blocks.md Outdated Show resolved Hide resolved
docs/en/searching-blocks.md Outdated Show resolved Hide resolved
docs/en/searching-blocks.md Outdated Show resolved Hide resolved
src/Extensions/ElementalPageExtension.php Show resolved Hide resolved
src/Extensions/ElementalPageExtension.php Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
// Check whether the search term exists in the nested page content
$pageContent = $siteTree->getElementsForSearch();
} else {
$pageContent = $siteTree->getContentFromElementsForCmsSearch();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we're just looking at DB fields directly here... is there a way we can craft this as a db query instead of using stripos? I suspect that would be faster. Though if that's too complex to do right now we can always merge this PR and look at that as a potential enhancement down the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ...... technically. Though it would involve working out what all of the fields to query are ahead of time, and then putting all of those into a big $filter and putting :PartialMatch on all the things. PartialMatch translates to SQL WHERE <field> LIKE '%<searchterm>%'

Feels much more like a future ENH to me rather than a do now thing. It may be faster and should def use less PHP + memory, though more database

@emteknetnz emteknetnz force-pushed the pulls/5/search branch 4 times, most recently from eb993cd to af60a0d Compare July 3, 2023 01:20
Copy link
Member

@GuySartorelli GuySartorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a more thorough review once we've resolved the current conversations about the approach - but it sounds like we're getting to a consensus now.

docs/en/searching-blocks.md Outdated Show resolved Hide resolved
docs/en/searching-blocks.md Outdated Show resolved Hide resolved
src/Controllers/ElementSiteTreeFilterSearch.php Outdated Show resolved Hide resolved
src/Controllers/ElementSiteTreeFilterSearch.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
@emteknetnz emteknetnz force-pushed the pulls/5/search branch 7 times, most recently from 8549742 to 0c03596 Compare July 3, 2023 06:17
@emteknetnz emteknetnz marked this pull request as ready for review July 3, 2023 06:29
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
src/Models/BaseElement.php Outdated Show resolved Hide resolved
tests/BaseElementTest.php Outdated Show resolved Hide resolved
@emteknetnz emteknetnz force-pushed the pulls/5/search branch 2 times, most recently from 39de753 to 9a55042 Compare July 4, 2023 04:12
Copy link
Member

@GuySartorelli GuySartorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, works well locally.

There's a subtle side-effect where when rendering templates, if the title of the block isn't displayed (which is the default for most blocks), it won't be included in search - but with the new feature, it will, which is actually a slight improvement to search fidelity 😅

@GuySartorelli GuySartorelli merged commit 331b4d3 into silverstripe:5 Jul 4, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants