Eliminate crippling performance of content search plugin for large sites with custom fields #18915
Pull Request for Issue # .
For MySQL based sites with a large number of articles (more than a few hundred) that have custom fields enabled the com_content search plugin ends up being so slow that it can cripple a site completely. For a test dataset (details below) of 3,900 articles with 2 custom fields each the database query for a search took between 700,000 and 1,300,000 milliseconds to return a set of search results (on an i7-6700k 4Ghz 8 core processor with 32GB or ram) . In addition the search results for matching all on a multi-select field incorrectly returns zero results.
Postgres performs a LOT better (query times in the region of 400 milliseconds) but the changes described below still cut the query times by 50%.
Summary of Changes
The plugin has been re-written and rather than joining all the custom fields rows and searching for custom fields at the same time as all the other content fields we now search custom fields and incorporate the results into the main query. In Postgres this can be done as a subquery but MySQL treats this independent subquery as a 'dependent subquery' with punishing performance implications. This may explain in part why MySQL performs so badly with the existing code.
This test requires a large dataset of articles/custom fields to test. It can be generated using com_overload ( Copyright (C) 2011 Nicholas K. Dionysopoulos) - an updated version is attached below that will work in Joomla 3.8 and that also creates 2 custom fields and populates these with the articles.
You should install this component and view it in the backend. A decent dataset to test with would use values of 3 categories, 3 levels, 100 articles -> this will give 3900 articles (WARNING this will take several minutes to generate the data)
This adds a text field with random values of "one" to "seven" as text and a multi-select list field with 2 random values from opt1, opt2. opt3, opt4 and op5.
Then use the search module/com_search in the frontend and search for
"seven" or "opt1" or "opt1 opt2" etc.
1; The correct set of results within a couple of seconds max.
If you implement the patch you will get a VERY FAST result for 1. and the correct results for 2.
I have tested the changes on Postgres and it supports a subquery rather than having a separate list of ids from a different query. I have not been able to test this on SQL server.
Documentation Changes Required
… maintain backwards compatability Fixes issue #18203
…gin against custom fields
…em of dependent sub-queries isn't an issue
nibra left a comment •
Good approach, actually exactly what I was thinking of. Just a few things:
Thanks @mbabker - not sure where the extra tab came from. That had tidied up the diff a lot.
Do we really care about aligning assignment operators?
@nibra I'll do a new PR later to implement the idea of picking up all the field values in one pass. We need to do the same elsewhere in the core (in my opinion too) instead of duplicated or repetitive queries.
Hi @laoneo - see my comments on your other PR.
Re the server type dependent code - as I explained in my comment in the code the subquery approach should be the fastest since the database engine can do its own optimisations. Unfortunately MySQL had a bug where the independent sub query is treated as if it were dependent and gets executed once for every article searched (and a ridiculous performance hit).
Postgres treats the query correctly and is about twice as fast as a subquery compared to passing the field ids in as a string.
So its a trade off between easier to read code and a small performance hit in Postgres.
the offending statement is when we have
we need an explicit cast
Really sorry about the Postgres - I had 2 development sites running and fixed the casting in the postgres version but hadn't committed from there.
JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.
good catch, i hope #18961 will fill the gap
we should remeber to use it, when and if #18961 will be merged, but for now we should go even without it