Eliminate crippling performance of content search plugin for large sites with custom fields #18915

GeraintEdwards · 2017-11-29T14:16:50Z

Pull Request for Issue # .

For MySQL based sites with a large number of articles (more than a few hundred) that have custom fields enabled the com_content search plugin ends up being so slow that it can cripple a site completely. For a test dataset (details below) of 3,900 articles with 2 custom fields each the database query for a search took between 700,000 and 1,300,000 milliseconds to return a set of search results (on an i7-6700k 4Ghz 8 core processor with 32GB or ram) . In addition the search results for matching all on a multi-select field incorrectly returns zero results.

Postgres performs a LOT better (query times in the region of 400 milliseconds) but the changes described below still cut the query times by 50%.

Summary of Changes

The plugin has been re-written and rather than joining all the custom fields rows and searching for custom fields at the same time as all the other content fields we now search custom fields and incorporate the results into the main query. In Postgres this can be done as a subquery but MySQL treats this independent subquery as a 'dependent subquery' with punishing performance implications. This may explain in part why MySQL performs so badly with the existing code.

Testing Instructions

This test requires a large dataset of articles/custom fields to test. It can be generated using com_overload ( Copyright (C) 2011 Nicholas K. Dionysopoulos) - an updated version is attached below that will work in Joomla 3.8 and that also creates 2 custom fields and populates these with the articles.

You should install this component and view it in the backend. A decent dataset to test with would use values of 3 categories, 3 levels, 100 articles -> this will give 3900 articles (WARNING this will take several minutes to generate the data)

This adds a text field with random values of "one" to "seven" as text and a multi-select list field with 2 random values from opt1, opt2. opt3, opt4 and op5.

Then use the search module/com_search in the frontend and search for

"seven" or "opt1" or "opt1 opt2" etc.

Expected result

1; The correct set of results within a couple of seconds max.

Search for 'opt1 opt2' and match all and it should return the matching set of articles

Actual result

Go and make a cup of coffee and then drink it, call your friends, have another cup of coffee - depending on your processor you are likely to be waiting 20 minutes for the results to return.
The search for 'opt1 opt2' and match all returns NO values which is NOT correct since there are several hundred matches

If you implement the patch you will get a VERY FAST result for 1. and the correct results for 2.

Discussion Point

I have tested the changes on Postgres and it supports a subquery rather than having a separate list of ids from a different query. I have not been able to test this on SQL server.

Documentation Changes Required

None

… maintain backwards compatability Fixes issue joomla#18203

…gin against custom fields

…em of dependent sub-queries isn't an issue

GeraintEdwards · 2017-11-29T14:18:07Z

This is version of com_overload that can generate the test data set
com_overload.zip

nibra

Good approach, actually exactly what I was thinking of. Just a few things:

Use Joomla's Code Style (fx, the diff is broken because you're not formatting your code correctly)
Remove unused code instead of commenting it out
In the cleanup loop, you have all the article ids for the custom fields already, so you could retrieve all of the custom fields at once outside of the loop.

GeraintEdwards · 2017-11-29T18:49:14Z

Agree entirely about the article ids and fetching the custom fields in one go. We need to do the same elsewhere in custom fields too. I'll look at the code formatting - but my IDE is setup to match Joomla's standards already

mbabker · 2017-11-29T18:52:24Z

I'll look at the code formatting - but my IDE is setup to match Joomla's standards already

There's an extra indent on the method's code. Take that out and it should be mostly OK at a cursory glance.

GeraintEdwards · 2017-11-29T19:15:22Z

Thanks @mbabker - not sure where the extra tab came from. That had tidied up the diff a lot.

Do we really care about aligning assignment operators?

@nibra I'll do a new PR later to implement the idea of picking up all the field values in one pass. We need to do the same elsewhere in the core (in my opinion too) instead of duplicated or repetitive queries.

brianteeman · 2017-11-29T19:18:46Z

Do we really care about aligning assignment operators?

it does make code more readable

csthomas · 2017-11-29T19:23:22Z

plugins/search/content/content.php

+				}
+
+				if ($serverType == "mysql")
+				{


IMO we always should use this variant, means 2 queries. Complex query take up too much memory.

In an ideal world the subquery approach should be the fastest since the database engine can do its own optimisations. Unfortunately MySQL had a bug where the independent sub query is treated as if it were dependent and gets executed once for every article searched (and a ridiculous performance hit).

Postgres treats the query correctly and is about twice as fast as a subquery compared to passing the field ids in as a string.

So its a trade off between easier to read code and a small performance hit in Postgres.

alikon · 2017-11-29T19:28:28Z

i've made a quick test on mysql and after apply pr no need to

make a cup of coffee and then drink it, call your friends, have another cup of coffee

to get results 👍

i'll make some test on postgresql too

Quy · 2017-11-29T19:32:08Z

plugins/search/content/content.php

+
+						if ($serverType == "mysql")
+						{
+


Remove blank line.

Quy · 2017-11-29T19:32:24Z

plugins/search/content/content.php

+							$fieldids = $db->loadColumn();
+							if (count($fieldids))
+							{
+


Remove blank line.

Quy · 2017-11-29T19:32:50Z

plugins/search/content/content.php

+
+					if ($serverType == "mysql")
+					{
+


Remove blank line.

Quy · 2017-11-29T19:33:00Z

plugins/search/content/content.php

+						$fieldids = $db->loadColumn();
+						if (count($fieldids))
+						{
+


Remove blank line.

nibra · 2017-11-29T23:04:03Z

plugins/search/content/content.php

+			  ->where('(f.context IS NULL OR f.context = ' . $db->q('com_content.article') . ')')
+			  ->where('(f.state IS NULL OR f.state = 1)')
+			  ->where('(f.access IS NULL OR f.access IN (' . $groups . '))');
+			 */



Remove this dead code

Do you still need the comment? If yes, neded is misspelled.

removing commented out code

angieradtke · 2017-11-30T08:08:35Z

I tested you changes at a real project: 8 Fields per article within 6000 articles in 4 different languages.
It seems to work .-)
Thanks Angie

angieradtke · 2017-11-30T08:23:01Z

One more idea, maybe we should use a param; search in fields as well, so it is more flexible

laoneo · 2017-11-30T08:38:04Z

Thanks @GeraintEdwards for the proper test instructions. Now I could easily reproduce the issue on my dev server.
But is there really no way to do that server type independent and in a more easy way? I tried a different approach in #18929 to split up the queries. What do you guys think?

GeraintEdwards · 2017-11-30T10:48:42Z

Hi @laoneo - see my comments on your other PR.

Re the server type dependent code - as I explained in my comment in the code the subquery approach should be the fastest since the database engine can do its own optimisations. Unfortunately MySQL had a bug where the independent sub query is treated as if it were dependent and gets executed once for every article searched (and a ridiculous performance hit).

Postgres treats the query correctly and is about twice as fast as a subquery compared to passing the field ids in as a string.

So its a trade off between easier to read code and a small performance hit in Postgres.

GeraintEdwards · 2017-11-30T11:07:50Z

@angieradtke I suppose it does make sense to be able to mark specific fields as 'searchable' and other not. That would require a new config option for individual fields - @laoneo what do you think?

laoneo · 2017-11-30T11:52:51Z

What's the reason to not search in a field? If you want such an option, then I would add it as a parameter in the search plugin in which fields to search.

laoneo · 2017-11-30T11:55:12Z

So its a trade off between easier to read code and a small performance hit in Postgres.

I'm more worried about other extension developers who want to implement a search plugin. Mostly the content search plugin acts as example. It is already very complex.

alikon · 2017-12-02T08:18:07Z

I have tested this item 🔴 unsuccessfully on 5eab93c

on postgresql

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.}

alikon · 2017-12-02T08:26:39Z

the offending statement is when we have a.id IN (cfv.item_id)
cause we are comparing the id field of #__content table wich is a SERIAL with the item_id field of #__fields_values table wich is a VARCHAR

we need an explicit cast

from
$wheres2[] = 'a.id IN( ' . (string) $subQuery . ')';
to
$wheres2[] = 'CAST(a.id AS VARCHAR) IN(' . (string) $subQuery . ')';

lines 125,177,216

alikon

use CAST for postgresql

alikon · 2017-12-02T08:27:11Z

plugins/search/content/content.php

+					}
+					else
+					{
+						$wheres[] = 'a.id IN( ' . (string) $subQuery . ')';


use CAST for postgresql

alikon · 2017-12-02T08:27:45Z

plugins/search/content/content.php

+						}
+						else
+						{
+							$wheres2[] = 'a.id IN( ' . (string) $subQuery . ')';


use CAST for postgresql

alikon · 2017-12-02T08:28:00Z

plugins/search/content/content.php

+					$fieldids = $db->loadColumn();
+					if (count($fieldids))
+					{
+						$wheres2[] = 'a.id IN(' . implode(",", $fieldids) . ')';


use CAST for postgresql

alikon · 2017-12-02T08:30:12Z

with this changes in place even on postgresql the performance are better

wilsonge · 2017-12-02T17:14:57Z

Removing RTC so we can fix postgres :)

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.}

GeraintEdwards · 2017-12-03T09:24:28Z

Really sorry about the Postgres - I had 2 development sites running and fixed the casting in the postgres version but hadn't committed from there.

JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.

…t as efficient)

alikon · 2017-12-03T11:19:23Z

JDatabaseQuery doesn't implement castAsInteger which would be the most efficient casting/matching so I'm using castAsChar or a.id for the match instead.

good catch, i hope #18961 will fill the gap

we should remeber to use it, when and if #18961 will be merged, but for now we should go even without it 😍

alikon · 2017-12-03T11:20:21Z

I have tested this item ✅ successfully on 351b0c6

mysql & postrgesql

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.}

nibra · 2017-12-03T11:50:16Z

@angieradtke, @tonypartridge, could you please re-test?

angieradtke · 2017-12-03T14:00:18Z

Second retest succesful on a live project .-)

nibra

Only one CS issue left: Please add spaces around concat operators. Then all checks will pass and this is ready for commit.

nibra · 2017-12-03T14:38:20Z

plugins/search/content/content.php

+				}
+				else
+				{
+					$wheres2[] = $subQuery->castAsChar('a.id').' IN( ' . (string) $subQuery . ')';


Please add spaces around concat operator

nibra · 2017-12-03T14:39:02Z

plugins/search/content/content.php

+						}
+						else
+						{
+							$wheres2[] = $subQuery->castAsChar('a.id').' IN( ' . (string) $subQuery . ')';


Please add spaces around concat operator

nibra · 2017-12-03T14:39:22Z

plugins/search/content/content.php

+					}
+					else
+					{
+						$wheres[] = $subQuery->castAsChar('a.id').' IN( ' . (string) $subQuery . ')';


Please add spaces around concat operator

ghost · 2017-12-03T14:50:23Z

@angieradtke please mark your Test as successfully:

open Issue Tracker
Login with your github-Account
Click on blue "Test this"-Button above Authors-Picture
mark your Test as successfully
hit "submit test result"

angieradtke · 2017-12-03T16:06:52Z

I have tested this item ✅ successfully on 6e1c893

_{This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/18915.}

nibra

Thank you!

wilsonge · 2017-12-03T16:31:10Z

I tweaked a couple of small c/s issues and merged. Thankyou so much for this :)

joomlabeat · 2018-01-18T20:28:12Z

Do we actually need the GROUP BY statement in the plugin's query? It doesn't seem it's needful here and taking it off it results to an extra ~30% gain in performance.

zero-24 · 2018-01-18T21:26:20Z

@joomlabeat as you can see this is a old issue / merged PR. Can you please open a new issue linking to this one? Else this get missed fast. Thanks!

nibra · 2018-01-18T22:43:15Z

Yes, the GROUP BY is required by the SQL standard (MySQL can deal without, others can't). The drivers are AFAIK not building the GROUP BY causes transparently if needed (which is what they should do).

csthomas · 2018-01-19T08:06:30Z

@joomlabeat is right, line 288 can be completely removed.

There is only (#__content INNER JOIN #__categories), there are no duplicates and the query has no aggregate function.

geraintedwards and others added 5 commits October 3, 2017 12:54

proxy function for window.listItemTask shouljd have a return value to…

68a4875

… maintain backwards compatability Fixes issue joomla#18203

Merge remote-tracking branch 'upstream/staging' into staging

f6cbb9d

Massive performance gain for MySQL searches by com_content search plu…

96bb234

…gin against custom fields

Make distinction between mysql and postgres/other dbs where the probl…

4023d45

…em of dependent sub-queries isn't an issue

Make distinction between mysql and postgres/other dbs where the probl…

954b863

…em of dependent sub-queries isn't an issue

joomla-cms-bot added the PR-staging label Nov 29, 2017

code sniffer failures

f558fac

nibra requested changes Nov 29, 2017

View reviewed changes

Removed extra indent on method code

db6012b

csthomas reviewed Nov 29, 2017

View reviewed changes

Quy reviewed Nov 29, 2017

View reviewed changes

plugins/search/content/content.php Outdated

if ($serverType == "mysql")

{

Copy link

Contributor

Quy Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

Quy reviewed Nov 29, 2017

View reviewed changes

plugins/search/content/content.php Outdated

$fieldids = $db->loadColumn();

if (count($fieldids))

{

Copy link

Contributor

Quy Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

Quy reviewed Nov 29, 2017

View reviewed changes

plugins/search/content/content.php Outdated

if ($serverType == "mysql")

{

Copy link

Contributor

Quy Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

Quy reviewed Nov 29, 2017

View reviewed changes

plugins/search/content/content.php Outdated

$fieldids = $db->loadColumn();

if (count($fieldids))

{

Copy link

Contributor

Quy Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

nibra reviewed Nov 29, 2017

View reviewed changes

code styling changes (as per comments)

8065805

removing commented out code

laoneo mentioned this pull request Nov 30, 2017

[com_fields] Simplify content search in fields #18929

Closed

joomla-cms-bot added the RTC This Pull Request is Ready To Commit label Dec 1, 2017

alikon suggested changes Dec 2, 2017

View reviewed changes

wilsonge removed the RTC This Pull Request is Ready To Commit label Dec 2, 2017

GeraintEdwards added 2 commits December 3, 2017 09:05

custom fields item id needs casting to an integer for Postgre

ab2916a

minor code styling tweaks

5bfd4e5

mysql doesn't support CAST so apply only to postgres code on a.id (no…

351b0c6

…t as efficient)

alikon mentioned this pull request Dec 3, 2017

improve CAST() with CAST('field' AS integer) #18961

Closed

nibra requested changes Dec 3, 2017

View reviewed changes

CS

6e1c893

nibra approved these changes Dec 3, 2017

View reviewed changes

Fix small code style issues

d3726fb

wilsonge merged commit 50e1d61 into joomla:staging Dec 3, 2017

wilsonge added this to the Joomla 3.8.3 milestone Dec 3, 2017

Quy mentioned this pull request Apr 25, 2018

Search fields SQL query breaks search #18838

Closed

Eliminate crippling performance of content search plugin for large sites with custom fields #18915

Eliminate crippling performance of content search plugin for large sites with custom fields #18915

Conversation

GeraintEdwards commented Nov 29, 2017 • edited by joomla-cms-bot

Summary of Changes

Testing Instructions

Expected result

Actual result

Discussion Point

Documentation Changes Required

GeraintEdwards commented Nov 29, 2017

nibra left a comment • edited

Choose a reason for hiding this comment

GeraintEdwards commented Nov 29, 2017

mbabker commented Nov 29, 2017

GeraintEdwards commented Nov 29, 2017

brianteeman commented Nov 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alikon commented Nov 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angieradtke commented Nov 30, 2017

angieradtke commented Nov 30, 2017

laoneo commented Nov 30, 2017 • edited

GeraintEdwards commented Nov 30, 2017 • edited

GeraintEdwards commented Nov 30, 2017

laoneo commented Nov 30, 2017

laoneo commented Nov 30, 2017

alikon commented Dec 2, 2017

alikon commented Dec 2, 2017

alikon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alikon commented Dec 2, 2017

wilsonge commented Dec 2, 2017

GeraintEdwards commented Dec 3, 2017 • edited

alikon commented Dec 3, 2017

alikon commented Dec 3, 2017

nibra commented Dec 3, 2017

angieradtke commented Dec 3, 2017

nibra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Dec 3, 2017

angieradtke commented Dec 3, 2017

nibra left a comment

Choose a reason for hiding this comment

wilsonge commented Dec 3, 2017

joomlabeat commented Jan 18, 2018

zero-24 commented Jan 18, 2018

nibra commented Jan 18, 2018

csthomas commented Jan 19, 2018

GeraintEdwards commented Nov 29, 2017 •

edited by joomla-cms-bot

nibra left a comment •

edited

laoneo commented Nov 30, 2017 •

edited

GeraintEdwards commented Nov 30, 2017 •

edited

GeraintEdwards commented Dec 3, 2017 •

edited