Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long "simple search" queries get "stuck" in optimizer (using internal search)

drtjmb opened this issue Aug 13, 2013 · 1 comment


Copy link

drtjmb commented Aug 13, 2013

Internal simple search for N terms (eg. "foo bar baz" N = 3) produces an SQL query that performs multiple table joins (see below).

For N > 10 I have reports of the query remaining in the "statistics" state (ie. developing a query execution plan) for a very long time (hours).

MySQL chooses the optimum plan by looking at each possible plan; the number of plans is N! - so with a large number of joins there are a very large number of possible plans to be evaluated, in fact MySQL spends longer evaluating query plans than it would to execute the query.

Server setting optimizer_search_depth controls depth of plan evaluations (default = 62). Setting this to a low value limits the number of plans that will be evaluated - in the case of simple search setting this to 3 or 4 drastically reduces query times (to a few seconds). Obviously the trade off is that MySQL might miss the most optimal plan - but time saved by not looking for an optimal plan outweighed any performance benefit in this case.

Ideal solution is obviously to produce better SQL :)

Workaround is to set optimizer_search_depth at the global MySQL level or when EPrints connects to the DB - Database::connect():

--- perl_lib/EPrints/Database/     2013-08-13 13:19:21.965667487 +0100
+++ perl_lib/EPrints/Database/  2013-08-13 13:20:35.421666314 +0100
@@ -239,6 +239,8 @@
        if( $rc )
                $self->do("SET NAMES 'utf8'");
+               $self->do('SET @@session.optimizer_search_depth = 3;');

        return $rc;

Simplified SQL extract from running a simple search with N terms:

SELECT `eprint`.`eprintid` FROM 
    `eprint` LEFT JOIN `eprint__ordervalues_en` ON `eprint`.`eprintid`=`eprint__ordervalues_en`.`eprintid`, 
    (SELECT `eprint`.`eprintid` AS `eprintid` FROM <UNIONs searching for "foo">) AS `and_140576402459664_0`, 
    (SELECT `eprint`.`eprintid` AS `eprintid` FROM <UNIONs searching for "bar">) AS `and_140576402459664_1`, 
    (SELECT `eprint`.`eprintid` AS `eprintid` FROM <UNIONs searching for "baz">) AS `and_140576402459664_2`, 
    (SELECT `eprint`.`eprintid` AS `eprintid` FROM <UNIONS searching for Nth term>) AS `and_140576402459664_N',

(Thanks to DL, DBA Administrator @ UEA)

Copy link

sebastfr commented Aug 13, 2013

Ideal solution is obviously to produce better SQL

Or use Xapian?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

4 participants