Join GitHub today
Archiving performance: cache the slow segment subqueries that are using LIKE constraint #8850
if we are using segment with
My proposal is to run such subqueries at the beginning of archiving process and cache them in a file. Therefore query for each report in Piwik will use the list of IDs instead of executing a subquery which may be really slow.
Here are some results for a subquery that returns more than 2000 action IDs:
FYI: I replaced real URL with an example.
Some calculations. In the best case we save here 21 seconds. Assuming that in Piwik we have 100 reports, it would speed up the archiving process by more than 30mins.
Some additional info:
Subquery cache is enabled.
Not easily do-able. and not sure that we want to try this... It's complex: a segment like
A thought: one of the issue is that the subquery is slow because full text searching in the large table can be slow. To improve the speed of full text searching (ie. Contains, Does not contain) idea suggested by @diosmosis - maybe we could use a FULL TEXT index on the
(we should use Boolean full-text searches to go above 50% threshold, and tweak appropriately innodb_ft_cache_size and innodb_ft_total_cache_size to size the fulltext index enough for the log_action table)
First thing to check when experiencing this issue is:
In general, I didn't jump into implementing this solution because it has challenges
One use case where caching IDs won't work
Imagine you create a edge case segment "Page URL contains 'i'" on demo.piwik.org (a medium sized piwik)
Creating a temp table with idaction has 1,255,519 results (took ~ 3 seconds).
Running the query re-using the temporary table (took 35 seconds:)
---> in this case, because there are 1.2 M ids, printing IDs in query would not work and exceed
This was referenced
Sep 28, 2015
@mattab have you tested it with browser archiving? I received feedback that it doesn't help in that case. Is it possible that each concurrent report request is trying to run segment subquery to warmup the cache? Currently it helps for sure with CLI archiving but we need to check browser archiving as well.