New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto load pages on content search #6901
Conversation
@@ -40,7 +40,7 @@ public function actionResults() | |||
{ | |||
$resultSet = null; | |||
|
|||
$this->searchRequest = new SearchRequest(); | |||
$this->searchRequest = new SearchRequest(['pageSize' => 3]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Search queries are very resource-intensive. I would prefer a solution where we make a prefetch with a page size of e.g. 30 and cache it (content id's). And then use this cache for the streaming response and always deliver it in junks of 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- I have done the caching with implementing new option SearchRequest::$cachePageNumber
.
If call new SearchRequest(['pageSize' => 3, 'cachePageNumber' => 10])
it means 30 records will be cached on first request, i.e. 10 pages, when going to 11th page then next search process will be called and results will be cached as well for pages from 11 to 20.
Please note I have implemented new abstract method runSearch()
, review how old method search()
should be modified for SolrDriver like I did for Mysql
and ZendLucence
drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Is there a reason why you didn't add a new AbstractDriver::searchCached(SearchRequest $request): ResultSet)
method? (instead runSearch
) This could be non-abstract and only contain the page caching logic?
The real search logic could stay in the abstract AbstractDriver::search(SearchRequest $request): ResultSet)
method.
Both methods search
and searchCached
could stay public, and searchCached
just adds a layer on top of search
.
I see one disadvantage, because the search()
returns an array of content
models (ResultSet
). But I think if we get back 100
plain Content
models and only store the IDs
in the cache, the performance should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- I did that way because I thought it is better to keep a calling the code SearchDriver->search(...)
from place where we should not have a logic. I.e. it would be better to call the method from Controller side and keep all logic inside the method search()
.
I mean a developer of the Search Controller part should not think about what code is executed inside the method search()
.
However maybe you are right because I have implemented it so the developer does it by new param 'cachePageNumber' => 10
, i.e. currently my code looks like this in the Search Controller:
$searchRequest = new SearchRequest(['cachePageNumber' => 10]);
$resultSet = $searchDriver->search($searchRequest);
and if I have understood you corrently new code should be like this:
$searchRequest = new SearchRequest();
$resultSet = $searchDriver->searchCached($searchRequest, $cachePageNumber = 10);
right? Should I redo it?
I see one disadvantage, because the
search()
returns an array ofcontent
models (ResultSet
). But I think if we get back100
plainContent
models and only store theIDs
in the cache, the performance should be ok.
Currently all Search Drivers return an array of Content
records.
MySQL does a code like Content::find()->where(...)->all()
.
Zend and Solr do this:
$content = Content::findOne(['id' => $contentId]);
if ($content !== null) {
$resultSet->results[] = $content;
} else {
throw new Exception('Could not load result! Content ID: ' . $contentId);
// ToDo: Delete Result
Yii::error("Could not load search result content: " . $contentId);
}
If we will modify all Search Drivers to return only Content IDs then we could remove the code from Zend and Solr drivers, so a removing the code Content::findOne(['id' => $contentId])
could improve a performance.
But if we will keep it then a performance may be worse, because the code Content::findOne()
will be run twice(before caching and after get from cache).
In additional we will need to implement new method ResultSet->getResults()
in order to use it instead of current code in view files: <?php foreach ($resultSet->results as $result): ?>
:
class ResultSet {
public function getResults(): array
{
return Content::findAll(['IN', 'id', $this->results]);
}
}
Do you agree such modifications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luke- I did that way because I thought it is better to keep a calling the code
SearchDriver->search(...)
from place where we should not have a logic. I.e. it would be better to call the method from Controller side and keep all logic inside the methodsearch()
. I mean a developer of the Search Controller part should not think about what code is executed inside the methodsearch()
. However maybe you are right because I have implemented it so the developer does it by new param'cachePageNumber' => 10
, i.e. currently my code looks like this in the Search Controller:$searchRequest = new SearchRequest(['cachePageNumber' => 10]); $resultSet = $searchDriver->search($searchRequest);
and if I have understood you corrently new code should be like this:
$searchRequest = new SearchRequest(); $resultSet = $searchDriver->searchCached($searchRequest, $cachePageNumber = 10);
right? Should I redo it?
Yes, please do.
I would prefer this, because then, this code block https://github.com/humhub/humhub/pull/6901/files#diff-b226b67b6a65e71e7d905c17f216f4eeaab4a2387a29d40ce5996999884fef29R39-R70 - which is rather complex is only in the searchCached
method, and it's always obvious caching related.
I see one disadvantage, because the
search()
returns an array ofcontent
models (ResultSet
). But I think if we get back100
plainContent
models and only store theIDs
in the cache, the performance should be ok.Currently all Search Drivers return an array of
Content
records. MySQL does a code likeContent::find()->where(...)->all()
. Zend and Solr do this:$content = Content::findOne(['id' => $contentId]); if ($content !== null) { $resultSet->results[] = $content; } else { throw new Exception('Could not load result! Content ID: ' . $contentId); // ToDo: Delete Result Yii::error("Could not load search result content: " . $contentId); }
If we will modify all Search Drivers to return only Content IDs then we could remove the code from Zend and Solr drivers, so a removing the code
Content::findOne(['id' => $contentId])
could improve a performance.But if we will keep it then a performance may be worse, because the code
Content::findOne()
will be run twice(before caching and after get from cache).In additional we will need to implement new method
ResultSet->getResults()
in order to use it instead of current code in view files:<?php foreach ($resultSet->results as $result): ?>
:class ResultSet { public function getResults(): array { return Content::findAll(['IN', 'id', $this->results]); } }
Do you agree such modifications?
Hmm, that`s tricky.
What about (not sure it's possible), maybe, when ResultSet
:
$results
attribute isContent[]|int[]
- Is serializable, on serialize, all
$results
entries which are type ofContent
are converted intoint
(content ID) - Implement Iterator, only related/required $results are always be loaded from int to object, ideally as batch using
IN
sql.
Like this (similar to your current approach):
class AbstractDriver {
public function searchCached(SearchRequest $request): ResultSet {
// $request->pageSize = 4
$largeResultSet = Yii::$app->cache->getOrSet('some-unique', function () {
$requestLarge = clone $request;
$requestLarge->pageSize * 10;
return $this->search($requestLarge);
});
return array_slice(/* of $largeResultSet */);
}
public function search(SearchRequest $request): ResultSet {
return $resultSet;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please do.
I would prefer this, because then, this code block https://github.com/humhub/humhub/pull/6901/files#diff-b226b67b6a65e71e7d905c17f216f4eeaab4a2387a29d40ce5996999884fef29R39-R70 - which is rather complex is only in the
searchCached
method, and it's always obvious caching related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurabakhtin Thanks, looks good for me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about (not sure it's possible), maybe, when
ResultSet
:
$results
attribute isContent[]|int[]
- Is serializable, on serialize, all
$results
entries which are type ofContent
are converted intoint
(content ID)- Implement Iterator, only related/required $results are always be loaded from int to object, ideally as batch using
IN
sql.
What kind of change does this PR introduce? (check at least one)
The PR fulfills these requirements:
develop
branch, not themaster
branch if no hotfixOther information:
Issue: https://github.com/humhub/humhub-internal/issues/207