Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto load pages on content search #6901

Merged
merged 8 commits into from Apr 2, 2024
Merged

Conversation

yurabakhtin
Copy link
Contributor

@yurabakhtin yurabakhtin commented Mar 26, 2024

What kind of change does this PR introduce? (check at least one)

  • Feature

The PR fulfills these requirements:

  • It's submitted to the develop branch, not the master branch if no hotfix
  • All tests are passing
  • Changelog was modified

Other information:
Issue: https://github.com/humhub/humhub-internal/issues/207

@yurabakhtin yurabakhtin requested a review from luke- March 26, 2024 14:37
@@ -40,7 +40,7 @@ public function actionResults()
{
$resultSet = null;

$this->searchRequest = new SearchRequest();
$this->searchRequest = new SearchRequest(['pageSize' => 3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Search queries are very resource-intensive. I would prefer a solution where we make a prefetch with a page size of e.g. 30 and cache it (content id's). And then use this cache for the streaming response and always deliver it in junks of 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke- I think it is possible like I did here for Search Providers, but we need to do an additional modification in the pagination part.
I have already started this but don't have time to finish this today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke- I have done the caching with implementing new option SearchRequest::$cachePageNumber.
If call new SearchRequest(['pageSize' => 3, 'cachePageNumber' => 10]) it means 30 records will be cached on first request, i.e. 10 pages, when going to 11th page then next search process will be called and results will be cached as well for pages from 11 to 20.

Please note I have implemented new abstract method runSearch(), review how old method search() should be modified for SolrDriver like I did for Mysql and ZendLucence drivers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Is there a reason why you didn't add a new AbstractDriver::searchCached(SearchRequest $request): ResultSet) method? (instead runSearch) This could be non-abstract and only contain the page caching logic?

The real search logic could stay in the abstract AbstractDriver::search(SearchRequest $request): ResultSet) method.

Both methods search and searchCached could stay public, and searchCached just adds a layer on top of search.


I see one disadvantage, because the search() returns an array of content models (ResultSet). But I think if we get back 100 plain Content models and only store the IDs in the cache, the performance should be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke- I did that way because I thought it is better to keep a calling the code SearchDriver->search(...) from place where we should not have a logic. I.e. it would be better to call the method from Controller side and keep all logic inside the method search().
I mean a developer of the Search Controller part should not think about what code is executed inside the method search().
However maybe you are right because I have implemented it so the developer does it by new param 'cachePageNumber' => 10, i.e. currently my code looks like this in the Search Controller:

$searchRequest = new SearchRequest(['cachePageNumber' => 10]);

$resultSet = $searchDriver->search($searchRequest);

and if I have understood you corrently new code should be like this:

$searchRequest = new SearchRequest();

$resultSet = $searchDriver->searchCached($searchRequest, $cachePageNumber = 10);

right? Should I redo it?


I see one disadvantage, because the search() returns an array of content models (ResultSet). But I think if we get back 100 plain Content models and only store the IDs in the cache, the performance should be ok.

Currently all Search Drivers return an array of Content records.
MySQL does a code like Content::find()->where(...)->all().
Zend and Solr do this:

$content = Content::findOne(['id' => $contentId]);
if ($content !== null) {
    $resultSet->results[] = $content;
} else {
    throw new Exception('Could not load result! Content ID: ' . $contentId);
    // ToDo: Delete Result
    Yii::error("Could not load search result content: " . $contentId);
}

If we will modify all Search Drivers to return only Content IDs then we could remove the code from Zend and Solr drivers, so a removing the code Content::findOne(['id' => $contentId]) could improve a performance.

But if we will keep it then a performance may be worse, because the code Content::findOne() will be run twice(before caching and after get from cache).

In additional we will need to implement new method ResultSet->getResults() in order to use it instead of current code in view files: <?php foreach ($resultSet->results as $result): ?>:

class ResultSet {
    public function getResults(): array
    {
        return Content::findAll(['IN', 'id', $this->results]);
    }
}

Do you agree such modifications?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke- I did that way because I thought it is better to keep a calling the code SearchDriver->search(...) from place where we should not have a logic. I.e. it would be better to call the method from Controller side and keep all logic inside the method search(). I mean a developer of the Search Controller part should not think about what code is executed inside the method search(). However maybe you are right because I have implemented it so the developer does it by new param 'cachePageNumber' => 10, i.e. currently my code looks like this in the Search Controller:

$searchRequest = new SearchRequest(['cachePageNumber' => 10]);

$resultSet = $searchDriver->search($searchRequest);

and if I have understood you corrently new code should be like this:

$searchRequest = new SearchRequest();

$resultSet = $searchDriver->searchCached($searchRequest, $cachePageNumber = 10);

right? Should I redo it?

Yes, please do.

I would prefer this, because then, this code block https://github.com/humhub/humhub/pull/6901/files#diff-b226b67b6a65e71e7d905c17f216f4eeaab4a2387a29d40ce5996999884fef29R39-R70 - which is rather complex is only in the searchCached method, and it's always obvious caching related.

I see one disadvantage, because the search() returns an array of content models (ResultSet). But I think if we get back 100 plain Content models and only store the IDs in the cache, the performance should be ok.

Currently all Search Drivers return an array of Content records. MySQL does a code like Content::find()->where(...)->all(). Zend and Solr do this:

$content = Content::findOne(['id' => $contentId]);
if ($content !== null) {
    $resultSet->results[] = $content;
} else {
    throw new Exception('Could not load result! Content ID: ' . $contentId);
    // ToDo: Delete Result
    Yii::error("Could not load search result content: " . $contentId);
}

If we will modify all Search Drivers to return only Content IDs then we could remove the code from Zend and Solr drivers, so a removing the code Content::findOne(['id' => $contentId]) could improve a performance.

But if we will keep it then a performance may be worse, because the code Content::findOne() will be run twice(before caching and after get from cache).

In additional we will need to implement new method ResultSet->getResults() in order to use it instead of current code in view files: <?php foreach ($resultSet->results as $result): ?>:

class ResultSet {
    public function getResults(): array
    {
        return Content::findAll(['IN', 'id', $this->results]);
    }
}

Do you agree such modifications?

Hmm, that`s tricky.

What about (not sure it's possible), maybe, when ResultSet:

  • $results attribute is Content[]|int[]
  • Is serializable, on serialize, all $results entries which are type of Content are converted into int (content ID)
  • Implement Iterator, only related/required $results are always be loaded from int to object, ideally as batch using IN sql.

Like this (similar to your current approach):

class AbstractDriver {

   public function searchCached(SearchRequest $request): ResultSet {
         // $request->pageSize = 4

         $largeResultSet = Yii::$app->cache->getOrSet('some-unique', function () {
                    $requestLarge = clone $request;
                    $requestLarge->pageSize * 10;
                    return $this->search($requestLarge);
         });

         return array_slice(/* of $largeResultSet */);
   }


   public function search(SearchRequest $request): ResultSet {
            return $resultSet;
   }

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do.

I would prefer this, because then, this code block https://github.com/humhub/humhub/pull/6901/files#diff-b226b67b6a65e71e7d905c17f216f4eeaab4a2387a29d40ce5996999884fef29R39-R70 - which is rather complex is only in the searchCached method, and it's always obvious caching related.

@luke- I have done this in the commit 24ed4ee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurabakhtin Thanks, looks good for me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about (not sure it's possible), maybe, when ResultSet:

  • $results attribute is Content[]|int[]
  • Is serializable, on serialize, all $results entries which are type of Content are converted into int (content ID)
  • Implement Iterator, only related/required $results are always be loaded from int to object, ideally as batch using IN sql.

@luke- Please review commit 5e87703.

@luke- luke- added this pull request to the merge queue Apr 2, 2024
Merged via the queue into develop with commit 317d598 Apr 2, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants