Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Unpublising parent pages should include child pages #89

Merged
merged 9 commits into from Apr 12, 2024

Conversation

ssmarco
Copy link
Collaborator

@ssmarco ssmarco commented Sep 28, 2023

Issue #88

Uses SiteTree::enforce_strict_hierarchy config in getting the dependent documents for indexing child pages.

@ssmarco ssmarco changed the title FIX Unpublising parent pages should include chid pages FIX Unpublising parent pages should include child pages Sep 28, 2023
@ssmarco ssmarco force-pushed the feature/3-remove-children-from-index branch from 9c67c32 to 737945e Compare September 28, 2023 13:19
foreach ($page->AllChildren() as $record) {
$document = DataObjectDocument::create($record);
$docs[$document->getIdentifier()] = $document;
$docs = array_merge($docs, $document->getDependentDocuments());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can calling getDependentDocuments() here ever result in an infinite loop?

Copy link
Collaborator Author

@ssmarco ssmarco Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my theory is correct, then the CMS will prevent us from this happening especially on a SiteTree which I used as a guide.

https://github.com/silverstripe/silverstripe-cms/blob/668744e728154818873bc24cb0b8d45f73728b95/code/Model/SiteTree.php#L1752
The delete method on each child page executes on before delete as well which is kind of the same thing which is to recursively call the delete function.

My initial change was to account only the immediate children (SiteTree) of a certain page but @michalkleiner made a valid point to account other relationships on each child pages hence made the changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am thinking of asking other vendors to stress test this implementation since this issue came from them since they have more content.

Issue [silverstripe#88](silverstripe#88) Uses `SiteTree::enforce_strict_hierarchy` config in getting the dependent documents for indexing child pages.
@ssmarco ssmarco force-pushed the feature/3-remove-children-from-index branch from 737945e to 188f6bd Compare October 5, 2023 00:48
@GuySartorelli
Copy link
Member

I don't have enough context into this module (let alone a way to test it) to feel comfortable approving it, I was only looking at it 'cause Max asked me to. But from what I can tell, it looks okay.

@ssmarco
Copy link
Collaborator Author

ssmarco commented Oct 5, 2023

I don't have enough context into this module (let alone a way to test it) to feel comfortable approving it, I was only looking at it 'cause Max asked me to. But from what I can tell, it looks okay.

If you are keen, we can have a catch-up to set-up Elastic Search on your local.

@ssmarco ssmarco force-pushed the feature/3-remove-children-from-index branch from 45775e0 to 0f9e26c Compare April 5, 2024 00:48
@ssmarco ssmarco force-pushed the feature/3-remove-children-from-index branch from 0f9e26c to 4109616 Compare April 5, 2024 05:29
@ssmarco
Copy link
Collaborator Author

ssmarco commented Apr 5, 2024

@blueo Ready for another look. Added a few more tests cases and comments where I find interesting to know and for future reference.

Copy link
Member

@GuySartorelli GuySartorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this locally - it works as expected. All descendents of the page being unpublished (and that page itself) are removed from the index.
No pages which should not be removed are removed.

I'll leave it to bernie to merge (or to comment if he has any requested changes)

P.S. loving your DDEV addon, Marco. It makes testing this so easy

Copy link
Collaborator

@blueo blueo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great changes - particularly the test coverage. I have a couple of questions around where the sitetree logic should go and the way we're getting 'removed' objects

src/DataObject/DataObjectDocument.php Outdated Show resolved Hide resolved

// Taking into account that this queued job has a reference of existing child pages
// We need to make sure that we are able to send these pages to ElasticSearch etc. for removal
$oldRecord = $doc->getDataObject();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__unserialise does this a bit differently by getting the 'latest' version - do we want to do that to be consistent? I"m assuming this is about getting documents that have been removed (ie were not picked up by the previous condition because of stage=Live)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested your __unserialize PR on top of this and your fix works as expected. As discussed, this job only checks for relationships between dataobjects and not their contents. Sending the live data is what the other PR has resolved a seemingly module wide issue which this PR also needs.

Copy link
Collaborator

@blueo blueo Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know they work together - I was thinking though - should these two code blocks work the same way?

  1. From this Job
                    $oldRecord = $doc->getDataObject();

                    if ($oldRecord->isArchived() || $oldRecord->isOnDraft()) {
                        $document = DataObjectDocument::create($oldRecord);
                        $carry[$document->getIdentifier()] = $document;
                    }
  1. from the unserialise function
if (!$dataObject && DataObject::has_extension($data['className'], Versioned::class) && $data['fallback']) {
            // get the latest version - usually this is an object that has been deleted
            $dataObject = Versioned::get_latest_version(
                $data['className'],
                $data['id']
            );
        }

eg your one does a check for isArchived + isOnDraft if there is no DataObject - where the unserialize does a Versioned::get_latest_version call. OR does it not matter and we get the same result in the end anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed - the __unserialize function will look after returning a draft/deleted version of the dataobject - so we don't need this check $oldRecord->isArchived() || $oldRecord->isOnDraft() - we can simply add the current dataobject - the job will then look after either indexing or removing it from the index

@@ -16,7 +17,7 @@
class RemoveDataObjectJobTest extends SearchServiceTest
{

protected static $fixture_file = '../fixtures.yml'; // phpcs:ignore
protected static $fixture_file = '../fixtures.yml'; // @phpcs:ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK you shouldn't need the @ - ideally just ignore a specific error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn in between following commenting out a specific error since it makes the line of code looks rather long and unpleasant to the eyes. I just felt that this error is not really an error for a Silverstripe developer and should be an acceptable convention rather than following 3rd-party conventions that does not care about Silverstripe.

@@ -76,11 +97,25 @@ public function testJob(): void

$resultTitles = [];

// This determines whether the document should be added or removed from from the index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great additions

Co-authored-by: Bernard Hamlin <948122+blueo@users.noreply.github.com>
@ssmarco ssmarco force-pushed the feature/3-remove-children-from-index branch from 00384eb to de17f14 Compare April 10, 2024 04:32
Copy link
Collaborator

@blueo blueo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ssmarco . I think this is ready to go now.

@blueo blueo merged commit 0bede9c into silverstripe:3 Apr 12, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants