Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command to deduplicate contacts #5344

Merged
merged 24 commits into from May 25, 2018

Conversation

alanhartless
Copy link
Contributor

@alanhartless alanhartless commented Nov 17, 2017

Q A
Bug fix?
New feature? y
Automated tests included? y
Related user documentation PR URL todo
Related developer documentation PR URL
Issues addressed (#s or URLs)
BC breaks?
Deprecations?

Description:

This PR adds a command php app/console mautic:contacts:dedup to find contacts with the same unique identifiers and merge them. By default, the records are merged from oldest into newest but that can be changed with the flag --newest-into-oldest.

Regardless of which record is merged into which, the latest modified record's custom field data is used (except where it is empty as we will error on the side of not completely losing data). Also the latest last active date is preserved and the oldest date identified date.

Contacts with conflicting unique identifiers when there are more than one unique identifier are not merged.

Steps to test this PR:

  1. Hack the database to update multiple contacts with the same email OR manually edit through the UI to ensure there are multiple contacts with the same unique identifiers.
  2. Run the command and they should be merged along with all of their behavior data.

List deprecations along with the new alternative:

  1. LeadModel::mergeLeads is deprecated. ContactMerger should be used instead

@alanhartless alanhartless added feature A new feature for inclusion in minor or major releases needs-automated-tests PR's that need automated tests before they can be merged WIP PR's that are not ready for review and are currently in progress labels Nov 17, 2017
@alanhartless alanhartless changed the title Command to deduplicate contacts WIP - Command to deduplicate contacts Nov 17, 2017
@dongilbert
Copy link
Member

+1 works well. Tentative approval pending UT's

@javjim
Copy link

javjim commented Nov 17, 2017

works properly for me

@alanhartless alanhartless force-pushed the feature-dedup-command branch 2 times, most recently from 789cf21 to 9ab8020 Compare November 23, 2017 16:37
@luizeof
Copy link
Member

luizeof commented Dec 7, 2017

@alanhartless .... 2.12.0 fix duplicating contacts on import .... this pull will be released on 2.12.0 too?

Thanks ;)

@vesper8
Copy link

vesper8 commented Apr 7, 2018

I need to use this, I have a bunch of duplicate contacts and am looking for a safe way to merge them

2.12 came out already but this didn't make it in? How can I use this now? I must create a fork or is there another way?

@alanhartless alanhartless added this to the 2.14.0 milestone Apr 17, 2018
@alanhartless alanhartless added ready-to-test PR's that are ready to test needs-documentation PR's that need documentation before they can be merged and removed needs-automated-tests PR's that need automated tests before they can be merged WIP PR's that are not ready for review and are currently in progress labels Apr 19, 2018
@alanhartless alanhartless changed the title WIP - Command to deduplicate contacts Command to deduplicate contacts Apr 19, 2018
@mautibot mautibot added the pending-test-confirmation PR's that require one test before they can be merged label Apr 26, 2018
protected function execute(InputInterface $input, OutputInterface $output)
{
/** @var ContactDeduper $deduper */
$deduper = $this->getContainer()->get('mautic.lead.deduper');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use DI for new commands?

use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;

class DedupCommand extends ModeratedCommand
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dedup real word in English? I think DeduplicateCommand would be a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably more slang but I can rename it.

/**
* @var ContactMerger
*/
private $merger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why not to use $contactMerger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason why not. I'll change it.

/**
* @var LeadRepository
*/
private $repository;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$leadRepository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason why not. I'll change it.


namespace Mautic\LeadBundle\Exception;

class ValueNotMergeable extends \Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ValueNotMergeableException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

*/
public function __construct($newerValue, $olderValue)
{
parent::__construct(var_export($newerValue, true).' / '.var_export($olderValue, true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is var_export here? Should be a comment here I think, if there is a real reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add in the message what the value actually was. $newerValue can be anything so var_export converts it to a string for the exception message.

*
* @return Lead
*/
public function mergeLeads(Lead $lead, Lead $lead2, $autoMode = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change here? I think this method just should call the new method, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was relocated from elsewhere in the class. But there is a circular dependency if I use the new service here. Will have to create some kind of legacy class just to get around that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to address this by passing the container to a legacy class to keep from having the circular dependency. Will likely need to leverage this class for some other methods in the LeadModel to remove more deprecated code/circular dependencies. I want to get rid of checkForDuplicateContact as well and use the ContactDeduper class instead.

Copy link
Contributor

@Maxell92 Maxell92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

@mautibot mautibot added the code-review PR's that require a code review before merging label May 8, 2018
@escopecz escopecz self-assigned this May 8, 2018
Copy link
Sponsor Member

@escopecz escopecz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks nice. Will test it. Just found minor issues in the doc blocks.

<?php

/*
* @copyright 2017 Mautic Contributors. All rights reserved
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should read 2018. Also check other file annotations.


/*
* @copyright 2017 Mautic Contributors. All rights reserved
* @author Mautic, Inc.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be Inc, or should it? Also check other file annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically it is Mautic, Inc contributing this code. So I think this is fine.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want every company/individual who creates/modifies the code to add itself as the author? It says "Mautic" on all files I randomly checked. Maybe @dbhurley could way in.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw Mautic, Inc. on another file from 2014. So if that's not new, let's ignore this discussion.

* @param LeadModel $leadModel
* @param MergeRecordRepository $repo
* @param LoggerInterface $logger
*/
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatcher param is missing in the docblock

<?php

/*
* @copyright 2017 Mautic Contributors. All rights reserved
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be PITA, but could you check all the file annotations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I technically wrote these files in 2017 :-)

@mautibot mautibot removed the code-review PR's that require a code review before merging label May 9, 2018
Copy link
Sponsor Member

@escopecz escopecz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine for me. Thanks Alan!

@escopecz escopecz added ready-to-commit PR's with 2 successful tests, 1 approval, automated tests and docs and is ready to be merged and removed pending-test-confirmation PR's that require one test before they can be merged ready-to-test PR's that are ready to test labels May 9, 2018
@dbhurley dbhurley added Backlog and removed ready-to-commit PR's with 2 successful tests, 1 approval, automated tests and docs and is ready to be merged labels May 18, 2018
@dbhurley dbhurley removed this from the 2.14.0 milestone May 18, 2018
@alanhartless alanhartless added this to the 2.14.0 milestone May 25, 2018
@alanhartless alanhartless merged commit a3427ee into mautic:staging May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature for inclusion in minor or major releases needs-documentation PR's that need documentation before they can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants