# Automatic Record Merge in HubSpot

Nosotros realizamos un merge automatico de registros en HubSpot mediante la implementación de la herramienta de "Merge records" de HubSpot, por lo cual el merge se rige bajo las siguientes normas:

**Doc:** https://knowledge.hubspot.com/crm-setup/merge-records

Esta implementación de la herramienta de HubSpot la realizamo a traves de la API.

Este codigo trabaja haciendo uso del "Key" de los duplicados que obtenemos despues de implementar las funciones de encontrar duplicados.

# Set environment variables

In [1]:
import os

In [2]:
os.environ["path"] = r'C:\Users\Admin\Documents\GitHub\Operational-Library-For-Data-Engineers'
os.environ["access_token"] = ''
os.environ["records_object_type"] = 'contacts'
os.environ["file_path"] = r'C:\Users\Admin\Desktop\85ztypmvx - Fit City Adventures - Contact Clean Up - Support - 1398'
os.environ["file_name"] = 'Duplicate Contacts Records Found.xlsx - Sheet1.csv'

## Libraries

In [3]:
import requests
import json

import pandas as pd
import numpy as np

from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

from IPython.display import display, HTML, clear_output

#### Data team library

Your library path

In [4]:
path = os.getenv('path')

In [5]:
import sys
sys.path.insert(0,path)

from data_tranfromations.delte_unnecessary_blank_spaces import delte_unnecessary_blank_spaces

## Parameters - Input Values

#### Access API

In [6]:
access_token = os.getenv('access_token') # Input

headers = {'Content-Type': 'application/json',
            'authorization': 'Bearer {}'.format(access_token)}

#### Info Records

In [7]:
records_object_type = os.getenv('records_object_type')

#### Frame

In [8]:
file_path = os.getenv('file_path')

In [9]:
file_name = os.getenv('file_name')

In [10]:
df = pd.read_csv(file_path+'\\'+file_name)

In [11]:
df = df.replace(np.nan, '')

In [12]:
printmd("<h3><span style='color:blue'>You will work with {} records</span></h3>".format(len(df)))

<h3><span style='color:blue'>You will work with 85 records</span></h3>

In [13]:
df.head(5)

Unnamed: 0,Record ID,First Name,Last Name,Phone Number,Email,Street Address,Associated Company,Record View,Key
0,1926602,Aimee,Caulk,,acaulk@destinationhotels.com,,destinationhotels,https://app.hubspot.com/contacts/8062547/recor...,aimeecaulkdestinationhotels
1,2052853,Aimee,Caulk,(480) 894-1400,aimee.caulk@destinationhotels.com,Southeast Valley,destinationhotels,https://app.hubspot.com/contacts/8062547/recor...,aimeecaulkdestinationhotels
2,1681301,Ashley,Murdock,323-717-2908,ashleymurdock@successfulinpink.com,,Successful in Pink LLC,https://app.hubspot.com/contacts/8062547/recor...,ashleymurdocksuccessfulinpink
3,1757351,Ashley,Murdock,,ashley@successfulinpink.com,,Successful in Pink LLC,https://app.hubspot.com/contacts/8062547/recor...,ashleymurdocksuccessfulinpink
4,120109,Castel,Valere‐Couturier,,cvalerecouturier@soundoffexperience.com,,Sound Off,https://app.hubspot.com/contacts/8062547/recor...,castelvalerecouturiersoundoff


## Empty Keys

Vamos a realizar una pequeña revisión de las Keys, no debe existir ninguna key que este en blanco, realizar un merge de registros con keys vacias podria incluso eliminar bases de datos completas.

In [14]:
df['Key'] = df['Key'].apply(delte_unnecessary_blank_spaces)

In [15]:
detect_empty_keys = df.loc[df['Key'] == '']

if len(detect_empty_keys) != 0:
    printmd("<h3><span style='color:red'>There are keys configured as empty</span></h3> please check these keys that are empty [''], otherwise you will end up making merges that should not be made.")
else:
     printmd("<h3><span style='color:green'>Keys are properly configured</span></h3> Go ahead!")

<h3><span style='color:green'>Keys are properly configured</span></h3> Go ahead!

In [16]:
detect_empty_keys

Unnamed: 0,Record ID,First Name,Last Name,Phone Number,Email,Street Address,Associated Company,Record View,Key


## Define number of duplicates

En esta longitud de valores seremos capaces de recorrer todo el frame de duplicados, obteniendo todas las key unicas que nos señalan los duplicados.

In [17]:
duplicates = df.drop_duplicates(subset=['Key'])
duplicates = duplicates.reset_index(drop=True)

## Merge Records

In [22]:
for i in range(len(duplicates)):
    
    clear_output(wait=True)
    
    print('Loop # {} of {}'.format(i, len(duplicates)-1))
    
    ## Our key to detect duplicates
    key = duplicates.at[i, 'Key']
    
    ## Find Duplicates Frame
    mini = df.loc[(df['Key'] == key)]
    
    ## To select the newest or oldest record as primary_record_id [ascending=True/False]
    #mini = mini.sort_values(by='Create Date', ascending=True) 
    mini = mini.reset_index(drop=True)
    
    # # # Loop for merging
    primary_record_id = int(mini.at[0, 'Record ID']) # Start record
    
    for j in range(1, len(mini)): #Start in 1 because primary_record_id will be the first [i = 0] in the start
        
        to_merge_record_id = int(mini.at[j, 'Record ID'])
        
        payload = json.dumps({'primaryObjectId': primary_record_id,
                              'objectIdToMerge': to_merge_record_id})
        
        url = 'https://api.hubapi.com/crm/v3/objects/{}/merge'.format(records_object_type)
        
        api_response = requests.request("POST", url, data=payload, headers=headers)
        
        ## El resultado del merge es un nuevo registro con un nuevo ID, este se convierte en el primary_record_id
        primary_record_id = api_response.json()['id']
        
        print('Merge # {} - Merging {} into {}'.format(j, to_merge_record_id, primary_record_id))
        print(api_response)
        
    break ## Puse este break-loop_i para que revise el primer resultado que le dio - recuerde que desmerge no hay

Loop # 41 of 41
Merge # 1 - Merging 1654501 into 1856001
<Response [200]>
