# Create redo records

This Jupyter notebook shows how to create a Senzing "redo record".
It assumes a G2 database that is empty.

Essentially the steps are to create very similar records under different data sources,
then delete one of the records.  This produces a "redo record".


More information:

1. [GitHub repository](https://github.com/Senzing/docker-jupyter)
1. [Senzing documentation](http://docs.senzing.com/?python#g2config)

## Table of contents

1. [Prepare environment](#Prepare-environment)
    1. [Initialize Senzing configuration](#Initialize-Senzing-configuration)
    1. [Initialize python environment](#Initialize-python-environment)
    1. [Helper class for JSON rendering](#Helper-class-for-JSON-rendering)
    1. [System path](#System-path)
    1. [Initialize variables](#Initialize-variables)
1. [G2Engine](#G2Engine)
    1. [Senzing initialization](#Senzing-initialization)
    1. [primeEngine](#primeEngine)
1. [Configuration](#Configuration)
    1. [One time configuration initialization](#One-time-configuration-initialization)
    1. [Variable initialization](#Variable-initialization)
    1. [Create add data source function](#Create-add-data-source-function)
    1. [Create add record function](#Create-add-record-function)
1. [Redo record](#Redo-record)    
    1. [Print data sources](#Print-data-sources)
    1. [Add data sources and records](#Add-data-sources-and-records)
    1. [Delete record](#Delete-record)
    1. [Count redo records](#Count-redo-records)
    1. [Print data sources again](#Print-data-sources-again)

## Prepare environment

### Initialize Senzing configuration

Run [senzing-G2ConfigMgr-reference.ipynb](senzing-G2ConfigMgr-reference.ipynb)
to install a Senzing Engine configuration in the database.

### Initialize python environment

In [None]:
import os
import sys
import json

# For RenderJSON

import uuid
from IPython.display import display_javascript, display_html, display

### Helper class for JSON rendering

A class for pretty-printing JSON.
Not required by Senzing, 
but helps visualize JSON.

In [None]:
class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict):
            self.json_str = json.dumps(json_data)
        elif isinstance(json_data, bytearray):
            self.json_str = json_data.decode()
        else:
            self.json_str = json_data
        self.uuid = str(uuid.uuid4())

    def _ipython_display_(self):
        display_html('<div id="{}" style="height:100%; width:100%; background-color: LightCyan"></div>'.format(self.uuid), raw=True)
        display_javascript("""
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
        document.getElementById('%s').appendChild(renderjson(%s))
        });
        """ % (self.uuid, self.json_str), raw=True)

### System path

Update system path.

In [None]:
python_path = "{0}/python".format(
    os.environ.get("SENZING_G2_DIR", "/opt/senzing/g2"))
sys.path.append(python_path)

### Initialize variables

Create variables used for G2Engine.

In [None]:
config_path = os.environ.get("SENZING_ETC_DIR", "/etc/opt/senzing")
support_path = os.environ.get("SENZING_DATA_VERSION_DIR", "/opt/senzing/data")

resource_path = "{0}/resources".format(
    os.environ.get("SENZING_G2_DIR", "/opt/senzing/g2"))

sql_connection = os.environ.get(
    "SENZING_SQL_CONNECTION", "sqlite3://na:na@/var/opt/senzing/sqlite/G2C.db")

verbose_logging = False

senzing_config_dictionary = {
    "PIPELINE": {
        "CONFIGPATH": config_path,        
        "SUPPORTPATH": support_path,
        "RESOURCEPATH": resource_path
    },
    "SQL": {
        "CONNECTION": sql_connection,
    }
}

senzing_config_json = json.dumps(senzing_config_dictionary)

## G2Engine

### Senzing initialization

Create an instance of G2Engine, G2ConfigMgr, and G2Config.

In [None]:
import G2Exception

In [None]:
from G2Engine import G2Engine
g2_engine = G2Engine()
g2_engine_flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS

return_code = g2_engine.initV2(
    "pyG2EngineForRedoRecords",
    senzing_config_json,
    verbose_logging)

print("Return Code: {0}".format(return_code))

In [None]:
from G2ConfigMgr import G2ConfigMgr
g2_configuration_manager = G2ConfigMgr()

return_code = g2_configuration_manager.initV2(
    "pyG2ConfigMgrForRedoRecords",
    senzing_config_json,
    verbose_logging)

print("Return Code: {0}".format(return_code))

In [None]:
from G2Config import G2Config
g2_config = G2Config()

return_code = g2_config.initV2(
    "pyG2ConfigForRedoRecords",
    senzing_config_json,
    verbose_logging)

print("Return Code: {0}".format(return_code))

### primeEngine

In [None]:
return_code = g2_engine.primeEngine()
print("Return Code: {0}".format(return_code))

## Configuration

### One time configuration initialization

Install a default configuration, if needed.

In [None]:
config_id_bytearray = bytearray()
return_code = g2_configuration_manager.getDefaultConfigID(config_id_bytearray)
print("Return Code: {0}".format(return_code))

if config_id_bytearray:
    config_id_int = int(config_id_bytearray)
    configuration_bytearray = bytearray()
    g2_configuration_manager.getConfig(config_id_int, configuration_bytearray)
    configuration_json = configuration_bytearray.decode()
    config_handle = g2_config.load(configuration_json)
else:
    config_handle = g2_config.create()

### Variable initialization

In [None]:
load_id = None

### Create add data source function

Create a data source with a name having the form `TEST_DATA_SOURCE_nnn`.

In [None]:
def add_data_source(datasource_suffix):
    datasource_prefix = "TEST_DATA_SOURCE_"
    datasource_id = "{0}{1}".format(datasource_prefix, datasource_suffix)
    configuration_comment = "Added {}".format(datasource_id)
    g2_config.addDataSource(config_handle, datasource_id)
    configuration_bytearray = bytearray()
    return_code = g2_config.save(config_handle, configuration_bytearray)
    configuration_json = configuration_bytearray.decode()
    configuration_id_bytearray = bytearray()
    g2_configuration_manager.addConfig(configuration_json, configuration_comment, configuration_id_bytearray)
    g2_configuration_manager.setDefaultConfigID(configuration_id_bytearray)
    g2_engine.reinitV2(configuration_id_bytearray)

### Create add record function

Create a record with the id having the form `RECORD_nnn`.
**Note:** this is essentially the same record with only the `DRIVERS_LICENSE_NUMBER` modified slightly.

In [None]:
def add_record(record_id_suffix, datasource_suffix):
    datasource_prefix = "TEST_DATA_SOURCE_"
    record_id_prefix = "RECORD_"
    datasource_id = "{0}{1}".format(datasource_prefix, datasource_suffix)
    record_id = "{0}{1}".format(record_id_prefix, record_id_suffix)
    data = {
        "NAMES": [{
            "NAME_TYPE": "PRIMARY",
            "NAME_LAST": "Smith",
            "NAME_FIRST": "John",
            "NAME_MIDDLE": "M"
        }],
        "PASSPORT_NUMBER": "PP11111",
        "PASSPORT_COUNTRY": "US",
        "DRIVERS_LICENSE_NUMBER": "DL1{:04d}".format(record_id_suffix),
        "SSN_NUMBER": "111-11-1111"
    }
    data_as_json = json.dumps(data)
    g2_engine.addRecord(
        datasource_id,
        record_id,
        data_as_json,
        load_id)

## Redo record

### Print data sources

Print the list of currently defined data sources.

In [None]:
datasources_bytearray = bytearray()
return_code = g2_config.listDataSources(config_handle, datasources_bytearray)
datasources_dictionary = json.loads(datasources_bytearray.decode())
RenderJSON(datasources_dictionary)

### Add data sources and records

In [None]:
add_data_source(1)
add_record(1,1)
add_record(2,1)
add_data_source(2)
add_record(3,2)
add_record(4,2)
add_data_source(3)
add_record(5,3)
add_record(6,3)

### Delete record

Deleting a record will create a "redo record".

In [None]:
return_code = g2_engine.deleteRecord("TEST_DATA_SOURCE_3", "RECORD_5", load_id)

print("Return Code: {0}".format(return_code))

### Count redo records

The `count_of_redo_records` will show how many redo records are in Senzing's queue of redo records. 

In [None]:
count_of_redo_records = g2_engine.countRedoRecords()

print("Number of redo records: {0}".format(count_of_redo_records))

### Print data sources again

Print the list of currently defined data sources.

In [None]:
datasources_bytearray = bytearray()
return_code = g2_config.listDataSources(config_handle, datasources_bytearray)
datasources_dictionary = json.loads(datasources_bytearray.decode())
RenderJSON(datasources_dictionary)