# G2Engine

In [None]:
import os
import sys
import json

# For RenderJSON

import uuid
from IPython.display import display_javascript, display_html, display

## Helper class for JSON rendering

In [None]:
class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict):
            self.json_str = json.dumps(json_data)
        else:
            self.json_str = json_data
        self.uuid = str(uuid.uuid4())

    def _ipython_display_(self):
        display_html('<div id="{}" style="height: 600px; width:100%; background-color: LightCyan"></div>'.format(self.uuid), raw=True)
        display_javascript("""
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
        document.getElementById('%s').appendChild(renderjson(%s))
        });
        """ % (self.uuid, self.json_str), raw=True)

## System path

Update system path.

In [None]:
sys.path.append('/opt/senzing/g2/python')

# G2Engine
The G2Engine API...

In [None]:
from G2Engine import G2Engine
import G2Exception

## Initialize variables

Create variables used for G2Engine.

In [None]:
module_name = 'pyG2EngineForAddRecord'
senzing_python_directory = "/opt/senzing/g2/python"
verbose_logging = False

config_dict = {
    "PIPELINE": {
        "CONFIGPATH": "/etc/opt/senzing",        
        "SUPPORTPATH": "/opt/senzing/data",
        "RESOURCEPATH": "/opt/senzing/g2/resources"
    },
    "SQL": {
        "CONNECTION": "sqlite3://na:na@/var/opt/senzing/sqlite/G2C.db",
    }
}
config_json = json.dumps(config_dict)

## Initialization

To start using Senzing G2Engine, create and initialize an instance.
This should be done once per process.
The `initV2()` method accepts the following parameters:

- **module_name:** A short name given to this instance of the G2Engine object.
- **config_json:** A JSON string containing configuration parameters.
- **verbose_logging:** A boolean which enables diagnostic logging.
- **config_id:** (optional) The identifier value for the engine configuration can be returned here.

Calling this function will return "0" upon success.

In [None]:
g2_engine = G2Engine()
return_code = g2_engine.initV2(module_name, config_json, verbose_logging)
print("Return Code: {0}".format(return_code))

## Prime Engine

The `primeEngine()` method may optionally be called to pre-initialize some of the heavier weight internal resources of the G2 engine.

In [None]:
return_code = g2_engine.primeEngine()
print("Return Code: {0}".format(return_code))

## Parameters

The following variables are used as parameters to the Senzing API.

In [None]:
datasource_code_1 = "TEST"
record_id_1 = "1"
datasource_code_2 = "TEST"
record_id_2 = "2"
datasource_code_3 = "TEST"
record_id_3 = "3"
datasource_code_4 = "TEST"
record_id_4 = "4"
datasource_code_5 = "TEST"
record_id_5 = "5"
datasource_code_6 = "TEST"
record_id_6 = "6"
datasource_code_7 = "TEST"
record_id_7 = "7"

load_id = None
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS

Initial data.

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Smith",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

## addRecord()

Once the Senzing engine is initialized, use addRecord() to load a record into the Senzing repository -- addRecord() can be called as many times as desired and from multiple threads at the same time. The addRecord() function returns "0" upon success, and accepts four parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The record ID, used to identify distinct records
- **data_as_json:** A JSON document with the attribute data for the record
- **load_id:** The observation load ID for the record; value can be null and will default to data_source


In [None]:
return_code = g2_engine.addRecord(datasource_code_1, record_id_1, data_as_json, load_id)
print("Return Code: {0}".format(return_code))

# Retrieve a Record
Use getRecordV2() to retrieve a single record from the data repository; the record is assigned in JSON form to a user-designated buffer, and the function itself returns "0" upon success. Once the Senzing engine is initialized, getRecordV2() can be called as many times as desired and from multiple threads at the same time. The getRecordV2() function accepts the following parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The record ID, used to identify the record for retrieval
- **flags:** Control flags for specifying what data about the record to retrieve
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize(C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_bytearray = bytearray("", 'utf-8')
return_code = g2_engine.getRecordV2(datasource_code_1, record_id_1, flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

The function `getRecordV2()` is an improved version of `getRecord()` that also allows you to use control flags. The `getRecord()` function has been deprecated.

## Entity Search
##### By Record

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers methods for entity searching, all of which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success) .

Use `getEntityByRecordIDV2()` to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The numeric ID of a resolved entity
- **flags:** Control flags for specifying what data about the entity to retrieve
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordIDV2(datasource_code_1, record_id_1, flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

# Entity Search
##### By Entity

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers methods for entity searching, all of which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success) .

Use `getEntityByEntityIDV2()` to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

- **entity_id:** The numeric ID of a resolved entity
- **flags:** Control flags for specifying what data about the entity to retrieve
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByEntityIDV2(entity_id_1, flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

The `getEntityByEntityIDV2()` and `getEntityByRecordIDV2()` functions are improved versions of `getEntityByEntityID()` and `getEntityByRecordID()`.  The new function signatures have control flags. 

`getEntityByEntityID()` and `getEntityByRecordID()` functions are deprecated.

## Search By Attributes

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers a method for entity searching by attributes, which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success) .

Use `searchByAttributes()` to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

- **data_as_json:** A JSON document with the attribute data to search for
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.searchByAttributes(data_as_json, response_bytearray)
response_dictionary = json.loads(response_bytearray)

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

## Search By Attributes V2

This function is similar but preferable to the searchByAttributes() function. This function has improved functionality and a better standardized output structure.

Use `searchByAttributesV2()` to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

- **data_as_json:** A JSON document with the attribute data to search for
- **flags:** Operational flags
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.searchByAttributesV2(data_as_json, flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

# Replace the record
Use the `replaceRecord()` function to update or replace a record in the data repository (if record doesn't exist, a new record is added to the data repository. Like the above functions, `replaceRecord()` returns "0" upon success, and it can be called as many times as desired and from multiple threads at the same time. The `replaceRecord()` function accepts four parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The record ID, used to identify distinct records
- **data_as_json:** A JSON document with the attribute data for the record
- **load_id:** The observation load ID for the record; value can be null and will default to dataSourceCode

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_1, record_id_1, data_as_json, load_id)

print("Return Code: {0}".format(return_code))

## Export JSON Entity Report

There are three steps to exporting resolved entity data from the G2Engine object in JSON format. First, use the `exportJSONEntityReport()` method to generate a long integer, referred to here as an 'exportHandle'. The `exportJSONEntityReport()` method accepts one parameter as input:

- **flags**: An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.

Second, use the fetchNext() method to read the exportHandle and export a row of JSON output containing the entity data for a single entity. Note that successive calls of fetchNext() will export successive rows of entity data. The fetchNext() method accepts the following parameters as input:

- **export_handle:** A long integer from which resolved entity data may be read and exported
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here.

In [None]:
export_handle = g2_engine.exportJSONEntityReport(flags)

while True:
    response_bytearray = bytearray()
    g2_engine.fetchNext(export_handle, response_bytearray)
    if not response_bytearray:
        break
    response_dictionary = json.loads(response_bytearray)        
    response = json.dumps(response_dictionary, sort_keys=True, indent=4)        
    print(response)

## Export CSV Entity Report

There are three steps to exporting resolved entity data from the G2Engine object in CSV format. First, use the `exportCSVEntityReportV2()` method to generate a long integer, referred to here as an 'exportHandle'.

The `exportCSVEntityReportV2()` method accepts these parameter as input:

- **csv_column_list:** A comma-separated list of column names for the CSV export. (These are listed a little further down.)
- **flags:** An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.

Second, use the `fetchNext()` method to read the exportHandle and export a row of CSV output containing the entity data for a single entity. Note that the first call of `fetchNext()` will yield a header row, and that successive calls of `fetchNext()` will export successive rows of entity data. The `fetchNext()` method accepts the following parameters as input:

- **export_handle:** A long integer from which resolved entity data may be read and exported
- **response_bytearray:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes

In [None]:
export_handle = g2_engine.exportCSVEntityReport(flags)

while True:
    response_bytearray = bytearray()
    g2_engine.fetchNext(export_handle, response_bytearray)
    if not response_bytearray:
        break
    print(response_bytearray.decode())

## Finding Paths
The `FindPathByEntityID()` and `FindPathByRecordID()` functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

- **entity_id_2:** The entity ID for the starting entity of the search path
- **entity_id_3:** The entity ID for the ending entity of the search path
- **datasource_code_2:** The data source for the starting entity of the search path
- **datasource_code_3:** The data source for the ending entity of the search path
- **record_id_2:** The record ID for the starting entity of the search path
- **record_id_3:** The record ID for the ending entity of the search path
- **max_degree:** The number of relationship degrees to search

First you will need to create some records so that you have some that you can compare. Can you see what is the same between this record and the previous one?

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Max",
        "NAME_MIDDLE": "W"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_2, record_id_2, data_as_json, None)

print("Return Code: {0}".format(return_code))

Replace values for Record #3

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Mildred"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_3, record_id_3, data_as_json, None)

print("Return Code: {0}".format(return_code))

Locate "entity identifier" for Record #1

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_1, record_id_1, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_1))

Locate "entity identifier" for Record #2

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_2, record_id_2, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_2 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_2))

Locate "entity identifier" for Record #3

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_3, record_id_3, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_3 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_3))

## `FindPathByEntityID()`

In [None]:
# Define search variables.

max_degree = 3

# Find the path by entity ID.

response_bytearray = bytearray([])
g2_engine.findPathByEntityID(entity_id_2, entity_id_3, max_degree, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

# FindPathByEntityIDV2()
The function `FindPathByEntityIDV2()` is an improved version of `FindPathByEntityID()` that also allow you to use control flags.

In [None]:
# Define search variables.

max_degree = 3

# Find the path by entity ID.

response_bytearray = bytearray([])
g2_engine.findPathByEntityIDV2(entity_id_2, entity_id_3, max_degree, flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

## `FindPathByRecordID()`

In [None]:
# Define search variables.

max_degree = 3

# Find the path by record ID.

response_bytearray = bytearray([])
g2_engine.findPathByRecordID(datasource_code_2, record_id_2,
                             datasource_code_3, record_id_3,
                             max_degree, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

# FindPathByRecordIDV2()
The function `FindPathByRecordIDV2()` is an improved version of `FindPathByRecordID()` that also allow you to use control flags.

In [None]:
# Define search variables.

max_degree = 3

# Find the path by record ID.

response_bytearray = bytearray([])
g2_engine.findPathByRecordIDV2(datasource_code_2, record_id_2,
                               datasource_code_3, record_id_3,
                               max_degree, flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

## Finding Paths with Exclusions
The `FindPathExcludingByEntityID()` and `FindPathExcludingByRecordID()` functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities. In addition, it will find paths that exclude certain entities from being on the path.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. Additionally, entities to be excluded can also be specified by either Entity ID or by Record ID.

When excluding entities, the user may choose to either (a) strictly exclude the entities, or (b) prefer to exclude the entities, but still include them if no other path is found. By default, entities will be strictly excluded. A "preferred exclude" may be done by specifying the G2_FIND_PATH_PREFER_EXCLUDE control flag.

These functions have the following parameters:

- **entity_id_2:** The entity ID for the starting entity of the search path
- **entity_id_3:** The entity ID for the ending entity of the search path
- **datasource_code_2:** The data source for the starting entity of the search path
- **datasource_code_3:** The data source for the ending entity of the search path
- **record_id_2:** The record ID for the starting entity of the search path
- **record_id_3:** The record ID for the ending entity of the search path
- **max_degree:** The number of relationship degrees to search
- **excluded_entities_as_json:** Entities that should be avoided on the path (JSON document)
- **flags:** Operational flags

## `FindPathExcludingByEntityID()`

In [None]:
# Define search variables.

max_degree = 4
excluded_entities = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }]}
excluded_entities_as_json = json.dumps(excluded_entities)

# Find the path by entity ID.

response_bytearray = bytearray([])
g2_engine.findPathExcludingByEntityID(entity_id_2, entity_id_3,
                                      max_degree, excluded_entities_as_json,
                                      flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

print("Return Code: {0}".format(return_code))
RenderJSON(response_dictionary)

## `FindPathExcludingByRecordID()`

In [None]:
# Define search variables.

excluded_records = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }]}
excluded_records_as_json = json.dumps(excluded_records)

# Find the path by record ID.

response_bytearray = bytearray([])
g2_engine.findPathExcludingByRecordID(datasource_code_2, record_id_2,
                                      datasource_code_3, record_id_3,
                                      max_degree, excluded_records_as_json,
                                      flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

## Finding Paths with Required Sources
The `FindPathIncludingSourceByEntityID()` and `FindPathIncludingSourceByRecordID()` functions can be used to find single relationship paths between two entities. In addition, one of the enties along the path must include a specified data source.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. The required data source or sources are specified by a json document list.

Specific entities may also be excluded, using the same methodology as the `FindPathExcludingByEntityID()` and `FindPathExcludingByRecordID()` functions use.

These functions have the following parameters:

- **entity_id_2:** The entity ID for the starting entity of the search path
- **entity_id_3:** The entity ID for the ending entity of the search path
- **datasource_code_2:** The data source for the starting entity of the search path
- **datasource_code_3:** The data source for the ending entity of the search path
- **record_id_2:** The record ID for the starting entity of the search path
- **record_id_3:** The record ID for the ending entity of the search path
- **max_degree:** The number of relationship degrees to search
- **excluded_entities_as_json:** Entities that should be avoided on the path (JSON document)
- **required_dsrcs_as_json:** Entities that should be avoided on the path (JSON document)
- **flags:** Operational flags

## `FindPathIncludingSourceByEntityID()`

In [None]:
# Define search variables.

max_degree = 4
excluded_entities = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }]}
excluded_entities_as_json = json.dumps(excluded_entities)
required_dsrcs = {
    "DATA_SOURCES": [
        datasource_code_1
    ]}
required_dsrcs_as_json = json.dumps(excluded_entities)

# Find the path by entity ID.

response_bytearray = bytearray([])
g2_engine.findPathIncludingSourceByEntityID(entity_id_2, entity_id_3, max_degree,
                                            excluded_entities_as_json,
                                            required_dsrcs_as_json,
                                            flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

## `FindPathIncludingSourceByRecordID()`

In [None]:
# Define search variables.

excluded_records = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }]}
excluded_records_as_json = json.dumps(excluded_records)

# Find the path by record ID.

response_bytearray = bytearray([])
g2_engine.findPathIncludingSourceByRecordID(datasource_code_2, record_id_2,
                                            datasource_code_3, record_id_3,
                                            max_degree, 
                                            excluded_records_as_json,
                                            required_dsrcs_as_json,
                                            flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

# Finding Networks

The `FindNetworkByEntityID()` and `FindNetworkByRecordID()` functions can be used to find all entities surrounding a requested set of entities. This includes the requested entities, paths between them, and relations to other nearby entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

- **entity_list_as_json:** A list of entities, specified by Entity ID (JSON document)
- **record_list_as_json:** A list of entities, specified by Record ID (JSON document)
- **max_degree:** The maximum number of degrees in paths between search entities
- **buildout_degree:** The number of degrees of relationships to show around each search entity
- **max_entities:** The maximum number of entities to return in the discovered network
They also have various arguments used to return response documents

The functions return a JSON document that identifies the path between the each set of search entities (if the path exists), and the information on the entities in question (search entities, path entities, and build-out entities.

In [None]:
# Define search variables.

entity_list = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }, {
        "ENTITY_ID": entity_id_2
    }, {
        "ENTITY_ID": entity_id_3
    }]}
entity_list_as_json = json.dumps(entity_list)
max_degree = 2
buildout_degree = 1
max_entities = 12

# Find the network by entity ID.

response_bytearray = bytearray()
g2_engine.findNetworkByEntityID(entity_list_as_json, max_degree, buildout_degree, max_entities, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

# findNetworkByRecordID()

In [None]:
# Define search variables.

record_list = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }, {
        "RECORD_ID": record_id_2,
        "DATA_SOURCE": datasource_code_2
    }, {
        "RECORD_ID": record_id_3,
        "DATA_SOURCE": datasource_code_3
    }]}
record_list_as_json = json.dumps(record_list)


# Find the network by record ID.

response_bytearray = bytearray()
g2_engine.findNetworkByRecordID(record_list_as_json, max_degree, buildout_degree, max_entities, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

# FindNetworkByEntityIDV2()
The function `FindNetworkByEntityIDV2()` is an improved version of `FindNetworkByEntityID()` that also allow you to use control flags.

In [None]:
# Define search variables.

entity_list = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }, {
        "ENTITY_ID": entity_id_2
    }, {
        "ENTITY_ID": entity_id_3
    }]}
entity_list_as_json = json.dumps(entity_list)
max_degree = 2
buildout_degree = 1
max_entities = 12

# Find the network by entity ID.

response_bytearray = bytearray()
g2_engine.findNetworkByEntityIDV2(entity_list_as_json, max_degree, 
                                  buildout_degree, max_entities,
                                  flags, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

# FindNetworkByRecordIDV2()
The function `FindNetworkByRecordIDV2()` is an improved version of `FindNetworkByRecordID()` that also allow you to use control flags.

In [None]:
# Define search variables.

record_list = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }, {
        "RECORD_ID": record_id_2,
        "DATA_SOURCE": datasource_code_2
    }, {
        "RECORD_ID": record_id_3,
        "DATA_SOURCE": datasource_code_3
    }]}
record_list_as_json = json.dumps(record_list)

# Find the network by record ID.

response_bytearray = bytearray()
g2_engine.findNetworkByRecordID(record_list_as_json, max_degree, buildout_degree, max_entities, response_bytearray)
response_dictionary = json.loads(response_bytearray)

# Print the results.

RenderJSON(response_dictionary)

## Redo Processing
Redo records are automatically created by Senzing when certain conditions occur where it believes more processing may be needed.  Some examples:
* A value becomes generic and previous decisions may need to be revisited
* Clean up after some record deletes
* Detected related entities were being changed at the same time
* A table inconsistency exists, potentially after a non-graceful shutdown
First we will need to have a total of 6 data sources so let's add 4 more

Create Record and Entity #4

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Owens",
        "NAME_FIRST": "Lily"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_4, record_id_4, data_as_json, None)

print("Return Code: {0}".format(return_code))

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_4, record_id_4, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_4 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_4))

Create Record and Entity #5

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Bauler",
        "NAME_FIRST": "August",
        "NAME_MIDDLE": "E"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_5, record_id_5, data_as_json, None)

print("Return Code: {0}".format(return_code))

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_5, record_id_5, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_5 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_5))

Create Record and Entity #6

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Barcy",
        "NAME_FIRST": "Brian",
        "NAME_MIDDLE": "H"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_6, record_id_6, data_as_json, None)

print("Return Code: {0}".format(return_code))

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_6, record_id_6, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_6 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_6))

Create Record and Entity #7

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Jack",
        "NAME_MIDDLE": "H"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

return_code = g2_engine.replaceRecord(datasource_code_7, record_id_7, data_as_json, None)

print("Return Code: {0}".format(return_code))

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getEntityByRecordID(datasource_code_7, record_id_7, response_bytearray)
response_dictionary = json.loads(response_bytearray)
entity_id_7 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
print("Return Code: {0}\nEntity ID: {1}".format(return_code, entity_id_7))

## Counting the number of redos
This returns the number of redos within the processed records that are awaiting processing.

In [None]:
return_code = g2_engine.countRedoRecords()
print("Return Code: {0}".format(return_code))

## Geting a redo record
Gets a redo record so that it can be processed

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.getRedoRecord(response_bytearray)
print("Return Code: {0}".format(return_code))

if (return_code == 0 and response_bytearray):
    g2_engine.process(response_bytearray.decode())

## Processing redo records
This processes the next redo record and returns it (If `processRedoRecord()` "response" returns 0 and "response_bytearray" is blank then there are no more redo records to process and if you do `count.RedoRecords()` again it will return 0)
Has potential to create more redo records in certian situations

In [None]:
response_bytearray = bytearray()
return_code = g2_engine.processRedoRecord(response_bytearray)
print("Return Code: {0}".format(return_code))

# Pretty-print XML.

xml_string = response_bytearray.decode()
if len(xml_string) > 0:
    import xml.dom.minidom
    xml = xml.dom.minidom.parseString(xml_string)
    xml_pretty_string = xml.toprettyxml()
    print(xml_pretty_string)

## Deleting Records
use `deleteRecord()` to remove a record from the data repository (returns "0" upon success) ; `deleteRecord()` can be called as many times as desired and from multiple threads at the same time. The `deleteRecord()` function accepts three parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The record ID, used to identify distinct records
- **load_id:** The observation load ID for the record; value can be null and will default to dataSourceCode

In [None]:
return_code = g2_engine.deleteRecord(datasource_code_1, record_id_1, load_id)
print("Return Code: {0}".format(return_code))

Attempt to get the record again. It should error and give an output similar to "Unknown record".

In [None]:
try:
    response_bytearray = bytearray()
    return_code = g2_engine.getRecord(datasource_code_1, record_id_1, response_bytearray)
    response_dictionary = json.loads(response_bytearray)
    response = json.dumps(response_dictionary, sort_keys=True, indent=4)
    print("Return Code: {0}\n{1}".format(return_code, response))
except G2Exception.G2ModuleGenericException as err:
    print("Exception: {0}".format(err))

## Purge Repository
To purge the G2 repository, use the aptly named `purgeRepository()` method. This will remove every record in your current repository.

In [None]:
g2_engine.purgeRepository()