# G2Engine

In [1]:
import os
import sys
import json

## System path

Update system path.

In [2]:
sys.path.append('/opt/senzing/g2/python')

# G2Engine
The G2Engine API...

In [3]:
import G2Exception
from G2Engine import G2Engine

ModuleNotFoundError: No module named 'G2Exception'

## Initialize variables

Create variables used for G2Engine.

In [None]:
module_name = 'pyG2EngineForAddRecord'
senzing_directory = os.environ.get("SENZING_DIR", "/opt/senzing")
senzing_python_directory = "{0}/g2/python".format(senzing_directory)
g2module_ini_pathname = "{0}/G2Module.ini".format(senzing_python_directory)
verbose_logging = True

## Initialization

To start using Senzing G2Engine, create and initialize an instance.
This should be done once per process.
The `init()` method accepts the following parameters:

- **module_name:** A short name given to this instance of the G2 engine (i.e. your G2Module object)
- **g2module_ini_pathname:** A fully qualified path to the G2 engine INI file (often /opt/senzing/g2/python/G2Module.ini)
- **verbose_logging:** A boolean which enables diagnostic logging - this will print a massive amount of information to stdout (default = False)
- **config_id:** (optional) The identifier value for the engine configuration can be returned here.

Calling this function will return "0" upon success.

In [None]:
g2_engine = G2Engine()
result = g2_engine.init(module_name, g2module_ini_pathname, verbose_logging)
print(result)

## Prime Engine

The `primeEngine()` method may optionally be called to pre-initialize some of the heavier weight internal resources of the G2 engine.

In [None]:
response = g2_engine.primeEngine()

## addRecord()

Once the Senzing engine is initialized, use addRecord() to load a record into the Senzing repository -- addRecord() can be called as many times as desired and from multiple threads at the same time. The addRecord() function returns "0" upon success, and accepts four parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id_1:** The record ID, used to identify distinct records
- **data_string:** A JSON document with the attribute data for the record
- **load_id:** The observation load ID for the record; value can be null and will default to data_source


In [None]:
datasource_code = "TEST"
record_id_1 = "1"
load_id = None
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Smith",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)

result = g2_engine.addRecord(datasource_code, record_id_1, data_string, load_id)
print(result)

## getRecord()

Use `getRecord()` to retrieve a single record from the data repository; the record is assigned in JSON form to a user-designated buffer, and the function itself returns "0" upon success. Once the Senzing engine is initialized, `getRecord()` can be called as many times as desired and from multiple threads at the same time. The `getRecord()` function accepts the following parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id_1:** The record ID, used to identify the record for retrieval
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here

In [None]:
response_string = bytearray("", 'utf-8')
result = g2_engine.getRecord(datasource_code, record_id_1, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## getEntityByRecordID()

Use `getEntityByRecordID()` to retrieve entity data based on the record ID of a particular data record. This function accepts the following parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id_1:** The record ID for a particular data record
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here.

In [None]:
response_string = bytearray()
result = g2_engine.getEntityByRecordID(datasource_code, record_id_1, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## getEntityByEntityID()

Use `getEntityByEntityID()` to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

- **entity_id_1:** The numeric ID of a resolved entity.
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here.

In [None]:
# Because Entity Ids can change, this assumes you've run getEntityByRecordID()
# to get the latest Entity Id.

entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
response_string = bytearray()
result = g2_engine.getEntityByEntityID(entity_id_1, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## searchByAttributes()

Use `searchByAttributes()` to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

- **data_string:** A JSON document with the attribute data to search for
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here.

In [None]:
response_string = bytearray()
result = g2_engine.searchByAttributes(data_string, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

# Replace the record
Use the `replaceRecord()` function to update or replace a record in the data repository (if record doesn't exist, a new record is added to the data repository. Like the above functions, `replaceRecord()` returns "0" upon success, and it can be called as many times as desired and from multiple threads at the same time. The `replaceRecord()` function accepts four parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id_1:** The record ID, used to identify distinct records
- **data_string:** A JSON document with the attribute data for the record
- **load_id:** The observation load ID for the record; value can be null and will default to dataSourceCode

In [None]:
datasource_code = "TEST"
record_id_1 = "1"
load_id = None
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord(datasource_code, record_id_1, data_string, load_id)
print(result)

## Export JSON Entity Report

There are three steps to exporting resolved entity data from the G2Engine object in JSON format. First, use the `exportJSONEntityReport()` method to generate a long integer, referred to here as an 'exportHandle'. The `exportJSONEntityReport()` method accepts one parameter as input:

- **flags**: An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.

Second, use the fetchNext() method to read the exportHandle and export a row of JSON output containing the entity data for a single entity. Note that successive calls of fetchNext() will export successive rows of entity data. The fetchNext() method accepts the following parameters as input:

- **export_handle:** A long integer from which resolved entity data may be read and exported
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here.

In [None]:
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
export_handle = g2_engine.exportJSONEntityReport(flags)
response_string = bytearray("xyzzy", 'utf-8')  # Initial value to start loop.
while response_string:
    g2_engine.fetchNext(export_handle, response_string)
    response_dictionary = json.loads(response_string)
    response = json.dumps(response_dictionary, sort_keys=True, indent=4)
    print("Result: {0}\n{1}".format(result, response))
    response_string = bytearray([])    

## Export CSV Entity Report

There are three steps to exporting resolved entity data from the G2Engine object in CSV format. First, use the `exportCSVEntityReport()` method to generate a long integer, referred to here as an 'exportHandle'. The `exportCSVEntityReport()` method accepts one parameter as input:

- **flags:** An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.
Second, use the `fetchNext()` method to read the exportHandle and export a row of CSV output containing the entity data for a single entity. Note that the first call of `fetchNext()` may yield a header row, and that successive calls of `fetchNext()` will export successive rows of entity data. The `fetchNext()` method accepts the following parameters as input:

- **export_handle:** A long integer from which resolved entity data may be read and exported
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here

In [None]:
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
export_handle = g2_engine.exportCSVEntityReport(flags)
response_string = bytearray("xyzzy", 'utf-8')  # Initial value to start loop.
while response_string:
    g2_engine.fetchNext(export_handle, response_string)
    print("Result: {0}\n{1}".format(result, response))
    response_string = bytearray([])

## Finding Paths
The `FindPathByEntityID()` and `FindPathByRecordID()` functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

- **entity_id_2:** The entity ID for the starting entity of the search path
- **entity_id_3:** The entity ID for the ending entity of the search path
- **datasource_code_2:** The data source for the starting entity of the search path
- **record_id_2:** The record ID for the starting entity of the search path
- **datasource_code_3:** The data source for the ending entity of the search path
- **record_id_3:** The record ID for the ending entity of the search path
- **max_degree:** The number of relationship degrees to search

First you will need to create some records so that you have some that you can compare. Can you see what is the same between this record and the previous one?

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Max",
        "NAME_MIDDLE": "W"
    }],
    "SSN_NUMBER": "111-11-1111"
}

data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST", "2", data_string, None)
print(result)

data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Mildred"
    }],
    "SSN_NUMBER": "111-11-1111"
}

data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST", "3", data_string, None)
print(result)

response_string = bytearray()
result = g2_engine.getEntityByRecordID("TEST", "2", response_string)
response_dictionary = json.loads(response_string)
entity_id_2 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string = bytearray()
result = g2_engine.getEntityByRecordID("TEST", "3", response_string)
response_dictionary = json.loads(response_string)
entity_id_3 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

## `FindPathByEntityID()`

In [None]:
# Define search variables.

max_degree = 3

# Find the path by entity ID.

response = bytearray([])
g2_engine.findPathByEntityID(entity_id_2, entity_id_3, max_degree, response)

# Print the results.

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## `FindPathByRecordID()`

In [None]:
# Define search variables.

datasource_code_2 = "TEST"
record_id_2 = "2"
datasource_code_3 = "TEST"
record_id_3 = "3"
max_degree = 3

# Find the path by record ID.

response = bytearray([])
g2_engine.findPathByRecordID(datasource_code_2, record_id_2,
                             datasource_code_3, record_id_3,
                             max_degree, response)

# Print the results.

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Finding Paths with Exclusions
The `FindPathExcludingByEntityID()` and `FindPathExcludingByRecordID()` functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities. In addition, it will find paths that exclude certain entities from being on the path.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. Additionally, entities to be excluded can also be specified by either Entity ID or by Record ID.

When excluding entities, the user may choose to either (a) strictly exclude the entities, or (b) prefer to exclude the entities, but still include them if no other path is found. By default, entities will be strictly excluded. A "preferred exclude" may be done by specifying the G2_FIND_PATH_PREFER_EXCLUDE control flag.

These functions have the following parameters:

- **entity_id_2:** The entity ID for the starting entity of the search path
- **entity_id_3:** The entity ID for the ending entity of the search path
- **datasource_code_2:** The data source for the starting entity of the search path
- **record_id_2:** The record ID for the starting entity of the search path
- **datasource_code_3:** The data source for the ending entity of the search path
- **record_id_3:** The record ID for the ending entity of the search path
- **max_degree:** The number of relationship degrees to search
- **excluded_entities:** Entities that should be avoided on the path (JSON document)
- **flags:** Operational flags

## `FindPathExcludingByEntityID()`

In [None]:
# Define search variables.

max_degree = 4
excluded_entities = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_2
    }]
}
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
excluded_string = json.dumps(excluded_entities)

# Find the path by entity ID.

response = bytearray([])
g2_engine.findPathExcludingByEntityID(entity_id_2, entity_id_3, max_degree, excluded_string, flags, response)

# Print the results.

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## `FindPathExcludingByRecordID()`

In [None]:
# Define search variables.

datasource_code_2 = "TEST"
record_id_2 = "2"
datasource_code_3 = "TEST"
record_id_3 = "3"
excluded_records = {
    "RECORDS": [{
        "RECORD_ID": "1",
        "DATA_SOURCE": "TEST"
    }]
}
excluded_string = json.dumps(excluded_records)

# Find the path by record ID.

response = bytearray([])
g2_engine.findPathExcludingByRecordID(datasource_code_2, record_id_2,
                                      datasource_code_3, record_id_3,
                                      max_degree, excluded_string, flags, response)

# Print the results.

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Finding Paths with Required Sources
The `FindPathIncludingSourceByEntityID()` and `FindPathIncludingSourceByRecordID()` functions can be used to find single relationship paths between two entities. In addition, one of the enties along the path must include a specified data source.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. The required data source or sources are specified by a json document list.

Specific entities may also be excluded, using the same methodology as the `FindPathExcludingByEntityID()` and `FindPathExcludingByRecordID()` functions use.

These functions have the following parameters:

- **entity_id_2:** The entity ID for the starting entity of the search path
- **entity_id_3:** The entity ID for the ending entity of the search path
- **datasource_code_2:** The data source for the starting entity of the search path
- **record_id_2:** The record ID for the starting entity of the search path
- **datasource_code_3:** The data source for the ending entity of the search path
- **record_id_3:** The record ID for the ending entity of the search path
- **max_degree:** The number of relationship degrees to search
- **excluded_entities:** Entities that should be avoided on the path (JSON document)
- **required_datasources:** Entities that should be avoided on the path (JSON document)
- **flags:** Operational flags

## `FindPathIncludingSourceByEntityID()`

In [None]:
#  Find entity_id_1 again, as it may have changed.

response_string = bytearray()
result = g2_engine.getEntityByRecordID(datasource_code, record_id_1, response_string)

response_dictionary = json.loads(response_string)
entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

# Define search variables.

print("MJD: {0} {1} {2}".format(entity_id_1, entity_id_2, entity_id_3))

max_degree = 4
excluded_entities = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }]
}
excluded_string = json.dumps(excluded_entities)

required_datasources = {
    "DATA_SOURCES": [
        "TEST"
    ]
}
required_string = json.dumps(required_datasources)

flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS

# Find the path by entity ID.

response = bytearray([])
g2_engine.findPathIncludingSourceByEntityID(entity_id_2, entity_id_3,
                                            max_degree,
                                            excluded_string, required_string,
                                            flags, response)

# Print the results.

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## `FindPathIncludingSourceByRecordID()`

In [None]:
# Define search variables.

datasource_code_2 = "TEST"
record_id_2 = "2"
datasource_code_3 = "TEST"
record_id_3 = "3"
excluded_records = {
    "RECORDS": [{
        "RECORD_ID": "1",
        "DATA_SOURCE": "TEST"
    }]
}
excluded_string = json.dumps(excluded_entities)

# Find the path by record ID.

response = bytearray([])
g2_engine.findPathIncludingSourceByRecordID(datasource_code_2, record_id_2,
                                            datasource_code_3, record_id_3,
                                            max_degree,
                                            excluded_string, required_string,
                                            flags, response)

# Print the results.

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Redo Processing
Redo records are automatically created by Senzing when certain conditions occur where it believes more processing may be needed.  Some examples:
* A value becomes generic and previous decisions may need to be revisited
* Clean up after some record deletes
* Detected related entities were being changed at the same time
* A table inconsistency exists, potentially after a non-graceful shutdown
First we will need to have a total of 6 data sources so let's add 4 more

In [None]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Owens",
        "NAME_FIRST": "Lily"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST", "4", data_string, None)
print(result)

data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Bauler",
        "NAME_FIRST": "August",
        "NAME_MIDDLE": "E"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST", "5", data_string, None)
print(result)

data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Barcy",
        "NAME_FIRST": "Brian",
        "NAME_MIDDLE": "H"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST", "6", data_string, None)
print(result)

data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Jack",
        "NAME_MIDDLE": "H"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST", "7", data_string, None)
print(result)

response_string = bytearray()
result = g2_engine.getEntityByRecordID("TEST", "4", response_string)
response_dictionary = json.loads(response_string)
entity_id_4 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string = bytearray()
result = g2_engine.getEntityByRecordID("TEST", "5", response_string)
response_dictionary = json.loads(response_string)
entity_id_5 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string = bytearray()
result = g2_engine.getEntityByRecordID("TEST", "6", response_string)
response_dictionary = json.loads(response_string)
entity_id_6 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string = bytearray()
result = g2_engine.getEntityByRecordID("TEST", "7", response_string)
response_dictionary = json.loads(response_string)
entity_id_7 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

## Counting the number of redos
This returns the number of redos within the processed records that are awaiting processing.

In [None]:
response = g2_engine.countRedoRecords()
print(response)

## Geting a redo record
Gets a redo record so that it can be processed

In [None]:
response_string = bytearray()
response = g2_engine.getRedoRecord(response_string)
print(response)
if (response == 0 and response_string):
    g2_engine.process(response_string.decode())

## Processing redo records
This processes the next redo record and returns it (If `processRedoRecord()` "response" returns 0 and "response_string" is blank then there are no more redo records to process and if you do `count.RedoRecords()` again it will return 0)
Has potential to create more redo records in certian situations

In [None]:
response_string = bytearray()
response = g2_engine.processRedoRecord(response_string)
print(response)
print(response_string.decode())

## Deleting Records
use `deleteRecord()` to remove a record from the data repository (returns "0" upon success) ; `deleteRecord()` can be called as many times as desired and from multiple threads at the same time. The `deleteRecord()` function accepts three parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id_1:** The record ID, used to identify distinct records
- **load_id:** The observation load ID for the record; value can be null and will default to dataSourceCode

In [None]:
datasource_code = 'TEST'
record_id_1 = '1'
load_id = None
ret = g2_engine.deleteRecord(datasource_code, record_id_1, load_id)

Attempt to get the record again. It should error and give an output similar to "Unknown record".

In [None]:
response_list = []

try:
    result = g2_engine.getRecord(datasource_code, record_id_1, response_list)

    response_string = "".join(response_list)
    response_dictionary = json.loads(response_string)
    response = json.dumps(response_dictionary, sort_keys=True, indent=4)
    print("Result: {0}\n{1}".format(result, response))
except G2Exception.G2ModuleGenericException as error:
    print("Expected error: {0}".format(error))

## Purge Repository
To purge the G2 repository, use the aptly named `purgeRepository()` method. This will remove every record in your current repository.

In [None]:
g2_engine.purgeRepository()