# Tutorial 2

# Work in progress

In the previous tutorial, a record was added using the default data source "TEST".  
A Senzing "data source" is an identifier that distinguishes where a record came 
from.  Using the default is fine for a quick tutorial.  However, often there needs 
to be a Senzing data source defined for each originating database or source of data.  
These originating sources of data could be databases from different companies for example.  
Senzing data sources need only be created once.

In this tutorial, we'll look at two ways to add a Senzing data source.

1. Programmatically using Python.
1. Using a command line utility.

More information: 

* [G2Engine Reference](senzing-G2Engine-reference.ipynb) __FIXME__

## Prepare Environment

To start we need to have a few python modules available to use.  The only unusual 
import here is the `IPython.display` import.  This is used to format JSON output 
in Jupyter Labs notebooks.

In [1]:
import os
import sys
import json

from IPython.display import JSON

### Initialize variables

This next section creates and populates some boilerplate variables and a default 
configuration for the Senzing engine.  In a later tutorial, we'll take a deeper 
dive into the Senzing configuration.  

In [2]:
%run init-config.ipynb

Stored 'senzing_config_json' (str)
Default config already set
Stored 'config_id_bytearray' (bytearray)


In [3]:
%store -r senzing_config_json

In [4]:
JSON(json.loads(senzing_config_json))

<IPython.core.display.JSON object>

## G2Config

G2Config is one of the base APIs for talking to Senzing.  Similar to [Tutorial-1](tutorial-1.ipynb) we import G2Config and a Senzing specific exception.  

For now, we'll handle all exceptions as the base `G2Exception` and simply log 
any we might get raised.

In [5]:
from senzing import G2Config, G2Exception

### Initialize G2Config

Details at [G2Config Initialization](senzing-G2Config-reference.ipynb#G2Config-Initialization).

In [None]:
g2_config = G2Config()
try:
    g2_config.init(module_name, senzing_config_json, verbose_logging)

except G2Exception as err:
    print(err)

### Create configuration handle

Details at [G2Config.create](senzing-G2Config-reference.ipynb#create). __FIXME__

In [6]:
try:
    config_handle = g2_config.create()
    
except G2Exception as err:
    print(err)

senzing.G2Exception.G2MalformedJsonException: 30121E|JSON Parsing Failure [code=1,offset=0]


### List DataSources

Call G2Config's `listDataSources()` method and pretty-print results.

Details at [G2Config.listDataSources](senzing-G2Config-reference.ipynb#listDataSources). __FIXME__

In [7]:
response_bytearray = bytearray()
try:
    g2_config.listDataSources(config_handle, response_bytearray)

except G2Exception as err:
    print(err)
    
JSON(json.loads(response_bytearray))

NameError: name 'config_handle' is not defined

### Add DataSource

Call G2Config's `addDataSource()` method and pretty-print results.

Details at [G2Config.addDataSource](senzing-G2Config-reference.ipynb#addDataSource). __FIXME__

In [None]:
datasource = {
        "DSRC_CODE": "CUSTOMER"
    }
datasource_code = json.dumps(datasource)

response_bytearray = bytearray()
try:
    g2_config.addDataSource(config_handle, datasource_code, response_bytearray)

except G2Exception as err:
    print(err)

### List DataSources again

Call G2Config's `listDataSources()` method and pretty-print results.
Notice that the list now contains the newly added datasource_code of "CUSTOMER".

Details at [G2Config.listDataSources](senzing-G2Config-reference.ipynb#listDataSources).

In [None]:
response_bytearray = bytearray()
try:
    g2_config.listDataSources(config_handle, response_bytearray)

except G2Exception as err:
    print(err)
    
JSON(json.loads(response_bytearray))

### Close configuration handle

Details at [G2Config.close](senzing-G2Config-reference.ipynb#close).

In [None]:
g2_config.close(config_handle)

## Add a data source using the command line tools.

This part of the tutorial will take place outside of Jupyter Labs and on the command line.  

### Join the docker container

To join the running docker container execute the follow in a terminal.

```console
docker exec -it jupyter /bin/bash
```

This assumes the container was named `jupyter` when started.  Once the command 
runs the bash command prompt from the container should be available and look 
similar to:

```console
(base) jovyan@800dbaffa890:~$ 
```

The exact numbers and letters between the `@` and `:` will be different, but otherwise 
it should look very similar.

### Running the tools

Senzing comes with several command line tools.  The tool of interest to add a 
data source is called `G2Config.py`.  Run it:

```console
(base) jovyan@800dbaffa890:~$ G2Command.py

Initializing Senzing engines...

Welcome to G2Config Tool. Type help or ? to list commands.

(g2cfg)
```

When the `g2cfg` prompt is available type `help` or `?` to see a list of available 
commands.  Start try using the `listDataSources` command.

```consold
(g2cfg) listDataSources

{"id": 1, "dataSource": "TEST"}
{"id": 2, "dataSource": "SEARCH"}

(g2cfg)
```

Tip: tab completion is enabled on both the command line and the `g2cfg` prompt.

To add a data we use the `addDataSource`:

```console
(g2cfg) addDataSource

Missing argument(s), syntax:

	addDataSource {"dataSource": "<dataSource_name>"}
```

Notice that the tools gives us helpful hints if we don't provide the correct arguments.  
Provide the correct arguments and then use the `save` command to save the configuration 
changes to the database.

```console
(g2cfg) addDataSource {"dataSource": "CUSTOMERS"}

Successfully added!

(g2cfg) save

WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!

Are you certain you wish to proceed and save changes? (y/n)  y

...lots of omitted output...

(g2cfg) 
```

Again list the data sources:

```console
(g2cfg) listDataSources

{"id": 1, "dataSource": "TEST"}
{"id": 2, "dataSource": "SEARCH"}
{"id": 1000, "dataSource": "CUSTOMERS"}

(g2cfg)
```

The specified data source has been added to the database.  
Now exit the tool with the `exit` command.  If you've forgotten to save your 
configurations changes, exit will prompt you and you can save at that time.

```console
(g2cfg) exit

There are unsaved changes, would you like to save first? (y/n)
```




In [None]:
!pwd