Skip to content
ODPi Egeria IGC Connector
Java
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
datastage-adapter
dco_signoffs Rename directory May 2, 2019
distribution
docs
igc-adapter
igc-clientlibrary
samples
.gitignore Initial docs Apr 24, 2019
LICENSE
README.md Documents usage of DataStage connector and adds samples for same Aug 12, 2019
info.yaml Fixes links after repository name change Aug 12, 2019
pom.xml

README.md

IBM InfoSphere Information Server Connectors

IBM InfoSphere Information Server is a commercially-available data integration, quality and governance suite from IBM. It is comprised of multiple modules, and this repository contains Egeria connectors for some of those modules:

  • IBM InfoSphere Information Governance Catalog is the metadata repository module within the suite, commonly referred to simply as "IGC". While the most recent versions of the software provide their own connectivity to OMRS cohorts, an example implementation of such connectivity is also provided here both for reference purposes and also to provide an integration point to older versions of the software (from v11.5.0.1 onwards).

    Note that currently the implemented connector is read-only: it only implements those methods necessary to search, retrieve, and communicate metadata from IGC out into the cohort -- it does not currently implement the ability to update IGC based on events received from other members of the cohort.

    Furthermore, only a subset of the overall Open Metadata Types are currently implemented.

  • IBM InfoSphere DataStage is a high-performance ETL module within the suite, and is pre-integrated to IGC. The connector implemented for this module is a Data Engine Proxy, translating the creation and update of DataStage ETL routines (jobs and sequences) into the appropriate Egeria components to represent and participate in end-to-end data lineage.

How it works

The IBM IGC Repository Connector works through a combination of the following:

  • IBM IGC's REST API, itself abstracted through the IGC REST Client Library
  • IBM InfoSphere Information Server's embedded Apache Kafka event bus
    • specifically the InfosphereEvents topic (hence the need to enable events in the setup)
  • Some IGC extensions that implement specific additional functionality
  • Egeria's Metadata Repository Proxy Services

The IBM DataStage Data Engine Proxy Connector works through a combination of the following:

Getting started

Enable IGC's events

To start using the connector, you will need an IGC environment, running either version 11.5 or 11.7 of the software. (The connector will automatically detect which version as part of its initialization.) You will need to first enable event notification in your IGC environment:

  1. Navigate to "Administration": "Administration"
  2. Navigate to "Event Notification" within the "Setup" heading: "Event Notification"
  3. Toggle "Enable" and then "Save and Close": "Enable" and "Save and Close"

There should not be any need to restart the environment after enabling the event notification.

Build connector and copy to OMAG Server Platform

After building the connector project (mvn clean install) the connector is available as:

distribution/target/egeria-connector-ibm-information-server-package-VERSION.jar

Simply copy this file to a location where it can be run alongside the OMAG Server Platform from the Egeria core (in the example below, the file would be copied to /lib/egeria-connector-ibm-information-server-package-VERSION.jar).

Configure security

There are multiple options to configure the security of your environment for this connector, but this must be done prior to starting up the connector itself (step below).

If you simply want to test things out, and are not concerned about security, the simplest (but most insecure) option is to set the environment variable STRICT_SSL to false using something like the following prior to starting up the OMAG Server Platform:

export STRICT_SSL=false

Note that this will disable all certificate validation for SSL connections made between Egeria and your IGC environment, so is inherently insecure.

Startup the OMAG Server Platform

You can startup the OMAG Server Platform with this connector ready-to-be-configured by running:

$ java -Dloader.path=/lib -jar server-chassis-spring-VERSION.jar

(This command will startup the OMAG Server Platform, including all libraries in the /lib directory as part of the classpath of the OMAG Server Platform.)

Configure the IGC connector

You will need to configure the OMAG Server Platform as follows (order is important) to make use of the IGC connector. For example payloads and endpoints, see the Postman samples.

  1. Configure your event bus for Egeria, by POSTing a payload like the following:

    {
        "producer": {
            "bootstrap.servers":"localhost:9092"
        },
        "consumer": {
            "bootstrap.servers":"localhost:9092"
        }
    }

    to:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{server}}/event-bus?connectorProvider=org.odpi.openmetadata.adapters.eventbus.topic.kafka.KafkaOpenMetadataTopicProvider&topicURLRoot=OMRSTopic
    
  2. Configure the cohort, by POSTing something like the following:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{server}}/cohorts/cocoCohort
    
  3. Configure the IGC connector, by POSTing a payload like the following:

    {
        "ibm.igc.services.host": "{{igc_host}}",
        "ibm.igc.services.port": "{{igc_port}}",
        "ibm.igc.username": "{{igc_user}}",
        "ibm.igc.password": "{{igc_password}}"
    }

    to:

    {{baseURL}}/open-metadata/admin-services/users/{{user}}/servers/{{server}}/local-repository/mode/repository-proxy/details?connectorProvider=org.odpi.egeria.connectors.ibm.igc.repositoryconnector.IGCOMRSRepositoryConnectorProvider
    

    The payload should include the hostname and port of your IGC environment's domain (services) tier, and a username and password through which the REST API can be accessed.

    Note that you also need to provide the connectorProvider parameter, set to the name of the IGC connectorProvider class (value as given above).

  4. Configure the event mapper for IGC, by POSTing something like the following:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{server}}/local-repository/event-mapper-details?connectorProvider=org.odpi.egeria.connectors.ibm.igc.eventmapper.IGCOMRSRepositoryEventMapperProvider&eventSource=my.igc.services.host.com:59092
    

    The hostname provided at the end should be the host on which your IGC-embedded kafka bus is running, and include the appropriate port number for connecting to that bus. (For v11.5 this is your domain (services) tier and port 59092, whereas in the latest versions of 11.7 it may be running on your Unified Governance / Enterprise Search tier, on port 9092.)

  5. The connector and event mapper should now be configured, and you should now be able to start the instance by POSTing something like the following:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{server}}/instance
    

After following these instructions, your IGC instance will be participating in the Egeria cohort. For those objects supported by the connector, any new instances or updates to existing instances should result in that metadata automatically being communicated out to the rest of the cohort.

Configure the DataStage connector

You will need to configure the OMAG Server Platform as follows (order is important) to make use of the DataStage connector. For example payloads and endpoints, see the Postman samples.

  1. Configure a local Egeria metadata repository for the access services, by POSTing something like the following (to use the graph repository):

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{omas_server}}/local-repository/mode/local-graph-repository
    
  2. Configure your event bus for the access services, by POSTing a payload like the following:

    {
        "producer": {
            "bootstrap.servers":"localhost:9092"
        },
        "consumer": {
            "bootstrap.servers":"localhost:9092"
        }
    }

    to:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{omas_server}}/event-bus?connectorProvider=org.odpi.openmetadata.adapters.eventbus.topic.kafka.KafkaOpenMetadataTopicProvider&topicURLRoot=OMRSTopic
    
  3. Enable the access services by POSTing something like the following:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{omas_server}}/access-services?serviceMode=ENABLED
    
  4. The access services should now be configured, and you should now be able to start them by POSTing something like the following:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{omas_server}}/instance
    
  5. Configure a local metadata repository for the DataStage connector, by POSTing something like the following (to use the in-memory repository):

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{ds_server}}/local-repository/mode/in-memory-repository
    
  6. Configure the DataStage connector, by POSTing a payload like the following:

    {
        "class": "DataEngineProxyConfig",
        "accessServiceRootURL": "http://localhost:8080",
        "accessServiceServerName": "omas",
        "dataEngineProxyProvider": "org.odpi.egeria.connectors.ibm.datastage.dataengineconnector.DataStageConnectorProvider",
        "pollForChanges": true,
        "pollIntervalInSeconds": 60,
        "dataEngineConfig": {
            "ibm.igc.services.host": "{{igc_host}}",
            "ibm.igc.services.port": "{{igc_port}}",
            "ibm.igc.username": "{{igc_user}}",
            "ibm.igc.password": "{{igc_password}}"
        }
    }

    to:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{ds_server}}/data-engine-proxy-service/configuration
    

    The payload should include the hostname and port of your IGC environment's domain (services) tier, and a username and password through which the REST API can be accessed.

    Note that you also need to provide the connectorProvider parameter, set to the name of the DataStage connectorProvider class (value as given above).

    Finally, note that we specify the connector should poll for changes at a particular interval. This is because changes to DataStage routines within DataStage do not trigger events into IGC's embedded Kafka topic (at least for older versions of Information Server), so we must busy-poll for changes. You can modify the interval if you want the connector to wait more or less time between each check for changes.

  7. The connector should now be configured, and you should now be able to start the instance by POSTing something like the following:

    POST http://localhost:8080/open-metadata/admin-services/users/{{user}}/servers/{{ds_server}}/instance
    

After following these instructions, your DataStage environment will be polled for any changes (including creation of new) DataStage jobs (including sequences). For those objects supported by the connector, any new instances or updates to existing instances should result in that metadata automatically being communicated to the Data Engine OMAS within the number of seconds specified by the pollIntervalInSeconds (though be aware that a large number of changes may take some time to synchronize).

Loading samples

If you have a completely empty IGC environment, you may want to load some sample metadata to further explore.

Samples are provided under egeria/open-metadata-resources/open-metadata-deployment/sample-data/.

For example, there you will find a Coco Pharmaceuticals set of samples. These samples are provided as a set of content that can be automatically loaded to IGC using Ansible, and a number of publicly-available Ansible roles. (See instructions via the link itself.)

Assuming you have first setup and configured your IGC environment as part of an Egeria cohort, loading the samples will generate numerous events out to the rest of the cohort for all of the different object types, relationships and classifications covered by the samples: GlossaryTerms, RelationalTables, SemanticAssignments, Confidentiality, and so on.


License: CC BY 4.0, Copyright Contributors to the ODPi Egeria project.

You can’t perform that action at this time.