Iglu client configuration

Oguzhan Unlu edited this page Nov 13, 2017 · 13 revisions

HOME > IGLU SETUP GUIDE > SETTING UP AN IGLU CLIENT > Iglu client configuration

All Iglu clients are configurable using a standard JSON, which is itself a self-describing JSON. You can check out the JSON Schema here:

http://iglucentral.com/schemas/com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1

Standard configuration

The standard configuration for a "vanilla" Iglu client which only uses Iglu Central looks like this:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      }
    ]
  }
}

A few notes:

  • cacheSize determines how many individual schemas we will keep cached in our Iglu client (to save additional lookups)
  • repositories is a JSON array of repositories to look up schemas in
  • name and connection should be self-evident
  • priority and vendorPrefixes help the resolver to know which repository to check first for a given schema. For details see Iglu's repository resolution algorithm

In many cases this standard configuration will be sufficient for your application. For example, you will find this Iglu configuration embedded in the Snowplow EmrEtlRunner's configuration file.

However, if you have setup your own Iglu repository, then you will have to update your Iglu client's configuration so that it knows about the new repository. Read on for two examples:

Example 1: two remote repositories

In this example we add your new remote repository into the configuration file, alongside Iglu Central. We are keeping Iglu Central in our configuration because we still want to lookup some of the public JSON Schemas available from Iglu Central.

This is a very typical configuration for someone using Iglu with Snowplow and their own data schemas.

Check out the JSON:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 1000,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Acme Iglu Repo (HTTP)",
        "priority": 5,
        "vendorPrefixes": [ "com.acme" ],
        "connection": {
          "http": {
            "uri": "http://iglu.acme.com"
          }
        }
      }
    ]
  }
}

Some notes on this:

  • We have doubled the cache size because we are expecting lots more schemas to be fetched from Acme's remote repository
  • The order the repositories are listed in does not matter
  • The vendorPrefixes and priority have been tweaked to make repository lookups as efficient as possible - check out Iglu's Schema resolution algorithm for details. Note that a lower priority number means that the repo is ranked higher in terms of priority. E.g. a repo ranked '0' will supercede a repo ranked '1'.

Example 2: just one embedded repository

In this example we have removed Iglu Central and added just one embedded repository - embedded meaning that the JSON Schema files will be co-located in the software alongside the Iglu client.

Check out the JSON:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 10,
    "repositories": [
      {
        "name": "Acme Iglu Repo (Embedded)",
        "priority": 1,
        "vendorPrefixes": [ "com.acme.bootstrap" ],
        "connection": {
          "embedded": {
            "path": "/iglu-embed-path"
          }
        }
      }
    ]
  }
}

Some notes on this:

  • We know that we will be embedding only 10 schemas in the repository, so we reduce the cacheSize to 10
  • We have removed Iglu Central entirely - we will only search for schemas in the embedded Acme Iglu Repo
  • The embedded Acme Iglu Repo stores schemas in its host application's /iglu-embed-path resource path. See setting up a JVM embedded repo for an example
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.