Skip to content

Releases: snowplow/enrich

5.0.0

14 Jun 08:47
Compare
Choose a tag to compare

This release adds the possibility to emit failed events to a third stream, with the exact same format as enriched events (TSV). For each error that happened, a failure entity gets added to derived_contexts field.

CHANGELOG

  • Add possibilty to emit failed events in TSV format into a third stream (#872)

4.2.1

04 Jun 12:27
Compare
Choose a tag to compare
  • Replace mysql-connector with mariadb-client
  • Upgrade fs2-kafka to 3.5.1
  • Update schema for com.mandrill/message_opened/jsonschema/1-0-3 (#893)
  • Bump sbt-snowplow-release to 0.3.2 (#892)
  • Don't pass field value to ValidatorReport if validation fails (#892)

Full Changelog: 4.2.0...4.2.1

4.2.0

08 Apr 10:31
Compare
Choose a tag to compare

This release brings a few changes in some bad rows emitted by Enrich.

What's new

Switch from EnrichmentFailures to SchemaViolations for some errors

The following errors were previously emitted as EnrichmentFailures bad rows and will now get emitted as SchemaViolations (against atomic schema):

  • When the context added by an enrichment is invalid.
  • When something goes wrong when the input fields of the HTTP request are mapped to the fields of the enriched event (e.g. when tr_tt is converted from string to number and mapped to tr_total).
  • When an atomic field is longer than the limit.

More errors wrapped inside a same bad row

Before 4.2.0, if there was any error in the mapping of the atomic fields, a bad row would get emitted right away and we would not try to validate the entities and unstructured event. All these errors are now wrapped inside a same SchemaViolations bad row.

Likewise, before 4.2.0, when an enrichment context was invalid, we were emitting a bad row right away and we were not checking the lengths of the atomic fields. Now all these errors are wrapped inside a same SchemaViolations bad row.

So 4.2.0 is more exhaustive in the errors that get wrapped inside a bad row.

Upgrading to 4.2.0

When upgrading from 4.0.0 or 4.1.0, there is only need to bump the version.

Check out Enrich documentation for the full guide on running and configuring the app.

CHANGELOG

  • Switch bad row type from EnrichmentFailures to SchemaViolations for some errors (#883)
  • Bump libs (#888)

4.1.0

22 Feb 14:56
Compare
Choose a tag to compare

What's new

Cross Navigation Enrichment

In this version, we introduce new enrichment: Cross Navigation Enrichment. This enrichment will be able to parse the extended cross navigation format in _sp querystring parameter and attach the cross_navigation context to an event.

The _sp parameter can be attached by our Web (see cross-domain tracking) and mobile trackers and contains user, session and app identifiers (e.g., domain user and session IDs, business user ID, source app ID). The information to include in the parameters is configurable in the trackers. This is useful for tracking the movement of users across different apps and platforms.

The extended cross navigation format can be described by _sp={domainUserId}.{timestamp}.{sessionId}.{subjectUserId}.{sourceId}.{platform}.{reason}

More information about this enrichment can be found in here.

Multiple JS enrichments support

Starting with this version, it is possible to have multiple JS enrichments. This allows to implement new enrichments in JavaScript and easily add them to Enrich. Currently, the order in which they would run is not defined.

Passing an object of parameters to the JS enrichment

As mentioned above, we added support for multiple JS enrichments. This simplifies implementing custom enrichments in JavaScript and adding them to Enrich. However, most enrichments take parameters. To avoid having to change the JavaScript code (and re-encode it in base64) every time there is a parameter change, we've added capability to pass these parameters in the enrichment configuration.

You can pass these parameters in the enrichment configuration, for example:

{
    "schema": "iglu:com.snowplowanalytics.snowplow/javascript_script_config/jsonschema/1-0-1",
    "data": {
        "vendor": "com.snowplowanalytics.snowplow",
        "name": "javascript_script_config",
        "enabled": true,
        "parameters": {
            "script": "script",
            "config": {
                "foo": 3,
                "nested": {
                    "bar": "test"
                }
            }
        }
    }
}

The parameter object can be accessed in JavaScript enrichment code via the second parameter of the process function, for example:

function process(event, params) {
  event.setApp_id(params.nested.bar);
  return [];
}

Authentication with Azure Event Hubs using OAuth2 in Enrich Kafka

The new version of Enrich Kafka allows to authenticate with Azure Event Hubs using OAuth2. If you would like to use this authentication method with Azure Event Hubs, you don't have to pass anything extra in the config. It is enough to remove security.protocol, sasl.mechanism and sasl.jaas.config properties from consumerConf and producerConf sections. Application sets the necessary properties with the required values by default.

Stopping publishing jar files

Starting with 4.1.0, we no longer publish jar files of the applications. If you are still using jar files to run the application, we recommend to switch to Docker. You can find the running instructions for Docker in the docs page of the respective component.

Changelog

  • enrich-kafka: authenticate with Event Hubs using OAuth2 (#863)
  • Add Cross Navigation Enrichment (#855)
  • Allow multiple javascript enrichments (#868)
  • Add the message delayed event to Mandrill adapter and update schema versions (#815)
  • Stop publishing fat jars (#862)
  • Allow passing an object of parameters to the JS enrichment (#871)
  • Make cookie extractor enrichment case insensitive (#877)
  • Add tracking scenario ID in observed_event if defined (#807)
  • Rename tracking_scenario to event_specification (#879)
  • Bump nimbus-jose-jwt to 9.37.2 (#880)
  • Bump postgres driver to 42.7.2 (#880)

4.0.1

12 Feb 08:48
Compare
Choose a tag to compare

Patch release bringing fix for region provider chain in S3 client.

See more in details in #866

4.0.0

29 Jan 15:17
Compare
Choose a tag to compare

What's new

Atomic fields lengths configurable

Several atomic fields, such as mkt_clickid have length limits defined (in this case, 128 characters). Recent versions of Enrich enforce these limits, so that oversized data does not break loading into the warehouse columns. However, over time we’ve observed that valid data does not always fit these limits. For example, TikTok click ids can be up to 500 (or 1000, according to some sources) characters long.

In this release, we are adding a way to configure the limits, and we are increasing the default limits for several fields:

  • mkt_clickid limit increased from 128 to 1000
  • page_url limit increased from 4096 to 10000
  • page_referrer limit increased from 4096 to 10000

Depending on your configuration, this might be a breaking change:

  • If you have featureFlags.acceptInvalid set to true in Enrich, then you probably don’t need to worry, because you had no validation in the first place (although we do recommend to enable it).
  • If you have featureFlags.acceptInvalid set to false (default), then previously invalid events might become valid (which is a good thing), and you need to prepare your warehouse for this eventuality:
    • For Redshift, you should resize the respective columns, e.g. to VARCHAR(1000) for mkt_clickid. If you don’t, Redshift will truncate the values.
    • For Snowflake and Databricks, we recommend removing the VARCHAR limit altogether. Otherwise, loading might break with longer values. Alternatively, you can alter the Enrich configuration to revert the changes in the defaults.
    • For BigQuery, no steps are necessary.

Below is an example of how to configure these limits:

{
  ...
  # Optional. Configuration section for various validation-oriented settings.
  "validation": {

    # Optional. Configuration for custom maximum atomic fields (strings) length.
    # Map-like structure with keys being field names and values being their max allowed length
    "atomicFieldsLimits": {
        "app_id": 5
        "mkt_clickid": 100000
        # ...and any other 'atomic' field with custom limit
    }
  }
}

Azure Blob Storage support

enrich-kafka can now download enrichments' assets (e.g. MaxMind database) from Azure Blob Storage.
See the configuration reference for the setup.

New license

Following our recent licensing announcement, Enrich is now released under the Snowplow Limited Use License Agreement.

stream-enrich assets and enrich-rabbitmq deprecated

As announced a while ago, stream-enrich assets and enrich-rabbitmq are now deprecated.
Only one asset now exists for each type of message queue.
Setup guide for each can be found on this page.

Upgrading to 4.0.0

Migration guide can be found on this page.

Changelog

  • Bump aws-msk-iam-aut to 2.0.3 (#857)
  • Scan enrich-kafka and enrich-nsq Docker images (#857)
  • Remove lacework workflow (#859)
  • Use SLF4J for Cats Effect starvation warning message (#858)
  • Bump jackson to 2.16.1 (#857)
  • Bump azure-identity to 1.11.1 (#857)
  • Bump http4s to 0.23.25 (#857)
  • Bump fs2-blobstore to 0.9.12 (#857)
  • Bump AWS SDK v2 to 2.23.9 (#857)
  • Bump AWS SDK to 1.12.643 (#857)
  • Bump mysql-connector-j to 8.3.0 (#857)
  • Make atomic field limits configurable (#850)
  • Switch from Blaze client to Ember client (#853)
  • Upgrade to Cats Effect 3 ecosystem (#837)
  • Add headset to the list of valid platform codes (#851)
  • Add mandatory SLULA license acceptance flag (#848)
  • Move to Snowplow Limited Use License (#846)
  • Add different types of authentication for azure blob storage (#845)
  • Remove config logging (#843)
  • enrich-kafka: support for multiple Azure blob storage account (#842)
  • enrich-kafka: add blob storage support (#831)
  • Deprecate enrich-rabbitmq (#822)
  • Deprecate Stream Enrich (#788)

3.9.0

15 Nov 23:01
Compare
Choose a tag to compare

This release bumps dependencies for potential security vulnerabilities. Also, it sets user-agent header in Pubsub publisher and consumer.

Changelog

  • enrich-pubsub: set UserAgent header in Pubsub publisher and consumer (#826)
  • Bump sbt-snowplow-release to 0.3.1 (#827)
  • Bump netty to 4.1.100.Final (#827)

3.8.2

07 Sep 14:42
Compare
Choose a tag to compare

Patch release that updates the default schemas for Sendgrid events.

CHANGELOG

  • Update default schema versions for the Sendgrid adapter to 3-0-0 (#797)
  • Change IP lookup enrichment example config (#810)
  • Change PII enrichment config example (#811)

3.8.1

09 Aug 09:27
Compare
Choose a tag to compare

It is now possible to ignore API and SQL enrichments errors thanks to a new parameter: ignoreOnError (SQL and API). When set to true, no bad row will be emitted if the enrichment fails and the enriched event will be emitted without the context added by the enrichment.

S3 and GCS dependencies were added to enrich-nsq asset so that it can be used in Mini, following our plan to deprecate Stream Enrich assets.

CHANGELOG

  • Github Actions: split testing and releasing (#806)
  • Bump AWS SDK to 1.12.506 (#805)
  • Bump snakeyaml to 1.33 (#804)
  • Bump jackson to 2.15.2 (#803)
  • Bump uap-java to 1.5.4 (#802)
  • Bump log4j to 2.20.0 (#801)
  • Remove bench module (#799)
  • enrich-nsq: add S3 and GCS dependencies (#793)
  • Add eventVolume and platform to observed_event (#795)
  • Makes schemas configurable in adapters (#791)
  • Update Iglu Scala Client to 1.5.0 (#794)
  • common: ignore API/SQL enrichments when failing (#760)

3.8.0

03 May 13:14
Compare
Choose a tag to compare

This version comes with a new Enrich app, enrich-nsq. Also, it has following improvements:

  • Superseding schemas
  • Improvements in API/SQL enrichments
  • Making derived contexts accesible to the JavaScript enrichment

Superseding schemas

Schemas define the structure of the data that you collect. Each schema defines what fields are recorded with each event that is captured, and provides validation criteria for each field. Schemas are also used to describe the structure of entities that are attached to events.

However, there are some cases where we want to replace schema versions in incoming events with another version due to some problem in the tracking code. The new superseding schemas feature makes this possible.

So, how does this work exactly? If we want a schema to be replaced by another one, we state this with $supersededBy field of the schema. Later, when an event with superseded schema arrived, superseded schema version will be replaced by the specified superseding schema version.

Improvements in API/SQL enrichments

The API enrichment lets you perform dimension widening on a Snowplow event via your own or third-party proprietary http(s) API. The SQL enrichment is the relational database counterpart of the API Enrichment. It allows you to use relational database to perform dimension widening.

Enrich caches the results of API requests and SQL queries with the API/SQL enrichments to avoid continuous calls. We've made some improvements in caching the errors. These improvements are:

  • Set TTL for errors to the tenth of TTL for successful results. With this way, API/SQL requests can be retried faster in case of cached error.
  • When we get an error, the error will be cached but we will return last known 'old' good value for further processing. This fallback would allow Enrich to produce fewer bad rows in case of 'getting stuck' with errors in the enrichment cache.

More details about the caching improvements can be found here.

Also, we've made some changes to the way we handle database connections with SQL enrichment. These changes should lead to acquiring database connections in a better way and better usage of existing database connections.

enrich-nsq, new member of 2nd generation enrich apps

In this release, enrich-nsq becomes the newest member of 2nd generation Enrich apps. It allows to read from and write to NSQ topics.

Instructions to setup and configure Enrich can be found on our docs website.

Making derived contexts accesible to the JavaScript enrichment

Previously, the JavaScript enrichment allowed users to call event.getDerived_contexts(), however, it was returning always null. Starting with Enrich 3.8.0, it will be possible to access derived contexts in the JavaScript enrichment.

Changelog

  • Take superseding schema into account during validation (#751)
  • common: Provide derived contexts to JS enrichment (#769)
  • Scan Docker images in Snyk Github action (#772)
  • common: do not validate enrichment names (#767)
  • common: SQL enrichment: get connection only if request not cached (#765)
  • common: SQL enrichment: put getConnection in Blocker (#763)
  • common-fs2: fix env var substitution for JSON files (#753)
  • Add enrich-nsq (#740)
  • fix: add mskAuth to kafka depedencies (#746)
  • common: improve caching in API/SQL enrichments (#747)