Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

Webtrends Content Filter Tutorial

Christian Kreutzfeldt edited this page Apr 22, 2015 · 1 revision

This tutorial shows how to set up a pipeline which reads content from a source, filters it for specific values and copies the results into a destination (here: kafka topic).

To learn more about the configuration format, please read the associated documentation on that topic.

We assume that you have a standalone processing node up and running. If not, see the build and deployment instruction for guidance.

General settings

The pipeline set up in this use case is named webtrends-filter-to-kafka.

Queues

As this pipelines reads content from a source, forwards it to a filter and finally writes it to a sink, we need two queues for exchanging data between pipeline components:

  • webtrends-content (raw data received from webtrends streaming api)
  • filtered-content (filtered content)

Components

The pipeline configures three components:

The source component establishes a connection with the Webtrends streaming api using the spqr-webtrends source.

All incoming data is handed over to the direct response operator which filters the contents for a specific value in data.cs-host using the filter operator found in spqr-json.

The matching events are finally written to Kafka using the emitter found in spqr-kafka.

Deployment

To put the pipeline to life, we assume that you have a Kafka instance available somewhere and that a processing node is running in standalone mode (to keep things simple).

As the processing node comes with a set of pre-installed components, all you have to do, is POST the configuration shown below to processing node running at port 7070 (localhost): (configuration is assumed to live inside file webtrends-filter-to-kafka.json)

curl -H "Content-Type: application/json" -X POST -d @webtrends-filter-to-kafka.json http://localhost:7070/pipelines/

The response should look like the following and the Kafka topic should receive incoming data:

{
  state: "OK"
  msg: ""
  pid: "webtrends-filter-to-kafka"
}

Configuration

{
  "id" : "webtrends-filter-to-kafka",
  "queues" : [ 
    { "id" : "webtrends-content", "queueSettings" : null },
    { "id" : "filtered-content", "queueSettings" : null } 
  ],
  "components" : [ {
    "id" : "webtrends-stream-reader",
    "type" : "SOURCE",
    "name" : "webtrendsSource",
    "version" : "0.0.1",
    "settings" : {
      "webtrends.stream.version" : "2.1",
      "webtrends.auth.audience" : "auth.webtrends.com",
      "webtrends.stream.type" : "return_all",
      "webtrends.schema.version" : "2.1",
      "webtrends.auth.scope" : "sapi.webtrends.com",
      "webtrends.stream.url" : "ws://sapi.webtrends.com/streaming",
      "webtrends.auth.url" : "https://sauth.webtrends.com/v1/token",
      "webtrends.client.id" : "<your_client_id>",
      "webtrends.client.secret" : "<your_client_secret>",
      "webtrends.stream.query" : "select data.*"
    },
    "fromQueue" : "",
    "toQueue" : "webtrends-content"
  }, 
  
  {
    "id" : "webtrends-content-filter",
    "type" : "DIRECT_RESPONSE_OPERATOR",
    "name" : "jsonContentFilter",
    "version" : "0.0.1",
    "settings" : {
      "field.1.path": "data.cs-host",
      "field.1.expression": "www.my-site.com",
      "field.1.type": "STRING"
    },
    "fromQueue": "webtrends-content",
    "toQueue": "filtered-content"
  },
  
  {
    "id" : "kafka-topic-emitter",
    "type" : "EMITTER",
    "name" : "kafkaEmitter",
    "version" : "0.0.1",
    "settings" : {
      "clientId" : "webtrendsToKafka",
      "topic" : "webtrends",
      "metadataBrokerList" : "localhost:9092",
      "zookeeperConnect" : "localhost:2181",
      "messageAcking" : "false",
      "charset" : "UTF-8"
    },
    "fromQueue" : "filtered-content",
    "toQueue" : ""
  } ]
}
Clone this wiki locally