-
Notifications
You must be signed in to change notification settings - Fork 6
Webtrends Content Filter Tutorial
This tutorial shows how to set up a pipeline which reads content from a source, filters it for specific values and copies the results into a destination (here: kafka topic).
To learn more about the configuration format, please read the associated documentation on that topic.
We assume that you have a standalone processing node up and running. If not, see the build and deployment instruction for guidance.
The pipeline set up in this use case is named webtrends-filter-to-kafka.
As this pipelines reads content from a source, forwards it to a filter and finally writes it to a sink, we need two queues for exchanging data between pipeline components:
- webtrends-content (raw data received from webtrends streaming api)
- filtered-content (filtered content)
The pipeline configures three components:
The source component establishes a connection with the Webtrends streaming api using the spqr-webtrends source.
All incoming data is handed over to the direct response operator which filters the contents for a specific value in data.cs-host using the filter operator found in spqr-json.
The matching events are finally written to Kafka using the emitter found in spqr-kafka.
To put the pipeline to life, we assume that you have a Kafka instance available somewhere and that a processing node is running in standalone mode (to keep things simple).
As the processing node comes with a set of pre-installed components, all you have to do, is POST the configuration shown below to processing node running at port 7070 (localhost): (configuration is assumed to live inside file webtrends-filter-to-kafka.json)
curl -H "Content-Type: application/json" -X POST -d @webtrends-filter-to-kafka.json http://localhost:7070/pipelines/
The response should look like the following and the Kafka topic should receive incoming data:
{
state: "OK"
msg: ""
pid: "webtrends-filter-to-kafka"
}
{
"id" : "webtrends-filter-to-kafka",
"queues" : [
{ "id" : "webtrends-content", "queueSettings" : null },
{ "id" : "filtered-content", "queueSettings" : null }
],
"components" : [ {
"id" : "webtrends-stream-reader",
"type" : "SOURCE",
"name" : "webtrendsSource",
"version" : "0.0.1",
"settings" : {
"webtrends.stream.version" : "2.1",
"webtrends.auth.audience" : "auth.webtrends.com",
"webtrends.stream.type" : "return_all",
"webtrends.schema.version" : "2.1",
"webtrends.auth.scope" : "sapi.webtrends.com",
"webtrends.stream.url" : "ws://sapi.webtrends.com/streaming",
"webtrends.auth.url" : "https://sauth.webtrends.com/v1/token",
"webtrends.client.id" : "<your_client_id>",
"webtrends.client.secret" : "<your_client_secret>",
"webtrends.stream.query" : "select data.*"
},
"fromQueue" : "",
"toQueue" : "webtrends-content"
},
{
"id" : "webtrends-content-filter",
"type" : "DIRECT_RESPONSE_OPERATOR",
"name" : "jsonContentFilter",
"version" : "0.0.1",
"settings" : {
"field.1.path": "data.cs-host",
"field.1.expression": "www.my-site.com",
"field.1.type": "STRING"
},
"fromQueue": "webtrends-content",
"toQueue": "filtered-content"
},
{
"id" : "kafka-topic-emitter",
"type" : "EMITTER",
"name" : "kafkaEmitter",
"version" : "0.0.1",
"settings" : {
"clientId" : "webtrendsToKafka",
"topic" : "webtrends",
"metadataBrokerList" : "localhost:9092",
"zookeeperConnect" : "localhost:2181",
"messageAcking" : "false",
"charset" : "UTF-8"
},
"fromQueue" : "filtered-content",
"toQueue" : ""
} ]
}
SPQR - stream processing and querying in realtime by Otto Group