Collector

xinglang edited this page Feb 11, 2015 · 1 revision

Collector is the stage to ingest events, it have a restful endpoint to ingest the event with json format. The ingestion can be single json event or a list of json events.

The Collector has a pluggable Validator, if validation failure, it will return error message to client. If validation passed, it will do geo enrichment and device classfication via Esper EPL. Then flow the events to next stage sessionizer.

The geo enrichment use Maxmind geolite2 geo db, and the device classfication used http://uadetector.sourceforge.net/

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

Event Model

The collector with a sample event model for user behavior tracking, the model can be extended.

  1. The ipv4/ipv6 is required for the geo enrichment
  2. ua is required for the device classfication.
  3. si means stream id, it is mandatory
  4. ct means capture time, it was used to identify the event timestamp, it is optional. If it is missed, system current time will be used.

Raw Event tags

  • Stream Id - si
  • Tenant - tn
  • Origin - or
  • Capture time - ct
  • User Agent - us
  • IPV4 address - ipv4
  • IPV6 address - ipv6
  • Referrer - rf
  • Event type - et

Only capture time is Long, others are String.

Geo tags

  • City - _cty
  • Continent - _con
  • Country - _cn
  • Region - _rgn
  • Longitude - _lon
  • Latitude - _lat
  • Country ISO Code - _tlcn

Device tags

  • Device category - _dd_dc
  • OS Family - _dd_os
  • OS Version - _dd_osv
  • User Agent Family - _dd_bf
  • User Agent Type - _dd_d
  • User Agent version - _dd_bv

API

There is a pluggable Validator to validate the events. See Validator

Deployment

This is a jetstream app which can be run on the docker. It will expose below ports:

  1. 9999 for monitoring
  2. 8080 for rest end point
  3. 15590 for the Inbound replay message.

The rest end point path:

  1. /pulsar/ingest/PulsarRawEvent - For single event
  2. /pulsar/batchingest/PulsarRawEvent - For batch event

Both request and resposne Payload will be in json format. batch will be in a json array format.