Open Source ETL designed for and dedicated to Log processing and transformation
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
compiler [maven-release-plugin] prepare for next development iteration Aug 28, 2018
core refactor ObjectMapper usage to use always the same options across all… Aug 31, 2018
enduser-doc [maven-release-plugin] prepare for next development iteration Aug 28, 2018
error-importer Various spelling and grammar changes Aug 30, 2018
generator [maven-release-plugin] prepare for next development iteration Aug 28, 2018
kafka-unit [maven-release-plugin] prepare for next development iteration Aug 28, 2018
local locally we should avoid auto create of topic to get quicker feedback … Aug 7, 2018
metric-api avoid clash on topic name for description based repository Aug 30, 2018
metric-impl refactor ObjectMapper usage to use always the same options across all… Aug 31, 2018
metric-importer slit metric into api/impl module Aug 30, 2018
process-importer-api avoid clash on topic name for description based repository Aug 30, 2018
process-importer-impl cleanup Aug 28, 2018
process-importer-simulator Various spelling and grammar changes Aug 30, 2018
process-importer Various spelling and grammar changes Aug 30, 2018
referential-importer-impl [maven-release-plugin] prepare for next development iteration Aug 28, 2018
referential-importer Various spelling and grammar changes Aug 30, 2018
retry-importer Various spelling and grammar changes Aug 30, 2018
rule-executor avoid clash on topic name for description based repository Aug 30, 2018
services [maven-release-plugin] prepare for next development iteration Aug 28, 2018
skaetl-backend worker status saved after changing it's state Dec 12, 2018
skaetl-frontend move description based api in same controller Aug 30, 2018
standalone-importer slit metric into api/impl module Aug 30, 2018
.gitignore Initial import May 11, 2018
.travis.yml add slack notifications Jun 7, 2018
Jenkinsfile Initial import May 11, 2018
LICENSE Initial commit May 11, 2018
README.md Update README.md Aug 31, 2018
buildDockerAndPush.sh oups(again) Jun 1, 2018
pom.xml slit metric into api/impl module Aug 30, 2018

README.md

SkaETL

SkaLogs ETL is a unique real time ETL designed for and dedicated to Logs and Events.

Build Status

Core features :

  • Centralized Logstash Configuration
  • Log Parsing Simulations based on extensive list of common pre-set patterns
  • Consumer Processes: Ingestion Pipeline handling through guided workflow
    • Ingestion (from specific Kafka topic)
    • Parsing: ability to handle multiple input formats:
      • CEF (HP Arcsight/MicroFocus),
      • Nitro (MacAfee),
      • GROK,
      • CSV,
      • json as string
    • Parsing Simulations (ability to simulate multiple preset grok patterns on a json log)
    • Transformation: add csv lookup, add field, add geolocalization, capitalize, delete field, format boolean, format date, format double, format email, format geopoint, format ip, format long, hash, lookup external, lookup list, lower case, rename field, swap case, trim, uncapitalize, upper case.
    • Metrics
      • functions: count, count-distinct, sum, avg, min, max, stddev, mean,
      • window types: tumbling, hopping, session,
      • time units: seconds, minutes, hours, days,
      • join types: none, inner, outer, left.
    • Notifications
      • email
      • Slack
  • Build data referential on the fly based on events processed by SkaETL
  • Build metrics on the fly (standard statistical & count functions): before storing in ES (avoids computations in ES, reduces ressources dedicated to ES cluster)
    • Create new mathematical functions to extend standard statistical metrics
  • Create threshold and notifications
  • Preview live data (before storing and indexing in ES)
    • At ingestion in Kafka
    • After Parsing
    • After Transforming
  • Output: ES, Kafka
  • Notifications: email, Slack

SkaETL parses and enhances data from Kafka topics to any output :

  • Kafka (enhanced topics)
  • Elasticsearch
  • Notfications : email, Slack
  • more to come...

Detailed features :

  • Real Time: real-time streaming (Kafka, transformation, analysis, standardization, calculations and visualization of all ingested and processed data
  • Guided Workflows:
    • "consumer processes" (data ingestion pipelines) to guide you through transformation, normalization, analysis - avoiding the tedious task of transforming different types of Logs via Logstash
    • Optional metrics computations via simple functions or complex customized functions via SkaLogs Language
    • Optional alerts and notifications
    • Referentials creation for further reuse
  • Logstash Configuration Generator: on the fly Logstash configuration generator
  • Parsing: grok, nitro, cef, with simulation tool
  • Error Retry Mechanism: automated mechanism for re-processing data ingestion errors
  • Referentials: create referentials for further reuse
  • CMDB: create IT inventory referential
  • Computations (Metrics): precompute all your metrics before storing your results in ES (reduces the use of ES resources and the # ES licenses),
  • SkaLogs Language: Complex queries, complex computations, event correlations (SIEM) and calculations, with an easy-to-use SQL-like language
  • Monitoring - Alerts: Real-time monitoring, alerts and notifications based on events and thresholds
  • Visualization: dashboard to monitor in real-time all your ingestion processes, metrics, referentials, kafka live stream
  • Output: Kafka, ES, email, Slack, more to come...

Requirements

  • Java >= 1.8
  • Kafka 1.0.0

Building the Source

SkaETL is built using Apache Maven.

Build the full project and run tests:

$ mvn clean install

Build without tests:

$ mvn clean install -DskipTests

License

SkaETL is released under Apache License 2.0.