Please read the "Dependencies" chapter below.
Probespawner uses a number of great software and it depends directly on a number of them.
Links are provided for every of them.
Some packages, namely jar files, are available here for convenience but you should upgrade them should you use this project.
All the probespawner's source is public domain - see LICENSE.md file
Below you'll find the instructions to install and use Probespawner in a *NIX environment.
[Click here to check instruction on how to install and use in Windows] (https://github.com/filipealmeida/probespawner/blob/master/INSTALL.windows.md)
- Download
java
(1.7+) - https://java.com/en/download/ - Install
java
- Download Jython - http://www.jython.org/downloads.html
- Have
java
on your path (1.7+) - Install Jython:
java -jar jython-installer-2.7.0.jar -s -d targetDirectory
- Grab probespawner from github or download it from where's available - https://github.com/filipealmeida/probespawner/
- Expand the tarball/zip you’ve downloaded.
- Enter probespawner's directory
- Have
jython
on your path - Run
./probespawner.sh <YOURCONFIG.json>
if in linux/mac orprobespawner.bat <YOURCONFIG.json>
if with windows
Probespawner is a small jython program initially designed to repeat JDBC queries periodically and write it’s output to an Elasticsearch cluster.
Now it's kind of a crossbreed of a logshipper with crontable.
It can periodically perform JDBC queries, JMX queries and operations and/or command executions, outputting it's parsed data (usually as JSON) to Elasticsearch, RabbitMQ/AMQP queues, OpenTSDB, files and/or STDOUT.
It's no substitute of a log shipper but comes in handy and packs a number of interesting examples in jython on how to achieve just that.
Tough immature and not production ready, it's kind of easy to adapt/extend and it already has been real useful for monitoring and troubleshooting systems, databases and java applications (so far).
The simple answer is "just because".
Probespawner got written initally to perform some tasks that elasticsearch-river-jdbc feeder did not address and to come around the bugs and difficulties if setting up one such river/feeder (plus rivers are apparently now deprecated).
Other work extended from there to help troubleshooting, monitoring and performance statistics on the OS and applications.
See the examples folder for some practical uses.
An effort do document some of the things done using probespawner will be made but some are:
- Collect AWR from OracleDB, DMV data from Microsoft SQL Server and performance schema data from MySQL's.
Index data on Elasticsearch. Insight through kibana.
SQLServer DMVs example here. - Collect netstat information periodically, send through RabbitMQ to Elasticsearch. D3JS to perform force directed graphs from the information with brush date/time interval selector. This animates the graph of the network conversations as you slide throught a time interval.
Example here. - Collect top information, ship through pipeline to Elasticsearch. Kibana dashboard allows for quick browse trough the processes history, correlate with machine resources, document blocking conditions and wait events.
Example here. - Collect stack traces periodically from application servers while monitoring resources of a JVM using JMXProbe. Data shipped through pipeline (RabbitMQ) made available for performance engineers, application testers, master troubleshooters and developers for the many reasons you might imagine.
Example here. - Collect vmstat info, write metrics to OpenTSDB via socket
Probespawner reads a JSON configuration file stating a list of inputs and the outputs, much like logstash.
The inputs provided are either JMX (probing a JVM), JDBC (querying a database) or execution of programs in different platforms.
Each is called a probe.
The data acquired cyclically from these input sources are sent to Elasticsearch, RabbitMQ, OpenTSDB, stdout or file.
Basically, for each input you have defined, probespawner will launch a (java) thread as illustrated in the concurrency manual of jython.
Each thread is an instance of a probe that performs:
- Periodical acquisition of records from a database, writes these to an Elasticsearch cluster (using Elasticsearch’s JAVA api).
- Periodical acquisition of JMX attributes from a JVM instance, outputs to an index of your choice on your Elasticsearch cluster and to a file on your filesystem.
- Periodical top command parse, sends data to a RabbitMQ queue
- Send metrics data from command execution to OpenTSDB
- Periodically executes any task you designed for your own probe and do whatever you want with the results, for instance, write them to STDOUT.
- Jython 2.5.3+ - http://www.jython.org/downloads.html
- Jyson 1.0.2+ - http://opensource.xhaus.com/projects/jyson
- JodaTime 2.7+ - https://github.com/JodaOrg/joda-time
- Tomcat’s 7.0.9+ (connection pool classes) - http://tomcat.apache.org/download-70.cgi
- Elasticsearch 1.5.0+ - https://www.elastic.co/downloads
- RabbitMQ 3.5.3+ - https://www.rabbitmq.com
- Mysql - http://dev.mysql.com/downloads/connector/j/
- Oracle - http://www.oracle.com/technetwork/apps-tech/jdbc-112010-090769.html
- MSSQL - http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774, http://go.microsoft.com/fwlink/?LinkId=245496
About the use of Tomcat’s connection pool, zxJDBC could’ve been used to attain the same objective. Since some code was around using it, so it stood.
CLASSPATH=jyson-1.0.2.jar:joda-time-2.7.jar
CLASSPATH=jyson-1.0.2.jar:joda-time-2.7.jar:tomcat-jdbc.jar:tomcat-juli.jar
Minimum set of jars needed to run probespawner for jdbc using "cooldbprobe" module with mysql as a source
CLASSPATH=jyson-1.0.2.jar:joda-time-2.7.jar:tomcat-jdbc.jar:tomcat-juli.jar:mysql-connector-java-5.1.20-bin.jar
JSON input configuration:
...
"probemodule": { "module": "cooldbprobe", "name" : "DatabaseProbe" },
"url": "jdbc:mysql://localhost:3306/mysql",
"driverClassName": "com.mysql.jdbc.Driver",
"username": "root",
"password": "password",
...
Minimum set of jars needed to run probespawner for jdbc using "cooldbprobe" module with OracleDB as a source
CLASSPATH=jyson-1.0.2.jar:joda-time-2.7.jar:tomcat-jdbc.jar:tomcat-juli.jar:ojdbc6.jar
JSON input configuration:
...
"probemodule": { "module": "cooldbprobe", "name" : "DatabaseProbe" },
"url": "jdbc:mysql://localhost:3306/mysql",
"driverClassName": "com.mysql.jdbc.Driver",
"username": "root",
"password": "password",
...
Minimum set of jars needed to run probespawner for jdbc using "cooldbprobe" module with Microsoft SQLServer (mssql) as a source
CLASSPATH=jyson-1.0.2.jar:joda-time-2.7.jar:tomcat-jdbc.jar:tomcat-juli.jar:sqljdbc4.jar
CLASSPATH=compress-lzf-1.0.2.jar:elasticsearch-2.1.1.jar:guava-18.0.jar:hppc-0.7.1.jar:jackson-core-2.6.2.jar:jboss-client.jar:joda-time-2.7.jar:jsr166e-1.1.0.jar:jyson-1.0.2.jar:lucene-core-5.3.1.jar:mysql-connector-java-5.1.20-bin.jar:netty-3.10.5.Final.jar:ojdbc6.jar:rabbitmq-client.jar:sqljdbc4.jar:t-digest-3.0.jar:tomcat-jdbc.jar:tomcat-juli.jar:wlclient.jar:wljmxclient.jar
JSON input configuration:
...
"probemodule": { "module": "cooldbprobe", "name" : "DatabaseProbe" },
"url": "jdbc:sqlserver://suchhost:1433;databaseName=Master",
"driverClassName": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"username": "suchuser",
"password": "suchpassword",
...
CLASSPATH=jyson-1.0.2.jar:joda-time-2.7.jar:tomcat-jdbc.jar:tomcat-juli.jar:jboss-client.jar
JSON input configuration:
...
"url":"service:jmx:remoting-jmx://localhost:8000",
"username": "admin",
"password": "password",
"alias": "hostname.project",
"queries" : [
{
"object_name" : "java.lang:type=Memory",
"attributes" : [ "NonHeapMemoryUsage", "HeapMemoryUsage" ]
}, {
"object_name" : "com.conceptwave.AVM:type=CWAVM"
}, {
"object_name" : "com.conceptwave.AVM:type=PEQueues,name=Participants Queue"
}, {
"object_name" : "java.lang:type=GarbageCollector,name=*",
"attributes" : [ "CollectionCount", "CollectionTime" ]
}
],
"description": "Obtains JMX metrics from JVM instance, stores on elasticsearch",
"probemodule": { "module": "jmxprobe", "name" : "JMXProbe" },
...
CLASSPATH=/foo/bar/opt/elasticsearch-2.1.1/lib/elasticsearch-2.1.1.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/lucene-core-5.3.1.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/guava-18.0.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/jsr166e-1.1.0.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/hppc-0.7.1.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/jackson-core-2.6.2.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/compress-lzf-1.0.2.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/t-digest-3.0.jar:/foo/bar/opt/elasticsearch-2.1.1/lib/netty-3.10.5.Final.jar
JSON output configuration:
"class": "elasticsearch",
"outputmodule": { "module": "jelh2", "name" : "Elasticsearch" },
The package contains the following files:
- dummyprobe.py - Do not be mistaken judging by the name that this is a dummy probe. It’s been a poor choice of name but this is immature code so it’s staying like that. This is the class from which the other inherit and it’s the real skeleton for probes.
- cooldbprobe.py - Same as databaseprobe.py but uses JodaTime for time parameters, timestamp is in milliseconds, datetime strings are ISO8601 format looking like “2015-11-21T15:14:15.912+05:00”.
- jelh.py - Jython’s Elasticsearch Little Helper, the module responsible for the bulk requests to elasticsearch.
- jmxprobe.py - This probe executes JMX requests in a JVM and reports it’s results to the outputs configured in the JSON setup. NOTE: It discards bad objects/attributes in case of failure but keeps working if it happens. Adjust to your needs (find out the try/except in the “tick” method)
- probespawner.ini - Logging configuration.
- probespawner.py - It’s the croupier, it reads and delivers the configuration for the probes. Lastly, it instantiates all of them and wait’s for them to finish. It’s a simple threadpool but it would be great if it dealt with interrupts and failures from it’s workers.
- probespawner.sh - Launches probespawner with it’s argument as configuration.
- probespawner.bat - Launches probespawner with it’s argument as configuration in case you’re using Microsoft Windows.
- testprobe.py - The real dummy probe, it merely states that is a test and sends a sample dictionary to the outputs configured.
- example.json - Sample JSON configuration for a JDBC, a JMX input, any input. Click here for contents
- zxdatabaseprobe.py - Same as databaseprobe.py but dispensing the use of Tomcat’s connection pool (uses zxJDBC)
- databaseprobe.py - Amazingly, this does not depend from “dummyprobe.py”, it was the initial code of a more monolitic probe that existed in the past. This probe executes any given query in a database and reports it’s results (if any) to the outputs configured in the JSON setup. It's here for legacy reasons, you should ignore this probe.
- execprobe.py - Executes “command” every cycle and reports it’s output
- linuxtopprobe.py - Executes top command on linux boxes every cycle, parses and reports it’s output in an elasticsearch friendly fashion
- netstats.py - Executes “netstat -s” command on linux boxes every cycle, parses and reports it’s output in an elasticsearch friendly fashion.
- netstatntc.py - Executes “netstat -ntce” command on linux boxes every cycle, parses and reports it’s output in an elasticsearch friendly fashion.
- rmqlh.py - Jython’s RabbitMQ Little Helper, the module responsible for pushing to RabbitMQ queues.
- opentsdblh.py - Jython’s OpenTSDB (time-series database) Little Helper, the module responsible for pushing to such backend.
- jelh2.py - Jython’s Elasticsearch 2.0 Little Helper, the module responsible for the bulk requests to elasticsearch versions 2+.
Instead of reading this section you can refer to the file example.json file in the repo/zip. There you'll find examples of most configurations.
Where omitted a description of a field it means it has no action.
The list of possible fields for inputs and outputs is shown below:
Field | Description |
---|---|
description | Small description of your input, it'll be used to name it's JAVA thread |
probemodule | Dictionary with “module” and “name” keys specifying the module and name to import as the probe for one input |
module | The jython module that contains the probe, e.g.: databaseprobe |
name | The name to import that will be used by probespawner to instantiate the thread, e.g.: DatabaseProbe |
output | List of outputs to write the acquired data, e.g.: [“elasticsearchJMXoutput”]. See “outputs” below |
interval | Interval in seconds to wait for the next iteration. The time spent in the execution of a cycle is subtracted to this value in every iteration |
maxCycles | How many cycles before exiting the thread |
storeCycleState | Stores parameters and information about cycles, like number of cycles, start and end times, etc. e.g.: "storeCycleState": { "enabled": true, "filename": "sincedb.json"} |
transform | Transformation of message to a string e.g.: “jmx.$data.attribute $cycle.laststart $data.number host=suchhost $config.maxCycles“ |
messageTemplate | A dictionary to append to be sent/added in the output message, can use $cycle, $config and $data variables |
There are two possible modules, “cooldbprobe”, “databaseprobe” and “zxdatabaseprobe”
First two use Tomcat’s JDBC connection pool.
Field | Description |
---|---|
url | Connection URL, e.g.: jdbc:mysql://mysqlhost:3306/INFORMATION_SCHEMA |
driverClassName | The driver classname, must be in the CLASSPATH, e.g.: com.mysql.jdbc.Driver |
username | Username to connect to the database |
password | Password to connect to the database |
dbProperties | Property dictionary with JDBC driver specific properties. Overrides the database properties passed into the Driver.connect(String, Properties) method (see setDbProperties method from Tomcat’s connection pool) |
minIdle | See Tomcat’s connection pool documentation (number of minimum idle connections) |
maxIdle | See Tomcat’s connection pool documentation |
maxAge | See Tomcat’s connection pool documentation |
validationQuery | See Tomcat’s connection pool documentation (validation query, everytime a handle is obtained from the pool) |
initSQL | See Tomcat’s connection pool documentation (initial SQL when creating a connection) |
sql | List of query objects |
.. statement | The statement itself |
.. parameter | Parameters for the query, see “Statement parameters” table below |
.. id | Id for your query, useful for debugging |
.. parameter | Initial setup of the parameters, see “Statement parameters” table below |
Parameters you can use for your prepared statements (?
in the statement
definition, see example.json).
Field | Description |
---|---|
start | Unix timestamp in your script environment timezone at your cycle start (see python’s time.time() function), e.g.: 1427976119.921 |
laststart | Unix timestamp of your previous cycle start |
end | Unix timestamp of the end of last cycle |
numCycles | Number of cycles started |
qstart | Unix timestamp of a given query start |
qlaststart | Unix timestamp of a the last time a given query started |
qend | Unix timestamp of a the last time a given query ended |
startdt | Same as start but a ISO8601 datetime, e.g.: “2015-04-02 12:00:00.000000”. Every parameter with suffix dt is a date in such format |
laststartdt | Same as laststart in ISO8601 |
enddt | Same as end in ISO8601 |
qstartdt | Same as qstart in ISO8601 |
qlaststartdt | Same as qlaststart in ISO8601 |
qenddt | Same as qend in ISO8601 |
cooldbprobe packs a few extras and has some differences from the above:
Field | Description |
---|---|
start | Unix timestamp in milliseconds in your script environment timezone at your cycle start (see python’s time.time() function), e.g.: 1427976119921, the getTimeMillis() from JodaTime |
laststart | Unix timestamp in milliseconds of your previous cycle start |
end | Unix timestamp in milliseconds of the end of last cycle |
numCycles | Number of cycles started |
ignoreFieldIfEmptyString | If set to true, removes empty string values from obtained rows |
qstart | Unix timestamp in milliseconds of a given query start |
qlaststart | Unix timestamp in milliseconds of a the last time a given query started |
qend | Unix timestamp in milliseconds of a the last time a given query ended |
startdt | Same as start but a ISO8601 datetime, e.g.: “2014-11-22T12:13:03.991+05:00”. Every parameter with suffix “dt” is a date in such format |
laststartdt | Same as laststart in ISO8601, e.g.: “2014-11-22T12:13:03.991+05:00” |
enddt | Same as end in ISO8601, e.g.: “2014-11-22T12:13:03.991+05:00” |
qstartdt | Same as qstart in ISO8601, e.g.: “2014-11-22T12:13:03.991+05:00” |
qlaststartdt | Same as qlaststart in ISO8601, e.g.: “2014-11-22T12:13:03.991+05:00” |
qenddt | Same as qend in ISO8601, e.g.: “2014-11-22T12:13:03.991+05:00” |
qelapsed | Elapsed time in milliseconds from start of execution until resultset traversal and insert in the outputs |
anyother | If you specify in your input queries any other field you can get it as a parameter on your query, e.g.: { “statement“: “select ?“, “parameter“: [“$cycle.myparameter“], “myparameter“: “testparameter“ } |
Field | Description |
---|---|
host | JMX host |
port | JMX port |
username | Username for JMX connection |
password | Password for JMX connection |
attributes | List of metrics to obtain from JXM, e.g.: ["java.lang:type=Memory/HeapMemoryUsage", "java.lang:type=Runtime/Uptime"] |
operations | List of operations to execute via JMX, e.g: [{ "name": "java.lang:type=Threading/dumpAllThreads", "params": [ true, true ], "signatures": [ "boolean", "boolean" ] }, { "name": "java.lang:type=Threading/findDeadlockedThreads" } ] |
arrayElementsToRecord | Set this to true to expand an array if such is returned to your request |
"queries" | array of queries in logstash fashion, see logstash configuration example |
"alias" | prefix for metric names |
"compositeDataToManyRecords" | true or false, splits return object in many; for otsdb output preferred value is true |
Field | Description |
---|---|
command | Set this to the command you want to execute every cycle |
regexp | Named groups regex, python style, to parse and name your fields |
metrics | List of group names which are metrics. A document/JSON entry per metric will be generated. Every metric value will be converted to a float. |
terms | Same as metrics but values won't be converted to floats |
decimalMark | Decimal mark separator for number parsing |
Field | Description |
---|---|
command | Set this to change the “top -Hbi -n20 -d5 -w512” command that gets executed every cycle. |
Field | Description |
---|---|
command | Set this to change the “netstat -s” command that gets executed every cycle. |
Field | Description |
---|---|
command | Set this to change the “netstat -ntc” command that gets executed every cycle. |
Field | Description |
---|---|
class | The class of your output, one of “elasticsearch”, “rabbitmq”, “file” or “stdout” |
outputmodule | Alike the input, your module and name to import e.g.: { "module": "jelh", "name" : "Elasticsearch" } or { "module": "rmqlh", "name" : "RabbitMQ" } |
codec | Transformations to the data, e.g.: json_lines (see rabbitMQ example) |
messageTemplate | A dictionary to append to be sent/added in the output message, can use $cycle, $config and $data variables |
Field | Description |
---|---|
cluster | A string with the clustername, defaults to “elasticsearch” |
outputmodule | { "module": "jelh", "name" : "Elasticsearch" } |
hosts | List of hosts:ports, e.g.: [“10.0.0.1:9300”, “10.0.0.2:9300”] If host and port are also specified, it’ll be added to this list |
host | Hostname/IP of node (defaults to “localhost”) |
port | Port for transport, defaults to 9300 |
options | Any options you want to add to the elasticsearch client configuration, e.g.: { "cluster.name": "fuckup", "client.transport.ping_timeout": "5s", "client.transport.nodes_sampler_interval": "5s", "client.transport.ignore_cluster_name": false, “client.transport.sniff": true, } Overrides cluster name (“cluster”) |
indexPrefix | Index name prefix, defaults to “sampleindex” |
indexSuffix | Defaults to “-%Y.%m.%d” but can be a JDBC fieldname. If the fieldname parses as a ISO8601 date string it’ll use it’s info and suffix with “%Y.%m.%d” |
type | Document type, e.g.: “jdbc” |
indexSettings | Elasticsearch JSON for index settings |
typeMapping | Elasticsearch type mapping, e.g.: "jdbc": { "properties" : { "@timestamp" : { "type" : "date" } } } |
index_settings | Overrides indexSettings |
time_mapping | Overrides typeMapping |
bulkActions | Number of documents to keep before flushing data |
bulkSize | ignored for the time being |
flushInterval | ignored for the time being |
concurrentRequests | ignored for the time being |
actionRetryTimeout | Number of seconds to sleep before re-executing the elasticsearch action in progress |
concurrentRequests | ignored for the time being |
Field | Description |
---|---|
outputmodule | { "module": "rmqlh", "name" : "RabbitMQ" } |
queue_name | queue to write to |
addresses | list of addresses (for failover) e.g.: ["suchhost:5672", "suchhost:5672"] |
host | your RabbitMQ host |
port | your AMQP port |
virtualhost | your known virtualhost |
username | your username |
password | your password |
uri | all of the above, overrides all, e.g.: amqp://myuser:mypassword@suchhost:5672/vhost |
networkRecoveryInterval | Sets connection recovery interval. Default is 5000. |
automaticRecoveryEnabled | if true, enables connection recovery |
topologyRecoveryEnabled | Enables or disables topology recovery |
routingKey | routing key for your data |
exchange | an exchange name if any |
declareQueue | if false, binds else, creates; true or false (defaults to false) |
passive | true or false (defaults to false) |
durable | true or false (defaults to false) |
exclusive | true or false (defaults to false) |
Field | Description |
---|---|
outputmodule | { "module": "opentsdblh", "name" : "OpenTSDB" } |
queue_name | queue to write to |
addresses | list of addresses (for failover) e.g.: ["suchhost:5672", "suchhost:5672"] |
host | your OpenTSDB host |
port | your OpenTSDB port |
metricPrefix | prefix to add to your OTSDB metrics, defaults to "probespawner" |
metric_field | field with your metric name |
value_field | field with the value for your metric |
metrics | list of fields that are metrics to be stored |
blacklist | list of blacklisted tags (keys to remove from metric push) |
tags | array of extra tags for metrics e.g.: [ "sometag=somevalue", "othertag=othervalue" ] |
Field | Description |
---|---|
codec | “json_lines”, no other available. |
Field | Description |
---|---|
codec | “json_lines”, no other available. |
filename | The filename to write the information to in the format established by the codec. |
Now you need to assemble a JSON file indicating which inputs are to be spawned, what probes are to be launched.
Hence the “input” field.
This field indicates probespawner which inputs are to be processed by the probe threads.
Field | Description |
---|---|
input | List of inputs to be launched by probespawner, e.g.: [“JMXInput”, “JDBCInput”] |
Refer to the file example.json.
There you will find an example with many combinations of the above options.
That should suffice for most everything you have in mind for recipes with probespawner.
Below a sample classpath for using probespawner and a few JDBC drivers:
export CLASSPATH=/home/suchuser/opt/apache-tomcat-7.0.59/lib/tomcat-jdbc.jar:/home/suchuser/opt/apache-tomcat-7.0.59/bin/tomcat-juli.jar:/home/suchuser/var/lib/java/jyson-1.0.2/lib/jyson-1.0.2.jar:/home/suchuser/var/lib/java/mysql-connector-java-5.1.20-bin.jar:/home/suchuser/var/lib/java/sqljdbc_4.0/enu/sqljdbc4.jar:/home/suchuser/var/lib/java/sqljdbc_4.0/enu/sqljdbc.jar:/home/suchuser/opt/rabbitmq-java-client-bin-3.5.3/rabbitmq-client.jar
jython probespawner.py --configuration=example.json
This one calls “find” in the CWD and adds all *jar files to the classpath before calling jython.
probespawner.sh example.json
The batch file lists all jars in the “jars” directory and adds them to the classpath before calling jython.
probespawner.bat example.json
Just in case you want to develop a probe for your convenience, be warned: this is immature code with no support. Having said that, take a look at the “testprobe.py” file.
This is the minimum probe, the only thing it does is writing the { ‘test’: ‘yes’ }
dictionary to the configured outputs in your JSON file.
For that to be accomplished, your class must override the “tick” method which is called at every cycle, meaning every “interval” seconds you defined in your input.
The method “initialize” is another popular overridden one, called before the probe starts cycling. It’s usually in this method you get the configuration keys from the JSON file, set up and initialize your probe.
username = self.getInputProperty("username")
This grabs the “username” content of your defined input in the JSON file and returns it’s value (or None).