StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
Java JavaScript HTML CSS ANTLR Shell Other
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
aerospike-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kafka_0_10-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kafka_0_11-lib SDC-9627: Upgrade Kafka dependency in 1.0 and 0.11 stage libraries Aug 2, 2018
apache-kafka_0_8_1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kafka_0_8_2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kafka_0_9-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kafka_1_0-lib SDC-9627: Upgrade Kafka dependency in 1.0 and 0.11 stage libraries Aug 2, 2018
apache-kafka_1_1-lib SDC-9489. Add stage library for Kafka 1.1.0 Jul 25, 2018
apache-kudu_1_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kudu_1_1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kudu_1_2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kudu_1_3-lib SDC-9600: Update StageLib for Kudu 1.3 and 1.7 to latest patch release Jul 24, 2018
apache-kudu_1_4-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kudu_1_5-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kudu_1_6-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
apache-kudu_1_7-lib SDC-9600: Update StageLib for Kudu 1.3 and 1.7 to latest patch release Jul 24, 2018
apache-pulsar_2-lib SDC-9711. Add support for Pulsar 2.1.0 Aug 6, 2018
apache-solr_6_1_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
aws-lib SDC-9779. S3 origin incorrectly validate saved offset for Excel data … Aug 16, 2018
azure-keyvault-credentialstore-lib SDC-9648. Library label not displayed for streamsets-datacollector-az… Jul 30, 2018
azure-lib SDC-9720: Populate hide stage field in stage definitions for error st… Aug 7, 2018
basic-lib SDC-7620 Fix typo/grammar in Field Flattener description for Remove F… Aug 15, 2018
bigtable-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
bootstrap Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cassandra-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cassandra_3-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh-spark_2_1-lib SDC-9629. HDP libs have incorrect driver jar set in stage lib properties Jul 26, 2018
cdh-spark_2_2-lib SDC-9629. HDP libs have incorrect driver jar set in stage lib properties Jul 26, 2018
cdh-spark_2_3-lib SDC-9629. HDP libs have incorrect driver jar set in stage lib properties Jul 26, 2018
cdh_5_10-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_11-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_12-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
cdh_5_13-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
cdh_5_14-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
cdh_5_15-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
cdh_5_2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_3-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_4-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_5-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_7-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_8-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_5_9-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_6_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_kafka_1_2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_kafka_1_3-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_kafka_2_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_kafka_2_1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_kafka_3_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cdh_spark_2_1_r1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cli Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
client-api SDC-9557: Failure Snapshot: Add boolean failure flag to SnapshotInfo Aug 2, 2018
cloudera-integration SDC-9575: SCH Registration: Update CSD to use the new command line tool Aug 2, 2018
cluster-bootstrap-api Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cluster-bootstrap Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cluster-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cluster-hdfs-protolib SDC-9639. Upgrade from older version of SDC fails hadoop fs origin stage Jul 27, 2018
cluster-kafka-protolib SDC-9676. ClusterKafkaConsumer needs to close KafkaConsumer before sp… Aug 1, 2018
common-ui SDC-9758. Usage Statistics modal dialog not sending correct checkbox … Aug 13, 2018
common SDC-9706. Delimited files with null header values causes data to get … Aug 3, 2018
commonlib SDC-9668. Automatically remove byte order mark (BOM) from XML files Jul 31, 2018
container-common SDC-9621 expression language function to split a string by some delim… Jul 30, 2018
container SDC-9673: SDC UI displays logs for unrelated pipelines Aug 16, 2018
couchbase-protolib SDC-9713: Couchbase destination: minor label edits Aug 6, 2018
couchbase_5-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
cyberark-credentialstore-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
datacollector-ui SDC-9828. UI: Add word wrap to description field in General pane Aug 18, 2018
dataformats-lib SDC-9519: Validation exception -- Amazon S3 destination for Avro data… Jul 19, 2018
dev-lib SDC-9709. Add Edge execution mode to Dev Data Generator origin Aug 6, 2018
dev-support SDC-9524.Updating pom files to support global version update using ve… Jul 19, 2018
dir-spooler-protolib SDC-9275. Directory Origin fails with read order set to Timestamp Jul 17, 2018
dist SDC-9783. Update release pom to use zip for windows SDCE binaries pt2 Aug 16, 2018
docs [doc] updated 3.4.x help Aug 8, 2018
e2e-tests Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
elasticsearch-protolib SDC-9720: Populate hide stage field in stage definitions for error st… Aug 7, 2018
elasticsearch_5-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
emr-protolib SDC-9363. EMR cluster: enable logging should be configurable Jul 18, 2018
emr_hadoop_2_8_3-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
flume-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
google-cloud-lib SDC-9720: Populate hide stage field in stage definitions for error st… Aug 7, 2018
google-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
groovy-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
groovy_2_4-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
guavasupport SDC-9440: Remove runner id from metric names Jul 24, 2018
hadoop-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hbase-protolib SDC-9520. Fix Hbase zookeeperQuorum validation Jul 18, 2018
hdfs-protolib SDC-9653. Migrate HDFS Executor miniIT tests Aug 13, 2018
hdp-stagelib-base Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_3-hive1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_3-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_4-hive1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_4-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
hdp_2_5-flume-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_5-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
hdp_2_6-flume-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_6-hive2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_6-lib SDC-9714. Conflicting Jackson dependencies in cdh/hdp stage libs Aug 8, 2018
hdp_2_6_1-hive1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hdp_2_6_2-hive1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
hive-protolib SDC-9562: Hive components should explicitly expose username and password Jul 24, 2018
httpcommonlib SDC-9430. NPE in parsing in HTTP Client Jul 31, 2018
influxdb_0_9-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
integration-testing SDC-6435: SDC source code incorrectly says it's licensed to the ASF Jun 13, 2017
jdbc-lib SDC-9710: Timestamp(9) not woking correctly Aug 15, 2018
jks-credentialstore-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
jms-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
json-dto Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
jython-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
jython_2_7-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kafka-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kafka_multisource-0_10-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kafka_multisource-0_9-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kafka_multisource-protolib SDC-9553. Support LineageEvents in Kafka Producer and multitopic cons… Jul 24, 2018
kafka_source-protolib SDC-6371. Runtime confs used by driver not evaluated in cluster mode Jul 18, 2018
kafka_target-protolib SDC-9720: Populate hide stage field in stage definitions for error st… Aug 7, 2018
kinesis-lib SDC-9720: Populate hide stage field in stage definitions for error st… Aug 7, 2018
kinetica-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kinetica_6_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kinetica_6_1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
kudu-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
lookup-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr-cluster-bootstrap Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_5_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_5_1-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_5_2-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_6_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_6_0-mep4-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_json-5_2-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_json-6_0-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_json-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapr_spark_2_1_mep_3_0-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
maprdb-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mapreduce-protolib SDC-9718. mapReduce pom clean up Aug 8, 2018
maprfs-protolib SDC-8751 maximize the space taken by stage library icons Jul 19, 2018
maprstreams-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
maprstreams-multisource-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
maprstreams-source-protolib SDC-8751 maximize the space taken by stage library icons Jul 19, 2018
maprstreams-target-protolib SDC-9720: Populate hide stage field in stage definitions for error st… Aug 7, 2018
mesos-bootstrap Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
messaging-client Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
miniIT Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
miniSDC Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mongodb-protolib SDC-8751 maximize the space taken by stage library icons Jul 19, 2018
mongodb_3-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
mysql-binlog-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
net-commonlib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
omniture-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
pulsar-protolib SDC-9760. Correct Capital letters in some Pulsar configuration labels Aug 14, 2018
rabbitmq-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
rbgen-maven-plugin Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
redis-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
release SDC-9780. Update release pom to use zip for windows SDCE binaries Aug 16, 2018
root-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
root-proto SDC-9524.Updating pom files to support global version update using ve… Jul 19, 2018
root Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
rpm Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
salesforce-lib SDC-9762. NPE in Salesforce Origin with subquery Aug 15, 2018
scripting-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka-api Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_10 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_11 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_8 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_9-common Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_9 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_9_mapr_5_1 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_0_9_mapr_5_2 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-kafka_1_0 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-solr-api Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-solr_6 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdc-solr_cdh_4 Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sdk Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
solr-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
spark-executor-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
spark-processor-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
sso Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
stage-lib-archetype Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
stage-lib-parent Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
stagesupport SDC-9531. Implement Tensorflow Processor - Data Collector Jul 22, 2018
stats-lib SDC-9561. System pipeline goes into fail/retry loop Jul 20, 2018
tensorflow-lib SDC-9766. TensorFlow Processor - validate field path exists or not an… Aug 16, 2018
testing Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
utils Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
vault-credentialstore-lib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
wholefile-converter-protolib Updated version to 3.5.0-SNAPSHOT Jul 17, 2018
wholefile-transformer-lib SDC-9544. Whole File Transformer - Rename the temporary directory pro… Jul 23, 2018
windows-lib SDC-9692. Windows Event Log - Add new log name type custom Aug 2, 2018
.gitignore SDC-8467. Update StageContext to satisfy API-178 Feb 22, 2018
BUILD.md SDC-9586. Build fails in GRUNT TEST phase Jul 26, 2018
CONTRIBUTING.md SDC-2077. Fix Contributor License Agreement link in datacollector/CON… Nov 25, 2015
LICENSE.txt SDC-6455: Remove license from license.md file Jun 14, 2017
NOTICE.txt SDC-8618. Update NOTICE.txt Mar 21, 2018
README.md SDC-6913. Update readme.md file with changelog link to Docs whats new… Jul 29, 2017
datacollector_splash.png SDC-1765. Cleaning up markdown files and testing pull request workflow. Sep 14, 2015
dependency-check-suppression.xml SDC-6435: SDC source code incorrectly says it's licensed to the ASF Jun 13, 2017
pom.xml SDC-9489. Add stage library for Kafka 1.1.0 Jul 25, 2018

README.md

What is StreamSets Data Collector?

StreamSets Data Collector is an enterprise grade, open source, continuous big data ingestion infrastructure. It has an advanced and easy to use User Interface that lets data scientists, developers and data infrastructure teams easily create data pipelines in a fraction of the time typically required to create complex ingest scenarios. Out of the box, StreamSets Data Collector reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others. You can use Python, Javascript and Java Expression Language in addition to a large number of pre-built stages to transform and process the data on the fly. For fault tolerance and scale out, you can setup data pipelines in cluster mode and perform fine grained monitoring at every stage of the pipeline.

To learn more, check out http://streamsets.com

License

StreamSets Data Collector is built on open source technologies, our code is licensed with the Apache License 2.0.

Getting Help

A good place to start is to check out http://streamsets.com/community. On that page you will find all the ways you can reach us and channels our team monitors. You can post questions on Google Groups sdc-user or on StackExchange using the tag #StreamSets. Post bugs at http://issues.streamsets.com or tweet at us with #StreamSets.

If you need help with production systems, you can check out the variety of support options offered on our support page.

Contributing code

We welcome contributors, please check out our guidelines to get started.

Changelog

See the latest changelog