StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
Branch: master
Clone or download
Pull request Compare This branch is 59 commits behind streamsets:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
aerospike-lib
apache-kafka_0_10-lib
apache-kafka_0_11-lib
apache-kafka_0_8_1-lib
apache-kafka_0_8_2-lib
apache-kafka_0_9-lib
apache-kafka_1_0-lib
apache-kafka_1_1-lib
apache-kafka_2_0-lib
apache-kudu_1_0-lib
apache-kudu_1_1-lib
apache-kudu_1_2-lib
apache-kudu_1_3-lib
apache-kudu_1_4-lib
apache-kudu_1_5-lib
apache-kudu_1_6-lib
apache-kudu_1_7-lib
apache-pulsar_2-lib
apache-solr_6_1_0-lib
aws-lib
aws-secrets-manager-credentialstore-lib
aws-support
azure-keyvault-credentialstore-lib
azure-lib
basic-lib
bigtable-lib
bootstrap
cassandra-protolib
cassandra_3-lib
cdh-spark_2_1-lib
cdh-spark_2_2-lib
cdh-spark_2_3-lib
cdh_5_10-lib
cdh_5_11-lib
cdh_5_12-lib
cdh_5_13-lib
cdh_5_14-lib
cdh_5_15-lib
cdh_5_2-lib
cdh_5_3-lib
cdh_5_4-lib
cdh_5_5-lib
cdh_5_7-lib
cdh_5_8-lib
cdh_5_9-lib
cdh_6_0-lib
cdh_kafka_1_2-lib
cdh_kafka_1_3-lib
cdh_kafka_2_0-lib
cdh_kafka_2_1-lib
cdh_kafka_3_0-lib
cdh_kafka_3_1-lib
cdh_spark_2_1_r1-lib
cli
client-api
cloudera-integration
cluster-bootstrap-api
cluster-bootstrap
cluster-common
cluster-hdfs-protolib
cluster-kafka-protolib
common
commonlib
container-common
container
couchbase-protolib
couchbase_5-lib
crypto-lib
cyberark-credentialstore-lib
databricks-ml-protolib
databricks-ml_2-lib
datacollector-ui
dataformats-lib
dev-lib
dev-support
dir-spooler-protolib
dist
docs
e2e-tests
elasticsearch-protolib
elasticsearch_5-lib
emr-protolib
emr_hadoop_2_8_3-lib
flume-protolib
google-cloud-lib
google-common
groovy-protolib
groovy_2_4-lib
guavasupport
hadoop-common
hbase-protolib
hdfs-protolib
hdp-stagelib-base
hdp_2_2-lib
hdp_2_3-hive1-lib
hdp_2_3-lib
hdp_2_4-hive1-lib
hdp_2_4-lib
hdp_2_5-flume-lib
hdp_2_5-lib
hdp_2_6-flume-lib
hdp_2_6-hive2-lib
hdp_2_6-lib
hdp_2_6_1-hive1-lib
hdp_2_6_2-hive1-lib
hive-protolib
httpcommonlib
influxdb_0_9-lib
integration-testing
jdbc-lib
jdbc-protolib
jks-credentialstore-lib
jms-lib
json-dto
jython-protolib
jython_2_7-lib
kafka-common
kafka_multisource-0_10-protolib
kafka_multisource-0_9-protolib
kafka_multisource-protolib
kafka_source-protolib
kafka_target-protolib
kinesis-lib
kinetica-protolib
kinetica_6_0-lib
kinetica_6_1-lib
kinetica_6_2-lib
kudu-protolib
lookup-protolib
mapr-cluster-bootstrap
mapr-cluster-bootstrap_2_2
mapr-common
mapr_5_0-lib
mapr_5_1-lib
mapr_5_2-lib
mapr_6_0-lib
mapr_6_0-mep4-lib
mapr_6_0-mep5-lib
mapr_json-5_2-protolib
mapr_json-6_0-protolib
mapr_json-protolib
mapr_spark_2_1_mep_3_0-lib
maprdb-protolib
mapreduce-protolib
maprfs-protolib
maprstreams-common
maprstreams-multisource-protolib
maprstreams-source-protolib
maprstreams-target-protolib
mesos-bootstrap
messaging-client
miniSDC
mleap-lib
mongodb-protolib
mongodb_3-lib
mysql-binlog-lib
net-commonlib
omniture-lib
pulsar-protolib
rabbitmq-lib
rbgen-maven-plugin
redis-lib
release
root-lib
root-proto
root
rpm
salesforce-lib
scripting-protolib
sdc-hbase-0_98
sdc-hbase-2_0
sdc-hbase-api
sdc-kafka-api
sdc-kafka_0_10
sdc-kafka_0_11-common
sdc-kafka_0_11
sdc-kafka_0_8
sdc-kafka_0_9-common
sdc-kafka_0_9
sdc-kafka_0_9_mapr_5_1
sdc-kafka_0_9_mapr_5_2
sdc-kafka_1_0
sdc-solr-api
sdc-solr_6
sdc-solr_7
sdc-solr_cdh_4
sdk
solr-protolib
spark-executor-protolib
spark-processor-protolib
sso
stage-lib-archetype
stage-lib-parent
stagesupport
stats-lib
tensorflow-lib
testing
utils
vault-credentialstore-lib
wholefile-converter-protolib
wholefile-transformer-lib
windows-lib
.gitignore
BUILD.md
CONTRIBUTING.md
LICENSE.txt
NOTICE.txt
README.md
datacollector_splash.png
dependency-check-suppression.xml
pom.xml

README.md

What is StreamSets Data Collector?

StreamSets Data Collector is an enterprise grade, open source, continuous big data ingestion infrastructure. It has an advanced and easy to use User Interface that lets data scientists, developers and data infrastructure teams easily create data pipelines in a fraction of the time typically required to create complex ingest scenarios. Out of the box, StreamSets Data Collector reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others. You can use Python, Javascript and Java Expression Language in addition to a large number of pre-built stages to transform and process the data on the fly. For fault tolerance and scale out, you can setup data pipelines in cluster mode and perform fine grained monitoring at every stage of the pipeline.

To learn more, check out http://streamsets.com

Building StreamSets Data Collector

To build the StreamSets Data Collector from source code, click here for details.

License

StreamSets Data Collector is built on open source technologies, our code is licensed with the Apache License 2.0.

Getting Help

A good place to start is to check out http://streamsets.com/community. On that page you will find all the ways you can reach us and channels our team monitors. You can post questions on Google Groups sdc-user or on StackExchange using the tag #StreamSets. Post bugs at http://issues.streamsets.com or tweet at us with #StreamSets.

If you need help with production systems, you can check out the variety of support options offered on our support page.

Contributing Code

We welcome contributors, please check out our guidelines to get started.

Changelog

See the latest changelog