Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
Java Shell JavaScript HTML CSS Python
Latest commit 044073e Jun 26, 2017 @arjun4084346 arjun4084346 committed with htran1 adding delimited identifier to column names in sql queries (#1964)
* adding back tick to column names so it can execute sql query which has sql keywords as column names
* adding different delimited identifiers for different sql systems
Permalink
Failed to load latest commit information.
bin Need to clean up the locks directory if gobblin cannot be stopped gra… Apr 26, 2017
buildSrc/src/main/groovy/gobblin/gradle Adding java8 build support; build.gradle clean up (#1528) Jan 12, 2017
conf Adding a custom rebalancer GobblinJobRebalancer for long-running tasks ( May 2, 2017
gobblin-admin Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gobblin-api adding delimited identifier to column names in sql queries (#1964) Jun 27, 2017
gobblin-audit Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gobblin-aws Set the job.id and task.id in the MDC for improved logging (#967) Mar 9, 2017
gobblin-cluster Test fix for GobblinClusterKillTest to work with new output directory… Jun 26, 2017
gobblin-compaction Add AsyncHttpJoinConverter (#1965) Jun 24, 2017
gobblin-config-management Implemented Job-level white/black list for replication (#1975) Jun 26, 2017
gobblin-core-base Add AsyncHttpJoinConverter (#1965) Jun 24, 2017
gobblin-core adding delimited identifier to column names in sql queries (#1964) Jun 27, 2017
gobblin-data-management Implemented Job-level white/black list for replication (#1975) Jun 26, 2017
gobblin-distribution Refactor the Crypto, Codec, and Metadata concepts out into their own Mar 10, 2017
gobblin-docker gobblin-docker files for 0.9.0 and 0.10.0 release (#1898) Jun 2, 2017
gobblin-docs update case study to latest version of code (#1902) Jun 12, 2017
gobblin-example Added a test file based source and max files to pull to all file base… Jun 19, 2017
gobblin-hive-registration Added additional timers to kafka source and Hive publisher. (#1957) Jun 19, 2017
gobblin-metastore Support shorten dataset state store name Apr 13, 2017
gobblin-metrics-libs adding more information in events for gobblintrackingevent_distcp_ng (#… Jun 12, 2017
gobblin-modules Fix R2 DataMap conversion from nested objects in a json string (#1977) Jun 26, 2017
gobblin-oozie/src/test/resources/local Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gobblin-rest-service Upgrade pegasus dependency (#1928) Jun 8, 2017
gobblin-restli Upgrade pegasus dependency (#1928) Jun 8, 2017
gobblin-runtime-hadoop Adding FileBasedJobLockFactory and FileBasedJobLockFactoryManager (#1513 Jan 12, 2017
gobblin-runtime Fix TextFileBasedSourceTest (#1971) Jun 25, 2017
gobblin-salesforce Add salesorce day based dynamic partitioning (#1762) Apr 11, 2017
gobblin-service Added Service Metric name constants class May 5, 2017
gobblin-test-harness Move integration test into gobblin-test-harness Jun 12, 2017
gobblin-test-utils Add the ability to pass a seed to the TestCredentialStore in case we Jun 8, 2017
gobblin-test/resource Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gobblin-tunnel Set the job.id and task.id in the MDC for improved logging (#967) Mar 9, 2017
gobblin-utility Http join converter and broker fixes (#1963) Jun 20, 2017
gobblin-yarn Fix for #1598 (#1864): Fixed NPE when Yarn container is killed. Jun 7, 2017
gradle Add AsyncHttpJoinConverter (#1965) Jun 24, 2017
ligradle/findbugs Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
maven-sonatype Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
travis Allow building of Java8 modules (#1556) Jan 18, 2017
.gitignore Ignore metastore_db and mock-couchbase in git (#1558) Jan 18, 2017
.travis.yml Upgrade to java 8. (#1842) May 17, 2017
CHANGELOG.md Release 0.10.0 (#1820) May 5, 2017
LICENSE Add License/Notice/Readme.md according to Linkedin Open Source Requir… Nov 21, 2014
NOTICE Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
README.md Updated Readme to include Gitter link Jan 12, 2017
build.gradle Upgrade pegasus dependency (#1928) Jun 8, 2017
defaultEnvironment.gradle Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gobblin-flavored-build.gradle Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gradle.properties Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gradlew Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
gradlew.bat Changed license to Apache 2.0 in source files for incubation Jan 6, 2017
mkdocs.yml RestApiExtractor documentation (#1683) Mar 15, 2017
query_github_issues.py Formatting fix Jan 12, 2017
readthedocs.yml Initial commit for mkdocs and readthedocs integration Mar 9, 2016
settings.gradle Adding service module Jan 31, 2017

README.md

Gobblin Build Status Documentation Status

Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop. Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.

Quick Links

  • Documentation: Check out the Gobblin documentation for a complete description of Gobblin's features
  • Powered By: Check out the list of companies known to use Gobblin
  • Architecture: The Gobblin Architecture page has a full explanation of Gobblin's architecture
  • Getting Started with Gobblin: Refer to the Getting Started Guide on how to get started with Gobblin
  • Building Gobblin: Refer to the page Building Gobblin for directions on how to build Gobblin
  • Javadocs: The full JavaDocs for each released version of Gobblin can be found here
  • Gobblin chat room: Gitter chat room for Gobblin developers and users here