Big Shipper

Spark ingestion tool for Big Data written in Scala

Mandatory

Installed build tool: SBT

Spark 1.4+

Beta version

0.1 - under development

Stack

Scala - Object-Oriented Meets Functional Programming Language.
SBT - Scala interactive build tool.
ScalaTest - The central concept in ScalaTest is the suite, a collection of zero to many tests.
Scalastyle - Scalastyle examines Scala code and indicates potential problems with it.
Apache Spark - Apache Spark is a fast and general engine for large-scale data processing.
Travis-ci - Travis CI is a hosted, distributed continuous integration service used to build and test software projects.

Compilation

Run script compiler.sh

cd big-shipper
./compiler.sh

This script generate target/scala-[version, like: 2.10]/BigShipper-assembly-0.1.jar file with dependencies embedded.

Usage

spark-submit --class main.Shipper target/scala-[version, like: 2.10]/BigShipper-assembly-0.1.jar -c /path_to/config.json --loglevel debug

Config example:

{
	"SOURCE":{
		"TYPE": "delimitedfile",
		"FIELDS": [
			{
				"NAME": "field1",
				"TYPE": "int"
			},
			{
				"NAME": "field2",
				"TYPE": "string"
			},
			{
				"NAME": "field3",
				"TYPE": "decimal"
			}
		],
		"DELIMITER_RAW": "|",
		"DIR_RAW_FILES": "/user/NAME/data_201703{2[7-9],3[0-1]}.txt"
	},
	"TARGET":{
		"TYPE": "hive",
		"ACTION": "append",
		"HIVE_TABLE": "table_name_here"
	}
}

Check more examples

SOURCE.TYPE: Type of source file(s). Values: [delimitedfile, json]

SOURCE.FIELDS.TYPE: Data types for fields. Values: [bigint, int, smallint, tinyint, double, decimal, float, byte, string, date, timestamp and boolean]

SOURCE.DIR_RAW_FILES: HDFS path with REGEX pattern to grab files or local path started with: [file://].

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
project		project
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt
compiler.sh		compiler.sh
runTests.sh		runTests.sh
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Shipper

Mandatory

Beta version

Stack

Compilation

Usage

Config example:

License

About

Languages

License

mvrpl/big-shipper

Folders and files

Latest commit

History

Repository files navigation

Big Shipper

Mandatory

Beta version

Stack

Compilation

Usage

Config example:

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages