Skip to content

kevinhan88/hadoop-connectors

 
 

Repository files navigation

hadoop-connectors

Apache Hadoop connectors for Pravega.

Description

Implementation of a Hadoop input format for Pravega (with wordcount examples). It leverages Pravega batch client to read all existing events in parallel.

Build

The build script handles Pravega as a source dependency, meaning that the connector is linked to a specific commit of Pravega (as opposed to a specific release version) in order to faciliate co-development. This is accomplished with a combination of a git submodule and the use of Gradle's composite build feature.

Cloning the repository

When cloning the connector repository, be sure to instruct git to recursively checkout submodules, e.g.:

git clone --recurse-submodules https://github.com/pravega/hadoop-connectors.git

To update an existing repository:

git submodule update --init --recursive

Building Pravega

Pravega is built automatically by the connector build script.

Building Hadoop Connector

Build the connector:

./gradlew build (w/o dependencies)
./gradlew shadowJar (w/ dependencies)

Test

./gradlew test

Usage

        Configuration conf = PravegaInputFormat.builder()
            .withScope("myScope")
            .forStream("myStream")
            .withURI("tcp://127.0.0.1:9090")
            .withDeserializer(io.pravega.client.stream.impl.JavaSerializer.class.getName())
            // optional to set start and end positions
            // generally, start positions are set to the end positions in previous job,
            // so only new generated events will be processed, otherwise, start from very beginning if not set
            .startPositions(startPos)
            .endPositions(endPos)
            .build();

        Job job = new Job(conf);
        job.setInputFormatClass(PravegaInputFormat.class);

        // NOTE:
        // 1. You have the option to use existing job 'Configuration' instance as the input parameter to create a builder
        //     "PravegaInputFormat.builder(conf)"
        // 2. Key class is 'EventKey', but you won't need it at most of the time.

About

Apache Hadoop connectors for Pravega.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%