Amazon Kinesis Connector Library
The Amazon Kinesis Connector Library helps Java developers integrate Amazon Kinesis with other AWS services. The current version of the library provides connectors for Amazon DynamoDB, Amazon Redshift, and Amazon S3. The library also includes sample connectors of each type, plus Apache Ant build files for running the samples.
- Amazon Kinesis Client Library: In order to use the Amazon Kinesis Connector Library, you'll also need the Amazon Kinesis Client Library.
- Java 1.7: The Amazon Kinesis Client Library requires Java 1.7 (Java SE 7) or later.
- SQL driver (Amazon Redshift only): If you're using an Amazon Redshift connector, you'll need a driver that will allow your SQL client to connect to your Amazon Redshift cluster. For more information, see Download the Client Tools and the Drivers in the Amazon Redshift Getting Started Guide.
Each Amazon Kinesis connector application is a pipeline that determines how records from an Amazon Kinesis stream will be handled. Records are retrieved from the stream, transformed according to a user-defined data model, buffered for batch processing, and then emitted to the appropriate AWS service.
A connector pipeline uses the following interfaces:
- IKinesisConnectorPipeline: The pipeline implementation itself.
- ITransformer: Defines the transformation of records from the Amazon Kinesis stream in order to suit the user-defined data model. Includes methods for custom serializer/deserializers.
- IFilter and IBuffer: IFilter defines a method for excluding irrelevant records from the processing. IBuffer defines a system for buffering the set of records to be processed; it specifies the size limit (number of records) and total byte count.
- IEmitter: Defines a method that makes client calls to other AWS services and persists the records stored in the buffer. The records can also be sent to another Amazon Kinesis stream.
Each connector depends on the implementation of KinesisConnectorRecordProcessor to manage the pipeline. The KinesisConnectorRecordProcessor class implements the IRecordProcessor interface in the Amazon Kinesis Client Library.
The library includes implementations for use with Amazon DynamoDB, Amazon Redshift, and Amazon S3. This section provides a few notes about each connector type. For full details, see the samples and the Javadoc.
- DynamoDBTransformer: Implement the fromClass method to map your data model to a format that's compatible with the AmazonDynamoDB client (Map<String,AttributeValue>).
- For more information on Amazon DynamoDB formats and putting items, see Working with Items Using the AWS SDK for Java Low-Level API in the Amazon DynamoDB Developer Guide.
- RedshiftTransformer: Implement the toDelimtedString method to output a delimited-string representation of your data model. The string must be compatible with an Amazon Redshift COPY command.
- For more information about Amazon Redshift copy operations and manifests, see COPY and Using a manifest to specify data files in the Amazon Redshift Developer Guide.
- S3Emitter: This class writes the buffer contents to a single file in Amazon S3. The file name is determined by the Amazon Kinesis sequence numbers of the first and last records in the buffer. For more information about sequence numbers, see Add Data to a Stream in the Amazon Kinesis Developer Guide.
Set the following variables (common to all connector types) in kinesis.connectors.KinesisConnectorConfiguration:
- AWSCredentialsProvider: Specify the implementation of AWSCredentialsProvider that supplies your AWS credentials.
- APP_NAME: The Amazon Kinesis application name (not the connector application name) for use with kinesis.clientlibrary.lib.worker.KinesisClientLibConfiguration. For more information, see Developing Record Consumer Applications in the Amazon Kinesis Developer Guide.
- KINESIS_ENDPOINT and KINESIS_INPUT_STREAM: The endpoint and name of the Kinesis stream that contains the data you're connecting to other AWS services.
Service-specific configuration variables are set in the respective emitter implementations (e.g., kinesis.connectors.dynamodb.DynamoDBEmitter).
The samples folder contains common classes for all the samples. The subfolders contain implementations of the pipeline and executor classes, along with Apache Ant build.xml files for running the samples.
Each sample uses the following files:
- StreamSource.java: A simple application that sends records to an Amazon Kinesis stream.
- users.txt: JSON records that are parsed line by line by the StreamSource program; the basis of KinesisMessageModel.
- KinesisMessageModel.java: The data model for the users.txt records.
- KinesisConnectorExecutor.java: An abstract implementation of an Amazon Kinesis connector application, which includes these features:
- Configures the constructor, using the samples.utils package and the .properties file in the sample subfolder.
- Provides the getKinesisConnectorRecordProcessorFactory() method, which is implemented by the executors in the sample subfolders; each executor returns an instance of a factory configured with the appropriate pipeline.
- Provides a run() method for spawning a worker thread that uses the result of getKinesisConnectorRecordProcessorFactory().
- .properties: The service-specific key-value properties for configuring the connector.
- <service/type>Pipeline: The implementation of IKinesisConnectorPipeline for the sample. Each pipeline class returns a service-specific transformer and emitter, as well as simple buffer and filter implementations (BasicMemoryBuffer and AllPassFilter).
Running a Sample
To run a sample, complete these steps:
- Edit the *.properties file, adding your AWS credentials and any necessary AWS resource configurations.
- Note: In the samples, KinesisConnectorExecutor uses the DefaultAWSCredentialsProviderChain, which looks for credentials supplied by environment variables, system properties, or IAM role on Amazon EC2. If you prefer to specify your AWS credentials via a properties file on the classpath, edit the sample code to use ClasspathPropertiesFileCredentialsProvider instead.
- Confirm that the required AWS resources exist, or set the flags in the *.properties file to indicate that resources should be created when the sample is run.
- Within the sample folder, execute ant run.