Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an ItemReader that reads from an InputStream [BATCH-2695] #912

Closed
spring-issuemaster opened this issue Mar 1, 2018 · 6 comments
Closed

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Mar 1, 2018

Michael Minella opened BATCH-2695 and commented

A regular request is to be able to read S3 files without downloading them first. In order to do this a reader would need to be created to read from an InputStream instead of a local file. This is to explore a mechanism to do so.


Affects: 4.0.0

Reference URL: https://stackoverflow.com/questions/30832041/spring-batch-read-files-from-aws-s3

Issue Links:

Backported to: 4.1.0.M3

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Mar 1, 2018

Gary Russell commented

Michael Minella spring-integration-aws already has a S3StreamingMessageSource.

It uses the S3RemoteFileTemplate.

cc/ Artem Bilan

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Mar 1, 2018

Artem Bilan commented

You just need a simple code like this:

InputStream s3ObjectInputStream = this.amazonS3.getObject(bucketName, key).getObjectContent();
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 21, 2018

Mahmoud Ben Hassine commented

Thank you Gary Russell and Artem Bilan.

Michael Minella As discussed, I first tried to see if It's possible to read data from a URL without downloading it using a URLResource. The answer is yes. To test that, I uploaded a flat file (with random data, 1M records approx 50Mb) to S3 and wrote the following test:

@Test
public void testReadDataFromS3() throws Exception {
	// given
	UrlResource resource = new UrlResource("https://s3.eu-west-3.amazonaws.com/benas-data/data.csv");
	FlatFileItemReader<String> itemReader = new FlatFileItemReaderBuilder<String>()
			.name("dataReader")
			.resource(resource)
			.lineMapper(new PassThroughLineMapper())
			.build();

	// when
	int itemCount = 0;
	itemReader.open(new ExecutionContext());
	while (itemReader.read() != null) {
		itemCount++;
	}
        itemReader.close();

	// then
	Assert.assertEquals(1000000, itemCount);
}

which is passing. The file is not downloaded locally and is streamed directly from S3.

The good news is that all file readers in Spring Batch (FlatFileItemReader, StaxEventItemReader and JsonItemReader) are based on the (powerful!) Resource abstraction, so it's possible to read not only flat files but also XML and JSON files from a specific URL (Our XML and JSON tests are passing when reading data directly from Github, see here).

One important part of this user story is we need to make sure that Spring Batch mechanics (skip, restart, etc) are still valid when streaming data from a URL. I wrote a test suite for these features here and it is passing too.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 31, 2018

Mahmoud Ben Hassine commented

Works as designed with a URLResource.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 31, 2018

Artem Bilan commented

Does this mean that for plain, in-memory InputStream that would be just enough for me to wrap it into the InputStreamResource and reuse a mentioned FlatFileItemReaderBuilder?

Is it documented somehow?

Thanks

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Aug 31, 2018

Mahmoud Ben Hassine commented

Artem Bilan yes, that should work. The documentation states that the reader expects a SF Resource and links to SF docs, so any Resource implementation should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.