Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix es.output.json for Cascading. #898

Merged
merged 2 commits into from
Dec 8, 2016
Merged

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Nov 17, 2016

If es.output.json is enabled, instead of deserializing JSON data into a map object ES-Hadoop will simply return the raw JSON string to the user. This is returned from the Hadoop RecordReader as a Text object containing the string data. When the Cascading integration receives data from a RecordReader or from a ScrollQuery, it automatically assumes that the data is a Map object. This leads to a cast exception when it receives a Text object instead.

This relates to issue #885.

Added tests to verify functionality.
fixes elastic#885
Pipe pipe = new Pipe("copy");

Tap out = new HadoopPrintStreamTap(Stream.NULL);
build(cfg(), in, out, pipe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, do we check the result somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no great way to check the result in this test case since it's writing in a separate process. We're mostly just checking that the job completes instead of throwing.

context[0] = sourceCall.getInput().createKey();
context[1] = sourceCall.getInput().createValue();
// as the tuple _might_ vary (some objects might be missing), we use a map rather then a collection
Settings settings = loadSettings(flowProcess.getConfigCopy(), true);
context[2] = CascadingUtils.alias(settings);
context[3] = settings.getOutputAsJson();
Copy link
Member

@tlrx tlrx Nov 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a constant for the index in the context array, something like
context[OUTPUT_AS_JSON] = settings.getOutputAsJson();

That would help to read the context array as well as quickly identify where it is read & set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 9ef7ffe

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to say, it looks good to me with my very limited knowledge of Cascading and its integration.

I left 2 minor comments

@jbaiera jbaiera merged commit 130c6bf into elastic:5.x Dec 8, 2016
@jbaiera jbaiera deleted the fix-cascading-json branch December 8, 2016 02:57
jbaiera added a commit that referenced this pull request May 8, 2017
Added tests to verify functionality.
fixes #885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants