-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KITE-1081 FS writers to use the current view schema #421
KITE-1081 FS writers to use the current view schema #421
Conversation
@@ -118,7 +118,7 @@ private FileSystemView(FileSystemView<?> view, Schema schema, Class<E> type) { | |||
writer = PartitionedDatasetWriter.newWriter(this); | |||
} else { | |||
writer = FileSystemWriter.newWriter( | |||
fs, root, -1, -1 /* get from descriptor */, dataset.getDescriptor()); | |||
fs, root, -1, -1 /* get from descriptor */, dataset.getDescriptor(), this.getSchema()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AbstractRefinableView#getSchema
method returns the accessor's read schema. That's useful for callers, but the writers should be initialized using getAccessor().getWriteSchema()
. The two schemas can differ slightly for reflected records.
I am not sure if you got a chance to look into my latest commit addressing all your comments. After adding this new commit, build is failing for CDH5 alone and looking into the logs it doesn't seem those errors were introduced by this new commit, I could be wrong. |
@rdblue Have you had a chance to review the update that @megnataraj provided? |
@megnataraj and @chris-hirstein, I've looked at the update and it looks mostly good. My comments on TestAvroWriter#newWriter still applies though. That writer should use the schema that was passed in. |
@rdblue It looks like @megnataraj added an update, can you take another look? Thanks! |
Re-kicked off Travis build b/c failures seemed unrelated to the PR. +1 for this PR pending successful build for non-flaky reasons. |
I'm not sure that the tests are exercising the correct code path, so I'd appreciate it if you didn't merge this until I have time to take a closer look. Sorry to be a blocker here. |
@rdblue Will you have time to review this change soon? |
|
||
@Test | ||
public void testWriteWithOldSchema() throws IOException { | ||
Schema writerSchema = SchemaBuilder.record("Message").fields() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This schema isn't compatible with TEST_SCHEMA
, which is the dataset's schema. The required string field "message" is missing. This works because you're passing this writerSchema
into newReader
.
The problem is that I'm not sure this is testing something well-defined. It validates that this writerSchema
as passed to FileSystemWriter.newWriter
is used for the file (otherwise writes would fail) but I don't see the value of using it as the read schema because we expect that it is a valid read schema for both TEST_SCHEMA
and itself.
I think this should pass the writerSchema
you have here to newWriter
, create a file, and validate that the file's schema is actually this one and not TEST_SCHEMA
. At the writer level, that's what we are interested in. The other behavior (visible/hidden files and other checks) is validated in other tests and can probably be removed.
I also think there needs to be an overall test from the view layer. The motivation is to show that a view with an older write schema works. So I would like to see a view test that creates a dataset with a 2-field schema (one optional field) and writes through a view with a write-compatible schema that is missing the optional field. That validates then end-to-end behavior that we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue this unit test is now updated to validate the file's schema is same as writerSchema.
Also A new test has been added to TestRefinableView, where reader writer compatibility is verified by using two schemas (Value, TestValue).
Let us know if these are the changes you're expecting, or if there are other modifications that needs to be done
@chris-hirstein, @megnataraj, thanks for the reminder. I've posted a comment about making sure the tests cover what we need to. I hope that is clear enough but if it isn't please ask questions about it and I'll make sure they get answered. |
I've updated this pull request as per the changes suggested by @rdblue |
@anandsagark, I'll look today. |
+1 overall. I'm willing to merge unless there are objections from @rdblue or @mkwhitacre. (I might quibble that the TestValue class could do with a more descriptive name, like "ValueWithoutOptionalField", but I don't think that's enough to block it.) |
I don't think that the new test actually tests what it should. It looks like the schemas for the two values are the same. I think the dataset should have a 2-field record and the code should write a 1-field record. But, I've been holding this up for too long and I'm fairly confident that it works, so I'm not going to shoot it down. Thanks for being patient, everyone. |
@rdblue Ugh, you're right. Got my wires crossed on which schema the view pointed to. @anandsagark, if I'm reading this right I think this just means making testReaderWriterCompatibleSchema use a dataset that uses the "testDescriptor" rather than the "valueDescriptor" so we write to a different schema than the writer has. |
…ord, read with two field record
…t 2-field record, writer 1-field record
+1 Test looks good to me. Thanks @anandsagark |
I kicked off the build again (looks like there was a spurious failure on the default profile). I'll merge pending a clean build on default and CDH5 (I know the CDH4 profile is failing for unrelated reasons.) |
The build is clean modulo the known CDH4 issue and we have +1's. Merging. Thanks everyone! |
…rrent-view-schema KITE-1081 FS writers to use the current view schema
No description provided.