-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1167007: Validate no null values for not null cols #739
SNOW-1167007: Validate no null values for not null cols #739
Conversation
bc0795c
to
bf9ed76
Compare
List<String> nullValueNotNullCols = new ArrayList<>(); | ||
for (String columnName : this.nonNullableFieldNames) { | ||
if (inputColNamesMap.containsKey(columnName) | ||
&& row.get(inputColNamesMap.get(columnName)) == null) { | ||
nullValueNotNullCols.add(statsMap.get(columnName).getColumnDisplayName()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already throw an exception at the end of parseColumnValueToParquet
when we insert null into a non-nullable column, should be just catch that and add the col name to the nullValueNotNullCols list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is the right place. The verifyInputColumns
is called before and sounds like the right place for such validation. verifyInputColumns
is also in the abstract class, the parseColumnValueToParquet
only in one implementation (Parquet).
If we catched the exception in insertRows function we would only add single column to the nullValueNotNullCols even if there is more columns with null values. We would need to pass the error object into the addRow
function to catch all the exception that may be thrown in a loop.
Therefore i think the verifyInputColumns
is much better place for the validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense to me, my concern was that in the worst case, we're iterating through all the data again to check this which is not ideal so I was hoping that we can do this as part of data validation.
@@ -442,6 +442,26 @@ Set<String> verifyInputColumns( | |||
"Values for all non-nullable columns must be specified, rowIndex:%d", rowIndex)); | |||
} | |||
|
|||
// Check for null values in not-nullable columns in the row | |||
List<String> nullValueNotNullCols = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to differentiate between missingCols
and nullValueNotNullCols
?
Is there a conceptual difference between the cases when a column is present with null
value and when it isn't present at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking about this, the only thing is the naming of missingCols. But if we do like this then we don't need to adapt anything in kafka push code. What u think @sfc-gh-tzhang ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with combing it and we will need to update the name to make it clear, the only concern is that we're updating a public API, but I highly doubt any customer is using this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we would need to change the name anyways (so we still would need changes in kafka push code) and it changes also the public API i would keep the lists separated.
The benefits I see from 2 separate lists:
- not breaking the public api (mentioned above)
- we give more detailed information what is going on. Missing columns vs null values is a bit different and might help in debugging in future.
So I would leave it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -442,6 +442,26 @@ Set<String> verifyInputColumns( | |||
"Values for all non-nullable columns must be specified, rowIndex:%d", rowIndex)); | |||
} | |||
|
|||
// Check for null values in not-nullable columns in the row | |||
List<String> nullValueNotNullCols = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with combing it and we will need to update the name to make it clear, the only concern is that we're updating a public API, but I highly doubt any customer is using this.
List<String> nullValueNotNullCols = new ArrayList<>(); | ||
for (String columnName : this.nonNullableFieldNames) { | ||
if (inputColNamesMap.containsKey(columnName) | ||
&& row.get(inputColNamesMap.get(columnName)) == null) { | ||
nullValueNotNullCols.add(statsMap.get(columnName).getColumnDisplayName()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense to me, my concern was that in the worst case, we're iterating through all the data again to check this which is not ideal so I was hoping that we can do this as part of data validation.
Validate if row contains null values for non-nullable columns. If yes, add such column names to InsertValidationResponse error as NullValueForNotNullColNames so it can be used for schema evolution.