-
Notifications
You must be signed in to change notification settings - Fork 3
Multivalue default support #23
Conversation
Codecov Report
@@ Coverage Diff @@
## main #23 +/- ##
============================================
- Coverage 52.40% 52.26% -0.15%
- Complexity 33 34 +1
============================================
Files 12 12
Lines 395 398 +3
Branches 30 32 +2
============================================
+ Hits 207 208 +1
Misses 179 179
- Partials 9 11 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| if (defaultVal == JsonProperties.NULL_VALUE) { | ||
| defaultVal = null; | ||
| } | ||
| if (!AvroUtils.isSingleValueField(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you this to run for MAP types as well? If not, it might be better to move this into the else block on L163. The default value is set to "" for MAP type explicitly so I'd assume something might be depending on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not familiar with the story behind the map type (because it really feels from this file like it's a hack we placed on top of pinot), but maps won't hit this for a couple different reasons - first, pinot's util, AvroUtils.isSingleValueField does not treat maps as multi valued - only arrays (see https://github.com/apache/incubator-pinot/blob/e892cb2942f1f41ac7056d977eeca0e5b22897d0/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroUtils.java#L229-L236). Second, the default value of a map, based on the above logic, will always be a string, based on the logic on lines 159-162 above - so none of the conditions are reachable.
On top of all that, based on my current understanding of the code around defaults, if the map code fell into here, it would actually be more correct (because a map is being split into two multivalued string cols, each of which would make sense to default to null). Setting an empty string I suspect has the same effect due to pinot's defaults for a string data type, but haven't confirmed because in practice we never use a default value for any of our map cols. In other words, we're generating:
{
"name": "tags__KEYS",
"dataType": "STRING",
"singleValueField": false,
"defaultNullValue": ""
}instead of
{
"name": "tags__KEYS",
"dataType": "STRING",
"singleValueField": false
}The reason I'd rather keep the logic after the map block you mentioned is the code feels separated into first initializing a default value, then normalizing the default values for pinot - and this feels more like the latter.
So all that said, happy to either add a comment or rearrange for clarity - @buchi-busireddy what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your reason makes sense to me so I think we should have comments at the least on why it's added later. Otherwise, I'm fine.
We should probably fix the map logic at some point and make it simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left the empty string with a todo since I'm not sure if that change has any unintended side effects, but otherwise rearranged the logic so the default value overrides exist in sequence.
Previously, an empty array was stringified when setting as a default value for a multivalued col. See the setter logic here: https://github.com/apache/incubator-pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java#L170-L186
The updated behavior matches pinot's ingestion handling of an empty array/list for a multivalued col:
https://github.com/apache/incubator-pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java#L108-L111