Skip to content
This repository was archived by the owner on Nov 12, 2024. It is now read-only.

Conversation

@aaron-steinfeld
Copy link
Contributor

@aaron-steinfeld aaron-steinfeld commented Dec 28, 2020

Previously, an empty array was stringified when setting as a default value for a multivalued col. See the setter logic here: https://github.com/apache/incubator-pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java#L170-L186

The updated behavior matches pinot's ingestion handling of an empty array/list for a multivalued col:
https://github.com/apache/incubator-pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java#L108-L111

@codecov
Copy link

codecov bot commented Dec 28, 2020

Codecov Report

Merging #23 (148921e) into main (08ec1d0) will decrease coverage by 0.14%.
The diff coverage is 60.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main      #23      +/-   ##
============================================
- Coverage     52.40%   52.26%   -0.15%     
- Complexity       33       34       +1     
============================================
  Files            12       12              
  Lines           395      398       +3     
  Branches         30       32       +2     
============================================
+ Hits            207      208       +1     
  Misses          179      179              
- Partials          9       11       +2     
Flag Coverage Δ Complexity Δ
unit 52.26% <60.00%> (-0.15%) 0.00 <0.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
.../hypertrace/core/viewcreator/pinot/PinotUtils.java 71.85% <60.00%> (-0.88%) 19.00 <0.00> (+1.00) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08ec1d0...148921e. Read the comment docs.

tim-mwangi
tim-mwangi previously approved these changes Dec 28, 2020
if (defaultVal == JsonProperties.NULL_VALUE) {
defaultVal = null;
}
if (!AvroUtils.isSingleValueField(field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you this to run for MAP types as well? If not, it might be better to move this into the else block on L163. The default value is set to "" for MAP type explicitly so I'd assume something might be depending on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not familiar with the story behind the map type (because it really feels from this file like it's a hack we placed on top of pinot), but maps won't hit this for a couple different reasons - first, pinot's util, AvroUtils.isSingleValueField does not treat maps as multi valued - only arrays (see https://github.com/apache/incubator-pinot/blob/e892cb2942f1f41ac7056d977eeca0e5b22897d0/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroUtils.java#L229-L236). Second, the default value of a map, based on the above logic, will always be a string, based on the logic on lines 159-162 above - so none of the conditions are reachable.

On top of all that, based on my current understanding of the code around defaults, if the map code fell into here, it would actually be more correct (because a map is being split into two multivalued string cols, each of which would make sense to default to null). Setting an empty string I suspect has the same effect due to pinot's defaults for a string data type, but haven't confirmed because in practice we never use a default value for any of our map cols. In other words, we're generating:

    {
      "name": "tags__KEYS",
      "dataType": "STRING",
      "singleValueField": false,
      "defaultNullValue": ""
    }

instead of

    {
      "name": "tags__KEYS",
      "dataType": "STRING",
      "singleValueField": false
    }

The reason I'd rather keep the logic after the map block you mentioned is the code feels separated into first initializing a default value, then normalizing the default values for pinot - and this feels more like the latter.

So all that said, happy to either add a comment or rearrange for clarity - @buchi-busireddy what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your reason makes sense to me so I think we should have comments at the least on why it's added later. Otherwise, I'm fine.
We should probably fix the map logic at some point and make it simpler.

Copy link
Contributor Author

@aaron-steinfeld aaron-steinfeld Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left the empty string with a todo since I'm not sure if that change has any unintended side effects, but otherwise rearranged the logic so the default value overrides exist in sequence.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants