Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null values should not be converted to the "null" string if the field is not Optional #106

Closed

Conversation

cosimomeli
Copy link

Hi,
I found "null" strings in my topics and digging in the code I found that everything is stringified if the string field is not Optional.
I don't want to debate about the approach "serialize everything", but at least null values have to fail if the field can't be null.
If someone really wants a string "null" somewhere, that's something that can be done with a pipeline.

Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rozza rozza self-requested a review July 11, 2022 09:46
Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having re-reviewed I think this is incorrect.

The complexity really comes from the difference in BsonNull being a value for a field and the lack of a field existing. So these documents are not equal:

{a: 1, b: null} and {a: 1}.

Its important to note all types are convertible to string including BsonNull. So I think the fix should be in the string handling of the BsonNull type.

Where a value is missing eg: the b field in {a: 1}. Then that would be validated by the schema repository inside Kafka.

Ross

I'd prefer to make BsonNull an empty string rather than "null".

@cosimomeli
Copy link
Author

The complexity really comes from the difference in BsonNull being a value for a field and the lack of a field existing. So these documents are not equal:

{a: 1, b: null} and {a: 1}.

Yes, these documents are not equal, but if my schema says b is of type String and can't be null, b is null in both documents from the schema's point of view. The equivalence of the documents depends on the Schema.
With the example b is of type String and can't be null, {a: 1, b: null} and {a: 1} are equivalent, and can't pass the schema check, {a: 1, b: null} and {a: 1, b: ""} are not equivalent, as the the empty string is a valid String.

I'd prefer to make BsonNull an empty string rather than "null".

In AVRO and Connect Schema null is a primitive type, casting null to the empty string is a type change and an escape from an explicit check someone has put in the schema ("I want this data not empty"). If a cast from null to empty string is needed, that can be done in the pipeline with a simple $set.

The current implementation is just hiding a data inconsistency, pushing around a "null" string (which is not much different from an empty string) polluting the data.

@mattytr2
Copy link

Any news on this?

@cosimomeli cosimomeli closed this Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants