New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unknown field support for csharp #3936

Merged
merged 9 commits into from Dec 13, 2017

Conversation

Projects
None yet
3 participants
@anandolee
Contributor

anandolee commented Nov 21, 2017

No description provided.

@anandolee anandolee requested a review from jskeet Nov 21, 2017

@googlebot googlebot added the cla: yes label Nov 21, 2017

anandolee added some commits Nov 22, 2017

using NUnit.Framework;
namespace Google.Protobuf
{

This comment has been minimized.

@jskeet

jskeet Nov 23, 2017

Contributor

Just starting to review this - I'll be sending a full review later. But before I start, a general point: please reformat all the C# code to use spaces rather than tabs for consistency. (If you're using Visual Studio, it's easy to set this as the default, then use Ctrl-E, D to reformat a whole document.)

@jskeet

jskeet Nov 23, 2017

Contributor

Just starting to review this - I'll be sending a full review later. But before I start, a general point: please reformat all the C# code to use spaces rather than tabs for consistency. (If you're using Visual Studio, it's easy to set this as the default, then use Ctrl-E, D to reformat a whole document.)

This comment has been minimized.

@anandolee

anandolee Nov 30, 2017

Contributor

Done

@anandolee

anandolee Nov 30, 2017

Contributor

Done

@jskeet

We need to think carefully about the mutability of UnknownField and UnknownFieldSet, along with how "empty" can be efficiently implemented in both cases.

There are two options I'd consider:

  • Have an immutable "empty" instance, and make all "mutation" methods return a value of the same type, documenting that the call will mutate any non-empty values and return this, or create a new instance and return it if it's called on empty

  • Use null for empty instead

The two options have different pros and cons.

The global nature of preserving unknown fields worries me as well - is that what's happening in other languages? What's the migration aspect described in CodedInputStream.cs?

Show outdated Hide outdated csharp/src/Google.Protobuf/CodedInputStream.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/CodedInputStream.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/Collections/Lists.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Nov 30, 2017

Contributor

Yes, preserving unknown fields are happening in all other languages from 3.4.0. C++ have already changed the default behavior to preserve in 3.5.0 You can see the release log for more info.

For the DefaultInstance issue, I've remove the one in UnknownFieldSet and use null instead. The UnknownField may still need a default one. I don't quit understand your option 1. Do you mean that add a check in each mutate method, just return if it is the DefaultInstance ?

Contributor

anandolee commented Nov 30, 2017

Yes, preserving unknown fields are happening in all other languages from 3.4.0. C++ have already changed the default behavior to preserve in 3.5.0 You can see the release log for more info.

For the DefaultInstance issue, I've remove the one in UnknownFieldSet and use null instead. The UnknownField may still need a default one. I don't quit understand your option 1. Do you mean that add a check in each mutate method, just return if it is the DefaultInstance ?

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Nov 30, 2017

Contributor

I'd suggest leaving the field null by default even if unknown fields are enabled - I'd expect most parse operations not to end up finding any unknown fields, so let's avoid the extra allocation.

Instead, if you create a static method in UnknownFieldSet that accepts an existing field set, and creates a new one if necessary, then you can have:

unknownFields = UnknownFields.MergeFieldFrom(unknownFields, input)

In equality, you can just use

Equals(unknownFields, other.unknownFields)

... which will call Equals if and only if both values are non-null. Note that this relies on a field set never being empty - null is effectively the representation of an empty unknown field set.

I'd also suggest renaming the field to _unknownFields so that it won't conflict with a message field called "unknown_fields".

Contributor

jskeet commented Nov 30, 2017

I'd suggest leaving the field null by default even if unknown fields are enabled - I'd expect most parse operations not to end up finding any unknown fields, so let's avoid the extra allocation.

Instead, if you create a static method in UnknownFieldSet that accepts an existing field set, and creates a new one if necessary, then you can have:

unknownFields = UnknownFields.MergeFieldFrom(unknownFields, input)

In equality, you can just use

Equals(unknownFields, other.unknownFields)

... which will call Equals if and only if both values are non-null. Note that this relies on a field set never being empty - null is effectively the representation of an empty unknown field set.

I'd also suggest renaming the field to _unknownFields so that it won't conflict with a message field called "unknown_fields".

@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 1, 2017

Contributor

Have removed the DefaultInstance.
Changed unknownFields to _unknownFields in code gen.
Also changed to use Equals(_unknownFields, other._unknownFields) in code gen.

For unknownFields = UnknownFields.MergeFieldFrom(unknownFields, input) suggestion, I think it is not needed. Unknown fields are most used in code gen which does not need this support

Contributor

anandolee commented Dec 1, 2017

Have removed the DefaultInstance.
Changed unknownFields to _unknownFields in code gen.
Also changed to use Equals(_unknownFields, other._unknownFields) in code gen.

For unknownFields = UnknownFields.MergeFieldFrom(unknownFields, input) suggestion, I think it is not needed. Unknown fields are most used in code gen which does not need this support

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Dec 1, 2017

Contributor

For unknownFields = UnknownFields.MergeFieldFrom(unknownFields, input) suggestion, I think it is not needed. Unknown fields are most used in code gen which does not need this support

I'm not sure what you mean. The point is that that would allow you to keep _unknownFields null until it needs to be populated. Anyway, I'll try to review the new code very early next week.

Contributor

jskeet commented Dec 1, 2017

For unknownFields = UnknownFields.MergeFieldFrom(unknownFields, input) suggestion, I think it is not needed. Unknown fields are most used in code gen which does not need this support

I'm not sure what you mean. The point is that that would allow you to keep _unknownFields null until it needs to be populated. Anyway, I'll try to review the new code very early next week.

@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 1, 2017

Contributor

Oh, that's what you mean. I've added those methods.
Thanks for all your feedback

Contributor

anandolee commented Dec 1, 2017

Oh, that's what you mean. I've added those methods.
Thanks for all your feedback

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Dec 4, 2017

Contributor

I think you need to regenerate all the C# files - it won't build at the moment.
If we're using null for a missing field set, do we still need Clear() on either UnknownFieldSet or UnknownField?

Contributor

jskeet commented Dec 4, 2017

I think you need to regenerate all the C# files - it won't build at the moment.
If we're using null for a missing field set, do we still need Clear() on either UnknownFieldSet or UnknownField?

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Dec 5, 2017

Contributor

Can I check, is this now ready for complete review as far as you're aware?

Contributor

jskeet commented Dec 5, 2017

Can I check, is this now ready for complete review as far as you're aware?

@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 6, 2017

Contributor

Yes, I think it is ready fro complete review

Contributor

anandolee commented Dec 6, 2017

Yes, I think it is ready fro complete review

@jskeet

Definitely getting there - sorry for this taking so many iterations.

Is the intention that users should be able to get at the unknown fields for a message? Currently I can only see them being visible by serializing the message.

Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
public override int GetHashCode()
{
int ret = 1;
foreach (KeyValuePair<int, UnknownField> field in fields)

This comment has been minimized.

@jskeet

jskeet Dec 6, 2017

Contributor

Presumably you're using ^ to make the order irrelevant? It might be worth explicitly stating that.

@jskeet

jskeet Dec 6, 2017

Contributor

Presumably you're using ^ to make the order irrelevant? It might be worth explicitly stating that.

This comment has been minimized.

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

anandolee Dec 7, 2017

Contributor

Done

Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
private UnknownField GetOrAddField(int number)
{
if (lastField != null && number == lastFieldNumber)

This comment has been minimized.

@jskeet

jskeet Dec 6, 2017

Contributor

When would lastField be null, but lastFieldNumber be equal to number in an unexpected way?

(In other words, can we just remove the first condition?)

@jskeet

jskeet Dec 6, 2017

Contributor

When would lastField be null, but lastFieldNumber be equal to number in an unexpected way?

(In other words, can we just remove the first condition?)

This comment has been minimized.

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

anandolee Dec 7, 2017

Contributor

Done

Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
return result;
}
/// <summary>

This comment has been minimized.

@jskeet

jskeet Dec 6, 2017

Contributor

Currently all of the methods below here can be internal, can't they? (I'm generally in favour of keeping things internal or private until we definitely need them to be public.)

@jskeet

jskeet Dec 6, 2017

Contributor

Currently all of the methods below here can be internal, can't they? (I'm generally in favour of keeping things internal or private until we definitely need them to be public.)

This comment has been minimized.

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

Thanks for all your feedbacks

Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownField.cs Outdated
return result;
}
/// <summary>

This comment has been minimized.

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

anandolee Dec 7, 2017

Contributor

Done

Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
Show outdated Hide outdated csharp/src/Google.Protobuf/UnknownFieldSet.cs Outdated
public override int GetHashCode()
{
int ret = 1;
foreach (KeyValuePair<int, UnknownField> field in fields)

This comment has been minimized.

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

anandolee Dec 7, 2017

Contributor

Done

@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 7, 2017

Contributor

It is not required to provide APIs to get Unknown Field. Some languages like c++ have this API, some(like python) do not provide it. We could not provide it at least now

Contributor

anandolee commented Dec 7, 2017

It is not required to provide APIs to get Unknown Field. Some languages like c++ have this API, some(like python) do not provide it. We could not provide it at least now

@jskeet

Okay, hopefully my last set of comments now :)

If we're not exposing the fields themselves, let's really not expose them:

  • Make UnknownField internal
  • Make the UnknownFieldSet constructor internal
  • Make UnknownFieldSet.AddOrReplaceField internal

Importantly, at that point the only way of getting an UnknownFieldSet instance is to populate one with a field. That's great because it means we never need to worry about comparing null with an empty UnknownFieldSet - whereas otherwise, both would effectively represent "there aren't any unknown fields in this message". If we revisit that decision, we'll need to provide an easy way of saying "I have two UnknownFieldSet references, either of which may be null and either of which may be non-null, but referring to an empty field set - please consider null and empty to be equivalent."

Just two more questions:

  • Is it deliberate that Clone doesn't clone unknown fields? (That could be slightly surprising in some cases.)
  • Can I confirm that we'll be removing the property in CodedInputStream before the next public release? It's not clear what's meant by "once migration is done".
@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 8, 2017

Contributor

Have changed the 3 methods to internal.

Should also clone the unknown fields. I've added the corresponding code and tests. Sorry I missed it.

We don't have a clear timeline for the migration, this is the java version:
https://github.com/google/protobuf/blob/master/java/core/src/main/java/com/google/protobuf/CodedInputStream.java#L470
I also moved the PreserveUnknownsDefault check from generated code to UnknownFieldSet.cs for future compatibility reason

Contributor

anandolee commented Dec 8, 2017

Have changed the 3 methods to internal.

Should also clone the unknown fields. I've added the corresponding code and tests. Sorry I missed it.

We don't have a clear timeline for the migration, this is the java version:
https://github.com/google/protobuf/blob/master/java/core/src/main/java/com/google/protobuf/CodedInputStream.java#L470
I also moved the PreserveUnknownsDefault check from generated code to UnknownFieldSet.cs for future compatibility reason

@jskeet

jskeet approved these changes Dec 8, 2017

Just spotted one last thing that can be made internal.

So long as we don't plan on releasing a version before PreserveUnknownsDefault is removed, I'm otherwise happy.

Show outdated Hide outdated csharp/src/Google.Protobuf/CodedInputStream.cs Outdated
@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 8, 2017

Contributor

For other languages, we have released a version that have PreserveUnknownsDefault (default false). Thus users can migrate their code. Then release a version with PreserveUnknownsDefault true and the last step is remove it (None of them have removed PreserveUnknownsDefault).

So we are indeed going to release a version which have PreserveUnknownsDefault.

Contributor

anandolee commented Dec 8, 2017

For other languages, we have released a version that have PreserveUnknownsDefault (default false). Thus users can migrate their code. Then release a version with PreserveUnknownsDefault true and the last step is remove it (None of them have removed PreserveUnknownsDefault).

So we are indeed going to release a version which have PreserveUnknownsDefault.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Dec 8, 2017

Contributor

So we are indeed going to release a version which have PreserveUnknownsDefault.

Then removing the property later will be a breaking change, meaning that the library should be revved to version 4.x to follow semantic versioning. I'm very, very keen on not breaking compatibility like this. Changing the default over time seems reasonable, but I'm nervous over it being part of the public API.

One alternative option would be to make it an internal property that could be controlled via an external mechanism, e.g. an environment variable. That's not great either, but at least it doesn't completely violate compatibility.

Contributor

jskeet commented Dec 8, 2017

So we are indeed going to release a version which have PreserveUnknownsDefault.

Then removing the property later will be a breaking change, meaning that the library should be revved to version 4.x to follow semantic versioning. I'm very, very keen on not breaking compatibility like this. Changing the default over time seems reasonable, but I'm nervous over it being part of the public API.

One alternative option would be to make it an internal property that could be controlled via an external mechanism, e.g. an environment variable. That's not great either, but at least it doesn't completely violate compatibility.

@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 8, 2017

Contributor

Just discussed with Jisi. He thinks it is OK to change the behavior directly(always preserve) and add a DiscardUnknownFields(), as C# does not have to support internal usages. Does this make sense?

Contributor

anandolee commented Dec 8, 2017

Just discussed with Jisi. He thinks it is OK to change the behavior directly(always preserve) and add a DiscardUnknownFields(), as C# does not have to support internal usages. Does this make sense?

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Dec 8, 2017

Contributor

So remove the property, change codegen as if it were always true, and add the method? Makes sense to me.

Unfortunately we can't add the DiscardUnknownFields method to the interface itself (again, a breaking change) - that's the next piece of design work, I guess. Maybe make the "just always preserve them" change, get that merged, and then work on discarding?

Contributor

jskeet commented Dec 8, 2017

So remove the property, change codegen as if it were always true, and add the method? Makes sense to me.

Unfortunately we can't add the DiscardUnknownFields method to the interface itself (again, a breaking change) - that's the next piece of design work, I guess. Maybe make the "just always preserve them" change, get that merged, and then work on discarding?

@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 8, 2017

Contributor

Have removed the DiscardUnknownFields, so the behavior is changed to always preserve.
Sounds good to work on discarding after this merge.

Contributor

anandolee commented Dec 8, 2017

Have removed the DiscardUnknownFields, so the behavior is changed to always preserve.
Sounds good to work on discarding after this merge.

@jskeet

I haven't re-reviewed in detail, but the change looks good from a skim. Let me know if there's anything else you need me to look at specifically.

As I've merged the double.NaN work, you may want to rebuild and regenerate to make the rebase easier.

anandolee added some commits Dec 12, 2017

Remove unknown field test from conformance failure_list_csharp.txt
Comments out failure tests in compatibility_tests (Need to add back after DiscardUnknownFields is supported)
@anandolee

This comment has been minimized.

Show comment
Hide comment
@anandolee

anandolee Dec 13, 2017

Contributor

ok to test

Contributor

anandolee commented Dec 13, 2017

ok to test

@anandolee anandolee merged commit bfd254e into protocolbuffers:master Dec 13, 2017

21 of 23 checks passed

Jenkins Build finished.
Details
MacOS JRuby (Allowed Failure) Kokoro build finished
Details
Jenkins 32bit Build finished.
Details
Linux C# Kokoro build finished
Details
Linux C++ Distcheck Kokoro build finished
Details
Linux Java Compatibility Kokoro build finished
Details
Linux Python Compatibility Kokoro build finished
Details
MacOS C++ Kokoro build finished
Details
MacOS C++ Distcheck Kokoro build finished
Details
MacOS JavaScript Kokoro build finished
Details
MacOS Obj-C CocoaPods Integration Kokoro build finished
Details
MacOS Obj-C OS X Kokoro build finished
Details
MacOS Obj-C iOS Debug (Allowed Failure) Kokoro build finished
Details
MacOS Obj-C iOS Release (Allowed Failure) Kokoro build finished
Details
MacOS PHP5.6 Kokoro build finished
Details
MacOS PHP7.0 Kokoro build finished
Details
MacOS Python Kokoro build finished
Details
MacOS Python C++ (Allowed Failure) Kokoro build finished
Details
MacOS Ruby 2.1 Kokoro build finished
Details
MacOS Ruby 2.2 (Allowed Failure) Kokoro build finished
Details
cla/google All necessary CLAs are signed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment