Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tracking unknown fields. #129

Merged
merged 7 commits into from
Sep 1, 2017
Merged

Conversation

judah
Copy link
Collaborator

@judah judah commented Aug 31, 2017

  • Every message has a new field containing a list of unknown fields:
    unknownFields :: Message msg => Lens' msg [TaggedValue]
  • Unknown fields are preserved by decodeMessage, encodeMessage,
    and showMessage
  • Unknown fields still cause an error for readMessage.

A few TODOs:

  • For now, unknown groups are printed sub-optimally by showMessage: the
    start/end group tags (and everything in between) all get displayed as
    individual fields, rather than being organized into a sub-struct.
  • The discardUnknownFields function isn't recursive, unlike in other
    languages.
  • The Ord instance doesn't try to do anything special, just treating
    the unknown fields as a list of values. If it really matters then
    discardUnknownFields can help resolve the ambiguity.

- Every message has a new field containing a list of unknown fields:
  ```unknownFields :: Message msg => Lens' msg [TaggedValue]```
- Unknown fields are preserved by `decodeMessage`, `encodeMessage`,
  and `showMessage`
- Unknown fields still cause an error for `readMessage`.

A few TODOs:
- For now, unknown groups are printed sub-optimally by `showMessage`: the
  start/end group tags (and everything in between) all get displayed as
  individual fields, rather than being organized into a sub-struct.
- The `discardUnknownFields` function isn't recursive, unlike in other
  languages.
- The Ord instance doesn't try to do anything special, just treating
  the unknown fields as a list of values.  If it really matters then
  `discardUnknownFields` can help resolve the ambiguity.
@@ -14,6 +14,7 @@ module Data.ProtoLens.Arbitrary
) where

import Data.ProtoLens.Message
import Data.ProtoLens.Encoding.Wire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this import?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -53,20 +53,23 @@ parseMessage :: forall msg . Message msg => Parser () -> Parser msg
parseMessage end = do
(msg, unsetFields) <- loop def requiredFields
if Map.null unsetFields
then return $ reverseRepeatedFields fields msg
then return $ over unknownFields reverse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why reversed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind. it's immediately afterwards :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line causes the list to be accumulated in reversed order:
https://github.com/google/proto-lens/pull/129/files/137e2aab090f96cd7d254531d41cd6537b28ca22#diff-f6d9e63733ce67e5fce8bbe122336aa5R64

It's similar to what we do for repeated fields:
https://github.com/google/proto-lens/pull/129/files/137e2aab090f96cd7d254531d41cd6537b28ca22#diff-f6d9e63733ce67e5fce8bbe122336aa5R126

A little bit of a hack, but at least isolated to this file.

Originally we used Seq for repeated fields, but it seemed to make the API unnecessarily complicated. (And right now it's tricky to have a different type just for accumulation during decode.)

else fail $ "Missing required fields "
++ show (map fieldDescriptorName
$ Map.elems $ unsetFields)
where
fields = fieldsByTag descriptor
addUnknown :: TaggedValue -> msg -> msg
addUnknown !f = over unknownFields (\(!xs) -> f : xs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be cleaner as over unknownFields (f :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs the bang pattern or else horrible space leaks happen. I've sent out #130 to add a helper function and use it elsewhere in this file; once that's in I'll use it here too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, missed the bang. makes sense

requiredFields = Map.filter isRequired fields
loop :: msg -> Map.Map Tag (FieldDescriptor msg)
-> Parser (msg, Map.Map Tag (FieldDescriptor msg))
loop msg unsetFields = ((msg, unsetFields) <$ end)
<|> do
tv@(TaggedValue tag _) <- getTaggedValue
case Map.lookup tag fields of
Nothing -> loop msg unsetFields
Nothing -> (loop $! addUnknown tv msg) unsetFields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the strictness? to avoid big thunks if a big message is entirely unknown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pretty much. In particular, the addUnknown thunk might otherwise hold onto the old value of msg, which can be painful if there are a bunch of unknown fields in a row (e.g.: an unknown, unpacked repeated field).

Copy link
Contributor

@isovector isovector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@judah judah merged commit e5aa6cf into google:master Sep 1, 2017
avdv pushed a commit to avdv/proto-lens that referenced this pull request Aug 9, 2023
- Every message has a new field containing a list of unknown fields:
  ```unknownFields :: Message msg => Lens' msg [TaggedValue]```
- Unknown fields are preserved by `decodeMessage`, `encodeMessage`,
  and `showMessage`
- Unknown fields still cause an error for `readMessage`.

A few TODOs:
- For now, unknown groups are printed sub-optimally by `showMessage`: the
  start/end group tags (and everything in between) all get displayed as
  individual fields, rather than being organized into a sub-struct.
- The `discardUnknownFields` function isn't recursive, unlike in other
  languages.
- The Ord instance doesn't try to do anything special, just treating
  the unknown fields as a list of values.  If it really matters then
  `discardUnknownFields` can help resolve the ambiguity.
ylecornec pushed a commit to ylecornec/proto-lens that referenced this pull request Feb 19, 2024
- Every message has a new field containing a list of unknown fields:
  ```unknownFields :: Message msg => Lens' msg [TaggedValue]```
- Unknown fields are preserved by `decodeMessage`, `encodeMessage`,
  and `showMessage`
- Unknown fields still cause an error for `readMessage`.

A few TODOs:
- For now, unknown groups are printed sub-optimally by `showMessage`: the
  start/end group tags (and everything in between) all get displayed as
  individual fields, rather than being organized into a sub-struct.
- The `discardUnknownFields` function isn't recursive, unlike in other
  languages.
- The Ord instance doesn't try to do anything special, just treating
  the unknown fields as a list of values.  If it really matters then
  `discardUnknownFields` can help resolve the ambiguity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants