Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve to json support #3365

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sernst
Copy link

@sernst sernst commented Jul 2, 2023

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Introduce to_dict to the objects included in the existing
JSON serialization process for ReadableSpan, MetricsData,
LogRecord, and Resource objects. This includes adding
to_dict to objects that are included within the serialized
data structures of these objects. In places where repr()
serialization was used, it has been replaced by a
JSON-compatible serialization instead. Inconsistencies between
null and empty string values were preserved, but in cases
where attributes are optional, an empty dictionary is provided
as well to be more consistent with cases where attributes are
not optional and an empty dictionary represents no attributes
were specified on the containing object.

These changes also included:

  1. Dictionary typing was included for all the to_dict
       methods for clarity in subsequent usage.
  2. DataT and DataPointT were did not include the
       exponential histogram types in point.py, and so those
       were added with new to_json and to_dict methods as
       well for consistency. It appears that the exponential
       types were added later and including them in the types
       might have been overlooked. Please let me know if that
       is a misunderstanding on my part.
  3. OrderedDict was removed in a number of places
       associated with the existing to_json functionality
       given its redundancy for Python 3.7+ compatibility.
       I was assuming this was legacy code for previous
       compatibility, but please let me know if that's not
       the case as well.
  4. to_dict was added to objects like SpanContext,
       Link, and Event that were previously being
       serialized by static methods within the ReadableSpan
       class and accessing private/protected members.
       This simplified the serialization in the ReadableSpan
       class and those methods were removed. However, once
       again, let me know if there was a larger purpose to
       those I could not find.

Finally, I used to_dict as the method names here to be
consistent with other related usages. For example,
dataclasses.asdict(). But, mostly because that was by
far the most popular usage within the larger community:

328k files found on GitHub that define to_dict functions,
which include some of the most popular Python libraries
to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define to_dictionary
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use
to_dictionary instead let me know and I will adjust.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Passes tox testing as outlined in CONTRIBUTING.md

Does This PR Require a Contrib Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@sernst sernst requested a review from a team as a code owner July 2, 2023 13:58
@sernst sernst mentioned this pull request Jul 2, 2023
1 task
Copy link
Member

@pmcollins pmcollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of these changes.

@@ -194,30 +210,28 @@ def __eq__(self, other: object) -> bool:
return NotImplemented
return self.__dict__ == other.__dict__

def to_dict(self) -> LogRecordDict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not add a new method here, but use __iter__ and dict instead:

# dict_test.py
class LogRecord:

    def __init__(self, body, severity_number):

        self.body = body
        self.severity_number = severity_number

    def __iter__(self):

        for key in self.__dict__:
            yield key, getattr(self, key)


log_record = LogRecord("the body", "1")

print(dict(log_record))
tigre@hilleman:~/sandbox$ python3 dict_test.py 
{'body': 'the body', 'severity_number': '1'}

@@ -130,6 +131,13 @@ def attributes(self) -> types.Attributes:
pass


class LinkDict(typing.TypedDict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing.TypedDict was introduced in 3.8. This is why tests are failing for pypy and 3.7.

@lzchen
Copy link
Contributor

lzchen commented Aug 16, 2023

@sernst
Are you still working on this?

@sernst
Copy link
Author

sernst commented Aug 18, 2023

@lzchen Yep, will re-engage here soon.

Currently, the `LogRecord.to_json`
method serializes the resource object
using `repr` of its attributes. This
differs from how the serialization
process is handled in `ReadableSpan.to_json`
and `MetricsData.to_json`, which utilize the
`Resource.to_json` functionality directly.
Using `repr` does not produce a json-parseable
output and doesn't follow the same depth of
serialization as the other two signal types.
Therefore, this change carries over the
serialization process from the spans and
metrics signal types to the logs type.

Fixes open-telemetry#3345
Introduce `to_dict` to the objects included in the existing
JSON serialization process for `ReadableSpan`, `MetricsData`,
`LogRecord`, and `Resource` objects. This includes adding
`to_dict` to objects that are included within the serialized
data structures of these objects. In places where `repr()`
serialization was used, it has been replaced by a
JSON-compatible serialization instead. Inconsistencies between
null and empty string values were preserved, but in cases
where attributes are optional, an empty dictionary is provided
as well to be more consistent with cases where attributes are
not optional and an empty dictionary represents no attributes
were specified on the containing object.

These changes also included:
1. Dictionary typing was included for all the `to_dict`
   methods for clarity in subsequent usage.
2. `DataT` and `DataPointT` were did not include the
   exponential histogram types in point.py, and so those
   were added with new `to_json` and `to_dict` methods as
   well for consistency. It appears that the exponential
   types were added later and including them in the types
   might have been overlooked. Please let me know if that
   is a misunderstanding on my part.
3. OrderedDict was removed in a number of places
   associated with the existing `to_json` functionality
   given its redundancy for Python 3.7+ compatibility.
   I was assuming this was legacy code for previous
   compatibility, but please let me know if that's not
   the case as well.
4. `to_dict` was added to objects like `SpanContext`,
   `Link`, and `Event` that were previously being
   serialized by static methods within the `ReadableSpan`
   class and accessing private/protected members.
   This simplified the serialization in the `ReadableSpan`
   class and those methods were removed. However, once
   again, let me know if there was a larger purpose to
   those I could not find.

Finally, I used `to_dict` as the method names here to be
consistent with other related usages. For example,
`dataclasses.asdict()`. But, mostly because that was by
far the most popular usage within the larger community:

328k files found on GitHub that define `to_dict` functions,
which include some of the most popular Python libraries
to date:
https://github.com/search?q=%22def+to_dict%28%22+language%3APython&type=code&p=1&l=Python

versus

3.3k files found on GitHub that define `to_dictionary`
functions:
https://github.com/search?q=%22def+to_dictionary%28%22+language%3APython&type=code&l=Python

However, if there is a preference for this library to use
`to_dictionary` instead let me know and I will adjust.

Fixes open-telemetry#3364
@ocelotl ocelotl self-assigned this Feb 2, 2024
@pmcollins
Copy link
Member

Hi all, thank you for your contributions up to this point -- I think this is a positive PR. Can we work on getting it across the line? I'm happy to help too.

IMO these to_dict implementations are helpful:
e.g. https://github.com/open-telemetry/opentelemetry-python/pull/3780/files#r1524771566

@sernst
Copy link
Author

sernst commented Mar 18, 2024

I'm happy to pick it back up if others think it is useful. It didn't seem like there was much excitement for it, and I wasn't really excited to back support for 3.7 given that version was EoL by the Python community in general already. But I believe that the extended lifetime support for this project beyond the Python community's accepted EoL lifetime is now passed as well, which would simplify the process. Some decisions need to be made on the exact implementation though, and I'm not sure what comments made here are opinion versus decision. If some of that can be cleared up, I'd be happy to give it a refresh assuming that the code hasn't drifted too much to make it a redo.

@pmcollins
Copy link
Member

I'm happy to pick it back up if others think it is useful. It didn't seem like there was much excitement for it, and I wasn't really excited to back support for 3.7 given that version was EoL by the Python community in general already. But I believe that the extended lifetime support for this project beyond the Python community's accepted EoL lifetime is now passed as well, which would simplify the process. Some decisions need to be made on the exact implementation though, and I'm not sure what comments made here are opinion versus decision. If some of that can be cleared up, I'd be happy to give it a refresh assuming that the code hasn't drifted too much to make it a redo.

Awesome -- I think this repo would benefit from these changes.

And yes, since 3.7 is no longer supported that conversation probably can be resolved now. (I have resolved my comments above as well.)

Looks like we need to agree upon whether we use to_dict vs __iter__? to_dict seems a bit more straightforward and complementary to the existing to_json methods but I'd like to hear your thoughts @ocelotl.

@ocelotl
Copy link
Contributor

ocelotl commented Mar 20, 2024

I'm happy to pick it back up if others think it is useful. It didn't seem like there was much excitement for it, and I wasn't really excited to back support for 3.7 given that version was EoL by the Python community in general already. But I believe that the extended lifetime support for this project beyond the Python community's accepted EoL lifetime is now passed as well, which would simplify the process. Some decisions need to be made on the exact implementation though, and I'm not sure what comments made here are opinion versus decision. If some of that can be cleared up, I'd be happy to give it a refresh assuming that the code hasn't drifted too much to make it a redo.

Awesome -- I think this repo would benefit from these changes.

And yes, since 3.7 is no longer supported that conversation probably can be resolved now. (I have resolved my comments above as well.)

Looks like we need to agree upon whether we use to_dict vs __iter__? to_dict seems a bit more straightforward and complementary to the existing to_json methods but I'd like to hear your thoughts @ocelotl.

We should not add a method named to_dict. Every other object that can be converted to a dictionary gets converted to a dictionary by using dict(), this should not be the exception.

BTW, I was working on a PR that would add __repr__ to all classes: #3662. Had to put it in standby for the time being but I think there is a pattern in the way #3662 represents objects as strings that could be used here too.

@pmcollins
Copy link
Member

Sounds fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants