Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Centralize JSON SerDe into helper functions #2259

Merged

Conversation

BuzzCutNorman
Copy link
Contributor

@BuzzCutNorman BuzzCutNorman commented Feb 20, 2024

I was wondering if it made sense to move all the serialize and deserialize functions into a helper library that we could reference in other areas. I thought singer_sdk.helper._util could be a place for it.

Copy link

codspeed-hq bot commented Feb 20, 2024

CodSpeed Performance Report

Merging #2259 will not alter performance

Comparing BuzzCutNorman:refactor-move-json-serde-to-helpers (f848370) with main (b065e5e)

Summary

✅ 6 untouched benchmarks

Copy link

codecov bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.41%. Comparing base (b065e5e) to head (f848370).
Report is 143 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2259      +/-   ##
==========================================
+ Coverage   89.23%   89.41%   +0.18%     
==========================================
  Files          54       58       +4     
  Lines        4782     4779       -3     
  Branches      935      933       -2     
==========================================
+ Hits         4267     4273       +6     
+ Misses        359      353       -6     
+ Partials      156      153       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BuzzCutNorman BuzzCutNorman changed the title moved deserialize_json and added serialize_json to _util refactor: Relocate json serialize and deserialize functions in the helpers Feb 20, 2024
@BuzzCutNorman
Copy link
Contributor Author

BuzzCutNorman commented Feb 20, 2024

@edgarrmondragon when you get a chance, please take a look and let me know what you think of this idea. I was working on the msgspec pr and found I had to copy the json encoder code over from io_base to _flattening. After doing that I thought it might make sense to move these functions into the helpers.

@BuzzCutNorman BuzzCutNorman changed the title refactor: Relocate json serialize and deserialize functions in the helpers refactor: Relocate json serialize and deserialize functions into the helpers Feb 20, 2024
@edgarrmondragon edgarrmondragon force-pushed the refactor-move-json-serde-to-helpers branch from ed8f39f to b9faa7f Compare July 12, 2024 22:40
@edgarrmondragon
Copy link
Collaborator

So we currently have this

$ rg -. serde singer_sdk
singer_sdk/helpers/_flattening.py
13:from singer_sdk._singerlib.serde import serialize_json

singer_sdk/contrib/batch_encoder_jsonl.py
9:from singer_sdk._singerlib.serde import serialize_json

singer_sdk/_singerlib/messages.py
11:from singer_sdk._singerlib.serde import serialize_json

singer_sdk/tap_base.py
13:from singer_sdk._singerlib.serde import serialize_json

singer_sdk/io_base.py
14:from singer_sdk._singerlib.serde import deserialize_json

singer_sdk/connectors/sql.py
15:from singer_sdk._singerlib import CatalogEntry, MetadataMapping, Schema, serde
1167:        return serde.serialize_json(obj)
1183:        return serde.deserialize_json(json_str)

singer_sdk/sinks/core.py
19:from singer_sdk._singerlib.serde import deserialize_json

And I think not all of these need to factor out SerDe. In particular, tap_base.py only uses json.dumps to print the discovered catalog. That operation doesn't need to be extremely performant nor does it need (so far as a I know) special decimal or NaN handling.

The others, I'm not sure currently:

  • Flattening may benefit since it's done to all records in a stream.
  • JSONL Batch reading/writing could also benefit, maybe greatly
  • Anything to do with Singer message writing and reading does certainly benefit
  • SQL JSON column reading/writing is an interesting case. I think it might benefit.

These four cases still architecturally benefit from having a centralized approach that was missing previously, and it was causing inconsistent handling of whitespace and decimals.

@edgarrmondragon edgarrmondragon changed the title refactor: Relocate json serialize and deserialize functions into the helpers refactor: Centralize JSON SerDe into helper functions Jul 12, 2024
@edgarrmondragon edgarrmondragon force-pushed the refactor-move-json-serde-to-helpers branch from 804537f to dc5c7a1 Compare July 12, 2024 23:15
@edgarrmondragon edgarrmondragon force-pushed the refactor-move-json-serde-to-helpers branch from ff38602 to 96525f1 Compare July 13, 2024 00:33
This way we can catch something like `msgspec.DecodeError` and raise
the more generic `InvalidInputLine`.
@edgarrmondragon edgarrmondragon force-pushed the refactor-move-json-serde-to-helpers branch from 946a0cc to 5d98422 Compare July 13, 2024 02:55
@edgarrmondragon edgarrmondragon force-pushed the refactor-move-json-serde-to-helpers branch from 5d98422 to 8f390f6 Compare July 13, 2024 02:57
@edgarrmondragon edgarrmondragon force-pushed the refactor-move-json-serde-to-helpers branch from 20a75cd to a5f67a5 Compare July 13, 2024 03:08
@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented Jul 16, 2024

These are the only breaking API changes in this PR according to griffe:

singer_sdk/mapper_base.py:0: InlineMapper.format_message: Public object was removed
singer_sdk/mapper_base.py:0: InlineMapper.write_message: Public object was removed
singer_sdk/mapper_base.py:0: InlineMapper.listen: Public object was removed
singer_sdk/mapper_base.py:0: InlineMapper.deserialize_json: Public object was removed
singer_sdk/tap_base.py:0: SQLTap.format_message: Public object was removed
singer_sdk/tap_base.py:0: SQLTap.write_message: Public object was removed
singer_sdk/tap_base.py:0: Tap.format_message: Public object was removed
singer_sdk/tap_base.py:0: Tap.write_message: Public object was removed
singer_sdk/target_base.py:0: SQLTarget.listen: Public object was removed
singer_sdk/target_base.py:0: SQLTarget.deserialize_json: Public object was removed
singer_sdk/target_base.py:0: Target.listen: Public object was removed
singer_sdk/target_base.py:0: Target.deserialize_json: Public object was removed
singer_sdk/io_base.py:0: logger: Public object was removed
singer_sdk/io_base.py:0: SingerReader: Public object was removed
singer_sdk/io_base.py:0: SingerWriter: Public object was removed

I think these are acceptable. The plugin class methods weren't really removed, but the superclass now lives in the private _singerlib submodule. The important singer_sdk.io_base members are not really removed thanks to the __all__ declaration in singer_sdk._singerlib._encoding.

@edgarrmondragon
Copy link
Collaborator

Thanks @BuzzCutNorman!

@edgarrmondragon edgarrmondragon merged commit 4a066fe into meltano:main Jul 16, 2024
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants