Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(taps): Change SQLStream.schema into a cached property #1745

Merged

Conversation

mjsqu
Copy link
Contributor

@mjsqu mjsqu commented Jun 2, 2023

Schema is called at least once per record and does not change within a stream, so it should be cached for performance benefits.

https://app.slack.com/client/TFG99TU9K/C013Z450LCD/thread/C013Z450LCD-1685703921.187269


📚 Documentation preview 📚: https://meltano-sdk--1745.org.readthedocs.build/en/1745/

Schema is called at least once per record and does not change within a stream, so it should be cached for performance benefits.
@edgarrmondragon edgarrmondragon changed the title feat(taps): Change SQLStream schema into a cached_property feat(taps): Change SQLStream.schema into a cached property Jun 5, 2023
@edgarrmondragon
Copy link
Collaborator

@mjsqu can you fix the failing mypy check? Some of the conversation in python/mypy#5107 (comment) might be helpful.

Thanks!

mjsqu and others added 2 commits June 6, 2023 07:38
mypy checks fail because the schema return type is lru_cache:

```
singer_sdk/streams/sql.py:82: error: Signature of "schema" incompatible with supertype "Stream"  [override]
singer_sdk/streams/sql.py:154: error: Argument "schema" to "get_selected_schema" has incompatible type "_lru_cache_wrapper[Dict[Any, Any]]"; expected "Dict[Any, Any]"  [arg-type]
Found 2 errors in 1 file (checked 59 source files)
```

The update should pass over the `_lru_cache_wrapper` type and return a `Dict` as expected
@mjsqu
Copy link
Contributor Author

mjsqu commented Jun 5, 2023

I had a quick attempt at this, but it didn't work particularly well - plus I think the proposed solution in the thread sets the lru_cache decorator to do nothing (pass), which removes the benefit?

I think I need to follow the example by @qbatten:
0b5d6b0

mjsqu and others added 4 commits June 6, 2023 10:41
- Remove `lru_cache`
- Set `_cached_schema` up as a class variable to hold the schema for each stream
- Check for presence of `self._cached_schema` on the stream
@codecov
Copy link

codecov bot commented Jun 5, 2023

Codecov Report

Merging #1745 (d3c8da5) into main (cbb858a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1745   +/-   ##
=======================================
  Coverage   85.47%   85.48%           
=======================================
  Files          59       59           
  Lines        4888     4891    +3     
  Branches      802      803    +1     
=======================================
+ Hits         4178     4181    +3     
  Misses        510      510           
  Partials      200      200           
Impacted Files Coverage Δ
singer_sdk/streams/sql.py 86.95% <100.00%> (+0.59%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mjsqu
Copy link
Contributor Author

mjsqu commented Jun 5, 2023

@edgarrmondragon 🥳

Thanks for the inspiration @qbatten

Done a quick test in https://github.com/mjsqu/tap-db2/blob/feature/meltano_sdk/tap_db2/client.py - still getting the speed benefit

@edgarrmondragon edgarrmondragon changed the title feat(taps): Change SQLStream.schema into a cached property refactor(taps): Change SQLStream.schema into a cached property Jun 5, 2023
Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjsqu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants