fix: Removed unnecessary and problematic column caching #2352

raulbonet · 2024-03-30T16:58:48Z

fixes #2325

📚 Documentation preview 📚: https://meltano-sdk--2352.org.readthedocs.build/en/2352/

for more information, see https://pre-commit.ci

codspeed-hq · 2024-03-30T17:01:49Z

CodSpeed Performance Report

Merging #2352 will not alter performance

_{Comparing raulbonet:rb/2325/deprecate-caching (3f50ae4) with main (1d1afbe)}

Summary

✅ 6 untouched benchmarks

codecov · 2024-03-30T17:03:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.97%. Comparing base (4f07b67) to head (3f50ae4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2352      +/-   ##
==========================================
- Coverage   88.98%   88.97%   -0.01%     
==========================================
  Files          54       54              
  Lines        4767     4763       -4     
  Branches      926      924       -2     
==========================================
- Hits         4242     4238       -4     
  Misses        364      364              
  Partials      161      161

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pnadolny13 · 2024-04-01T17:12:59Z

I agree that this is a bug and that it should be fixed but I also worry about reverting back to the poor performance we were seeing in target-postgres MeltanoLabs/target-postgres#192 (i.e. prepare table took 30s per table) as a result. I havent dug into this at all but I wonder if theres a simple alternative way to fix the cache rather removing it completely. cc @edgarrmondragon in case you have ideas.

edgarrmondragon · 2024-04-01T18:14:48Z

I agree that this is a bug and that it should be fixed but I also worry about reverting back to the poor performance we were seeing in target-postgres MeltanoLabs/target-postgres#192 (i.e. prepare table took 30s per table) as a result. I havent dug into this at all but I wonder if theres a simple alternative way to fix the cache rather removing it completely. cc @edgarrmondragon in case you have ideas.

Yeah, I worry too about reverting without more context. A closed issue is linked from this PR, so I'd prefer a new issue be opened with clear intructions on how to reproduce the bad behavior.

raulbonet · 2024-04-02T21:49:21Z

Hi guys,

Thank you for your answers. I think I am not managing to understand the code or we are not understanding each other.

The column caching is currently not even being used in target-postgres.

The method get_table_columns() of the SDK, where the caching was implemented, is being overridden.
This is precisely the reason that your current tests are not failing in target-postgres

def get_table_columns(  # type: ignore[override]
        self,
        schema_name: str,
        table_name: str,
        connection: sa.engine.Connection,
        column_names: list[str] | None = None,
    ) -> dict[str, sa.Column]:
        """Return a list of table columns.

        Overrode to support schema_name

        Args:
            schema_name: schema name.
            table_name: table name to get columns for.
            connection: database connection.
            column_names: A list of column names to filter to.

        Returns:
            An ordered list of column objects.
        """
        inspector = sa.inspect(connection)
        columns = inspector.get_columns(table_name, schema_name)

        return {
            col_meta["name"]: sa.Column(
                col_meta["name"],
                col_meta["type"],
                nullable=col_meta.get("nullable", False),
            )
            for col_meta in columns
            if not column_names
            or col_meta["name"].casefold() in {col.casefold() for col in column_names}
        }

I am precisely trying to deprecate some of these custom methods in target-postgres to use the built-in ones in the SDK and, when doing this, I started to leverage the caching and tests started to fail.
The issue was closed as a mistake: I merged the changes in my personal fork and this also closed this issue automatically, I just re-opened it.

@edgarrmondragon , can you please tell me what context are you missing? About reproducing the error, this is also in the original issue.

raulbonet · 2024-04-02T21:59:35Z

@pnadolny13 about ideas on how to fix this to leverage caching, if I may, it's a bit what I was saying here:

I think currently is a bit unclear the separation of concerns of SQLConnection and SQLSink, which in my opinion is why the target-postgres ended up diverging so much from the Meltano SDK: the methods in SQLConnection try to both encapsulate wrappers for common operations and manage optimization of these operations.

BUT, at the same time, the SQLConnection is re-used and what's created every time that there is a schema change is a SQLSink, not a SQLConnection, which is reused.

And whenever you want to do something a bit out of the box, you need to override the whole method, instead of being able to call super() and do your custom thing afterwards (what happened in target-postgres).

If we maintain this logic (a sink is supposed to have a stable schema), then all caching logic has to go on the Sink, including the managing of transactions (what you do in Postgres).

However, I am not sure if the problems you had in target-postgres came from this. In my opinion, right now the biggest problem that I see in target-postgres (I mentioned it somewhere, I don't know where), is that prepare_table() is called inside process_batch(), which exponentially increases the calls.

But maybe I am just not understanding the code, I am the new guy here! :)

edgarrmondragon · 2024-04-12T17:15:20Z

I think I understand the context a bit better now. It seems that #1864 made the sort of caching introduced by #1779 unnecessary?

Note: I think this is the same cause behind MeltanoLabs/target-snowflake#165 so we may wanna apply these changes there too?

edgarrmondragon · 2024-04-12T17:18:03Z

I'm happy to merge this and ship a prerelease that we can test against MeltanoLabs/target-postgres :)

deprecate column caching

a88701f

raulbonet requested a review from edgarrmondragon as a code owner March 30, 2024 16:58

[pre-commit.ci] auto fixes from pre-commit.com hooks

5e0e780

for more information, see https://pre-commit.ci

edgarrmondragon added the Community-Contributed PR label Apr 4, 2024

cjohnhanson and others added 2 commits April 12, 2024 08:36

Merge branch 'main' into rb/2325/deprecate-caching

f09675f

Merge branch 'main' into rb/2325/deprecate-caching

3721c22

edgarrmondragon changed the title ~~fix: deprecate column caching~~ fix: Removed unnecessary and problematic column caching Apr 12, 2024

edgarrmondragon approved these changes Apr 12, 2024

View reviewed changes

edgarrmondragon added 2 commits April 15, 2024 13:58

Merge branch 'main' into rb/2325/deprecate-caching

5ad307b

Merge branch 'main' into rb/2325/deprecate-caching

3f50ae4

edgarrmondragon added this pull request to the merge queue Apr 16, 2024

Merged via the queue into meltano:main with commit 834ea2d Apr 16, 2024
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Removed unnecessary and problematic column caching #2352

fix: Removed unnecessary and problematic column caching #2352

raulbonet commented Mar 30, 2024 •

edited by github-actions bot

Loading

codspeed-hq bot commented Mar 30, 2024 •

edited

Loading

codecov bot commented Mar 30, 2024 •

edited

Loading

pnadolny13 commented Apr 1, 2024 •

edited

Loading

edgarrmondragon commented Apr 1, 2024 •

edited

Loading

raulbonet commented Apr 2, 2024

raulbonet commented Apr 2, 2024 •

edited

Loading

edgarrmondragon commented Apr 12, 2024 •

edited

Loading

edgarrmondragon commented Apr 12, 2024

fix: Removed unnecessary and problematic column caching #2352

fix: Removed unnecessary and problematic column caching #2352

Conversation

raulbonet commented Mar 30, 2024 • edited by github-actions bot Loading

codspeed-hq bot commented Mar 30, 2024 • edited Loading

CodSpeed Performance Report

Merging #2352 will not alter performance

Summary

codecov bot commented Mar 30, 2024 • edited Loading

Codecov Report

pnadolny13 commented Apr 1, 2024 • edited Loading

edgarrmondragon commented Apr 1, 2024 • edited Loading

raulbonet commented Apr 2, 2024

raulbonet commented Apr 2, 2024 • edited Loading

edgarrmondragon commented Apr 12, 2024 • edited Loading

edgarrmondragon commented Apr 12, 2024

raulbonet commented Mar 30, 2024 •

edited by github-actions bot

Loading

codspeed-hq bot commented Mar 30, 2024 •

edited

Loading

codecov bot commented Mar 30, 2024 •

edited

Loading

pnadolny13 commented Apr 1, 2024 •

edited

Loading

edgarrmondragon commented Apr 1, 2024 •

edited

Loading

raulbonet commented Apr 2, 2024 •

edited

Loading

edgarrmondragon commented Apr 12, 2024 •

edited

Loading