-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Removed unnecessary and problematic column caching #2352
fix: Removed unnecessary and problematic column caching #2352
Conversation
for more information, see https://pre-commit.ci
CodSpeed Performance ReportMerging #2352 will not alter performanceComparing Summary
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2352 +/- ##
==========================================
- Coverage 88.98% 88.97% -0.01%
==========================================
Files 54 54
Lines 4767 4763 -4
Branches 926 924 -2
==========================================
- Hits 4242 4238 -4
Misses 364 364
Partials 161 161 ☔ View full report in Codecov by Sentry. |
I agree that this is a bug and that it should be fixed but I also worry about reverting back to the poor performance we were seeing in target-postgres MeltanoLabs/target-postgres#192 (i.e. prepare table took 30s per table) as a result. I havent dug into this at all but I wonder if theres a simple alternative way to fix the cache rather removing it completely. cc @edgarrmondragon in case you have ideas. |
Yeah, I worry too about reverting without more context. A closed issue is linked from this PR, so I'd prefer a new issue be opened with clear intructions on how to reproduce the bad behavior. |
Hi guys, Thank you for your answers. I think I am not managing to understand the code or we are not understanding each other.
The method def get_table_columns( # type: ignore[override]
self,
schema_name: str,
table_name: str,
connection: sa.engine.Connection,
column_names: list[str] | None = None,
) -> dict[str, sa.Column]:
"""Return a list of table columns.
Overrode to support schema_name
Args:
schema_name: schema name.
table_name: table name to get columns for.
connection: database connection.
column_names: A list of column names to filter to.
Returns:
An ordered list of column objects.
"""
inspector = sa.inspect(connection)
columns = inspector.get_columns(table_name, schema_name)
return {
col_meta["name"]: sa.Column(
col_meta["name"],
col_meta["type"],
nullable=col_meta.get("nullable", False),
)
for col_meta in columns
if not column_names
or col_meta["name"].casefold() in {col.casefold() for col in column_names}
}
@edgarrmondragon , can you please tell me what context are you missing? About reproducing the error, this is also in the original issue. |
@pnadolny13 about ideas on how to fix this to leverage caching, if I may, it's a bit what I was saying here: I think currently is a bit unclear the separation of concerns of BUT, at the same time, the SQLConnection is re-used and what's created every time that there is a schema change is a And whenever you want to do something a bit out of the box, you need to override the whole method, instead of being able to call If we maintain this logic (a sink is supposed to have a stable schema), then all caching logic has to go on the Sink, including the managing of transactions (what you do in Postgres). However, I am not sure if the problems you had in But maybe I am just not understanding the code, I am the new guy here! :) |
I think I understand the context a bit better now. It seems that #1864 made the sort of caching introduced by #1779 unnecessary? Note: I think this is the same cause behind MeltanoLabs/target-snowflake#165 so we may wanna apply these changes there too? |
I'm happy to merge this and ship a prerelease that we can test against MeltanoLabs/target-postgres :) |
fixes #2325
📚 Documentation preview 📚: https://meltano-sdk--2352.org.readthedocs.build/en/2352/