-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Deprecate and standardize exists_table #2905
Conversation
|
can you rebase |
|
can you rebase |
|
@datapythonista can you rebase here |
Unit Test Results 19 files 19 suites 1h 47m 15s ⏱️ Results for commit ffe03a2. ♻️ This comment has been updated with latest results. |
| @@ -266,11 +266,15 @@ def test_close_drops_temp_tables(con, test_data_dir): | |||
|
|
|||
| table = con.parquet_file(hdfs_path) | |||
|
|
|||
| name = table.op().name | |||
| assert con.exists_table(name) is True | |||
| qualified_name = table.op().name | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading through this test, and the code path it's trying to exercise I no longer think this test is useful nor is there any real way to make this test reliable.
The main question here is: what should happen to any object that is referencing a temporary object from an open database connection, when that connection is closed?
I don't think this behavior is well defined, nor should we try to define it. However, since we're depending on __del__ to clean up ImpalaTemporaryTables (roughly speaking) when they are garbage collected, we should definitely not be dropping tables when a connection is closed.
I will submit a PR to address that.
In this PR I think you can remove this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this is actually a different problem (though the lifetime problem still exists).
The issue is that con.exists_table(name) has different semantics than what you're inlining here.
exists_table(like=name) will return true if the list of tables matching name is non-empty.
What you've written here is to assert that whatever tables are listed contains name as an exact match.
The important thing to note is that this is only true when con is connected to the same database that name is contained in. Otherwise, the test will fail.
The solution is to write the assertion as assert con.list_tables(like=name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert the FutureWarning in a test (I think this is how we handled other deprecated items)
also add a release note
| @@ -266,11 +266,15 @@ def test_close_drops_temp_tables(con, test_data_dir): | |||
|
|
|||
| table = con.parquet_file(hdfs_path) | |||
|
|
|||
| name = table.op().name | |||
| assert con.exists_table(name) is True | |||
| qualified_name = table.op().name | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this is actually a different problem (though the lifetime problem still exists).
The issue is that con.exists_table(name) has different semantics than what you're inlining here.
exists_table(like=name) will return true if the list of tables matching name is non-empty.
What you've written here is to assert that whatever tables are listed contains name as an exact match.
The important thing to note is that this is only true when con is connected to the same database that name is contained in. Otherwise, the test will fail.
The solution is to write the assertion as assert con.list_tables(like=name).
|
I saw the But I don't think the problem with the test is that, I did try what you say, and also printed the tables in the database to see if it was a prefix problem, or something related, but the table just doesn't exist. |
|
This is green now. If it's still find to delete the tricky test, this should be ready. |
Can you clarify this a bit? Ibis doesn't currently have this behavior. Here's me executing a few
The table exists, it's just in a different database. What's happening is the prefix is being split off by a regular expression and that prefix is used as the database in a SHOW TABLES IN __ibis_tmp LIKE '__ibis_tmp_<SOME UUID>'query. The We should not delete the test until we understand why it fails. |
ibis/backends/base/__init__.py
Outdated
| '`name in client.list_tables()` instead.', | ||
| FutureWarning, | ||
| ) | ||
| return name in self.client.list_tables(database=database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be implemented the way it was before as len(self.client.list_tables(like=name, database=database)) > 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name won't match tables in other databases, so I don't think you can use in here. Since this isn't doing something trivial, should we reconsider deprecating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standardizing it seems like a good idea FWIW, just not sure about removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand what's Impala doing. I don't think other backends are affected.
We've got a syntax for checking other databases, name in con.list_tables(database='foo'). And I think it's better than this magic failing the test. Just the explicit syntax doesn't seem to work in Impala (or there was an error in my code when I tried that, but I don't think so). I'm using the previous syntax of checking the list of tables with the like param for now, so we can get tests passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand what's Impala doing. I don't think other backends are affected.
Did you read my comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think it's better than this magic failing the test.
Are you referring to the regular expression splitting?
Just the explicit syntax doesn't seem to work in Impala
Are you sure about that? Here's an example that seems to work just fine:
(Pdb) list
265
266 table = con.parquet_file(hdfs_path)
267
268 name = table.op().name
269 breakpoint()
270 -> assert con.exists_table(name) is True
271 con.close()
272
273 assert not con.exists_table(name)
274
275
(Pdb) con.list_tables('__ibis_tmp_81c73096e0924eafba359d2a1de85f85', database='__ibis_tmp')
['__ibis_tmp_81c73096e0924eafba359d2a1de85f85']
(Pdb) name
'__ibis_tmp.`__ibis_tmp_81c73096e0924eafba359d2a1de85f85`'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh, I see, you're right. I didn't fully understand the problem, now it's clear. Then, what's different in Impala is that other backends don't support fully qualified table names, as Impala handles in list_tables. Not sure why the test wasn't working when I splittted the table myself in the test, I guess I missed something.
In any case, I think this PR is fine now. Or is there anything else you would change?
|
I think this is good to go! @datapythonista Thanks for putting up with my reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you rebase & also i think we have a link somewhere for the deprecated apis? pls add to that
|
Good point, added to #2863 and rebased |
|
thanks @datapythonista! |
Not very useful, and backends reimplement it for no reason.