Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Deprecate and standardize exists_table #2905

Merged
merged 32 commits into from
Sep 22, 2021
Merged

CLN: Deprecate and standardize exists_table #2905

merged 32 commits into from
Sep 22, 2021

Conversation

datapythonista
Copy link
Contributor

Not very useful, and backends reimplement it for no reason.

@datapythonista datapythonista added refactor Issues or PRs related to refactoring the codebase backends Issues related to all backends labels Aug 16, 2021
@jreback
Copy link
Contributor

jreback commented Aug 17, 2021

can you rebase

@jreback
Copy link
Contributor

jreback commented Aug 30, 2021

can you rebase

@datapythonista datapythonista mentioned this pull request Sep 6, 2021
@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@datapythonista can you rebase here

@github-actions
Copy link
Contributor

github-actions bot commented Sep 22, 2021

Unit Test Results

       19 files         19 suites   1h 47m 15s ⏱️
10 777 tests   8 735 ✔️   2 042 💤 0 ❌
54 579 runs  43 801 ✔️ 10 778 💤 0 ❌

Results for commit ffe03a2.

♻️ This comment has been updated with latest results.

@@ -266,11 +266,15 @@ def test_close_drops_temp_tables(con, test_data_dir):

table = con.parquet_file(hdfs_path)

name = table.op().name
assert con.exists_table(name) is True
qualified_name = table.op().name
Copy link
Member

@cpcloud cpcloud Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through this test, and the code path it's trying to exercise I no longer think this test is useful nor is there any real way to make this test reliable.

The main question here is: what should happen to any object that is referencing a temporary object from an open database connection, when that connection is closed?

I don't think this behavior is well defined, nor should we try to define it. However, since we're depending on __del__ to clean up ImpalaTemporaryTables (roughly speaking) when they are garbage collected, we should definitely not be dropping tables when a connection is closed.

I will submit a PR to address that.

In this PR I think you can remove this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this is actually a different problem (though the lifetime problem still exists).

The issue is that con.exists_table(name) has different semantics than what you're inlining here.

exists_table(like=name) will return true if the list of tables matching name is non-empty.

What you've written here is to assert that whatever tables are listed contains name as an exact match.

The important thing to note is that this is only true when con is connected to the same database that name is contained in. Otherwise, the test will fail.

The solution is to write the assertion as assert con.list_tables(like=name).

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you assert the FutureWarning in a test (I think this is how we handled other deprecated items)

also add a release note

@jreback jreback added this to the Next release milestone Sep 22, 2021
@@ -266,11 +266,15 @@ def test_close_drops_temp_tables(con, test_data_dir):

table = con.parquet_file(hdfs_path)

name = table.op().name
assert con.exists_table(name) is True
qualified_name = table.op().name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this is actually a different problem (though the lifetime problem still exists).

The issue is that con.exists_table(name) has different semantics than what you're inlining here.

exists_table(like=name) will return true if the list of tables matching name is non-empty.

What you've written here is to assert that whatever tables are listed contains name as an exact match.

The important thing to note is that this is only true when con is connected to the same database that name is contained in. Otherwise, the test will fail.

The solution is to write the assertion as assert con.list_tables(like=name).

@datapythonista
Copy link
Contributor Author

I saw the like issue. I think it's a bug, but I didn't overthink it, as exists_table is being deprecated anyway. I don't think con.exists_able('cat') should be true, if a table catalog exists, but no table cat. This is indirectly being fixed with this clean up.

But I don't think the problem with the test is that, I did try what you say, and also printed the tables in the database to see if it was a prefix problem, or something related, but the table just doesn't exist.

@datapythonista
Copy link
Contributor Author

This is green now. If it's still find to delete the tricky test, this should be ready.

@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2021

I saw the like issue. I think it's a bug, but I didn't overthink it, as exists_table is being deprecated anyway. I don't think con.exists_able('cat') should be true, if a table catalog exists, but no table cat. This is indirectly being fixed with this clean up.

Can you clarify this a bit? Ibis doesn't currently have this behavior. Here's me executing a few exists_table calls in the test:

(Pdb) list
265
266         table = con.parquet_file(hdfs_path)
267
268         name = table.op().name
269         breakpoint()
270  ->     assert con.exists_table(name) is True
271         con.close()
272
273         assert not con.exists_table(name)
274
275
(Pdb) con.list_tables()
['alltypes', 'functional_alltypes', 'tpch_customer', 'tpch_lineitem', 'tpch_nation', 'tpch_orders', 'tpch_part', 'tpch_partsupp', 'tpch_region', 'tpch_region_avro', 'tpch_supplier']
(Pdb) con.exists_table('all')
False
(Pdb) con.exists_table('func')
False
(Pdb) con.exists_table('tpch_')
False

But I don't think the problem with the test is that, I did try what you say, and also printed the tables in the database to see if it was a prefix problem, or something related, but the table just doesn't exist.

The table exists, it's just in a different database.

What's happening is the prefix is being split off by a regular expression and that prefix is used as the database in a

SHOW TABLES IN __ibis_tmp LIKE '__ibis_tmp_<SOME UUID>'

query.

The LIKE doesn't contain any wildcards, so it'll perform an exact match.

We should not delete the test until we understand why it fails.

'`name in client.list_tables()` instead.',
FutureWarning,
)
return name in self.client.list_tables(database=database)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be implemented the way it was before as len(self.client.list_tables(like=name, database=database)) > 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name won't match tables in other databases, so I don't think you can use in here. Since this isn't doing something trivial, should we reconsider deprecating it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardizing it seems like a good idea FWIW, just not sure about removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what's Impala doing. I don't think other backends are affected.

We've got a syntax for checking other databases, name in con.list_tables(database='foo'). And I think it's better than this magic failing the test. Just the explicit syntax doesn't seem to work in Impala (or there was an error in my code when I tried that, but I don't think so). I'm using the previous syntax of checking the list of tables with the like param for now, so we can get tests passing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what's Impala doing. I don't think other backends are affected.

Did you read my comment?

Copy link
Member

@cpcloud cpcloud Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think it's better than this magic failing the test.

Are you referring to the regular expression splitting?

Just the explicit syntax doesn't seem to work in Impala

Are you sure about that? Here's an example that seems to work just fine:

(Pdb) list
265
266         table = con.parquet_file(hdfs_path)
267
268         name = table.op().name
269         breakpoint()
270  ->     assert con.exists_table(name) is True
271         con.close()
272
273         assert not con.exists_table(name)
274
275
(Pdb) con.list_tables('__ibis_tmp_81c73096e0924eafba359d2a1de85f85', database='__ibis_tmp')
['__ibis_tmp_81c73096e0924eafba359d2a1de85f85']
(Pdb) name
'__ibis_tmp.`__ibis_tmp_81c73096e0924eafba359d2a1de85f85`'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh, I see, you're right. I didn't fully understand the problem, now it's clear. Then, what's different in Impala is that other backends don't support fully qualified table names, as Impala handles in list_tables. Not sure why the test wasn't working when I splittted the table myself in the test, I guess I missed something.

In any case, I think this PR is fine now. Or is there anything else you would change?

@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2021

I think this is good to go! @datapythonista Thanks for putting up with my reviews!

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you rebase & also i think we have a link somewhere for the deprecated apis? pls add to that

@datapythonista
Copy link
Contributor Author

Good point, added to #2863 and rebased

@cpcloud cpcloud merged commit ffe03a2 into ibis-project:master Sep 22, 2021
@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2021

thanks @datapythonista!

@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backends Issues related to all backends refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants