Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db.table_names() comes up empty on GCS #673

Closed
wjones127 opened this issue Dec 1, 2023 · 2 comments
Closed

db.table_names() comes up empty on GCS #673

wjones127 opened this issue Dec 1, 2023 · 2 comments
Assignees
Labels
bug Something isn't working needs triaging An outstanding issue that have not yet received a response and needs to be triaged Python Python SDK

Comments

@wjones127
Copy link
Contributor

From a user report:

>>> import lancedb
>>> db = lancedb.connect("gs://xxxx/lance")
>>> db.table_names()
[]
>>> db.open_table("meta")
LanceTable(meta)
>>> t = db.open_table("meta")
>>> len(t.to_pandas())
11
>>>

We use PyArrow filesystems under the hood to introspect the bucket:

paths = filesystem.get_file_info(
fs.FileSelector(get_uri_location(self.uri))
)

However, PyArrow's GCSFileSystem is known to report common prefixes as empty if there isn't a "directory marker": apache/arrow#32403 object-store-rs doesn't create these directory markers.

I suspect that is the underlying issue. We could either create the directory markers or we could use some other mechanism to list the bucket.

@wjones127 wjones127 added bug Something isn't working Python Python SDK labels Dec 1, 2023
@wjones127 wjones127 self-assigned this Dec 1, 2023
@sergiocorreia
Copy link

FWIW, I get the same error on a Windows installation (Win10+Py3.12), and for a different reason

Given a path C:/foo/bar.lancedb, get_uri_location() returns /foo/bar.lancedb which then will obviously fail when passed to filesystem.get_file_info() because it's not a valid path or file.

If instead of doing

paths = filesystem.get_file_info( 
     fs.FileSelector(get_uri_location(self.uri)) 
 )

We do:

paths = filesystem.get_file_info( 
     fs.FileSelector(self.uri)
 )

Then the code works correctly, on Windows.

@changhiskhan changhiskhan added the needs triaging An outstanding issue that have not yet received a response and needs to be triaged label Dec 21, 2023
@westonpace
Copy link
Contributor

The table_names feature should be fixed in #1059

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triaging An outstanding issue that have not yet received a response and needs to be triaged Python Python SDK
Projects
None yet
Development

No branches or pull requests

4 participants