Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to dataset URI creation for LanceTable #703

Closed
jaycgibbs opened this issue Dec 14, 2023 · 9 comments · Fixed by #724
Closed

Changes to dataset URI creation for LanceTable #703

jaycgibbs opened this issue Dec 14, 2023 · 9 comments · Fixed by #724

Comments

@jaycgibbs
Copy link

This is a bit of a niche problem that would matter primarily for Windows users, but dataset URI creation within lancedb.table.LanceTable._dataset_uri right now makes use of os.path.join which, on a Windows machine, means that a connection URI joined with a table name inserts backslashes instead of forward slashes. Would it be possible to either have this section modified to account for different filesystems? Apologies if this is already solvable / fixable via some method that I'm not aware of

@changhiskhan
Copy link
Contributor

Ok so it seems like we can use os.sep for this purpose. But it's not super straight forward. In windows, if the url is remote (eg S3), then you still want the backslashes instead of os.sep. To distinguish the two, we could perhaps use urlparse to get the uri scheme.

My understanding on windows is that the scheme for local paths will return one of 1) "", 2) "file", or 3) a single drive letter. I don't have a windows machine - would you be able to confirm that?

For the local path then, we can do something like os.path.join(base, os.sep, subdir). Could you try this on your windows machine? os.path.join("C:\Downloads", os.sep, "lancedb") does this give the expected result?

Thanks!

@jaycgibbs
Copy link
Author

On my (Windows) machine,

  • os.path.join("C:\Downloads", os.sep, "lancedb") returns C:\lancedb (ignores the previous segment)
  • os.path.join("C:\Downloads", "lancedb") returns C:\Downloads\lancedb

@changhiskhan
Copy link
Contributor

And just to clarify, does the backslash screw up the path reference on windows? What error are you seeing?

@sergiocorreia
Copy link

FYI, os.path.join does not expect os.sep as one of its arguments docs

On an urelated note, maybe consider using pathlib:

>>> from pathlib import Path
>>>Path("C:\Downloads") / "lancedb"
WindowsPath('C:/Downloads/lancedb')

>>>str(Path("C:\Downloads") / "lancedb")
'C:\\Downloads\\lancedb'

@jaycgibbs
Copy link
Author

jaycgibbs commented Dec 18, 2023

@changhiskhan It doesnt cause an error -- in my case I was creating a LanceTable in a given namespace / folder within S3. But because of the incorrect slash, a table with the name folder\table_name was created rather than a table within the folder

agree with @sergiocorreia that pathlib seems like the best thing to use here

@changhiskhan
Copy link
Contributor

FYI, os.path.join does not expect os.sep as one of its arguments docs

On an urelated note, maybe consider using pathlib:

>>> from pathlib import Path
>>>Path("C:\Downloads") / "lancedb"
WindowsPath('C:/Downloads/lancedb')

>>>str(Path("C:\Downloads") / "lancedb")
'C:\\Downloads\\lancedb'

The main issue is that pathlib doesn't work well for remote uri's. But I guess if we have to parse the uri scheme anyways it's probably the only option here.

@sergiocorreia
Copy link

The main issue is that pathlib doesn't work well for remote uri's. But I guess if we have to parse the uri scheme anyways it's probably the only option here.

If you go with pathlib, maybe then consider epath from Google: https://github.com/google/etils/tree/main/etils/epath which allows gs://, aws://, s3:// , etc.

See also this link for the motivation: https://discuss.python.org/t/make-pathlib-extensible/3428/144

@changhiskhan
Copy link
Contributor

etils looks interesting. Definitely nice to have one api for manipulating paths. It requires python 3.10 though and we still have a bunch of folks on py3.9 and EOL isn't very soon.

changhiskhan added a commit that referenced this issue Dec 20, 2023
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes #703
@changhiskhan
Copy link
Contributor

@sergiocorreia @jaycgibbs lmk if you have thoughts on the PR and whether it'll solve your problem

wjones127 pushed a commit that referenced this issue Dec 20, 2023
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes #703
changhiskhan added a commit that referenced this issue Dec 20, 2023
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes #703
changhiskhan added a commit that referenced this issue Dec 20, 2023
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes #703

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this issue Apr 5, 2024
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes lancedb#703

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
westonpace pushed a commit that referenced this issue Apr 5, 2024
Use pathlib for local paths so that pathlib
can handle the correct separator on windows.

Closes #703

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants