-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to dataset URI creation for LanceTable #703
Comments
Ok so it seems like we can use os.sep for this purpose. But it's not super straight forward. In windows, if the url is remote (eg S3), then you still want the backslashes instead of os.sep. To distinguish the two, we could perhaps use urlparse to get the uri scheme. My understanding on windows is that the scheme for local paths will return one of 1) "", 2) "file", or 3) a single drive letter. I don't have a windows machine - would you be able to confirm that? For the local path then, we can do something like os.path.join(base, os.sep, subdir). Could you try this on your windows machine? os.path.join("C:\Downloads", os.sep, "lancedb") does this give the expected result? Thanks! |
On my (Windows) machine,
|
And just to clarify, does the backslash screw up the path reference on windows? What error are you seeing? |
FYI, On an urelated note, maybe consider using pathlib:
|
@changhiskhan It doesnt cause an error -- in my case I was creating a LanceTable in a given namespace / folder within S3. But because of the incorrect slash, a table with the name agree with @sergiocorreia that pathlib seems like the best thing to use here |
The main issue is that pathlib doesn't work well for remote uri's. But I guess if we have to parse the uri scheme anyways it's probably the only option here. |
If you go with pathlib, maybe then consider epath from Google: https://github.com/google/etils/tree/main/etils/epath which allows gs://, aws://, s3:// , etc. See also this link for the motivation: https://discuss.python.org/t/make-pathlib-extensible/3428/144 |
etils looks interesting. Definitely nice to have one api for manipulating paths. It requires python 3.10 though and we still have a bunch of folks on py3.9 and EOL isn't very soon. |
Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703
@sergiocorreia @jaycgibbs lmk if you have thoughts on the PR and whether it'll solve your problem |
Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703
Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703
Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes lancedb#703 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
This is a bit of a niche problem that would matter primarily for Windows users, but dataset URI creation within
lancedb.table.LanceTable._dataset_uri
right now makes use ofos.path.join
which, on a Windows machine, means that a connection URI joined with a table name inserts backslashes instead of forward slashes. Would it be possible to either have this section modified to account for different filesystems? Apologies if this is already solvable / fixable via some method that I'm not aware ofThe text was updated successfully, but these errors were encountered: