-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DriverManager::get_drivers_for_filename
for the ability to auto detect compatible Driver
s for writing data
#510
Conversation
Tested locally for:
Note: The original code suggested in #506 had the tests for writing capabilities of the driver, and I've kept that as, for reading a file, |
The tests pass locally but are failing in CI, So I thought it could be because we need to install something for GPKG driver to be present in the ubuntu distribution. So I've added conditional in the test to only test extension if the driver we're looking for is already there. It should work. |
Found the problem: The previous versions don't support Relevant link: https://gdal.org/drivers/vector/gpkg.html#compressed-files
But
So I just made a condition on the |
I checked that all upto gdal 3.1.3 has That's why I left it as it was. This link shows that ubuntu-lts has gdal 3.0 https://askubuntu.com/questions/1267844/installing-libgdal-dev-on-ubuntu-20-04 I've made it to check for that in the test as well. |
Ok, should I remove the ES test? I don't understand it, and I just added it for complete tests. I can't find if something changed in version 3.1 for that. Or should I add the condition of >= 3010000 for it as well. |
EDIT: Found the problem, you can only read the last 2 paragraph.
I don't know why it's failing there. the things we're checking seems fine. Should we just put lol, the name was different The logic for the function is correct, tests are troublesome. |
src/driver.rs
Outdated
/// ```text | ||
/// ["GPKG"] | ||
/// ``` | ||
pub fn get_drivers_for_filename(filename: &str, is_vector: bool) -> Vec<Driver> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is is_vector
necessary? Is it possible to have multiple drivers that read the same file extension but differentiate on vector vs. raster data? If so, is this something we need to impose on every user (IOW, could we let the caller deal with that if it's a rare edge case?)
I guess what I'm wondering here is if we should only be evaluating against DMD_EXTENSION
and DMD_EXTENSIONS
, and defer capability filtering to a different mechanism (e.g. extension method on Driver
). The method name implies we're only looking at matching drivers against filename (extension).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. In principle, the user might have all sorts of extra drivers installed. Matching the GDAL logic seemed like a safe approach.
We could ask Even, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic I went by is this: there can be multiple drivers for the same extension, and some could support raster some vector and some both. And this function is (related with other comments) made to know the driver in advance while creating a file as if we have a file already, then we can call Dataset::open
without driver information. But we cannot create and save data without deciding the driver first. And if we're writing a data, we'd want the list of drivers that supports that type of data, hence that bool.
It was also in the function Inicola sent me as an example, so it's my understanding of that, not my innovation.
src/driver.rs
Outdated
let can_create = d.metadata_item("DCAP_CREATE", "").is_some() | ||
|| d.metadata_item("DCAP_CREATECOPY", "").is_some(); | ||
let check_vector = is_vector && d.metadata_item("DCAP_VECTOR", "").is_some(); | ||
let check_raster = !is_vector && d.metadata_item("DCAP_RASTER", "").is_some(); | ||
let check_vector_translate = | ||
is_vector && d.metadata_item("DCAP_VECTOR_TRANSLATE_FROM", "").is_some(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above... i.e. should we consider selection and filtering two different operations?
src/driver.rs
Outdated
let filename = filename.to_ascii_lowercase(); | ||
let e = match filename.rsplit_once('.') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to ensure true logical parity with GDAL C behavior, we'd use CPLGetExtension
. 50/50 on whether that's good idea or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider using the rust's "extension()" method on the Path
as well, but considering that it can only do the last "." onwards, which requires us to do the endswith
on the filename for .shp.zip
and .gpkg.shp
, I thought just using them directly might be better. And the online protocol addresses with prefix (like Elasticsearch
in the test) may not be taken as a valid path, so I just did the string thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this work for Path
s. I think you can use extension
and then file_stem
and again extension
to check for this without that much ceremony.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path
works fine with those, especially as they're just valid file names on Linux. Path::new
is infallible.
I'd like this because otherwise it's pretty annoying to convert a Path
or PathBuf
to a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly I don't want to do it lol.
Like for that: startswith()
we need to do for "PG:" and similar paths, we'd need to convert the path into string right? And iirc it can't convert it into direct string and has to go through Debug trait and ends up with quoted string or something.
I can try, but unless we can use the Path
args without having to turn it back to string, it feels like we'll just end up asking for a Path and then internally convert it into string anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a Path it's harder to get a string.
That conflicts with the first sentence :D
Users can just call [to_string_lossy](https://doc.rust-lang.org/std/path/struct.Path.html#method.to_string_lossy)
. If it's easy for us, it's easy for them.
Anyway, I'm trying to do it with AsRef<Path>
. I'll make a commit if I am successful.
Edit: users have to call it once to pass the path as string currently.
If we take Path, we have to call to_string_lossy
for extension, for filename (second extension), and again to check for prefix...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's easy for us, it's easy for them.
It's easy for us in the sense that it's infallible, it requires no unsafe code, and it's almost always efficient. It's harder for them because it's ugly and maybe they won't know about to_string_lossy
so they might try to debug-format them, as you wanted to do.
There's also something to be said about being lossy, but I guess we only support UTF-8 file names, while GDAL might, in principle, support arbitrary byte sequences). But we hide it, and we could support non-UTF-8 file names today -- at least on Linux -- without changing our API.
On the other hand, if we make the user call to_string_lossy
on a non-UTF-8 path, they'll get a different name at the output. It's fine when we do it (and also when the user does it) because it won't affect the driver selection in practice. But as you can see, it's a lot of relatively subtle stuff to think about.
Also, you'll see that to_string_lossy
returns a Cow<'_, str>
, which might cause some people to give up and go back to Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I'm currently having that same problem. it seems I can't use Cow<'_, str> on pattern. And might have to do, Variants of Cow::
. At this point doing if statement seems easier than match. It's really a lot of complications when we have to turn it back into string. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's leave it like this, I'll try to clean it up tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the abovementioned problem with Cow
, and not being able to use match properly and so on. I converted it into String after fighting with borrow checking for a while trying to use str.
src/driver.rs
Outdated
for i in 0..DriverManager::count() { | ||
let d = DriverManager::get_driver(i).expect("Index for this loop should be valid"); | ||
let can_create = d.metadata_item("DCAP_CREATE", "").is_some() | ||
|| d.metadata_item("DCAP_CREATECOPY", "").is_some(); | ||
let check_vector = is_vector && d.metadata_item("DCAP_VECTOR", "").is_some(); | ||
let check_raster = !is_vector && d.metadata_item("DCAP_RASTER", "").is_some(); | ||
let check_vector_translate = | ||
is_vector && d.metadata_item("DCAP_VECTOR_TRANSLATE_FROM", "").is_some(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this description of what GDAL does when Dataset::open
is called.
https://github.com/OSGeo/gdal/blob/master/gcore/gdaldataset.cpp#L3630-L3645
From my 5 minute reading it looks like some of the logic is delegated to GDALIdentifyDriverEx
:
https://github.com/OSGeo/gdal/blob/master/gcore/gdaldriver.cpp#L2649C25-L2649C45
My concern here is not wanting to diverge from the accepted GDAL logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GDALIdentifyDriverEx
is used for opening, but that's usually not necessary.
The functions I linked to in #506 (comment) is used by the GDAL binaries to pick an output format. If we offer this function, we should match the latter, and use a better name (get_driver(s)_for_writing
or something like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to change the function name. I was also looking for ideas as the ones I came up got too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions I linked to in #506 (comment) is used by the GDAL binaries to pick an output format.
Ahhh... I thought there was something already in the code, but didn't realize it was C++ only :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is for binary apps. Not the part of the gdal
library. Otherwise, I was planning on just using the function from gdal-sys
.
src/driver.rs
Outdated
/// drivers and returns the matches. The determined driver is | ||
/// checked for writing capabilities as | ||
/// [`Dataset::open`](Dataset::open) can already open datasets | ||
/// with just path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why we limit returned drivers to ones with "write" capabilities... maybe read-only use cases might also want a list of drivers to pick from (e.g. when reading JP2k, there are several options and you might want to inspect the metadata to decide on which to select).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, initially I was thinking that, but then I realized that when we already have a dataset, we can just call Dataset::open
and then look at the driver. Having a dataset you can open means you have a lot in your arsenal to detect the driver, like the metadata you mentioned.
The use case we're thinking is to when we need to write a dataset, currently the only way I could find to write to file is to create a file from a driver, which requires you to know the driver in advance. Of course, you can make the user pass the driver name along with the filename and call it a day (that's what I was doing on one of my program), but all users are not likely to know about the driver names. And if we can more or less guess based on the file extension, then we make it convenient and remove any user error (could write a Tiff file in .shp extension file and it'll write it). And since GIS programs let you write to any file, making the program only dependent on the filename makes it easier to integrate with other programs as well.
I think renaming the function to mention it's for writing should help with it.
That's the appropriate place :) |
Thank you. That was more of a comment due to worry that the tests might fail on the CI or on other people's computer because some drivers are not available. But that should be fixed now. Thought the tests did give a lot of trouble for other reasons. |
Nit: maybe we should use something more common like PG instead of ElasticSearch? |
Done. I also renamed the funtion. Idk if PG will also have some quirks on the tests like ElasticSearch though. Hopefully not |
src/driver.rs
Outdated
/// ```text | ||
/// ["GPKG"] | ||
/// ``` | ||
pub fn guess_drivers_for_write(filename: &str, is_vector: bool) -> Vec<Driver> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bikeshed here, but should we consider:
- Also providing
guess_drivers_for_read
- Have the signature
guess_drivers_for_write<P: AsRef<Path>>(filename: P, ...)
to be more coherent with other methods accepting files. (&str
isAsRef<Path>
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about guess_drivers_for_read
. There's already an API that works for existing files, and do we really need to find a driver for a file that doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider using AsRef<Path>
, but I thought the connections like PG:https://xxxxxx
might not work or have unexpected problems. And we need to convert it back to string to use the "endswith" or "split" for double extensions. Hence, I went with string. I haven't tested if those connections work as path or not though.
LGTM |
This is my vote, assuming the semantics are/will be the same. Would be nice to expose the capability for users using older versions of 3.x, but then add a |
Yeah, I think we can add it now (with an appropriate signature for the native one), with a note that it's implemented differently before 3.9. |
I pushed a couple of changes, but still want to do another pass before merging, sorry. |
Wouldn't the name |
Yeah, probably. I had |
It's still a path in case of PG and others, just not a local one. Should be fine if people check the doc strings, if they don't I thought |
should help implementing georust/gdal#510
should help implementing georust/gdal#510
This is finalized, right? No more modifications? |
@@ -10,6 +10,8 @@ | |||
|
|||
- <https://github.com/georust/gdal/pull/504> | |||
|
|||
- Added `DriverManager::guess_driver_for_write` and `DriverManager::guess_drivers_for_write` for the ability to auto detect compatible `Driver`(s) for writing data. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add the PR link here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Done.
src/driver.rs
Outdated
@@ -461,6 +592,11 @@ impl DriverManager { | |||
} | |||
} | |||
|
|||
pub enum DriverProperties { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the name DriverType
, as "Properties" usually implies a collection of settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I was initially thinking that properties might be weird. It's renamed now.
Looks good to me. @lnicola OK if I merge? |
Yes, but I think it still needs a rebase. I'm a little worried about that enum, since we should probably add a variat (bit) to warn when there are multiple matching drivers (GDAL has a second parameter, but we don't need to match that). That would make the name a little unfortunate, and something like DriverPreference or GetOutputDriverFlags would be nicer. But let's merge this now, as I'm not going to have much time for it in the near future and I don't want to keep delaying it forever. |
Is a "Squash and merge" sufficient? |
Feel free to try it, I'm seeing:
|
@Atreyagaurav thanks for your hard, patient work on this. We all appreciate it! :) |
Thank you everyone for reviewing and the guidance. And for making gdal available in rust. I'll try to contribute more if I find some simple issues I can fix. |
CHANGES.md
if knowledge of this change could be valuable to users.Fixes #506
I have the tests at the end of
src/driver.rs
, let me know if I should remove it.I've only tested for getting the drivers here, but I tried using that function to see if the code I wrote that wrote GPKG files can write SHP now it works. Though some driver specific functions seem to make it not work if we're not careful (like
out_data.start_transaction()?
works for GPKG but not SHP).