New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change Seek API to be less enum-y #969
Conversation
Yeah I agree in general with these kinds of changes (enum to method). |
👍 I agree with the reasoning here. As a side note, this style of overloading is also not very common as a convention (in contrast to trait-based overloading). |
Linux and the BSDs support more than the three ways listed above to seek in a file:
Using separate functions is even less extensible than enums. |
@mahkoh I don't see much difference in the extensibility between enum SeekFrom {
Start(u64),
Current(i64),
End(i64),
#[cfg(any(target_os="linux",target_os="bsd",...))]
Data(u64),
#[cfg(any(target_os="linux",target_os="bsd",...))]
Hole(u64),
} and fn seek_from_start(&mut self, offset: u64) -> Result<i64>;
fn seek_from_current(&mut self, offset: i64) -> Result<i64>;
fn seek_from_end(&mut self, offset: i64) -> Result<i64>;
#[cfg(any(target_os="linux",target_os="bsd",...))]
fn seek_for_data(&mut self, offset: u64) -> Result<i64>;
#[cfg(any(target_os="linux",target_os="bsd",...))]
fn seek_for_hole(&mut self, offset: u64) -> Result<i64>; |
If you want five methods just for seeking that's probably fine. POSIX manages to combine 4 billion different ways to seek in a file in one method. |
The safe and extensible way to define this API is to use a custom type: mod file {
pub struct Seek(pub i32);
pub const SEEK_SET: Seek = Seek(0);
pub const SEEK_CUR: Seek = Seek(1);
pub const SEEK_END: Seek = Seek(2);
impl File {
fn seek(&mut self, pos: i64, whence: Seek) -> Result<i64> {
}
}
} |
@mahkoh Even if we went back to this, I'm not sure it's a good idea to expose a public constructor for the whence parameter. How would a user know what magic numbers they were allowed to supply? And if we make it private, it becomes very similar to how old_io works right now (and runs into the same concerns raised by @kennytm above) I suppose the question should be: why should we want to combine 4 billion (or rather, 5) methods into a single one? |
I suppose it might be helpful to note that this RFC is for the It might make sense to add some special |
By looking them up in the OS manual. No harm will come from invalid numbers.
You mean like |
It's not combining anything. The OS provides one function and you're trying to split it up because the number of valid inputs in one of the arguments is small. Quite similar to OpenOptions. Just change it to |
Platform-specific APIs can be added via extension traits (which I thought was our strategy). Otherwise the current, previous, and proposed APIs are all inappropriate for handling OS-specific extensions. If we indeed want to transparently support those, then the only tractable design would be basically what the C-API is: magic numbers. |
My point was that there aren't actually 4 billion (or 9223372036854775807, if you'd prefer your newer numbering): we've got 5. And each of them has a special meaning not communicated by In any case, if we do want to expose underlying OS mechanisms to |
The design I posted above supports extensions without extension traits. A seek operation that is not wrapped by the standard library can be introduced by the user simply by defining: const SEEK_DATA: Seek = Seek(linux_consts::SEEK_DATA); With extension traits they will have to write yet another
There is nothing magic about numbers with well known names such as SEEK_DATA. The only thing that is accomplished by using a fixed set of enum variants or functions is that you lock the user into your idea of which platform specific functions should be provided. |
To be clear I wasn't belittling the notion of using plain numbers; just using the best term I know for it. I'm not a huge fan of their usage but it can make sense for an OS-level abstraction. |
|
@mahkoh I think "magic numbers" is a fair characterization of using special numbers when you really mean semantically different things. Replacing random C constants with enums and typed values is a huge part of what gives rust the safe, high-level flavor that it has, whereas the C APIs we bind to are (while usually very flexible) not ergonomic to use directly. I'm not convinced your scheme of using constants is any better than just using an enum. All I'm really seeing as an advantage is that it breaks the currently tight abstraction over the underlying APIs, which allows you to introduce new seek modes. This abstraction leak is both a good and a bad thing. @mahkoh if I'm right, you're viewing it as a positive since it allows you to easily introduce your API for non-standard extensions you might use, but from the POV of the rust project right now, it's actually better for the abstraction to be tight and only extensible via
This is a completely valid concern, but I also view it as a rather short-term issue which we shouldn't let dictate long-term API strategy. Eventually we will be able to expose more and more platform specific APIs and bindings, these things are just hidden right now for the purposes of stability for 1.0. |
That doesn't make any sense. The expression "magic number" has a well known definition that I already posted above.
You should know that "safe" has a very specific meaning in Rust. Doing a wrong seek operation is not unsafe. It's very interesting how you incorrectly apply a word with negative connotation (magic number) to my idea of how this API should look like and then incorrectly apply a word with a positive connotation (safe) to your idea how the API should look like.
Should it be more high level than Python at the cost of flexibility?
The value that is passed to seek in my API is typed.
I do not understand why lseek(fd, offset, SEEK_SET); is less ergonomic than file.seek(Seek::Set(offset));
Correct, me. I don't have to wait for the library maintainers to maybe introduce new variants that can do what I want.
Everything becomes non-standard when the standard you're working with is the intersection of what all supported operating systems provide. Right now you cannot even set the mode on a directory you create with Rust. Everything will have mode 0o777. Because Windows doesn't have modes so that's non-standard. Another example:
With words such as "cleanly" you can justify just about anything. If we're just talking about "stabilize": glibc has had a stable API and ABI for much longer than Rust exists. In contrast to that: Rust has to break the API if it ever introduces a new seek mode into the enum. There is nothing "unstable" about the API I posted above.
Platform specific extension traits do not create good APIs. Assuming the current seek implementation with an enum, the My API above has this covered as the only thing you have to do to extend it to a platform specific functionality is to define a new constant. And then you can continue using the API you've used for years but with new functionality. The current "stabilization process" confuses "stable" with "locked down". |
Expanding on the terrible API extension traits provide: If you use extension traits, then you have to use one function for every platform specific detail. This does not work: if cfg!(target_os = "dragonfly") {
file.seek_dragonfly(...);
} else if cfg!(target_os = "linux") {
file.seek_linux(...);
} etc. This is already a problem with #[cfg(target_os = "dragonfly")]
fn my_oneliner(...) {
file.seek_dragonfly(...);
}
#[cfg(target_os = "linux")]
fn my_oneliner(...) {
file.seek_linux(...);
}
my__oneliner(...); |
@mahkoh the methods don't have to have different names (I don't see how What about, say, fn seek<T: SeekWhence>(&mut self, seek: T)
// or
fn seek<T: SeekWhence>(&mut self, pos: i64, whence: T) used like, |
You can't open a file with the same options on linux and windows but with mode set on linux. You cannot do if cfg!(unix) {
options.mode(...);
} The name of the function is not the only problem. If you reference an item that doesn't exist on the target platform, e.g. SEEK_DATA on windows, then you have to move that piece of code into its own function, away from the rest of the logic.
Probably depends on how |
@huonw That's interesting... My feeling is that a strong API should:
I don't think the second point above can be supported in current Rust, but the API you described might eventually allow a future Rust to support what I'm talking about. To spitball, something like this would eventually be neat to support: // MyType lazily reads from, say, an open socket, and allows seeking through all data that's
// currently been read.
// does not support SEEK_END, since that would require reading all data to find EOF,
// which violates expectation that seeking is a fast operation.
// this use-case also suggests a SEEK_FORWARD and SEEK_REVERSE SeekWhence,
// which act like SEEK_CUR but only allow advancing in a desired direction.
// note that SEEK_FORWARD and SEEK_REVERSE don't actually have analogues
// in the POSIX seek routine.
impl Seek for MyType {
fn seek(&mut self, offset: i64, whence: SeekSet) -> i64 {
self.my_seek_set(offset);
}
fn seek(&mut self, offset: i64, whence: SeekCur) -> i64 {
if offset < 0 {
self.rewind(-offset)
} else {
self.advance(offset)
}
}
// do not implement `seek` for SeekEnd: op not supported for MyType.
}
let x = MyType::new();
x.seek(10, SeekSet);
// x.seek(0, SeekEnd); <- fails to compile, because `seek` is not implemented with SeekEnd parameterization.
fn blah<T: Seek>(x: &mut T) {
// using SeekEnd variant would have to propagate to function signature somehow?
x.seek(0, SeekEnd);
}
// blah(&mut x); <- fails to compile, somehow? blah uses SeekEnd,
// which MyType doesn't provide an implementation for, so the attempt
// to call `blah` should fail to compile, or at least to link. This won't work in this form, but I feel like there's the germ of something here... The Rust generics compilation model (in which a different function may be generated for each unique parameterization of a type) almost seems to allow explicitly controlling the function generation for a specific parameterization. If you can do that, you might also want to avoid generating functions for unsupported parameterizations. I'm not sure this is a good idea, but it's interesting to me all the same. And I like that using a |
@aidancully We can implement something similar using multidispatch: pub trait Seek<T=SeekFrom> {
fn seek(&mut self, offset: T);
} All seekable streams would implement I'm not familiar enough with this to say whether that's a good idea or not, but it's possible. |
👍 to changing the |
Enum params is OK for me. It's common in api design, and seek is not the
|
And It's verbose to have three seek methods. On Mar 20, 2015 8:27 AM, "Liigo Zhuang" com.liigo@gmail.com wrote:
|
The question whether is this particular case, where there are only three variants, it makes sense to have the complication of an enum instead of just having three methods. I find having three methods to be much simpler, cleaner, and easy to use.
I think it's much more verbose to have to type |
Also, from a purely bikeshedding perspective, what about |
As for @kennytm 's code above, enum SeekFrom {
Start(u64),
Current(i64),
End(i64),
#[cfg(any(target_os="linux",target_os="bsd",...))]
Data(u64),
#[cfg(any(target_os="linux",target_os="bsd",...))]
Hole(u64),
} now match Start(0) {
Start(_) | Current(_) | End(_) =>{}
_ =>() // error here for Windows
} so current API do has extensibility problem. EDIT: we may add a |
@rkjnsn " |
Unfortunately I dropped the ball on this one; we need to stabilize this API for beta, and at this point there's not time to make this change. (I personally much prefer the proposed API, but there wasn't a clear consensus anyway, and it's used rarely enough that it probably doesn't matter much). I'm going to close this RFC. Thanks again, @gankro! |
:( Oh well |
The current
seek
API pretty closely mirrors the relevant OS APIs by having a single method with the seek-style specified as a runtime flag. However this just makes usage of the API more awkward. This API is logically 3 distinct methods that are being squashed into one by overloading on an enum variant.I consider this to be a popular "trap" in Rust API design; you see several similar methods and reason that they can in fact be unified be taking an enum, reducing the combinatorics and surface-area of the API. However this is simply shuffling the complexity from methods to variants. It requires the caller to pull the enum in scope, and since the prevailing style is to not pull in variants, leads to stuttery incantations like
reader.seek(SeekFrom::Start(offset))
. In addition it requires a newcomer to lookup what on earth SeekFrom even is.Therefore this amendment proposes the following changes:
reader.seek(SeekFrom::Start(offset))
->reader.seek_from_start(offset)
reader.seek(SeekFrom::Current(offset))
->reader.seek_from_current(offset)
reader.seek(SeekFrom::End(offset))
->reader.seek_from_end(offset)
Since the relevant OS APIs are a single method, this increases the implementation burden on library authors. However I believe that the Rust philosophy generally favours library consumers over authors (see e.g. where the burden of
unsafe
primarily lays). This particular API is one where implementation should be a rare one-off thing, while consumption is most certainly not.