Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change Seek API to be less enum-y #969

Closed

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Mar 12, 2015

The current seek API pretty closely mirrors the relevant OS APIs by having a single method with the seek-style specified as a runtime flag. However this just makes usage of the API more awkward. This API is logically 3 distinct methods that are being squashed into one by overloading on an enum variant.

I consider this to be a popular "trap" in Rust API design; you see several similar methods and reason that they can in fact be unified be taking an enum, reducing the combinatorics and surface-area of the API. However this is simply shuffling the complexity from methods to variants. It requires the caller to pull the enum in scope, and since the prevailing style is to not pull in variants, leads to stuttery incantations like reader.seek(SeekFrom::Start(offset)). In addition it requires a newcomer to lookup what on earth SeekFrom even is.

Therefore this amendment proposes the following changes:

  • reader.seek(SeekFrom::Start(offset)) -> reader.seek_from_start(offset)
  • reader.seek(SeekFrom::Current(offset)) -> reader.seek_from_current(offset)
  • reader.seek(SeekFrom::End(offset)) -> reader.seek_from_end(offset)

Since the relevant OS APIs are a single method, this increases the implementation burden on library authors. However I believe that the Rust philosophy generally favours library consumers over authors (see e.g. where the burden of unsafe primarily lays). This particular API is one where implementation should be a rare one-off thing, while consumption is most certainly not.

@blaenk
Copy link
Contributor

blaenk commented Mar 12, 2015

Yeah I agree in general with these kinds of changes (enum to method).

@aturon
Copy link
Member

aturon commented Mar 12, 2015

👍 I agree with the reasoning here.

As a side note, this style of overloading is also not very common as a convention (in contrast to trait-based overloading).

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

Linux and the BSDs support more than the three ways listed above to seek in a file:

SEEK_DATA and SEEK_HOLE are nonstandard extensions also present in Solaris, FreeBSD, and DragonFly BSD; they are proposed for inclusion in the next POSIX revision (Issue 8).

Using separate functions is even less extensible than enums.

@kennytm
Copy link
Member

kennytm commented Mar 12, 2015

@mahkoh I don't see much difference in the extensibility between

enum SeekFrom {
   Start(u64),
   Current(i64), 
   End(i64),
   #[cfg(any(target_os="linux",target_os="bsd",...))]
   Data(u64),
   #[cfg(any(target_os="linux",target_os="bsd",...))]
   Hole(u64),
}

and

fn seek_from_start(&mut self, offset: u64) -> Result<i64>;

fn seek_from_current(&mut self, offset: i64) -> Result<i64>;

fn seek_from_end(&mut self, offset: i64) -> Result<i64>;

#[cfg(any(target_os="linux",target_os="bsd",...))]
fn seek_for_data(&mut self, offset: u64) -> Result<i64>;

#[cfg(any(target_os="linux",target_os="bsd",...))]
fn seek_for_hole(&mut self, offset: u64) -> Result<i64>;

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

If you want five methods just for seeking that's probably fine. POSIX manages to combine 4 billion different ways to seek in a file in one method.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

The safe and extensible way to define this API is to use a custom type:

mod file {
    pub struct Seek(pub i32);

    pub const SEEK_SET: Seek = Seek(0);
    pub const SEEK_CUR: Seek = Seek(1);
    pub const SEEK_END: Seek = Seek(2);

    impl File {
        fn seek(&mut self, pos: i64, whence: Seek) -> Result<i64> {
        }
    }
}

@codyps
Copy link

codyps commented Mar 12, 2015

@mahkoh
The separate pos and whence methodology (which is used in old_io) runs into this issue that one needs to cast (via as) the pos parameter quite a bit if only seeking from the start or end, as those variants typically have u64 typed values (not i64). When I converted some of my code that uses seek from old_io to new io, I was able to remove a bunch of as i64, which I'd really like to not add back in.

Even if we went back to this, I'm not sure it's a good idea to expose a public constructor for the whence parameter. How would a user know what magic numbers they were allowed to supply? And if we make it private, it becomes very similar to how old_io works right now (and runs into the same concerns raised by @kennytm above)

I suppose the question should be: why should we want to combine 4 billion (or rather, 5) methods into a single one?

@codyps
Copy link

codyps commented Mar 12, 2015

I suppose it might be helpful to note that this RFC is for the Seek API, not the File api, and additional functions based on OS support for seeking file descriptors doesn't make sense (as Seek is used for things not backed by file descriptors).

It might make sense to add some special seek variants to File, but I'm not sure that's in scope here.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

How would a user know what magic numbers they were allowed to supply?

By looking them up in the OS manual. No harm will come from invalid numbers.

why should we want to combine 4 billion (or rather, 5) methods into a single one?

You mean like seek_from_start_0, seek_from_start_1, seek_from_start_2, and so on? That would be 9223372036854775807 functions, not 4 billion.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

why should we want to combine 4 billion (or rather, 5) methods into a single one?

It's not combining anything. The OS provides one function and you're trying to split it up because the number of valid inputs in one of the arguments is small.

Quite similar to OpenOptions. Just change it to open_read, open_read_write, open_read_write_append, open_write, and open_write_append.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 12, 2015

Platform-specific APIs can be added via extension traits (which I thought was our strategy). Otherwise the current, previous, and proposed APIs are all inappropriate for handling OS-specific extensions. If we indeed want to transparently support those, then the only tractable design would be basically what the C-API is: magic numbers.

@codyps
Copy link

codyps commented Mar 12, 2015

POSIX manages to combine 4 billion different ways to seek in a file in one method.

why should we want to combine 4 billion (or rather, 5) methods into a single one?

You mean like seek_from_start_0, seek_from_start_1, seek_from_start_2, and so on? That would be 9223372036854775807 functions, not 4 billion.

My point was that there aren't actually 4 billion (or 9223372036854775807, if you'd prefer your newer numbering): we've got 5. And each of them has a special meaning not communicated by seek_5334

In any case, if we do want to expose underlying OS mechanisms to seek in interesting ways, I'm not sure that should go on trait Seek. Seek ends up being used for things that aren't OS provided, and burdening all the users & implementers (who aren't using file descriptors) with an API designed to work with FILE * or int fd may not be the best choice.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

Platform-specific APIs can be added via extension traits (which I thought was our strategy). Otherwise the current, previous, and proposed APIs are all inappropriate for handling OS-specific extensions.

The design I posted above supports extensions without extension traits. A seek operation that is not wrapped by the standard library can be introduced by the user simply by defining:

const SEEK_DATA: Seek = Seek(linux_consts::SEEK_DATA);

With extension traits they will have to write yet another lseek binding even though it's already there in the stdlib.

magic numbers

There is nothing magic about numbers with well known names such as SEEK_DATA. The only thing that is accomplished by using a fixed set of enum variants or functions is that you lock the user into your idea of which platform specific functions should be provided.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 12, 2015

To be clear I wasn't belittling the notion of using plain numbers; just using the best term I know for it. I'm not a huge fan of their usage but it can make sense for an OS-level abstraction.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 12, 2015

In computer programming, the term magic number
[...]
Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

@reem
Copy link

reem commented Mar 13, 2015

@mahkoh I think "magic numbers" is a fair characterization of using special numbers when you really mean semantically different things. Replacing random C constants with enums and typed values is a huge part of what gives rust the safe, high-level flavor that it has, whereas the C APIs we bind to are (while usually very flexible) not ergonomic to use directly.

I'm not convinced your scheme of using constants is any better than just using an enum. All I'm really seeing as an advantage is that it breaks the currently tight abstraction over the underlying APIs, which allows you to introduce new seek modes.

This abstraction leak is both a good and a bad thing. @mahkoh if I'm right, you're viewing it as a positive since it allows you to easily introduce your API for non-standard extensions you might use, but from the POV of the rust project right now, it's actually better for the abstraction to be tight and only extensible via std::os::platform extension APIs. A tight abstraction allows for clear behavior and stability, whereas an open abstraction like you're discussing is much harder to cleanly stabilize.

With extension traits they will have to write yet another lseek binding even though
it's already there in the stdlib.

This is a completely valid concern, but I also view it as a rather short-term issue which we shouldn't let dictate long-term API strategy. Eventually we will be able to expose more and more platform specific APIs and bindings, these things are just hidden right now for the purposes of stability for 1.0.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 13, 2015

I think "magic numbers" is a fair characterization of using special numbers when you really mean semantically different things.

That doesn't make any sense. The expression "magic number" has a well known definition that I already posted above.

Replacing random C constants with enums and typed values is a huge part of what gives rust the safe

You should know that "safe" has a very specific meaning in Rust. Doing a wrong seek operation is not unsafe. It's very interesting how you incorrectly apply a word with negative connotation (magic number) to my idea of how this API should look like and then incorrectly apply a word with a positive connotation (safe) to your idea how the API should look like.

high-level flavor that it has

Should it be more high level than Python at the cost of flexibility?

Replacing random C constants with enums and typed values

The value that is passed to seek in my API is typed.

whereas the C APIs we bind to are (while usually very flexible) not ergonomic to use directly

I do not understand why

lseek(fd, offset, SEEK_SET);

is less ergonomic than

file.seek(Seek::Set(offset));

which allows you to introduce new seek modes.

Correct, me. I don't have to wait for the library maintainers to maybe introduce new variants that can do what I want.

you're viewing it as a positive since it allows you to easily introduce your API for non-standard extensions

Everything becomes non-standard when the standard you're working with is the intersection of what all supported operating systems provide. Right now you cannot even set the mode on a directory you create with Rust. Everything will have mode 0o777. Because Windows doesn't have modes so that's non-standard. Another example: File cannot even open files larger than 4G on 32 bit unix because that would require a non-standard API.

A tight abstraction allows for clear behavior and stability, whereas an open abstraction like you're discussing is much harder to cleanly stabilize.

With words such as "cleanly" you can justify just about anything. If we're just talking about "stabilize": glibc has had a stable API and ABI for much longer than Rust exists. In contrast to that: Rust has to break the API if it ever introduces a new seek mode into the enum.

There is nothing "unstable" about the API I posted above.

Eventually we will be able to expose more and more platform specific APIs and bindings

Platform specific extension traits do not create good APIs. Assuming the current seek implementation with an enum, the seek function will always be limited to the existing three cross platform seek modes because you cannot change a function with an extension trait. So the function for platforms that support SEEK_DATA will be called what? seek_linux, seek_freebsd, seek_dragonfly? Not all unixes support SEEK_DATA, so it makes no sense to provide it via the unix module.

My API above has this covered as the only thing you have to do to extend it to a platform specific functionality is to define a new constant. And then you can continue using the API you've used for years but with new functionality.

The current "stabilization process" confuses "stable" with "locked down".

@mahkoh
Copy link
Contributor

mahkoh commented Mar 13, 2015

Expanding on the terrible API extension traits provide: If you use extension traits, then you have to use one function for every platform specific detail. This does not work:

if cfg!(target_os = "dragonfly") {
    file.seek_dragonfly(...);
} else if cfg!(target_os = "linux") {
    file.seek_linux(...);
}

etc. This is already a problem with OpenOptionExt today. You have to write code like this:

#[cfg(target_os = "dragonfly")]
fn my_oneliner(...) {
    file.seek_dragonfly(...);
}

#[cfg(target_os = "linux")]
fn my_oneliner(...) {
    file.seek_linux(...);
}

my__oneliner(...);

@huonw
Copy link
Member

huonw commented Mar 13, 2015

@mahkoh the methods don't have to have different names (I don't see how OpenOptionExt has this problem).


What about, say,

fn seek<T: SeekWhence>(&mut self, seek: T)
// or
fn seek<T: SeekWhence>(&mut self, pos: i64, whence: T)

used like, file.seek(io::SeekStart(10)), or file.seek(10, linux::SeekData), although this is possibly still quite "enum"-y.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 13, 2015

I don't see how OpenOptionExt has this problem

You can't open a file with the same options on linux and windows but with mode set on linux. You cannot do

if cfg!(unix) {
    options.mode(...);
}

The name of the function is not the only problem. If you reference an item that doesn't exist on the target platform, e.g. SEEK_DATA on windows, then you have to move that piece of code into its own function, away from the rest of the logic.

What about, say,

Probably depends on how SeekWhence looks like.

@aidancully
Copy link

@huonw That's interesting... My feeling is that a strong API should:

  • evaluate to lseek (or whatever) for raw-file-type Seek back-ends;
  • allow evaluating to distinct functions for other Seek back-ends.

I don't think the second point above can be supported in current Rust, but the API you described might eventually allow a future Rust to support what I'm talking about.

To spitball, something like this would eventually be neat to support:

// MyType lazily reads from, say, an open socket, and allows seeking through all data that's
// currently been read.
// does not support SEEK_END, since that would require reading all data to find EOF,
// which violates expectation that seeking is a fast operation.
// this use-case also suggests a SEEK_FORWARD and SEEK_REVERSE SeekWhence,
// which act like SEEK_CUR but only allow advancing in a desired direction.
// note that SEEK_FORWARD and SEEK_REVERSE don't actually have analogues
// in the POSIX seek routine.
impl Seek for MyType {
  fn seek(&mut self, offset: i64, whence: SeekSet) -> i64 {
    self.my_seek_set(offset);
  }
  fn seek(&mut self, offset: i64, whence: SeekCur) -> i64 {
    if offset < 0 {
      self.rewind(-offset)
    } else {
      self.advance(offset)
    }
  }
  // do not implement `seek` for SeekEnd: op not supported for MyType.
}
let x = MyType::new();
x.seek(10, SeekSet);
// x.seek(0, SeekEnd); <- fails to compile, because `seek` is not implemented with SeekEnd parameterization.
fn blah<T: Seek>(x: &mut T) {
  // using SeekEnd variant would have to propagate to function signature somehow?
  x.seek(0, SeekEnd);
}
// blah(&mut x); <- fails to compile, somehow? blah uses SeekEnd,
// which MyType doesn't provide an implementation for, so the attempt
// to call `blah` should fail to compile, or at least to link.

This won't work in this form, but I feel like there's the germ of something here... The Rust generics compilation model (in which a different function may be generated for each unique parameterization of a type) almost seems to allow explicitly controlling the function generation for a specific parameterization. If you can do that, you might also want to avoid generating functions for unsupported parameterizations.

I'm not sure this is a good idea, but it's interesting to me all the same. And I like that using a SeekWhence trait would be compatible with a crazy idea like this, if this crazy idea was ever thought useful to add to the language.

@lambda-fairy
Copy link
Contributor

@aidancully We can implement something similar using multidispatch:

pub trait Seek<T=SeekFrom> {
    fn seek(&mut self, offset: T);
}

All seekable streams would implement Seek<SeekFrom>, but files would implement Seek<platform_name_here::FancySeekExtensions> as well.

I'm not familiar enough with this to say whether that's a good idea or not, but it's possible.

@aturon aturon self-assigned this Mar 19, 2015
@rkjnsn
Copy link
Contributor

rkjnsn commented Mar 19, 2015

👍 to changing the Seek trait to use three methods. It makes it much nicer to use. Since this is a general trait, which covers Cursors and other non-file things, I don't think supporting platform-dependent seek functionality should be a consideration here. Types that need other methods of seeking (os-specific file objects, among others) will provide it independently of this trait.

@liigo
Copy link
Contributor

liigo commented Mar 20, 2015

Enum params is OK for me. It's common in api design, and seek is not the
only one.
On Mar 20, 2015 7:43 AM, "rkjnsn" notifications@github.com wrote:

[image: 👍] to changing the Seek trait to use three methods. It makes
it much nicer to use. Since this is a general trait, which covers Cursors
and other non-file things, I don't think supporting platform-dependent seek
functionality should be a consideration here. Types that need other methods
of seeking (os-specific file objects, among others) will provide it
independently of this trait.


Reply to this email directly or view it on GitHub
#969 (comment).

@liigo
Copy link
Contributor

liigo commented Mar 20, 2015

And It's verbose to have three seek methods.

On Mar 20, 2015 8:27 AM, "Liigo Zhuang" com.liigo@gmail.com wrote:

Enum params is OK for me. It's common in api design, and seek is not the
only one.

@rkjnsn
Copy link
Contributor

rkjnsn commented Mar 20, 2015

@liigo

Enum params is OK for me. It's common in api design, and seek is not the only one.

The question whether is this particular case, where there are only three variants, it makes sense to have the complication of an enum instead of just having three methods. I find having three methods to be much simpler, cleaner, and easy to use.

And It's verbose to have three seek methods.

I think it's much more verbose to have to type reader.seek(SeekFrom::Start(offset)) repeatedly in user code that to just be able to do reader.seek_from_start(offset)

@rkjnsn
Copy link
Contributor

rkjnsn commented Mar 20, 2015

Also, from a purely bikeshedding perspective, what about seek_absolute and seek_relative instead of seek_from_start and seek_from_current?

@bombless
Copy link

As for @kennytm 's code above,

enum SeekFrom {
   Start(u64),
   Current(i64), 
   End(i64),
   #[cfg(any(target_os="linux",target_os="bsd",...))]
   Data(u64),
   #[cfg(any(target_os="linux",target_os="bsd",...))]
   Hole(u64),
}

now

match Start(0) {
    Start(_) | Current(_) | End(_) =>{}
    _ =>() // error here for Windows
}

so current API do has extensibility problem.
+1 for this RFC.

EDIT: we may add a Dummy variant here to solve the problem though, but I think it's not elegant enough

@codyps
Copy link

codyps commented Mar 20, 2015

@rkjnsn "relative to what?" isn't answered by seek_relative(), hence the _from_foo naming.

@aturon
Copy link
Member

aturon commented Mar 31, 2015

Unfortunately I dropped the ball on this one; we need to stabilize this API for beta, and at this point there's not time to make this change. (I personally much prefer the proposed API, but there wasn't a clear consensus anyway, and it's used rarely enough that it probably doesn't matter much).

I'm going to close this RFC. Thanks again, @gankro!

@aturon aturon closed this Mar 31, 2015
@Gankra
Copy link
Contributor Author

Gankra commented Mar 31, 2015

:(

Oh well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet