New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add resource-handling IO functions in pervasives #640

Open
wants to merge 3 commits into
base: trunk
from

Conversation

Projects
None yet
@c-cube
Copy link
Contributor

c-cube commented Jun 28, 2016

Two things here:

  • functions to open files, give the channel to a continuation, and automatically close the file afterwards
  • a function to read a whole in_channel into a string. This is yet another piece of code that gets rewritten literally everywhere.
with e ->
g x;
raise e

let open_out_gen mode perm name =

This comment has been minimized.

@nojb

nojb Jun 28, 2016

Contributor

This does not have the right semantics if g raises. Better to use

match f x with
| res -> g x;  res
| exception e -> g x; raise e

This comment has been minimized.

@dra27

dra27 Jan 27, 2017

Contributor

Noting the various comments about this below, I realise that the uses of finally_ here shouldn't cause this semantics error, but is there any reason to leave this erroneous version of finally_ in and not replace it with @nojb's better suggestion (which doesn't try to call g twice if g raises)?

This comment has been minimized.

@c-cube

c-cube Jan 27, 2017

Author Contributor

Indeed, fixed. I hadn't paid enough attention to that.

This comment has been minimized.

@mseri

mseri Aug 31, 2017

Contributor

The same finally_ is defined in otherlibs/threads/pervasives.ml as well. I think it would be better to define it just once in pervasives and expose it, then you can call it in threads and not risk to have different implementations lying around.

@hcarty

This comment has been minimized.

Copy link
Contributor

hcarty commented Jun 28, 2016

Maybe this would be a good time to add a module like IO or Channel to house all of these functions. I know that the existing ones are in Pervasives but it seems reasonable to consider putting aliases to existing in/out channel functions as well as new functions into a module if the number of functions is going to grow.

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jun 28, 2016

@hcarty indeed, I would love a IO module, indeed, with shorter names, etc. I'm just afraid it would break a lot of user code. We need input from the maintainers on this.

@nojb indeed, it's not clear what should happen if f returns but close fails in general. Here I just use close_noerr which I believe doesn't raise at all…

Also, I need to write some tests, obviously (especially for input_string_all).

@@ -903,6 +943,11 @@ val really_input_string : in_channel -> int -> string
characters have been read.
@since 4.02.0 *)

val input_string_all : in_channel -> string

This comment has been minimized.

@dbuenzli

dbuenzli Jun 28, 2016

Contributor

I find the name of this function ugly. Here are a few alternative proposals input_contents, string_of_in_channel. in_channel_to_string.

This comment has been minimized.

@c-cube

c-cube Jun 28, 2016

Author Contributor

I have no strong opinion, but it sounds like string_of_in_channel would fit well within the stdlib. I'd like more people to give their opinion on this, but the name will probably change indeed.

This comment has been minimized.

@bluddy

bluddy Jun 28, 2016

We already have the pattern of input_line, input_byte etc. We also have the odd really_input_string. Maybe input_file, implying that you're reading the whole file?

Another function I use is input_lines, which creates a list of all the lines.

This comment has been minimized.

@c-cube

c-cube Jun 28, 2016

Author Contributor

@bluddy it doesn't have to be a file :-)
But indeed, input_lines would be nice too, it's often very convenient. In the absence of iterators(!), returning a list is acceptable.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 28, 2016

I would love a IO module, indeed, with shorter names, etc. I'm just afraid it would break a lot of user code.

Is this about possible clashes of unit names? There is indeed currently no good story about that. One possible direction is to always use names such as camlFoo for new modules, and provide perhaps a default map file so that user code can simply refer to Foo. Another direction is to add sub-modules to Pervasives (e.g. Pervasives.IO), but this is not very coherent with the current design, and does not allow to benefit from the removal of unused units at link time.

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jun 28, 2016

@alainfrisch precisely, that is my concern (I'd also like to add an Option module, but it would pose the same problem). I don't really know how what to do ^^.

Maybe with module aliases, have CamlOption, CamlIO, and then aliases in Pervasives?

@bluddy

This comment has been minimized.

Copy link

bluddy commented Jun 28, 2016

@c-cube if we're going to have submodules via aliases, how about Std as the parent module? Much better than Pervasives IMO. Let's start making this thing right.

Actually, C++ is considering having a new stdlib and incrementing the postfix number. This would potentially work really nicely for us with modules and allow us to increment the postfix: Std would be a logical remaking of the current standard library, and draw from the existing modules. We would at some point create an Std2, using combinations of possibly the same underlying modules, and not have to worry about backwards compatibility (since Std will still be there), etc.

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jun 28, 2016

@bluddy my goal here is not to engineer a new, incompatible stdlib… I just want more things in it :p
So module aliases in Pervasives would do fine, I think.

@bluddy

This comment has been minimized.

Copy link

bluddy commented Jun 28, 2016

@c-cube Since you've already started this effort, how about another PR with moving everything to the Std module (or whichever one you prefer), while keeping everything old for backwards compatibility? Also, possibly reorganizing things to make more sense (e.g. things in Unix should be specifically Unix), getting rid of Stream in the new Std etc. When/if we want to remake the stdlib, we move to Std2 if needed.

@alainfrisch since you brought up the concept of changes to stdlib, what do you think of this idea?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 28, 2016

since you brought up the concept of changes to stdlib, what do you think of this idea?

I'm not sure I follow your proposal. What would go in Std? Only aliases to new Caml* modules and to existing modules? Or actual code? If we want to propose a whole new structure for the stdlib, one should use this opportunity for redesigning many aspects (exceptions, naming, use of optional arguments) and cleaning old stuff (removing deprecated modules and functions). I'm not sure this can be achieved with a simple Github PR, though.

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jun 28, 2016

edit: wrong PR -_-

@bluddy

This comment has been minimized.

Copy link

bluddy commented Jun 28, 2016

@alainfrisch Mostly I'm trying to bounce ideas around. For years there's been this de facto approach (justified or not) of the standard library being a monolithic thing that cannot be changed. And now we've breached that wall, and I think we're (or I am) investigating the consequences. So true, we want to add some functionality. But as soon as we add it (and I'm fine with going ahead and doing it), we bump into the issue of the stdlib and its relatively bad module name choices, the fact that adding more modules will conflict with user modules, etc. So I want to think about ideas of moving away from the current position to one where the stdlib occupies mainly one global, short module name (something like Std), and where module names make sense. And we can do that with backwards compatibility since everything else will remain. Additionally, we don't necessarily need to make the first iteration perfect (ie. the enemy of the good) because we can increment the posfix and have Std2, Std3 etc. as needed (within reason, of course). These would all be modules with module aliases to other modules (though namespaces of course would also be nice if we ever get them).

So a first move might be just shifting all the current functionality to Std and organizing it better. A further move might be a more long-term project of designing a new stdlib with improved functions, and placing everything in Std2. This will be very hard to do in anything resembling a stable library, so Std can be the stable library and for a while Std2 will be entirely experimental. Etc.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 28, 2016

For years there's been this de facto approach (justified or not) of the standard library being a monolithic thing that cannot be changed. And now we've breached that wall...

Really, nothing changed in the policy w.r.t. evolutions of the standard library. All previous releases already accepted contributions. The recent addition to CONTRIBUTING.md was simply to counter a widespread misconception about the stdlib being stalled, or somehow tied to specific needs of the compiler itself.

I think it's more productive to focus on lifting specific limitations of the stdlib than trying to experiment with very different designs, which is better left to external projects.

@bluddy

This comment has been minimized.

Copy link

bluddy commented Jun 28, 2016

I think it's more productive to focus on lifting specific limitations of the stdlib than trying to experiment with very different designs, which is better left to external projects.

@alainfrisch That has been the approach so far, but I'm questioning whether it has to be the approach. If our assumption is that we have to live with the decisions made in the standard library forever, that simply isn't true if you have a single 'namespace' like Std, which you can increment when a reboot is needed. And many people look to the standard library for guidance -- they don't want to choose between Core, Batteries or Containers. Why not use the experience gained from creating these external libraries to make a new, clean standard library? Or at the very least, for a single iteration, moving everything into Std and cleaning up the cruft that has built up over time?

@damiendoligez

This comment has been minimized.

Copy link
Member

damiendoligez commented Jul 1, 2016

@bluddy

Why not use the experience gained from creating these external libraries to make a new, clean standard library?

The answer to this question is mostly lack of manpower. I would very much like to see a good proposal for versioning the standard library, designing an improved version, then deprecating the current version, but that's a large amount of work and I don't know who's going to do it.

@mrvn

This comment has been minimized.

Copy link
Contributor

mrvn commented Jul 4, 2016

@c-cube

indeed, it's not clear what should happen if f returns but close fails in general. Here I just use close_noerr which I believe doesn't raise at all…

That only works for reads. For writes to ensure the data is actually written successfully one needs to check errors for both fsync() and close(). This would require returning both the result of f and the exception thrown by fsync or close. I believe the current design would lead to people not checking for errors.

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jul 30, 2016

@mrvn the last commit uses close_out in the success path, and close_out_noerr when an exception occurred.

@c-cube c-cube force-pushed the c-cube:stdlib-safe-io branch from abcc272 to 62d3337 Sep 23, 2016

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Sep 23, 2016

Note: had to bootstrap because pervasives.cmi changed.

@c-cube c-cube force-pushed the c-cube:stdlib-safe-io branch from 62d3337 to 209a333 Jan 5, 2017

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jan 7, 2017

I just saw https://ocaml.org/learn/tutorials/file_manipulation.html, and am a bit puzzled that in 2017, the recommended way of manipulating files in OCaml is unsafe.

The current code is:

open Printf
  
let file = "example.dat"
let message = "Hello!"
  
let () =
  (* Write message to file *)
  let oc = open_out file in    (* create or truncate file, return channel *)
  fprintf oc "%s\n" message;   (* write something *)   
  close_out oc;                (* flush and close the channel *)
  
  (* Read file and display the first line *)
  let ic = open_in file in
  try 
    let line = input_line ic in  (* read line from in_channel and discard \n *)
    print_endline line;          (* write the result to stdout *)
    flush stdout;                (* write on the underlying device now *)
    close_in ic                  (* close the input channel *) 
  
  with e ->                      (* some unexpected exception occurs *)
    close_in_noerr ic;           (* emergency closing *)
    raise e                      (* exit with error: files are closed but
                                    channels are not flushed *)

but imho it would be much better as:

open Printf
  
let file = "example.dat"
let message = "Hello!"
  
let () =
  (* Write message to file *)
  with_open_out file (fun oc -> fprintf oc "%s\n" message);
  
  (* Read file and display the first line *)
  with_open_in file (fun ic ->
    let line = input_line ic in
    print_endline line;
    flush stdout);
@bobzhang

This comment has been minimized.

Copy link
Member

bobzhang commented Jan 7, 2017

I don't think we should add anything to pervasive unless the benefit is huge

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jan 7, 2017

I'd argue the benefit in this case is huge. But even then, we can consider having a sub-module IO in any case (not a separate one, just a sub-module of Pervasives)?

@quicquid

This comment has been minimized.

Copy link

quicquid commented Jan 9, 2017

I had the impression that most people write their own (possibly flawed) version of with_open_in, since file I/O occurs so regularly. Since a sizable number of people would profit, I'd not underestimate the impact of the PR. Moreover, I welcome a functional wrapper around I/O. Even in the more complicated context of Prolog, there are predicates like phrase_from_file which provide a pure file interface. It would be great to have this kind of abstraction in OCaml too.

@c-cube c-cube referenced this pull request Jan 13, 2017

Merged

Prefix the stdlib #1010

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Jan 27, 2017

So, any chance this makes it into 4.05? :-/

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Jan 27, 2017

This seems unlikely until #1010 is merged (and there are still a couple of issues being worked on around that)

@mseri

This comment has been minimized.

Copy link
Contributor

mseri commented Aug 31, 2017

Is there any chance of finally getting exposed in the stdlib? It's of very common use and I have seen it redefined in plenty of places. Would be great to have it in the standard library.

In fact, I had come here to make a PR to add it but it looks like it's already part of this PR (only hidden)

@ksromanov

This comment has been minimized.

Copy link

ksromanov commented Mar 21, 2018

Yes, please. finally and bracket in separate commits, so that it would be easy to include only one of them.

@bobzhang

This comment has been minimized.

Copy link
Member

bobzhang commented Mar 21, 2018

I know my 2 cents is not very expensive, but I really wish we are more cautious to add functions to pervasive directly, that means the function name is part of the language. A qualified module would be much better, like IO or better names you have in mind

@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone May 18, 2018

@ksromanov

This comment has been minimized.

Copy link

ksromanov commented Jun 13, 2018

Actually there is a function try_finally in the tree - https://github.com/ocaml/ocaml/blob/trunk/utils/misc.ml#L27

So, it just needs to be exported. ;-)

@c-cube c-cube force-pushed the c-cube:stdlib-safe-io branch from 8097ada to bf4f85f Aug 8, 2018

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Aug 8, 2018

I've just updated this on top of #1855, but maybe it's still time to add an IO module with nicer (shorter) names and the safe functions in it… What do y'all think?

c-cube added some commits Jun 28, 2016

add resource-handling IO functions in pervasives
- safe `with_open_{in,out}` functions
- add `input_lines` to read all the lines into a list
- also modify otherlibs/threads/pervasives.ml

@c-cube c-cube force-pushed the c-cube:stdlib-safe-io branch from bf9e6fd to abaa120 Aug 8, 2018

@copy copy referenced this pull request Aug 8, 2018

Closed

[RFC] Add Bigstring module #1961

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Aug 29, 2018

I've just updated this on top of #1855, but maybe it's still time to add an IO module with nicer (shorter) names and the safe functions in it… What do y'all think?

I think it's a good idea, in line with the introduction of modules to collect functions around built-in or stdlib types (Float, Option, Result). Since there are two types of channels, I would argue for introducing two modules (In_channel, Out_channel).

(On a related note, I've asked myself several times whether keeping the management of channel buffers in the runtime system, and not in OCaml code, is a good idea. I've observed that looping over characters of a file with input_char can be really slow. Implementing the buffering in OCaml, with the RTS exposing lower-level stuff, might be much faster, since the fast path of input_char could be inlined, and one would commonly avoid the OCaml->C bridge.)

@diml

This comment has been minimized.

Copy link
Member

diml commented Aug 29, 2018

(On a related note, I've asked myself several times whether keeping the management of channel buffers in the runtime system, and not in OCaml code, is a good idea. I've observed that looping over characters of a file with input_char can be really slow. Implementing the buffering in OCaml, with the RTS exposing lower-level stuff, might be much faster, since the fast path of input_char could be inlined, and one would commonly avoid the OCaml->C bridge.)

Agreed, and now we can even use a bigarray for the buffer since it's in the stdlib.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Aug 30, 2018

This discussion is going off on a tangent. Could we please converge on a decision on the original PR? (with_open functions).

@alainfrisch
Copy link
Contributor

alainfrisch left a comment

More and more people share the opinion that nothing (at least non-operator values) should be added to Stdlib at toplevel anymore. So let's indeed put the new functions in new module(s).

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Oct 26, 2018

Following #2111 it'd be neat to decide on what to name this new module (and the types inside) before updating the code. Are there drawbacks to In_channel and Out_channel?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Oct 26, 2018

{Ìn,Out}_channel would be coherent with other stdlib modules named after their builtin type. However, the types are not really built-in, and we might want to encourage people to use the qualified name; so we could decide to introduce any name (with Stdlib.in_channel becoming a deprecated alias for Foo.t).

Assuming we keep e.g. In_channel, there remains to pick names for values. In_channel.input_string is a bit redundant, but In_channel.string feels weird.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

dbuenzli commented Oct 26, 2018

Assuming we keep e.g. In_channel, there remains to pick names for values. In_channel.input_string is a bit redundant, but In_channel.string feels weird

In_channel.read_string maybe ? The pattern would be map input_* -> In_channel.read_* and output_* -> Out_channel.write_*.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 26, 2018

Input and Output could work a little better than In_channel and Out_channel. The type names could be Input.channel and Output.channel.

I'm still unsure how to translate output_string, print_string and prerr_string, i.e. the generic functions and their specializations to stdout and stderr. Likewise for input_line and read_line.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Oct 26, 2018

Do we want to provide functions that return options instead of possibly raising End_of_file? With the current system, the overhead (esp. when reading character by character) could be quite high, but this might addressed in the future (e.g. an attribute on external primitives to return "unboxed" optional values, the boxing into Some being done on the OCaml side, hopefully fused with the client code that immediately deconstructs the option).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Oct 26, 2018

About the naming: if we keep explicit verbs, they will somehow be redundant with the module names if we keep two modules for input and output. I'm not against it, but one could then also consider having a single IO or Channel module (I know, I'm the one who suggest In_channel/Out_channel first, but now is a good time to explore the possibilities).

@lpw25

This comment has been minimized.

Copy link
Contributor

lpw25 commented Oct 26, 2018

Channel.Input.t and Channel.Output.t?

@c-cube

This comment has been minimized.

Copy link
Contributor Author

c-cube commented Oct 26, 2018

I also like the idea of having a single Channel module, in particular because we could have a phantom type for [`input | `output] Channel.t so as to have only one close function.

edit: but of course this requires more changes.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Oct 26, 2018

we could have a phantom type for [input | output] Channel.t so as to have only one close function.

I don't think this is a good idea. Combining two functions into one does not justify the risk of bad error messages, and extra complexity in the interface. The programmer will always know which kind of channel is used and can easily decide which function to call.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

dbuenzli commented Oct 26, 2018

I don't know how much people feel about not following the M.t convention for channels but having IO.{in,out}_channel or Channel.{in_t,out_t} (I'd lean towards that one) would basically allow to migrate all of the Pervasives channel functions as they exist there without the whole thing feeling totally redundant.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Oct 26, 2018

M.t (together with equal, compare, hash) is useful to allow using directly e.g. Map.Make(M.t). But we could have IO.In.t, IO.Out.t, and all actual i/o functions directly in IO.

(It's funny how we depart from suggested stdlib guidelines even before they are adopted.)

@dbuenzli

This comment has been minimized.

Copy link
Contributor

dbuenzli commented Oct 27, 2018

(It's funny how we depart from suggested stdlib guidelines even before they are adopted.)

I personally never had to map channels, rather their fds, but why not. However I would then rather suggest Channel.{In,Out}.t and the functions in Channel. For these reasons:

  1. I'd prefer if we can keep the IO name for potential more generalized notions of IO that do not rely on fds that may emerge in the future (more specifically thinking about type aliases to bytes -['a]-> unit and unit -['a]-> bytes).
  2. Channel is the name under which these abstractions have been known for a long time and I see no really good reason to change it.
@dbuenzli

This comment has been minimized.

Copy link
Contributor

dbuenzli commented Oct 27, 2018

But we could have IO.In.t, IO.Out.t, and all actual i/o functions directly in IO.

Oh and btw. this would certainly also violate the inexisting stdlib guidelines that the functions regarding a type should belong to its module. Just saying ;-) I'm fine with transgression.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Nov 6, 2018

However I would then rather suggest Channel.{In,Out}.t and the functions in Channel

This looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment