Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[patch] Weak format checking #5949
Original bug ID: 5949
I wrote a patch to complete typing tests of formats and fix some related bugs.
For some examples that I put in "Steps to reproduce" about buffer overflows, they only "have an invalid behavior" on a 32 bits machine. You have just to increase numbers to obtain similar problems on a 64 bits machine.
Steps to reproduce
Printf.printf "%5000000000d" 42;;
Printf.printf "%3.+f" 3.14;;
Printf.printf "%2147483548d" 42;;
Printf.printf "%1.1s" "abc";;
The attached patch fixes all of these problems.
Note that this patch also resolves the bug 0004799.
Comment author: @pierreweis
I checked your proposed modifications, and have a few questions:
Comment author: bvaugon
Thanks for your answer and questions!
I agree it is impossible to avoid [ Invalid_argument "String.create" ] because of lots of reasons, but my modifications about limitations of "padding" is principally to prevent "Segmentation fault". These kind of "Segmentation faults" are due to buffer overflows in the runtime when size are not representables. Since Sys.max_string_length depends on architecture, it is also impossible to statically bound paddings.
I consider "%1.1s" as invalid because, with the current runtime implementation, when this format is used with Printf.printf, it always raises [ Invalid_argument "Printf: bad conversion %s, at char number 0 in format string ``%1.1s''" ] at run time. So, ".1" with 's' does not mean anything and should be refused statically, not dynamically.
The format "%F" is used to print a float with the OCaml lexical convention. The modifier '+' does not do anything with "%F" in the current stdlib/runtime implementation. An OCaml user which writes "%+F" probably want to force the sign printing and the current implementation simply ignore the '+'.
For "%0s", it is a bit different. I think in hindsight that it could be accepted. My first idea cames from the fact that it is equivalent to "%s", but this kind of syntax ("%0s") could be usefull for OCaml code generation, so you can forget my remark.
Sorry for "\n%", the ocamlc error message is correct. I meant "\n%k", for example, which generate a bad error message.
I also agree that it should be interresting to have a complete documentation and a clean semantic of OCaml formats.
With the oral (in principle) approval of some OCaml developpers, I am currently finished writing a first (but complete) trial version of a new implementation of Printf, Scanf and Format stdlib modules where formats are not represented (as currently) by strings but are statically compiled into GADTs trees. This modifications obviously impacts the OCaml compilers (typecore.ml and predef.ml), and also Pervasives for compatibility since format types are currently defined in this module. Unlike the current implementation, my Printf, Scanf and Format modules implementations are "well typed", they does not use Obj.magic, nor dynamic format strings parsing when printing and scanning, nor this kind of horrible things. It also has significantly better performances. I try very hard to be compatible with the current implementation, except for some error messages and errors wich are now detected statically rather than dynamically.
So it is not necessarily a good idea to completely preserve the current semantic, especially for examples likes above that no reasonable programmer writes. When finished, this huge patch (maybe a few thousand lines) will probably need to be more tested, adapted and maybe partially redesigned, I suggest to have profound discussions about what shoud be kept and what should be changed from the current semantic of formats.
Comment author: @gasche
Now that Benoît's GADT-based implementation is merged, none of the example he gives have the problems initially reported -- some of them eg. \n escaping issues were fairly orthogonal but were fixed in his patch as well. In particular, all known-crash related to formats are fixed, so this issue is not a blocker anymore for 4.02.
(I am not fond of having a test on the padding length at the type-checker level, which is the way Benoît fixed those issues. It catches the problem, which is good, but maybe we'll solve the issue in another way in the future.)
Note however that one of the things Benoît did not change (mostly) is the runtime semantics of formats: how atomic formats are interpreted by the printf/format/scanf functions. In many case this was implemented by calling the equivalent C formatting function, which is the cause of many format bugs we've had. Keeping this unchanged was the right choice (Benoît's work was orthogonal as it was about format type-checking), in particular the only sane choice to try to preserve compatibility, but this means that the nasty C bits remain and that we will probably still find new bugs (even probably segfaults) due to broken libc on some systems or just bad interaction on our part with the C formatters.
I'm marking this issue resolved (all the reported problems were fixed... by the reporter himself), but we'll talk formats again.