Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Str library: $ inconsistency #7024
Original bug ID: 7024
The Str library states that the $ metacharacter "[m]atches at end of line (either at the end of the matched string, or just before a newline character)". However, it appears that it only matches against LF and not other types of ends of line (say, CRLF). The documentation is not consistent with the observed behaviour.
Steps to reproduce
From an Ocaml toplevel:
Comment author: @gasche
The standard library input functions will (when the file is being read in text, rather than binary mode) translate \r\n into \n at reading time under Windows. This means that you should not manipulate strings with \r\n in the OCaml world, and that in particular Str can assume than line ends with \n.
Do you have a particular reason for manipulating raw strings that have not been read in O_TEXT mode?
Comment author: flindgren
The program reads many files through the Unix module, which does not seem to support text mode. Some text files may embed binary data and cannot be read in translating modes.
But regardless of the validity or not of my use case, is this only-supports-LF documented somewhere? The documentation of Str only refers generally to line endings and newlines, without specifying that they must be of the right type. Is it documented elsewhere?
Comment author: @xavierleroy
There is a general assumption in OCaml libraries that "newline" means '\n' (LF). I agree it's not stated explicitely anywhere.
Would it be enough to document this? E.g. for Str,
'$' ... [m]atches at end of line: either at the end of the matched string, or just before a '\n' character
I don't feel like adding a special case to the regexp matcher so that '$' also matches just before a "\r\n" sequence.