Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange sscanf input segfault #5380

Closed
vicuna opened this issue Oct 22, 2011 · 8 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Oct 22, 2011

Original bug ID: 5380
Reporter: bluestorm
Assigned to: @pierreweis
Status: closed (set by @pierreweis on 2011-12-15T08:50:14Z)
Resolution: fixed
Priority: normal
Severity: crash
Version: 3.12.0
Category: ~DO NOT USE (was: OCaml general)
Has duplicate: #5605
Related to: #5973 #6115
Monitored by: @gasche

Bug description

Following a question of Jianzhou Zhao on the beginner list (use Scanf to parse %-separated strings), I tried the following code, which typechecks and segfault on my machine:

Scanf.sscanf "string1%string2" "%s@%%s" (fun s1 s2 -> s1, s2) ();;

I have reproduced the bug on 3.12.0, 3.10.2 and 3.11.2.

@vicuna

This comment has been minimized.

Copy link
Author

commented Oct 22, 2011

Comment author: bluestorm

Note : also reproduced in SVN trunk.

@vicuna

This comment has been minimized.

Copy link
Author

commented Oct 24, 2011

Comment author: @pierreweis

Thank you for reporting this strange and puzzling bug.

I will correct it as soon as possible; for the time being, you may use the ``%,'' conversion separator to clearly delineate your two conversions:

    Objective Caml version 3.12.2+dev1 (2011-08-03)

Scanf.sscanf "string1%string2" "%s@%%,%s" (fun s1 s2 -> s1, s2);;

  • : string * string = ("string1", "string2")

As an additional benefit, you will also get:

Scanf.sscanf "string1%string2" "%s@%%,%s" (fun s1 s2 -> s1, s2) ();;

Error: This expression has type string * string
but an expression was expected of type 'a -> 'b

which is indeed the expected behaviour of the type checker.

Stay tune for a complete correction in the working sources.

@vicuna

This comment has been minimized.

Copy link
Author

commented Oct 24, 2011

Comment author: @xclerc

Tentative fix by revision 11233 in branch "version/3.12".

@vicuna

This comment has been minimized.

Copy link
Author

commented Oct 25, 2011

Comment author: @xclerc

Reverted the commit, the tentative fix being awfully wrong.

@vicuna

This comment has been minimized.

Copy link
Author

commented Oct 25, 2011

Comment author: @pierreweis

I fixed the bug. It was indeed not trivial to correct and was hiding there for years!

In short, previous versions of the compiler accepted incorrect format strings that should not be typable (although those format strings were conformant to the documentation). Hence the seg fault you observed :(

To correct this nasty situation, I was obliged to slightly modify the conventions for the @ character in format strings, as follows.

As you may know, %% is equivalent to a plain % character and @@ to a plain @ character. I had to add the extra convention that @% is equivalent to a plain % character. As a consequence, some code that was (wrongly) accepted before may now fail to compile with a typing error. Consider for instance the "@%s" format string; before the correction of the bug it was made of one plain @ character followed by a string conversion; now, the new convention turns @% to a plain % character: the format string is thus equivalent to 2 plain characters, and the string conversion has vanished. To correct this format, simply double the @ to recover the plain @ character, writing "@@%s".

The good news is that the corrected format string is already valid and equivalent to the wrong one in all the distributed versions of the compiler (including 3.10, 3.11, 3.12, and the SVN trunk).

So that you can safely correct your code in advance and prevent any bad surprise with the forecoming versions of OCaml!

@vicuna

This comment has been minimized.

Copy link
Author

commented Nov 8, 2011

Comment author: @pierreweis

This is indeed a tough issue!

My correction was still buggy. I reopen the BR to find a better and correct way to get rid of the bug.

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 15, 2011

Comment author: @pierreweis

The solution was simply to enforce the current treatment of '%' in format strings:

Every occurrence of '%' in a format string is considered as introducing a conversion, unless escaped as "%%" to stand for a plain '%' character. This rule now stands within character ranges and format string indications.

So, to read a string until a plain character '%' you must write the format string "%s@%%".

Mutatis mutandis, the same rule apply to '@' characters in format strings:

Every occurrence of '@' in a format string is considered as introducing a format string indication, unless escaped as "%@" to stand for a plain '@' character. This rule now stands within character ranges and format string indications.

So, to read a string until a plain character '@' you must write the format string "%s@%@".

For sake of backward compatibility, occurrences of '@' that does not start a valid format string indication are still accepted.

This unified treatment of '%' and '@' in format strings corrects the bug and preserves existing programs.

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 15, 2011

Comment author: @pierreweis

Corrected in version 3.12.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.