Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in printf with mix and match of % and %x$ #324

Closed
stephane-chazelas opened this issue Sep 11, 2021 · 4 comments
Closed

bug in printf with mix and match of % and %x$ #324

stephane-chazelas opened this issue Sep 11, 2021 · 4 comments
Labels
bug Something is not working

Comments

@stephane-chazelas
Copy link

stephane-chazelas commented Sep 11, 2021

$ printf '%1$s %1$b\n' '\\\\'
\\\\ b
$ printf '%b %1$s\n' '\\\\'
\\

expected:

$ printf '%1$s %1$b\n' '\\\\'
\\\\ \\
$ printf '%b %1$s\n' '\\\\'
\\ \\\\

(I don't know if it's limited to %b or if that applies to other format specifiers)

(issue already mentioned on the austin-group-l mailing list where %x$y is being discussed atm)

@stephane-chazelas stephane-chazelas changed the title bug in printf's when %b and %x$ are used in same format bug in printf when %b and %x$ are used in same format Sep 11, 2021
@hyenias
Copy link

hyenias commented Sep 11, 2021

I did not understand the title of this issue. But after some trail-and-error and some investigation I believe I understand the trouble you expressed. Trouble occurs when x$ aka parameter field is used. (see n$ parameter field ).

Example: Reorder parameters and repeat the 2nd:

$ printf '%3$s %2$s %1$s %2$s\n' a b c
c b a b

More troubled output:

$ printf '%b %1$s\n' '\\\\'
\\
$ printf '%1$b %1$s\n' '\\\\'
\\ \\
$ printf '%s %1$s\n' '\\\\'
\\\\
$ printf '%s %1$s\n' 'a'
a
$ printf '%1$s %1$s\n' 'a'
a a

@McDutchie McDutchie added the bug Something is not working label Sep 11, 2021
@phidebian
Copy link

Yall, I bumped into a problem the other day, and after investigating, think it all converge to this #324, as @hyenias stated, the title is confusing as (to me) the problem as nothing to do with %b (or conversion specifier more generally).

To me the title could be problem with mix'n'match of % and %x$

In the following examples

  • 1s: %s mean 1st sequential (positional) format
  • 1i:%2$s means 1st indexed format referencing index [2]

Examples WITHOUT mix'n'match (all results correct)

TC$ printf '1s:%s 2s:%s\n' 1 2 3 4
1s:1 2s:2
1s:3 2s:4

TC$ printf '1i:%2$s 2i:%1$s\n' 1 2 3 4
1i:2 2i:1
1i:4 2i:3

Examples WITH mix'n'match

TC$ printf '1s:%s 2i:%1$s\n' 1 2 3 4  
1s:1 2i:2  # expect 1 1 No reasons to get 2
1s:3 2i:4  # expect 2 2 Only 1 arg consumed on previous fmt scan
           # expect 3 3
           # expect 4 4 As we consume args with this fmt string

TC$ printf '1i:%2$s 2s:%s\n' 1 2 3 4  
1i:2 2s:3 # expect 2 1  %s is the 1st sequential access get 1, next %s would get 2, etc...
1i: 2s:   # expect 3 2 as arg#1 was consumed by previous iteration now argv is 2 3 4

Indeed when I say 'should' is subject to my interpretation, and it seems that is not normalised, to me indexed access don't advance parameter access in argv[], just pick argv[ndx] while sequential (positional) access do consume (advance) argv[].

There is 2 ways to fix that.

  • The C99 way, i.e warn or better for ksh simply reject mix'n'match format string.
  • Fix it with the semantic I gave above in the 'should' explanation.

The C way

TC$ echo $'
> #include <stdio.h>
> int main(int c, char **v){printf("1s:%s 1i:%1$s\\\\n","1");}
> ' | cc -x c - -o yo && ./yo  && rm yo
<stdin>: In function 'main':
<stdin>:3:1: warning: '$'operand number used after format without operand number [-Wformat=]
1s:1 1i:1

As we see here C complains about using %1$ while %s as been used already, yet this is a warning, and result is correct.

A more complete example showing indexed access don't advance anything while sequential access do advance parameter access.

echo $'
#include <stdio.h>
int main(int c, char **v){printf("1s:%s 1i:%2$s 2s:%s 2i%1$s\\\\n","1","2");}
' | cc -x c - -o yo && ./yo  && rm yo
<stdin>: In function 'main':
<stdin>:3:1: warning: '$'operand number used after format without operand number [-Wformat=]
1s:1 1i:2 2s:2 2i1

As we see here despite the warning indexed access access the right argv[ndx] while sequential access do avance in argv[] without bad side effect on argv[ndx].

More thorogh example on ksh side shows that mix'n'match is bugged because sequential access do increment argv[], then subsequent [ndx] are offsetted then screwed...

I investigate later...

@phidebian
Copy link

phidebian commented Mar 30, 2023

Hi All,
I recovered some cycles, so I could cut a patch, this is a fix candidate that could be used 'as is' or may be we could adjust some behaviors.

The patch is posted at
https://github.com/phidebian/ksh/tree/bug-324-1
and is ksh93/ksh PR
#617

In a nutshell the patch behave like if the printf format string is pre-processed before working replacing any sequential argv[] access % and * (with * being width/prec) with its auto mapped index equivalent %x$ *x$

The printf(3) I have says.

       itly  which  argument  is taken, at each place where an argument is re-
       quired, by writing "%m$" instead of '%' and "*m$" instead of '*', where
       the  decimal integer m denotes the position in the argument list of the
       desired argument, indexed starting from 1.  Thus,

           printf("%*d", width, num);

       and

           printf("%2$*1$d", width, num);

This is how the patch behave, yet internally this is not what is implemented, I tried to not touch any existing utility function like sffmtpos(), but instead work around its deficiencies.

The same with extend(), instead of changing it I implement yet another function around it reload() to work around the type mistmatch clash, the extend() being unchanged in case it is used by other code than printf (may be scanf)...

I used this script to generate the printf.sh I added in the test suite, this script can be used locally to double check the results we got, we can use it with zsh as well, and probably with other shell I don't have.

Only when reviewer and testers have double check all this we could accept this patch.

As it is now, the output are almost on pare with zsh, when diff occurs, they are zsh deficiencies (IMHO), but indeed this is debatable.

Again this snipped is for generating src/cmd/ksh93/tests/printf.sh so it may look ugly, yet it is usable locally
You shold hack the K= path

#!/bin/bash
# ./printf.sh [-o] [ShellName]

o=0 ; [ "$1" = "-o" ] && o=1 && shift
K=/d1/ksh/arch/linux.i386-64/bin/ksh # Fix candidate
[ "$1" ] && K="$1"                   # production zsh | ksh
 
for f in \
'%s %s\n'  \
'%s %1$s\n' \
'%1$s %s\n' \
'%1$s %1$s\n' \
'%1$s %s %s %2$s\n' \
'%s %1$s %s %s\n' \
'%s %1$s %2$s %s\n' \
'%2$s %s\n' \
'%2$s %1$s %2$s %1$s\n' \
'%2$s %1$s %s %2$s %1$s %s\n' \
'%*2$s %s\n' \
'%*2$.*3$s %s\n' \
'%*3$.*2$s %s\n' \
'%*s %*.*s\n' \
'%*d %*.*d\n' \
'%*f %*.*f\n' \
'%3$*2$.*1$s %2$s %1$s\n' \
'%s %5$s\n' \
; do
echo

((o)) &&
{ echo "printf '$f' 1 2 3 4 5 6 7 8 9 0"
  $K -c "printf '$f' 1 2 3 4 5 6 7 8 9 0"
  echo
  continue
}

x=$($K -c "printf '$f' 1 2 3 4 5 6 7 8 9 0")
echo "x='$x";echo "'"
echo "f='$f'"
echo 'T $LINENO "$f" "$x" # err_exit src/cmd/ksh93/tests/shtests grep -c accounting'
done

This can be used like, -b stand for bugged, -f stand for fixed

./printf.sh >ksh-f.out 2>&1
./printf.sh ksh >ksh-b.out 2>&1
./printf.sh zsh >zsh.out 2>&1
diff zsh.out ksh-f.out 
diff ksh-b.out ksh-f.out 

A fragment of an .out file look like this

x='1 2
3 4
5 6
7 8
9 0
'
f='%s %s\n'
T $LINENO "$f" "$x" # err_exit src/cmd/ksh93/tests/shtests grep -c accounting

The x= is what we expect (in the QA suite), yet when used locally this is what we got for printf format string depicted by f='...' and knowing that all test will receive 1 2 3 4 5 6 7 8 9 0 as argv[] i.e printf "$f" 1 2 3 4 5 6 7 8 9 0

Ignore the T line that is how QA run the test, the comment is a reminder how shtests counts unitary test, took me some time to get it, so I record it in the printf.sh so next time I'll know.

So lemme know for any trouble with that so I can peek/poke a bit more in there if needed.

@hyenias
Copy link

hyenias commented Apr 6, 2023

See #613 for notes and such.

@hyenias hyenias mentioned this issue Apr 8, 2023
@McDutchie McDutchie changed the title bug in printf when %b and %x$ are used in same format bug in printf with mix and match of % and %x$ May 17, 2023
McDutchie pushed a commit that referenced this issue May 17, 2023
@stephane-chazelas reported:
>     $ printf '%1$s %1$b\n' '\\\\'
>     \\\\ b
>     $ printf '%b %1$s\n' '\\\\'
>     \\
>
> expected:
>
>     $ printf '%1$s %1$b\n' '\\\\'
>     \\\\ \\
>     $ printf '%b %1$s\n' '\\\\'
>     \\ \\\\

@hyenias commented:
> After some trial-and-error and some investigation I believe I
> understand the trouble you expressed. Trouble occurs when x$ aka
> parameter field is used[*].
>
> Example: Reorder parameters and repeat the 2nd:
>
>     $ printf '%3$s %2$s %1$s %2$s\n' a b c
>     c b a b
>
> More troubled output:
>
>     $ printf '%b %1$s\n' '\\\\'
>     \\
>     $ printf '%1$b %1$s\n' '\\\\'
>     \\ \\
>     $ printf '%s %1$s\n' '\\\\'
>     \\\\
>     $ printf '%s %1$s\n' 'a'
>     a
>     $ printf '%1$s %1$s\n' 'a'
>     a a
>
> [*] See n$ parameter field: https://en.wikipedia.org/wiki/Printf

Analysis and fix:

printf format string can handle sequential argv[] access via % and
* as well as indexed argv[] access via %x$ *x$ (* used for
width/prec)

In a given format string it is possible to mix'n'match sequential/
indexed access, and mix'n'match the argv[] types.

'%*2$s' 1 2  Indexed access for 2$ making argv[1] an int, seq
    access for 's' then accessing arv[0] for s, here we mix seq/ndx
    access.

'%*2$s %s' 1 2 Same as above, the 1st format makes argv[0] a
    'string' argv[1] an 'int' then the second format make argv[1] a
    'string' this is a mixed type access for arv[1].

The former algo for printf-format handling builtin (say printf) is
to save the builtin argv[] in a data structure called struct printf
located in src/cmd/ksh93/bltins/print.c. This is located in a .c
file, meaning it is private to the ksh builtin.

Yet, all the code to scan a printf format string (not limited to
printf, may be scanf too) is located in lib/libast/sfio/ meaning
all sorts of jazz to to setup and use the private struct printf
access at sfio level via function pointers.

This printf data struct has a member called nextarg[] that is
initially setup to the builtin argv[]. It is called nextarg,
because on sequential access the % or * format access *nextarg++,
meaning each sequential access, consumes an arg. This is a major
design flaw, because it can't support indexed access %x$ *x$ very
well.

For this, on 1st indexed access, a new sfio level (not builtin)
data structure is built called fp[]. It rescans the format string
to find out all the indexed access, enters them in this kind of
cache, recording the type/value of the argv[x].

Now another problem occurs: the type entered is what we see on 1st
index 'x' occurence, so if we got '%2$d %2$f' then for occurence 2
i.e. fp[1] we enter type 'd' and fetch the value nextarg[1]. Since
nextarg may have been moved forward by sequential access, it
provides a bogus value. Then later access to %$2f will retrieve the
cached value for fp[1], find an 'int', and ksh is unable to convert
it to 'double' and all sorts of crash occur.

This is a complete design flaw, may be due to initial design for
sequential access only (like bash) then piled on a kludge to try an
indexed access.

This patch will leave all the sfio infrastructure (functions)
untouched, yet will implement a workaround on each flaw. The
reasons not to touch the existing functions is that the format scan
may be used by scanf() and I'm not fixing scanf for now.

The workaroud is based on adding more members in the struct printf
data structure, basically add argv0, the inital backup of argv[]
received by the builtin. nextarg is left unchanged, the former
infrastructure still maintained i.e. advance; they are tied to
sequential access. Yet, indexed access is the argv0[x] access with
lazy conversion, i.e. type conversion on demand, i.e. late in the
printf() process.

In short, this fix works 'as if' the format string is pre-pocessed,
replacing any non-indexed access (i.e. seq access) by an explicitly
computed index access, the index generated being simply the seq
index i.e. something similar to:

  +----++------+-++-----------+-- Seq
  |    ||      | ||  ndx ---+ |
  v    vv      v vv         v v
'%s   %*s     %*.*s       %*1$s'   expanded to (ignore spaces)
'%1$s %3$*2$s %6$*4$.*5$s %7$*1$s'
  ^    ^  ^    ^  ^   ^    ^  ^
  |    |  |    |  |   |    |  |
  +----+--+----+--+---+----+--+-- Computed Seq index.
  Note the index inversion due to the syntax of the format string
  printf '%*.*s' width prec data; is equivalent to
  printf '%3$*1$.*2$'  width prec data;

Here I keep all the infrastructure, yet working around bugs.

src/lib/libast/sfio/sfvprintf.c:
- Implement argp, argn processing so that each argv[] cached is on
  its right position.
- Use nargs/xargs make sure we accees the correct argv[] position.
- argv[] vs fp[] type mismatch we do a late argv[] evaluation
  (uncached).

src/lib/libast/sfio/sftable.c:
- sffmtpos() duplicates the format string processing done here in
  sfvprintf(), so we implement the same argp, argn processing so
  that each argv[] cached is on its right position.

src/cmd/ksh93/bltins/print.c:
- Add reload() function. It is called when the caller wants to
  solve a fp[x] cache miss, i.e., a type mismatch between the
  cached type/value and the desired new type. It does the same as
  extend(), i.e. setup type/value for a given argp and then bump
  nextarg, but also saves/restores nextarg. This because 'maybe'
  extend is used by other code (scanf?).

Resolves: #324
McDutchie added a commit that referenced this issue May 18, 2023
The second part of the patch from one of the PR comments[*] didn't
make it into the PR, so it failed to fix one of the originally
reported reproducers. This patch applies that fix.

This corrects the following reproducer:

    $ printf '%b %1$s\n' '\\\\'
    \\

to:

    $ printf '%b %1$s\n' '\\\\'
    \\ \\\\

[*] #617 (comment)
McDutchie added a commit that referenced this issue May 18, 2023
The second part of the patch from one of the PR comments[*] didn't
make it into the PR, so it failed to fix one of the originally
reported reproducers. This patch applies that fix.

This corrects the following reproducer:

    $ printf '%b %1$s\n' '\\\\'
    \\

to:

    $ printf '%b %1$s\n' '\\\\'
    \\ \\\\

[*] #617 (comment)
McDutchie added a commit that referenced this issue May 18, 2023
The second part of the patch from one of the PR comments[*] didn't
make it into the PR, so it failed to fix one of the originally
reported reproducers. This patch applies that fix.

This corrects the following reproducer:

    $ printf '%b %1$s\n' '\\\\'
    \\ \\

to:

    $ printf '%b %1$s\n' '\\\\'
    \\ \\\\

[*] #617 (comment)

Co-authored-by: Martijn Dekker <martijn@inlv.org>
McDutchie added a commit that referenced this issue May 18, 2023
The second part of the patch from one of the PR comments[*] didn't
make it into the PR, so it failed to fix one of the originally
reported reproducers. This patch applies that fix.

This corrects the following reproducer:

    $ printf '%b %1$s\n' '\\\\'
    \\ \\

to:

    $ printf '%b %1$s\n' '\\\\'
    \\ \\\\

[*] #617 (comment)

Co-authored-by: Martijn Dekker <martijn@inlv.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

4 participants