-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 324-1 #617
Bug 324-1 #617
Conversation
printf format string can handle sequential argv[] access via % and * as well as indexed argv[] access via %x$ *x$ (* used for width/prec) In a given format string it is possible to mix'n'match sequential/indexed access, and mix'n'match the argv[] types. '%*2$s' 1 2 Indexed access for 2$ making argv[1] an int, seq access for 's' then accessing arv[0] for s, here we mix seq/ndx access. '%*2$s %s' 1 2 Same as above, the 1st format makes argv[0] a 'string' argv[1] an 'int' then the second format make argv[1] a 'string' this is a mixed type access for arv[1]. The former algo for printf-format handling builtin (say printf) is to save the builting argv[] in a data structure called struct printf located in src/cmd/ksh93/bltins/print.c, as you noted here this is located on a .c file meaning it is private to the builtin of the ksh. Yet, all the code to scan a printf format string (not limited to printf, may be scanf too) is located in lib/libast/sfio/* meaning all sort of jazz to to setup and use the private struct printf access at sfio level via function pointers. This printf data struct has a member called nextarg[] that is initially setup to the builtin argv[]. It is called nextarg, because on sequential access the % or * format access *nextarg++ meaning each sequential access consume an arg. This is a major design flaw, because it can't support indexed access %x$ *x$ very will, for this on 1st indexed access, a new sfio level (not builtin) data structure is build called fp[], it rescan the format string to find out all the indexed access, enter them in this kind of cache, it record the type/value of the argv[x]. Now another problem occurs, the type enter is what we see on 1st index 'x' occurence, so if we got '%2$d %2$f' then for occurence 2 i.e fp[1] we enter type 'd' and fetch the value nextarg[1], since nextarg may have been moved forward by sequential access it provide bogus value, then later access to %$2f will retrieve the cahced value for fp[1] find an 'int' and ksh is unable to convert it to 'double' and all sort of crash occurs. This is a complete design flaw, may be due to initial design for sequential access only (like bash) then piled on a kludge to try an indexed access. This patch will leave all the sfio infrastruct (functions) untouched, yet will implement workaround on each flaws, the reasons not to touch the existing function is that the format scan may be used by scanf() and I don't fix scanf for now. The workaroud is based on adding more members in the struct printf data structure, basicall add argv0 the inital backup of argv[] received by the builtin, nextarg is left unchanged, the former infrastructure still maintain i.e advance, it are sequential accesse tie, yet indexed access is the argv0[x] access with lazy conversion, i.e type conversion on demand, i.e late in the printf() process. In short this fix works 'as if' the format string is pre-pocessed, replacing any non indexed access (i.e seq access) by an explicit computed index access, the index generated beeing simply the seq index i.e something similar to. +----++------+-++-----------+-- Seq | || | || ndx ---+ | v vv v vv v v '%s %*s %*.*s %*1$s' expansed to (ignore spaces) '%1$s %3$*2$s %6$*4$.*5$s %7$*1$s' ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | +----+--+----+--+---+----+--+-- Computed Seq index. Note the index inversion due to the syntax of the format string printf '%*.*s' width prec data; is equivalent to printf '%3$*1$.*2$' width prec data; Here I keep all the infrastructure, yet working around bugs - exend() function src/cmd/ksh93/bltins/print.c do setup type/value for a given argp then bump nextarg, to work around I made a new function reload()that do the same but save/restore nextarg. This because 'may be' extend is used by other code (scanf?) - sffmtpos() duplicate the format string processing done here sfvprintf(), so we implement the same argp, argn processing so that each argv[] cached is on its right position. - sfvprintf() same as above with use nargs/xargs make sure we accees the correct argv[] position. argv[] vs fp[] type mismatch we do a late argv[] evaluation (uncached).
When I try to grab this PR's diff info, I get the following:
|
Is this the way it is suppose to work now? $ f=$'%3$s %s %s %s\n' ksh -c 'printf "$f" a b c d e f'
c d e f
$
$ f=$'%3$s %s %s %s\n' ksh617 -c 'printf "$f" a b c d e f'
c a b c
f d e f |
Quick speed check on above. $ cat test.ksh
#!/bin/ksh
integer i
time {
for ((i=500000;i--;)) do; printf "$f" a b c d e f > /dev/null; done
}
$
$ f=$'%3$s %s %s %s\n' ksh test.ksh
real 0m06.123s
user 0m03.723s
sys 0m02.356s
$ f=$'%3$s %s %s %s\n' ksh test.ksh
real 0m06.086s
user 0m03.728s
sys 0m02.316s
$ f=$'%3$s %s %s %s\n' ksh test.ksh
real 0m06.011s
user 0m03.602s
sys 0m02.367s
$
$ f=$'%3$s %s %s %s\n' ksh617 test.ksh
real 0m05.884s
user 0m03.459s
sys 0m02.382s
$ f=$'%3$s %s %s %s\n' ksh617 test.ksh
real 0m05.782s
user 0m03.378s
sys 0m02.362s
$ f=$'%3$s %s %s %s\n' ksh617 test.ksh
real 0m05.706s
user 0m03.290s
sys 0m02.374s
$
# Best run out of 3 using real user time. PR617 is approximately %5 faster for this one test.
$ echo $((6.011/5.706))
1.05345250613389415 |
I don't know, I still haven't wrapped my head around this whole bug. But I will note the zsh behaviour is identical to the patched ksh's behaviour above. |
Sorry I didn't spotted the post. Yes I have a big problem with my work env, and it looks like my editor (emacs autoindent, untabify, C templates, etc..) produce incompatible src, because it get confuse with all this space/tab jazz, and at the moment I don't have a process to reinject my changes properly. Sorry about that I try to work out a way to not pollute your sources. So far this is a candidate, so may be some points or some behavior got to be adjusted, bugs fixed.
Yes this is what the patch produce, and I think it is correct unless one say otherwise. The patch implement as per printf(3) docco auto numbering of non indexed argv[] access.
Where we clearly see that non indexed are sequentially numbered, and the fact the
On my side I got
On this Production ksh logic was more or less forward progress i.e it does something along *argv++ to access that lead to error, beside producing strange things like
not talk about infinite loops or segv. I guess in fact nobody ever relying of mix'n'match indexing vs non-indexing access because it was too buggy so I gues we will not break anything. What works with prod ksh is non mix'n'match.
Then the bumpy area
Let me know if you think something must be adjusted. Cheers, |
- Fix coding style to match surroudning code - Remove extraneous spaces/tabs - Change space indentation to tab indentation
@phidebian Thanks for the clarification. As this is a difficult area with the way printf has been used for so many years, it took many explanations for me to get on the same path as you. I believe I understand now and will start testing more later today. New/Fixed/Changed printf behavior for ksh: |
Performance using best out of 3 runs, ksh vs ksh617:
Found no errors so far. Will continue testing tomorrow. |
Regarding perf. I tried not to change the original code logic much, but instead put some band aids around faulty code, while to be honest we should rework this code path, the actual code as 3 or 4 format (don't remember exactly) string scan when at least 1 indexed access exist, while theorically this all things can be done in 1 pass. My fix try not to upset the code base more than necessary, i.e I reused most of the original code,
Perf diff occurs when mixing types, something impossible to do with a compiler (C) but doable with an interpreter,
Because I use the std method for conversion this fix a relativly safe, I didn't entered all the possible in the QA, I have a QA script generator, I should test all the possible type mix double/int double/str and the infamous %()T I didn't check that at all. may be it a segv. I give it a more thorough QA when I get back, but for the main use we should be good enough. Cheers, |
For clean indexed access: %2$s %1$s %3$s: 5.542/5.732s, 0.966852756454989532 = 4.3% slower |
Hand testing completed. I could not break this new preprocessing auto-numbering string formatting approach to resolve the mix'n'match issues. However, the %s type field output is still not producing correct results when proceeded by a %b type field. Original 2nd reproducer of #324 is still not working.
Additionally, hard coding all the parameters did not help. So now, the parameter indexing issue is resolved but the %b special processing appears to be applied to %s. See
|
Hurray, BE's back, so I cutted the 2nd round for the patch candidate. As I envisioned, I lost sigth of the initial request because the initial request of the kind The fix kept the initial logic to build the fp[] array (kinda argv[] cache for type/value), because I didn't wanted to change the original code too much, but then when a type mismatch occurs like
Here 1$ is many types, there are all late converted. And the the infamous My test regarding type mismatch forgot to look the base, now I look at it and it is fixed.
For the record, we are almost on pare with ZSH except
Other diff with ZSH
|
Alrighty. You have pointed out more wrong stuff with ZSH as compared what you have coded for ksh's new printf behavior. So far, what you have done all makes sense to me. As I do not want to spend another 6 hours or so revalidating another set of code in your adjusted bug-324-2, I went through this following procedure to find the code changes you did in order to correct the last outstanding issue with %b as PR branches are made to track the flow of changes. @hyenias procedure followed for work on an existing PR needing edits. This has the result of creating a new branch named pr_617 on my local machine so that I can hand edit the changes found. The author that created the PR would already have such a branch and may need to perform git pull to sync it up with the ksh93/master/617 branch as @McDutchie adjusted your code to the repo's standards already. I do not want to trash his work on reformatting your code nor do I want to start from scratch and test everything again.
Then, I opened your bug-324-2 branch along with the ksh93/617 branch and stared-and-compared. For the sake of my time as I am not the author that is more familiar with his own code, I skipped comments and tests just looked for code impacting changes. Here is what I found: diff --git a/src/lib/libast/sfio/sfvprintf.c b/src/lib/libast/sfio/sfvprintf.c
index 9454c936..92855b30 100644
--- a/src/lib/libast/sfio/sfvprintf.c
+++ b/src/lib/libast/sfio/sfvprintf.c
@@ -319,9 +319,9 @@ loop_fmt :
{ if(!fp &&
!(fp = (*_Sffmtposf)(f,oform,oargs,ft,0)) )
goto pop_fmt;
+ if(n > xargs)
+ xargs = n;
}
- if(n > xargs)
- xargs = n;
else
n=++nargs;
@@ -600,9 +600,8 @@ loop_fmt :
argp = ++nargs;
if(fp)
{ if(ft && ft->extf)
- { if(fmt == fp[argp].ft.fmt)
- { if(fp[argp].ft.fmt != fp[argp].fmt)
- fmt = fp[argp].ft.fmt;
+ { if(fmt == fp[argp].ft.fmt && fp[argp].ft.fmt == fp[argp].fmt)
+ {
argv = fp[argp].argv;
size = fp[argp].ft.size;
} I spot checked and it all seems to work for me. Does the above diff cover all of your changes from @McDutchie last commit 366770d on this PR? I would recommend updating your local copy of this PR #617 then making all the necessary changes to the code base then push back up to this PR. @phidebian Thank you for all the work you have done on this PR to resolve issue #324. |
I am not too familiar with the code review process you follow, I should read it... meanwhile WATCHOUT, the thread here is called bug-324-1, may be because I did a PR, while my final last patch candidate is https://github.com/phidebian/ksh/tree/bug-324-2, didn't made a PR because didn't want to pollute the process if it was not good. If you spent 6hour on the 324-1 you did the bulk of it, 324-2 is just 1 test line change from
The first (original) test catch type mismatch between the currently scanned The second test comes when the 1st succeed as in Regarding perf, I decided not to update the cache, i.e the the first cache entry setup is kept all the time, because there is no assumtion about what torture we will get next, beside all this mix'n'match is marginal, people use it normally, i.e don't really venture in grey area like Note that before working on this fix, I made another one who was fixing nothing, but just reject C illegal combo, as explained C don't allow mix'n'match in index vs non index access, but then @hyenias said ksh is not C that I understand, the fact we are re-doing the scanning for long argv[] is something C don't do and we enjoy (hum I got to torture C vsnprintf(...va_list) may be this one loops on argv[] too never tried that ). @McDutchie for this bug-324-2 I inaugurated my new code review process before committing, and then before push, before PR, etc... I should not convert random Have a good one, and thanx @hyenias I rekon this all thing is marginal, most people use printf normally and generally don't torture it like this, yet when torture happen we got to produce something with a given logic, producing predictable result, and admitting that non indexed access are auto numbered may looks odd, but at list understandable. Now the goodies, do you know the
Not very well documented, I got bitten the hardway during fix, thanx to |
@phidebian I did not know about ksh's printf format string concatenation with delimiter of $ printf '%..:s PM\n' 12 30 45
12:30:45 PM
$ printf '%..-s\n' 04 09 2023
04-09-2023 |
I wonder why this thing was invented, yet it allow things like building a PATH like variable from an array
It allow also the generation of a comma separated list of word without to worry about leading/trailing excess comma, could be usefull in docco printout, or C code generation :-)
I wonder how many undocumentd features exist like the |
Notably: Remove long, fix-specific comments from source code. These will be reused in the final git commit message instead, for documentation findable via e.g. 'git blame'. They won't make sense on their own 20 years from now without the context of the commit. sfio.h: Rename reload to reloadf for consistency with extf, eventf. tests/{shtests,printf.sh}: Instead of err_exit in comments, just grep -c '^T ' to count the number of tests in printf.sh.
@stephane-chazelas reported: > $ printf '%1$s %1$b\n' '\\\\' > \\\\ b > $ printf '%b %1$s\n' '\\\\' > \\ > > expected: > > $ printf '%1$s %1$b\n' '\\\\' > \\\\ \\ > $ printf '%b %1$s\n' '\\\\' > \\ \\\\ @hyenias commented: > After some trial-and-error and some investigation I believe I > understand the trouble you expressed. Trouble occurs when x$ aka > parameter field is used[*]. > > Example: Reorder parameters and repeat the 2nd: > > $ printf '%3$s %2$s %1$s %2$s\n' a b c > c b a b > > More troubled output: > > $ printf '%b %1$s\n' '\\\\' > \\ > $ printf '%1$b %1$s\n' '\\\\' > \\ \\ > $ printf '%s %1$s\n' '\\\\' > \\\\ > $ printf '%s %1$s\n' 'a' > a > $ printf '%1$s %1$s\n' 'a' > a a > > [*] See n$ parameter field: https://en.wikipedia.org/wiki/Printf Analysis and fix: printf format string can handle sequential argv[] access via % and * as well as indexed argv[] access via %x$ *x$ (* used for width/prec) In a given format string it is possible to mix'n'match sequential/ indexed access, and mix'n'match the argv[] types. '%*2$s' 1 2 Indexed access for 2$ making argv[1] an int, seq access for 's' then accessing arv[0] for s, here we mix seq/ndx access. '%*2$s %s' 1 2 Same as above, the 1st format makes argv[0] a 'string' argv[1] an 'int' then the second format make argv[1] a 'string' this is a mixed type access for arv[1]. The former algo for printf-format handling builtin (say printf) is to save the builtin argv[] in a data structure called struct printf located in src/cmd/ksh93/bltins/print.c. This is located in a .c file, meaning it is private to the ksh builtin. Yet, all the code to scan a printf format string (not limited to printf, may be scanf too) is located in lib/libast/sfio/ meaning all sorts of jazz to to setup and use the private struct printf access at sfio level via function pointers. This printf data struct has a member called nextarg[] that is initially setup to the builtin argv[]. It is called nextarg, because on sequential access the % or * format access *nextarg++, meaning each sequential access, consumes an arg. This is a major design flaw, because it can't support indexed access %x$ *x$ very well. For this, on 1st indexed access, a new sfio level (not builtin) data structure is built called fp[]. It rescans the format string to find out all the indexed access, enters them in this kind of cache, recording the type/value of the argv[x]. Now another problem occurs: the type entered is what we see on 1st index 'x' occurence, so if we got '%2$d %2$f' then for occurence 2 i.e. fp[1] we enter type 'd' and fetch the value nextarg[1]. Since nextarg may have been moved forward by sequential access, it provides a bogus value. Then later access to %$2f will retrieve the cached value for fp[1], find an 'int', and ksh is unable to convert it to 'double' and all sorts of crash occur. This is a complete design flaw, may be due to initial design for sequential access only (like bash) then piled on a kludge to try an indexed access. This patch will leave all the sfio infrastructure (functions) untouched, yet will implement a workaround on each flaw. The reasons not to touch the existing functions is that the format scan may be used by scanf() and I'm not fixing scanf for now. The workaroud is based on adding more members in the struct printf data structure, basically add argv0, the inital backup of argv[] received by the builtin. nextarg is left unchanged, the former infrastructure still maintained i.e. advance; they are tied to sequential access. Yet, indexed access is the argv0[x] access with lazy conversion, i.e. type conversion on demand, i.e. late in the printf() process. In short, this fix works 'as if' the format string is pre-pocessed, replacing any non-indexed access (i.e. seq access) by an explicitly computed index access, the index generated being simply the seq index i.e. something similar to: +----++------+-++-----------+-- Seq | || | || ndx ---+ | v vv v vv v v '%s %*s %*.*s %*1$s' expanded to (ignore spaces) '%1$s %3$*2$s %6$*4$.*5$s %7$*1$s' ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | +----+--+----+--+---+----+--+-- Computed Seq index. Note the index inversion due to the syntax of the format string printf '%*.*s' width prec data; is equivalent to printf '%3$*1$.*2$' width prec data; Here I keep all the infrastructure, yet working around bugs. src/lib/libast/sfio/sfvprintf.c: - Implement argp, argn processing so that each argv[] cached is on its right position. - Use nargs/xargs make sure we accees the correct argv[] position. - argv[] vs fp[] type mismatch we do a late argv[] evaluation (uncached). src/lib/libast/sfio/sftable.c: - sffmtpos() duplicates the format string processing done here in sfvprintf(), so we implement the same argp, argn processing so that each argv[] cached is on its right position. src/cmd/ksh93/bltins/print.c: - Add reload() function. It is called when the caller wants to solve a fp[x] cache miss, i.e., a type mismatch between the cached type/value and the desired new type. It does the same as extend(), i.e. setup type/value for a given argp and then bump nextarg, but also saves/restores nextarg. This because 'maybe' extend is used by other code (scanf?). Resolves: #324
The second part of the patch from one of the PR comments[*] didn't make it into the PR, so it failed to fix one of the originally reported reproducers. This patch applies that fix. This corrects the following reproducer: $ printf '%b %1$s\n' '\\\\' \\ to: $ printf '%b %1$s\n' '\\\\' \\ \\\\ [*] #617 (comment)
The second part of the patch from one of the PR comments[*] didn't make it into the PR, so it failed to fix one of the originally reported reproducers. This patch applies that fix. This corrects the following reproducer: $ printf '%b %1$s\n' '\\\\' \\ to: $ printf '%b %1$s\n' '\\\\' \\ \\\\ [*] #617 (comment)
The second part of the patch from one of the PR comments[*] didn't make it into the PR, so it failed to fix one of the originally reported reproducers. This patch applies that fix. This corrects the following reproducer: $ printf '%b %1$s\n' '\\\\' \\ \\ to: $ printf '%b %1$s\n' '\\\\' \\ \\\\ [*] #617 (comment) Co-authored-by: Martijn Dekker <martijn@inlv.org>
The second part of the patch from one of the PR comments[*] didn't make it into the PR, so it failed to fix one of the originally reported reproducers. This patch applies that fix. This corrects the following reproducer: $ printf '%b %1$s\n' '\\\\' \\ \\ to: $ printf '%b %1$s\n' '\\\\' \\ \\\\ [*] #617 (comment) Co-authored-by: Martijn Dekker <martijn@inlv.org>
This is a patch candidate for ksh93 Issue#324