Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is $1 == 0 when $1 is apparently "uninitialized" #175

Closed
raygard opened this issue Apr 3, 2023 · 7 comments
Closed

Why is $1 == 0 when $1 is apparently "uninitialized" #175

raygard opened this issue Apr 3, 2023 · 7 comments

Comments

@raygard
Copy link

raygard commented Apr 3, 2023

nawk '{print ($1 == 0)}' lf (where file lf contains a single linefeed) prints 0, as does mawk and gawk.

I think $1 should have the POSIX "uninitialized" value, and using typeof() in gawk says it's "unassigned". POSIX says comparing numeric to uninitialized should do a numeric compare, and uninitialized has numeric value 0, so this should print 1 (true), shouldn't it?

I'm probably missing something obvious, but what?

(I know onetrueawk is not necessarily POSIX-conformant, but I do not know where else to ask Arnold or other awk experts about this.)

@plan9
Copy link
Collaborator

plan9 commented Apr 3, 2023

maybe the wording in the standard is confusing you. uninitialized $expr field references result in an empty string, which is non-zero, and all the implementations are in agreement. [including mks awk, i might add]

@plan9 plan9 closed this as completed Apr 3, 2023
@raygard
Copy link
Author

raygard commented Apr 3, 2023

Thank you for the prompt reply, and I really am not trying to be a smart-aleck here, but the standard is confusing me because it's clearly in contradiction to the major implementations. BTW busybox awk prints 1 for the above.
I am working on my own implementation and find it confusing when the major implementations differ from the standard. Is it OK if I continue to ask here about divergence between implementations and the standard? Is there a better venue for such questions?
Incidentally, posix says that field references beyond $NF "shall evaluate to the uninitialized value", and also that "Field variables shall have the uninitialized value when created from $0 using FS and the variable does not contain any characters." Apparently existing implementations (except busybox as noted) evaluate (without creating in the former case) or create these as empty string variables instead. I will do the same, and perhaps raise the issue with the POSIX folks.

@plan9
Copy link
Collaborator

plan9 commented Apr 4, 2023

I know what the standard says. It has a very specific definition of an uninitialized value:

An uninitialized value shall have both a numeric value of zero and a string value of the empty string.

that is your clue. an uninitialized $field will always evaluate to empty string [see eg. "Expressions in Decreasing Precedence in awk" table].

perhaps you have some other notion of "uninitialized" value in mind, one without a string representation, which would mean any field that is dereferenced but is beyond NF could not be used at all. that's an interesting problem, but not in the awk standard.

@arnoldrobbins
Copy link
Collaborator

arnoldrobbins commented Apr 4, 2023

@raygard This is as good a place as any to raise issues.

FYI, the standard isn't always right, unfortunately. A while back I emailed in a large number of issues and was given a bug reporting account to report them, instead, but I ran out of steam. :-(

If you haven't, I suggest reading Part I of the gawk manual which covers standard awk pretty well. I also suggest going through all the "dark corners" in the manual; you can get to them via the index. HTH.

@raygard
Copy link
Author

raygard commented Apr 5, 2023

@plan9 and @arnoldrobbins, thank you. I found a StackOverflow post (https://stackoverflow.com/questions/51632945/in-awk-why-does-a-nonexistent-field-like-nf1-not-equal-zero) from @benhoyt asking about this exact issue 5 years ago when Ben was working on his own goawk implementation. This triggered a POSIX defect report (https://www.austingroupbugs.net/view.php?id=1198) because the standard disagrees with major implementations. POSIX changed the precedence table from $expr | Field reference | String | N/A to $expr | Field reference | Uninitialized or string | N/A, so it doesn't conflict with the text, rather than changing the text to agree with gawk/mawk/BWK awk.

Arnold, I will check out the dark corners items in the gawk manual. I am working on a minimal implementation. I will try to comply with existing practice where it conflicts with the standard.

@plan9
Copy link
Collaborator

plan9 commented Apr 6, 2023

@raygard i think you are misinterpreting what's happening.

there was no change to the standard in https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

what you see on that page is an interpretation with suggested changes proposed and approved for future consideration. this does not mean awk standard has changed. if and when an attempt is made to update/rewrite it, which requires going through the full standards-committee process and multiple levels of approvals, these suggested interpretations and changes will be considered, and the committee may accept or reject them.

of course anyone is welcome to read this interpretation as correct and future standard, and implement it, but I will suggest that compatibility with the existing "non-compliant" implementations may prove advantageous for now.

@raygard
Copy link
Author

raygard commented Apr 7, 2023

@plan9, you are right that I misunderstood "Accepted as Marked" and "Proposed => Approved", etc. as indicating that these changes were destined for the next release. On a related note, I see there is much about the process at (https://www.opengroup.org/austin/docs/austin_sd6.txt). I'll check it out more when I get a chance. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants