fish converts an empty string to zero in numeric contexts #3346

Closed
krader1961 opened this Issue Sep 1, 2016 · 5 comments

Comments

Projects
None yet
4 participants
@krader1961
Contributor

krader1961 commented Sep 1, 2016

Several recent issues involving the test and printf builtins brought to light that fish is inconsistent about how it treats an empty, null, string when converting it to a number. For example, test "" -ge 0 is considered true because the empty string is converted to a zero. Doing printf '%f\n' "" prints 0.000000. Yet the similar printf '%d\n' "" on macOS Sierra (as of git commit eeb42f5) reports the error ": Invalid argument". Whereas printf '%f\n' "" prints "0.00000". On the other hand the printf '%d\n' "" on Ubuntu 16.04 (with the same version of fish) prints "0".

While I am curious why Ubuntu and macOS Sierra handle the printf '%d\n' "" case differently it's orthogonal to the core point of this issue. Which is, fish should consistently treat an empty, null, string (i.e., "") as an invalid number. The current behavior is problematic. It will also be inconsistent if the change I believe @floam is working on to make the test "" -ge 0 scenario be an error rather than return a true (zero) status is merged.

See also issue #3334.

@krader1961 krader1961 added the bug label Sep 1, 2016

@krader1961 krader1961 added this to the fish-future milestone Sep 1, 2016

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Sep 7, 2016

Member

While I am curious why Ubuntu and macOS Sierra handle the printf '%d\n' "" case differently it's orthogonal to the core point of this issue.

Actually, the OS X/Linux difference here and the cause of the builtin_test failure are likely quite related: checking strtol success.

Consider how we were checking for a successful conversion by wcstoll in the test builtin:

static bool parse_number(const wcstring &arg, long long *out, wcstring_list_t &errors) {
     const wchar_t *str = arg.c_str();
     wchar_t *endptr = NULL;
     *out = wcstoll(str, &endptr, 10);
     return endptr && *endptr == L'\0';
}

wcstoll returns 0 in the case of failure. Here's the manpage suggesting how to test that the string was not invalid:

... strtol() stores the address of the first invalid
character in *endptr. If there were no digits at all, however, strtol()
stores the original value of str in *endptr. (Thus, if *str is not \0
but **endptr is \0 on return, the entire string was valid.)

By only doing half of this by checking *endptr == '\0' but not str, we end up storing 0 into out and then return true for a failed conversion if there were no digits at all and the input was an empty string. The correct check is *str != L'\0' && *endptr == L'\0'.

Linux vs. OS X

In my changes I went further than using only the prescribed test from the manpage for builtin_test, and I also am checking errno because I wanted to give a more helpful error message to the user (please don't mind some aspects of the diff which are not relevant to this discussion and are in-progress):

static bool parse_number(const wcstring &arg, long long *out, wcstring_list_t &errors) {
+    using namespace test_expressions;
     const wchar_t *str = arg.c_str();
     wchar_t *endptr = NULL;
+    // from the manual:
+    // If endptr is not NULL, wcstrtoll() stores the address of the first invalid
+    // character in *endptr.  If there were no digits at all, strtol() stores
+    // the original value of str in *endptr.  (Thus, if *str is not L'\0' but
+    // **endptr is `\0' on return, the entire string was valid.)
+    
+    errno = 0;
     *out = wcstoll(str, &endptr, 10);
-    return endptr && *endptr == L'\0';
+
+    // XXX DONTCOMMITAARON use &errors instead before committing
+    if (errno == EINVAL) {
+        debug(0, L"Invalid digit: '%S'", endptr);
+        return false;
+    } else if (errno == ERANGE) {
+        debug(0, L"Out of range digit: '%S'", endptr);
+        return false;
+    }
+
+    return *str != L'\0' && *endptr == L'\0';

I recall that according to the manpage, the behavior of errno being set EINVAL where conversion fails after no valid digits might be a platform-specific feature, although I never investigated what that would mean practically:

The strtol(), strtoll(), strtoimax(), and strtoq() functions return the
result of the conversion, unless the value would underflow or overflow.
If no conversion could be performed, 0 is returned and the global vari-
able errno is set to EINVAL (the last feature is not portable across all
platforms)
. If an overflow or underflow occurs, errno is set to ERANGE
and the function return value is clamped according to the following ta-
ble.

My hunch would be that in printf we intended to always do what I did up above, and it worked on OS X, but unfortunately by relying on errno to verify the conversion (which will turn out to work differently for OS X versus Linux here for me to be correct), and then not correctly checking both str and endptr, a major divergence in behavior is possible and that's what is happening.

So I might not want to check for EINVAL like I do above in my WIP. If it's not portable, it will be better to report the "invalid digit" error outside this function when it returns false due to the improvement to the return expression. We will want to make sure not to rely on EINVAL in this context for anything but diagnostics, if it isn't portable, in fish.

Member

floam commented Sep 7, 2016

While I am curious why Ubuntu and macOS Sierra handle the printf '%d\n' "" case differently it's orthogonal to the core point of this issue.

Actually, the OS X/Linux difference here and the cause of the builtin_test failure are likely quite related: checking strtol success.

Consider how we were checking for a successful conversion by wcstoll in the test builtin:

static bool parse_number(const wcstring &arg, long long *out, wcstring_list_t &errors) {
     const wchar_t *str = arg.c_str();
     wchar_t *endptr = NULL;
     *out = wcstoll(str, &endptr, 10);
     return endptr && *endptr == L'\0';
}

wcstoll returns 0 in the case of failure. Here's the manpage suggesting how to test that the string was not invalid:

... strtol() stores the address of the first invalid
character in *endptr. If there were no digits at all, however, strtol()
stores the original value of str in *endptr. (Thus, if *str is not \0
but **endptr is \0 on return, the entire string was valid.)

By only doing half of this by checking *endptr == '\0' but not str, we end up storing 0 into out and then return true for a failed conversion if there were no digits at all and the input was an empty string. The correct check is *str != L'\0' && *endptr == L'\0'.

Linux vs. OS X

In my changes I went further than using only the prescribed test from the manpage for builtin_test, and I also am checking errno because I wanted to give a more helpful error message to the user (please don't mind some aspects of the diff which are not relevant to this discussion and are in-progress):

static bool parse_number(const wcstring &arg, long long *out, wcstring_list_t &errors) {
+    using namespace test_expressions;
     const wchar_t *str = arg.c_str();
     wchar_t *endptr = NULL;
+    // from the manual:
+    // If endptr is not NULL, wcstrtoll() stores the address of the first invalid
+    // character in *endptr.  If there were no digits at all, strtol() stores
+    // the original value of str in *endptr.  (Thus, if *str is not L'\0' but
+    // **endptr is `\0' on return, the entire string was valid.)
+    
+    errno = 0;
     *out = wcstoll(str, &endptr, 10);
-    return endptr && *endptr == L'\0';
+
+    // XXX DONTCOMMITAARON use &errors instead before committing
+    if (errno == EINVAL) {
+        debug(0, L"Invalid digit: '%S'", endptr);
+        return false;
+    } else if (errno == ERANGE) {
+        debug(0, L"Out of range digit: '%S'", endptr);
+        return false;
+    }
+
+    return *str != L'\0' && *endptr == L'\0';

I recall that according to the manpage, the behavior of errno being set EINVAL where conversion fails after no valid digits might be a platform-specific feature, although I never investigated what that would mean practically:

The strtol(), strtoll(), strtoimax(), and strtoq() functions return the
result of the conversion, unless the value would underflow or overflow.
If no conversion could be performed, 0 is returned and the global vari-
able errno is set to EINVAL (the last feature is not portable across all
platforms)
. If an overflow or underflow occurs, errno is set to ERANGE
and the function return value is clamped according to the following ta-
ble.

My hunch would be that in printf we intended to always do what I did up above, and it worked on OS X, but unfortunately by relying on errno to verify the conversion (which will turn out to work differently for OS X versus Linux here for me to be correct), and then not correctly checking both str and endptr, a major divergence in behavior is possible and that's what is happening.

So I might not want to check for EINVAL like I do above in my WIP. If it's not portable, it will be better to report the "invalid digit" error outside this function when it returns false due to the improvement to the return expression. We will want to make sure not to rely on EINVAL in this context for anything but diagnostics, if it isn't portable, in fish.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Sep 7, 2016

Member

Indeed, commenting out the errno check in verify_numeric causes OS X to revert to the reported broken Linux behavior:

void builtin_printf_state_t::verify_numeric(const wchar_t *s, const wchar_t *end, int errcode) {
    /*if (errcode != 0) {
        this->fatal_error(L"%ls: %s", s, strerror(errcode));
    } else*/ if (*end) {
        if (s == end)
            this->fatal_error(_(L"%ls: expected a numeric value"), s);
        else
            this->fatal_error(_(L"%ls: value not completely converted"), s);
    }
}
> printf \%d ''
0⏎                        

It needs to check *end and *s correctly in the else if.

Member

floam commented Sep 7, 2016

Indeed, commenting out the errno check in verify_numeric causes OS X to revert to the reported broken Linux behavior:

void builtin_printf_state_t::verify_numeric(const wchar_t *s, const wchar_t *end, int errcode) {
    /*if (errcode != 0) {
        this->fatal_error(L"%ls: %s", s, strerror(errcode));
    } else*/ if (*end) {
        if (s == end)
            this->fatal_error(_(L"%ls: expected a numeric value"), s);
        else
            this->fatal_error(_(L"%ls: value not completely converted"), s);
    }
}
> printf \%d ''
0⏎                        

It needs to check *end and *s correctly in the else if.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Sep 29, 2016

Contributor

I see that @floam added a change, commit f843eb3 and PR #3405, intended to at least partially address this issue. Is that true, @floam? Given that commit and your earlier comments on this issue would you like to take ownership of it?

Contributor

krader1961 commented Sep 29, 2016

I see that @floam added a change, commit f843eb3 and PR #3405, intended to at least partially address this issue. Is that true, @floam? Given that commit and your earlier comments on this issue would you like to take ownership of it?

@ridiculousfish

This comment has been minimized.

Show comment
Hide comment
@ridiculousfish

ridiculousfish Oct 2, 2016

Member

FYI I reverted f843eb3

Member

ridiculousfish commented Oct 2, 2016

FYI I reverted f843eb3

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Nov 22, 2016

Contributor

Fixing issue #3570 caused me to notice that there are numerous places in the fish code that set errno to zero before performing a string to int conversion then checking the value of errno. That is fubar.

Contributor

krader1961 commented Nov 22, 2016

Fixing issue #3570 caused me to notice that there are numerous places in the fish code that set errno to zero before performing a string to int conversion then checking the value of errno. That is fubar.

@krader1961 krader1961 self-assigned this Nov 24, 2016

krader1961 added a commit to krader1961/fish-shell that referenced this issue Nov 25, 2016

fix handling of odd strings by `test` builtin
The `test` builtin currently has unexpected behavior with respect to
expressions such as `'' -eq 0`. That currently evaluates to true with a
return status of zero. This change addresses that oddity while also
ensuring that other unusual strings (e.g., numbers with leading and
trailing whitespace) are handled consistently.

Fixes #3346

@krader1961 krader1961 modified the milestones: fish 2.5.0, fish-future Nov 25, 2016

krader1961 added a commit that referenced this issue Nov 28, 2016

emit error message when test is given invalid int
This augments the previous change for issue #3346 by adding an error
message when an invalid integer is seen. This change is likely to be
controversial so I'm not going to squash it into the previous change.

@faho faho added the release notes label Dec 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment