-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect result for (== 1) . length . filter (== ',')
depending on -O level
#197
Comments
Ah, you were faster than me ^^. |
Removing this RULE from {-# RULES
"TEXT ==N/length -> compareLength/==EQ" [~1] forall t n.
eqInt (length t) n = compareLength t n == EQ
#-} EDIT: posted the wrong RULE. |
If you are hit by this, I'm currently using the following workaround: strictly evaluate |
So, I've narrowed this down a little more. Looking at Core. Please tell me if I'm barking up the wrong tree here. I'm currently working with this code: wat :: Bool
wat = length (filter (== ',') "0,00") == 1 This is the code that gets emitted with optimizations turned on. wat
wat =
case unpackCString# "0,00"#
of _ { Text dt_a6KV dt1_a6KW dt2_a6KX ->
case tagToEnum# (<# dt2_a6KX 1#) of _ {
False ->
let {
len_s6W8
len_s6W8 = uncheckedIShiftRA# dt2_a6KX 1# } in
case tagToEnum# (># len_s6W8 1#) of _ {
False ->
let {
$j_a6LS
$j_a6LS =
\ _ ->
let {
end_a6KU
end_a6KU = +# dt1_a6KW dt2_a6KX } in
letrec {
$wloop_cmp_s6Zq
$wloop_cmp_s6Zq =
\ ww_s6Zk ww1_s6Zo ->
case tagToEnum# (>=# ww1_s6Zo end_a6KU) of _ {
False ->
case indexWord16Array# dt_a6KV ww1_s6Zo of r#_a6L7 { __DEFAULT ->
case tagToEnum# (geWord# r#_a6L7 55296##) of _ {
False ->
case chr# (word2Int# r#_a6L7) of _ {
__DEFAULT -> $wloop_cmp_s6Zq ww_s6Zk (+# ww1_s6Zo 1#);
','# ->
case tagToEnum# (># ww_s6Zk 1#) of _ {
False -> $wloop_cmp_s6Zq (+# ww_s6Zk 1#) (+# ww1_s6Zo 1#);
True -> GT
}
};
True ->
case tagToEnum# (leWord# r#_a6L7 56319##) of _ {
False ->
case chr# (word2Int# r#_a6L7) of _ {
__DEFAULT -> $wloop_cmp_s6Zq ww_s6Zk (+# ww1_s6Zo 1#);
','# ->
case tagToEnum# (># ww_s6Zk 1#) of _ {
False -> $wloop_cmp_s6Zq (+# ww_s6Zk 1#) (+# ww1_s6Zo 1#);
True -> GT
}
};
True ->
case indexWord16Array# dt_a6KV (+# ww1_s6Zo 1#)
of r#1_a6Li { __DEFAULT ->
case chr#
(+#
(+#
(uncheckedIShiftL#
(-# (word2Int# r#_a6L7) 55296#) 10#)
(-# (word2Int# r#1_a6Li) 56320#))
65536#)
of _ {
__DEFAULT -> $wloop_cmp_s6Zq ww_s6Zk (+# ww1_s6Zo 2#);
','# ->
case tagToEnum# (># ww_s6Zk 1#) of _ {
False -> $wloop_cmp_s6Zq (+# ww_s6Zk 1#) (+# ww1_s6Zo 2#);
True -> GT
}
}
}
}
}
};
True ->
case tagToEnum# (<# ww_s6Zk 1#) of _ {
False ->
case ww_s6Zk of _ {
__DEFAULT -> GT;
1# -> EQ
};
True -> LT
}
}; } in
$wloop_cmp_s6Zq 0# dt1_a6KW } in
case len_s6W8 of _ {
__DEFAULT ->
case $j_a6LS void# of _ {
__DEFAULT -> False;
EQ -> True
};
1# ->
case dt2_a6KX of _ {
__DEFAULT ->
case $j_a6LS void# of _ {
__DEFAULT -> False;
EQ -> True
};
1# -> True
}
};
True -> False
};
True -> False
}
} What I found interesting was the length check at the beginning: of _ { Text dt_a6KV dt1_a6KW dt2_a6KX ->
case tagToEnum# (<# dt2_a6KX 1#) of _ {
False ->
let {
len_s6W8
len_s6W8 = uncheckedIShiftRA# dt2_a6KX 1# } in
case tagToEnum# (># len_s6W8 1#) of _ { I've written a little piece of code the emulate this check: {-# LANGUAGE MagicHash #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text
import Data.Text.Internal
import GHC.Exts
import GHC.Int
main :: IO ()
main = do
print (showText ("0,00" :: Text))
print (tagToEnum# ((>#) (uncheckedIShiftRA64# 4# 1#) 1#) :: Bool) Turns out it yields |
This looks a lot like it. I've tried Looks like this piece of code forces me into the false len_s6W8 = uncheckedIShiftRA# dt2_a6KX 1# Where |
I took a look at the following function and added some tracing: compareLengthI :: Integral a => Stream Char -> a -> Ordering
compareLengthI (Stream next s0 len) n =
case compareSize (trace ("len: " ++ show len) len) (fromIntegral n) of
Just o -> trace ("n: " ++ show (fromIntegral n)) (trace ("o: " ++ show o) )o
Nothing -> trace ("n: " ++ show (fromIntegral n)) (loop_cmp 0 s0)
where
loop_cmp !z s = case next s of
Done -> compare z n
Skip s' -> loop_cmp z s'
Yield _ s' | z > n -> GT
| otherwise -> loop_cmp (z + 1) s'
The output I get is rather interesting
This looks like it's comparing the length of "0,00" (which has a length of {-# RULES
"TEXT ==N/length -> compareLength/==EQ" [~1] forall t n.
eqInt (length t) n = compareLength t n == EQ
#-} |
Rewriting wat :: Bool
wat = x `seq` length x == 1
where
x = filter (== ',') "1,00" Which yield the following correct output:
The optimized version is comparing the length of the unfiltered string to 1 if the filter expression has not been forced before. |
Looking at core2core output. If my assumptions are correct, this is the last version of the code that is correct, after that we it seems to be checking the lenght of the original string. wat
wat =
case compareLengthI
$fIntegralInt
(filter
(\ ds_d6kf ->
case ds_d6kf of _ { C# x_a6u0 ->
case x_a6u0 of _ {
__DEFAULT -> False;
','# -> True
}
})
(stream (unpackCString# "0,00"#)))
(I# 1#)
of _ {
__DEFAULT -> False;
EQ -> True
} This happens quite early in the process. I've disabled the "TEXT literal" RULE (which also fires quite early, hence my suspicion that it's somewhat involved in the process) and that seems to fix the behavior as well, as well as hopefully not breaking stream fusion as my other attempts might. |
Fix coming thanks to @nomeata. |
@nomeata Can I read up somewhere on how you debugged this? :) |
Well, after seeing in the Core that streams are involved, my first hypothesis was that In the code I saw that streams cache information about their length (upper bound and lower bounds, if present). And that made me think: I wonder if these caches are always up-to-date. And that hit it on the spot: The There are quite a few more functions where the |
Althought the bug might well be in
Although I suspect that this comment reflects the state before the (Judging from the code I cannot exploit this bug to override arbitrary memory…) |
@bgamari, you started fixing this bug, didn’t you? Are you going to submit a patch? (Don’t forget the other functions that don’t update the |
Joachim Breitner <notifications@github.com> writes:
@bgamari, you started fixing this bug, didn’t you? Are you going to submit a patch? (Don’t forget the other functions that don’t update the `len` field accordingly.)
Indeed I did; it's been on my list of things to finish up but other
things have been getting in the way. I'll be hiking tomorrow, but I'll
try to get to it on Monday.
|
Ok, no hurry from my side, I just wanted to make sure it is on someone’s list. |
I'm beginning to think that it is the If we really want the |
@bgamari here's something I should have done right away; bisecting through hackage releases: The result appears to be that the sample program above reports the correct result with likely caused the regression. However, the commit EDIT: so it was rather a66cbb7 (motivated by #76 & #75).... and so this regression was introduced by @ekmett it seems |
Oh dear, sorry, I cited the wrong commit; I meant a66cbb7. I have updated the original comment as well. |
My vague recollection of the original intent was to get a conservative LT answer based on a bound of how bad the expansion could be. We should be able to salvage something similar. |
Alright, fair enough; that should be doable |
There were size hint issues throughout the fusion implementation. See haskell#197.
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197.
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197.
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197.
Fix usage of size hints which resulted in serious bugs such as operations like `(== 1) . length . filter (== ',')` (see #197) giving wrong results.
This was fixed by #200 |
(cherry picked from commit 758c116)
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197. (cherry picked from commit cfb8278)
(cherry picked from commit 758c116)
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197. (cherry picked from commit cfb8278)
(cherry picked from commit 758c116)
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197. (cherry picked from commit cfb8278)
(cherry picked from commit 758c116)
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197. (cherry picked from commit cfb8278)
(cherry picked from commit 758c116)
This fixes a variety of size hint bugs in text's fusion framework. These issues fell broadly into two classes, * Code point/code unit confusion * Inappropriate bounds It seems the most of the latter were introduced when the Size type was extended to track both upper and lower bounds in f4fc30c. These could manifest in a variety of issues similar to haskell#197. (cherry picked from commit cfb8278)
@raichoo reports a nasty bug related to
text
(it's still unclear what exactly is going on here, although I suspecttext
's RULE framework to be the culprit -- I hope so at least, as the alternative would mean that several GHC releases out there may generate wrong code):when compiling this with
-O0
or running in GHCi, the properTrue
output is generated. However, when compiling with-O1
the wrongFalse
output is generated.The text was updated successfully, but these errors were encountered: