Skip to content
This repository has been archived by the owner on Oct 14, 2023. It is now read-only.

Cannot build lean on linux armv7l (ubuntu 16) due to segfault compiling library #1679

Open
abliss opened this issue Jun 18, 2017 · 30 comments
Open

Comments

@abliss
Copy link

abliss commented Jun 18, 2017

Prerequisites

  • [X ] Put an X between the brackets on this line if you have done all of the following:
    • Checked that your issue isn't already filed.
    • Reduced the issue to a self-contained, reproducible test case.

Description

The build of lean fails due to a segfault.

Steps to Reproduce

  1. git clone https://github.com/leanprover/lean [I cloned at 8849df]; cd lean
  2. mkdir -p build/debug; cd build/debug; cmake -D CMAKE_BUILD_TYPE=Debug ../../src
  3. make

Expected behavior: [What you expect to happen]
make should succeed

Actual behavior: [What actually happens]
Make segfaults, apparently during a compile of the lean core?

70%] Built target smt2_frontend
[ 70%] Built target leanstatic
[ 71%] Built target lean
/home/abliss/proj/lean/library/init/core.lean: subtype.has_sizeofSegmentation fault
CMakeFiles/standard_lib.dir/build.make:57: recipe for target 'CMakeFiles/standard_lib' failed
make[2]: *** [CMakeFiles/standard_lib] Error 139
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/standard_lib.dir/all' failed
make[1]: *** [CMakeFiles/standard_lib.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

(The exact target printed in the last line varies slightly.)

Reproduces how often: [What percentage of the time does it reproduce?]
5/5

Versions

$ bin/lean --version
Lean (version 3.1.1, commit 8849df0d3ee1, Debug)
$ uname -a
Linux localhost 3.4.0-gad29d11 #1 SMP PREEMPT Sat Jul 2 00:00:53 UTC 2016 armv7l armv7l armv7l GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

Additional Information

I'll try attaching a core dump, but it's 46M and I don't know if github will allow it. It seems to be some problem during the handling of an exception:

at /home/abliss/proj/lean/src/library/module_mgr.cpp:368
#16 0x0073d04c in main (argc=2, argv=0xbebcd074) at /home/abliss/proj/lean/src/shell/lean.cpp:636
(gdb)
#0  0x00bc8b10 in lean::scanner::next (this=0xbebcae80) at /home/abliss/proj/lean/src/frontends/lean/scanner.cpp:99
#1  0x00bc8c02 in lean::scanner::throw_exception (this=0xbebcae80, msg=0xcd3a48 "unexpected token")
    at /home/abliss/proj/lean/src/frontends/lean/scanner.cpp:119
#2  0x00bca304 in lean::scanner::read_key_cmd_id (this=0xbebcae80) at /home/abliss/proj/lean/src/frontends/lean/scanner.cpp:590
#3  0x00bca6fa in lean::scanner::scan (this=0xbebcae80, env=...) at /home/abliss/proj/lean/src/frontends/lean/scanner.cpp:659
#4  0x0095a6c6 in lean::parser::scan (this=0xbebcae10) at /home/abliss/proj/lean/src/frontends/lean/parser.cpp:228
#5  0x0096a6c8 in lean::parser::next (this=0xbebcae10) at /home/abliss/proj/lean/src/frontends/lean/parser.h:282
#6  0x00966630 in lean::parser::parse_imports (this=0xbebcae10, fingerprint=@0xbebcacd0: 2974842506, imports=std::vector of length 6, capacity 8 = {...})
    at /home/abliss/proj/lean/src/frontends/lean/parser.cpp:2401
#7  0x0096711e in lean::parser::get_imports (this=0xbebcae10, imports=std::vector of length 6, capacity 8 = {...})
    at /home/abliss/proj/lean/src/frontends/lean/parser.cpp:2477
#8  0x008b2ab8 in lean::module_mgr::get_direct_imports (this=0xbebcc694, id="/home/abliss/proj/lean/library/init/category/default.lean",
    contents="/-\nCopyright (c) 2016 Microsoft Corporation. All rights reserved.\nReleased under Apache 2.0 license as described in the file LICENSE.\nAuthors: Leonardo de Moura\n-/\nprelude\nimport init.category.applica"...) at /home/abliss/proj/lean/src/library/module_mgr.cpp:415
#9  0x008b179e in lean::module_mgr::build_lean (this=0xbebcc694, id="/home/abliss/proj/lean/library/init/category/default.lean",
    contents="/-\nCopyright (c) 2016 Microsoft Corporation. All rights reserved.\nReleased under Apache 2.0 license as described in the file LICENSE.\nAuthors: Leonardo de Moura\n-/\nprelude\nimport init.category.applica"..., mtime=1497112814, module_stack=lean::rb_tree = {...})
    at /home/abliss/proj/lean/src/library/module_mgr.cpp:227
#10 0x008b1054 in lean::module_mgr::build_module (this=0xbebcc694, id="/home/abliss/proj/lean/library/init/category/default.lean", can_use_olean=true,
    module_stack=lean::rb_tree = {...}) at /home/abliss/proj/lean/src/library/module_mgr.cpp:208
#11 0x008b1904 in lean::module_mgr::build_lean (this=0xbebcc694, id="/home/abliss/proj/lean/library/init/default.lean",
    contents="/-\nCopyright (c) 2014 Microsoft Corporation. All rights reserved.\nReleased under Apache 2.0 license as described in the file LICENSE.\nAuthors: Leonardo de Moura\n-/\nprelude\nimport init.core init.logic "..., mtime=1497112814, module_stack=lean::rb_tree = {...})
    at /home/abliss/proj/lean/src/library/module_mgr.cpp:240
#12 0x008b1054 in lean::module_mgr::build_module (this=0xbebcc694, id="/home/abliss/proj/lean/library/init/default.lean", can_use_olean=true,
    module_stack=lean::rb_tree = {...}) at /home/abliss/proj/lean/src/library/module_mgr.cpp:208
#13 0x008b1904 in lean::module_mgr::build_lean (this=0xbebcc694, id="/home/abliss/proj/lean/library/data/bitvec.lean",
    contents="/-\nCopyright (c) 2015 Joe Hendrix. All rights reserved.\nReleased under Apache 2.0 license as described in the file LICENSE.\nAuthors: Joe Hendrix, Sebastian Ullrich\n\nBasic operations on bitvectors.\n\nTh"..., mtime=1497112814, module_stack=lean::rb_tree = {...})
    at /home/abliss/proj/lean/src/library/module_mgr.cpp:240
#14 0x008b1054 in lean::module_mgr::build_module (this=0xbebcc694, id="/home/abliss/proj/lean/library/data/bitvec.lean", can_use_olean=true,
    module_stack=lean::rb_tree = {...}) at /home/abliss/proj/lean/src/library/module_mgr.cpp:208
#15 0x008b2684 in lean::module_mgr::get_module (this=0xbebcc694, id="/home/abliss/proj/lean/library/data/bitvec.lean")
    at /home/abliss/proj/lean/src/library/module_mgr.cpp:368
#16 0x0073d04c in main (argc=2, argv=0xbebcd074) at /home/abliss/proj/lean/src/shell/lean.cpp:636
@abliss
Copy link
Author

abliss commented Jun 18, 2017

Here's a zipfile containing the core dump.

core.zip

#0  0x00bc8b10 in lean::scanner::next (this=0xbebcae80) at /home/abliss/proj/lean/src/frontends/lean/scanner.cpp:99
                                             99          m_curr = m_curr_line[m_spos];                                                                                                                       [Current thread is 1 (Thread 0xb6f4d000 (LWP 19129))]                                                                                                           (gd
(gdb) print m_spos
$1 = -2147483648
(gdb) print m_curr_line
$2 = ""

At a glance, it would appear that (1) scanner.m_spos is a signed int, when it should probably be unsigned, and (2) some bug has caused it to overflow during the scanning of some file.

Here's some more context on the exception that was being thrown:

(gdb) up                                                                                                                                                        #1  0x00bc8c02 in lean::scanner::throw_exception (this=0xbebcae80, msg=0xcd3a48 "unexpected token")                                                                 at /home/abliss/proj/lean/src/frontends/lean/scanner.cpp:119                                                                                                119             next();                                                                                                                                         
(gdb) print msg                                                                                                                                                 
$3 = 0xcd3a48 "unexpected token"                                                                                                                                
(gdb) print m_stream_name                                                                                                                                       
$4 = "/home/abliss/proj/lean/library/init/category/default.lean"                                                                                                
(gdb) print pos
$5 = {first = 8, second = 73}

Stepping through the live code in gdb, the problem seems to be that the test curr() != EOF never returns true, since m_curr is (char)255, and EOF seems to be an int valued at -1.

This does seem to be borne out by the compiler warning:

[ 62%] Building CXX object frontends/lean/CMakeFiles/lean_frontend.dir/scanner.cpp.o                                                                            
/home/abliss/proj/lean/src/frontends/lean/scanner.cpp: In member function ‘void lean::scanner::fetch_line()’:                                                   
/home/abliss/proj/lean/src/frontends/lean/scanner.cpp:79:20: warning: comparison is always false due to limited range of data type [-Wtype-limits]
                       if (m_curr == EOF) m_curr = 0;

For reference, here is the location of the EOF definition:

#define EOF 99999999
 ^
In file included from /usr/arm-linux-gnueabihf/include/stdio.h:74:0,
                 from /usr/include/c++/5/cstdio:42,
                 from /usr/include/c++/5/ext/string_conversions.h:43,
                 from /usr/include/c++/5/bits/basic_string.h:5249,
                 from /usr/include/c++/5/string:52,
                 from /usr/include/c++/5/bits/locale_classes.h:40,
                 from /usr/include/c++/5/bits/ios_base.h:41,
                 from /usr/include/c++/5/ios:42,
                 from /usr/include/c++/5/ostream:38,
                 from /usr/include/c++/5/iostream:39,
                 from /home/abliss/proj/lean/src/util/debug.cpp:8:
/usr/arm-linux-gnueabihf/include/libio.h:62:0: note: this is the location of the previous definition
 # define EOF (-1)
 ^

... and here's a small example showing the difference in architecture:

#include <iostream>                                                                                                                                             
int main() {
  char c = -1;
  return (c == EOF);
}

When I compile that with g++ -std=c++11, it returns 0 on my 32-bin armhf machine, and 1 on my amd64 machine (where lean seems to compile and run fine). I tried this with g++ 4.8, 4.9, and 5.4.

Actually here's an even smaller example:

int main() { return (char)-1== -1;}

This returns 1 on amd64 and 0 on armhf when simply compiled with g++. Conversely, this returns 1 and 0 the opposite way:

int main() { return (char)255 == 255;}

So I guess char is an unsigned type on armhf and a signed type on amd64. Who knew?

@abliss
Copy link
Author

abliss commented Jun 18, 2017

Anyway, it seems like the least invasive fix for this is to add -fsigned-char to the CXX_FLAGS in CMakeLists.txt. But I would suggest also enabling at least -Werror=type-limits . And the fact that this manifested as an overflow and segfault might point to safer ways to code the scanner's exception handling.

@leodemoura
Copy link
Member

@abliss Thanks for reporting the problem.

I checked the C++ documentation, and the meaning of char is platform dependent. I didn't know, I thought it was always signed :(
http://en.cppreference.com/w/cpp/language/types
Here is the relevant part:

signed char - type for signed character representation.
unsigned char - type for unsigned character representation. Also used to inspect object representations (raw memory).
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type). Multibyte characters strings use this type to represent code units. The character types are large enough to represent 256 different values (in order to be suitable for storing UTF-8 encoded data) (since C++14). The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed.

BTW, according to the standard EOF is a macro definition of type int that expands into a negative integral constant expression (generally, -1).

@gebner @Kha @jroesch I think we should go with @abliss' suggestion (i.e., add -fsigned-char) at least in the near future. What do you think?

@leodemoura
Copy link
Member

BTW, note that according to the standard there is no guarantee that EOF fits in a signed char.

@leodemoura
Copy link
Member

@abliss I don't have an ARM machine to make experiments.
Could you please try the following experiment?

  • Replace all occurrences of EOF with std::char_traits<char>::eof() at src/frontends/lean/scanner.cpp, and check whether it fixes the build problem.

@abliss
Copy link
Author

abliss commented Jun 18, 2017

Thanks @leodemoura , I'll give that a try and report back. Meanwhile, adding -fsigned-char allowed the build to progress (even with -Werror), but it eventualy failed again like this:

es/lean.dir/server.cpp.o                                                                                                                                        [ 71%] Building CXX object shell/CMakeFiles/lean.dir/leandoc.cpp.o
[ 71%] Linking CXX executable lean
[ 71%] Built target lean
[ 71%] Built target standard_lib
Scanning dependencies of target leanpkg
/home/abliss/proj/lean/library/init/core.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/default.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/leanpkg/leanpkg/main.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/core.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/default.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/leanpkg/leanpkg/resolve.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/core.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/default.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/leanpkg/leanpkg/manifest.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/core.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/default.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/leanpkg/leanpkg/toml.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/core.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/init/default.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/library/data/buffer/parser.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/leanpkg/leanpkg/toml.lean:1:0: error: corrupted binary file
/home/abliss/proj/lean/leanpkg/leanpkg/toml.lean:6:0: error: invalid import: init
could not resolve import: init

So it seems that there may be another arch-specific assumption buried in there somewhere.

@leodemoura
Copy link
Member

So it seems that there may be another arch-specific assumption buried in there somewhere.

I think it is the same assumption. The whole code base is assuming char is signed.

@leodemoura
Copy link
Member

Sorry, my previous comment doesn't make sense. The -fsigned-char should make sure char is signed in the whole system.

@leodemoura
Copy link
Member

@Kha @gebner @jroesch Do you have access to an ARM machine?

@abliss
Copy link
Author

abliss commented Jun 19, 2017

Yeah, I figured that the -fsigned-char should've fixed it too, but it seems that the serializer code is unable to re-read the coe.olean file that it wrote. I'm stepping through it now trying to figure out exactly where it went wrong.

BTW, if you want to experiment locally, you might have some luck installing the Android NDK on your x86 machine. I believe it comes with a cross-compiling toolkit and a QEMU emulator. (I'm running out of an ubuntu chroot on my quad-core Nexus 7; the qemu performance will likely be much slower.)

@abliss
Copy link
Author

abliss commented Jun 19, 2017

BTW is there documentation on the .olean format somewhere? I'm not sure whether it's writing the .oleans badly or just reading them badly. The .olean files generated on ARM don't match byte-for-byte the ones from x86, but I'm not sure whether they're supposed to be platform-independent (or even reproducible).

@leodemoura
Copy link
Member

@abliss .olean is based on serialization. Here is a potential red flag:
https://github.com/leanprover/lean/blob/master/src/util/serializer.cpp#L66-L77
The C++ standard does not guarantee that EOF fits in a char.
When you compile with -fsigned-char, does the following program return 1 on your 32-bin armhf machine?

#include <iostream>                                                                                                                                             
int main() {
  char c = -1;
  return (c == EOF);
}

@gebner
Copy link
Member

gebner commented Jun 19, 2017

@leodemoura The EOF value in the scanner is completely arbitrary (it just shouldn't be 0 or a character that we actually want to read). I'll change it to 0xFF.

I have now rebooted my Raspberry Pi for the first time in 2 years and will try to fix this issue:

Last login: Fri May 29 08:34:25 2015 from 10.57.0.247

@leodemoura
Copy link
Member

@gebner Thanks @gebner!

@gebner
Copy link
Member

gebner commented Jun 19, 2017

The problem already happens when reading init/core.olean. Bisecting shows that it happens at the psigma.has_sizeof instance. Weirdly enough, switching the sigma and psigma instances makes the error go away.

I'm now compiling a debug build with the latest changes. Each build takes about 8 hours on my rpi, so I'm afraid this will take a bit of time.

@leodemoura
Copy link
Member

I'm now compiling a debug build with the latest changes. Each build takes about 8 hours on my rpi, so I'm afraid this will take a bit of time.

@gebner Thanks again for your heroic efforts!

@abliss
Copy link
Author

abliss commented Jun 19, 2017

I tried reducing the library to just init/core.lean. If I run lean --make, it seems to occasionally produce a valid core.olean. But most of the time it is corrupt.

Is there expected variation in a produced .olean file (timestamps, random numbers, threading?) or should it be deterministic based on the inputs?

(I did notice that if I run it under valgrind on ARM, it reported a handful of operations on undefined data, but this was due to a bug in valgrind: https://lists.launchpad.net/touch-packages/msg70519.html ... after upgrading valgrind from 3.11 to 3.13, it seems to be running cleanly on ARM as well so far.)

I tried this experiment to answer my own question:

abliss@localhost:~/proj/lean/lib2/init$ while date; do 
  LEAN_PATH=.. lean test.lean;
  mv core.olean core.olean.$?.`date +%s`; 
  LEAN_PATH=.. lean --make -j1; 
done;

On ARM, This produced no successes, and a string of failures in 4 different flavors:

$ sha1sum core.olean.1.* | sort | cut -c 1-40 | uniq -c
      1 1d2086ceecc2d829ae70de961863ae8ad1a7b7b6
      4 c3bf280912c8bf4bc806562da7ec3ab85081f898
      6 ce35d173064a8d7c2dcace23366403a22f2c6bf5
     24 df408af14f8460f9fa738a840ba678f6334b74eb

(without the -j 1 flag, I seem to get different output every time.)

On amd64, this produced a string of 15 identical successes:

$ sha1sum core.olean.* | sort | cut -c 1-54 | uniq -c
     15 5d3d8c79ad61cdaa8b61c2161f43263f16453f74  core.olean.0

Moving one of these successful core.olean files to the ARM machine allowed it to run correctly, so it seems that deserialization is fine, but serialization is flaky.

I wonder if there might be a thread synch issue (even with -j1), which for some reason only affects ARM (different scheduler, or maybe just slower). Other than that I don't know what could be causing the nondeterministic corruption.

@gebner
Copy link
Member

gebner commented Jun 21, 2017

Thanks for figuring out that it works with the olean file generated on amd64. I have tried to add a few checks in the code to narrow down the problem.

From a top-down view, an olean file consists of a bit of metadata (lean version, module dependencies), followed by the actual content (a byte array) plus a hash code (which we don't check for performance reasons). For some reason, when reading the olean file, already this actual content part is corrupted and the hash code is wrong.

Is there expected variation in a produced .olean file (timestamps, random numbers, threading?)

Yes, mainly due to multi-threading. AFAICT we don't use random numbers or store timestamps. The variation comes mainly from the mk_fresh_name function, which generates globally unique names.

I wonder if there might be a thread synch issue (even with -j1), which for some reason only affects ARM

You can run lean with the -j0 option, which disables multi-threading (almost) completely. We use this option for deterministic tests. However the bug is there even with -j0.

Typically these non-deterministic threading bugs arise due to timing differences: the ARM machine is simply slower and hence hits some corner cases. We've had this situation with slow debug builds in the past as well.

@abliss
Copy link
Author

abliss commented Jun 21, 2017

Thanks for those details. I can confirm that the bug still happens for me with -j 0, and it happens in exactly the same place as you: psigma.sizeof (core.olean line 462), even though my quad-core ARM Nexus 7 sounds much faster than your RPi (debug build is well under an hour). Pretty strange!

@gebner
Copy link
Member

gebner commented Jun 23, 2017

This is terribly, terribly weird. We seem to lose 3 bytes when writing the blob section of the olean file. All of these bytes have the value 0xFF (there are other 0xFF bytes that make it through, though..).

Stepping through with the debugger, when writing, the blob has size 152240--this is consistent between amd64 and arm and also the size we write into the file. However 3 out of those 152240 bytes don't make it to the file. So when reading, we are 3 bytes short....

Here is the full diff:

[gebner:~/lean/library/init] master+* ± diff <(od -w1 -vAn -tx1z core.olean) <(od -w1 -vAn -tx1z core.olean.klikia)
15,18c15,18
<  0b  >.<
<  c6  >.<
<  7b  >{<
<  3e  >><
---
>  78  >x<
>  46  >F<
>  6f  >o<
>  fe  >.<
40960d40959
<  ff  >.<
57344d57342
<  ff  >.<
113032c113030
<  13  >.<
---
>  a6  >.<
136846,136847c136844,136845
<  15  >.<
<  85  >.<
---
>  44  >D<
>  92  >.<
136861,136862c136859,136860
<  15  >.<
<  82  >.<
---
>  44  >D<
>  8f  >.<
138994c138992
<  13  >.<
---
>  10  >.<
138998,138999c138996,138997
<  0f  >.<
<  8d  >.<
---
>  47  >G<
>  80  >.<
139009c139007
<  13  >.<
---
>  10  >.<
139013,139014c139011,139012
<  0f  >.<
<  8a  >.<
---
>  47  >G<
>  7d  >}<
139264d139261
<  ff  >.<
145170c145167
<  14  >.<
---
>  10  >.<
145174,145175c145171,145172
<  15  >.<
<  16  >.<
---
>  4b  >K<
>  37  >7<
145185c145182
<  14  >.<
---
>  10  >.<
145189,145190c145186,145187
<  15  >.<
<  15  >.<
---
>  4b  >K<
>  36  >6<

@johoelzl
Copy link
Contributor

johoelzl commented Jun 23, 2017

Are these 3 bytes the only ff bytes?

Even worse: they get lost at 0xXXX000 (4096) page boundaries!

@gebner
Copy link
Member

gebner commented Jun 23, 2017

Are these 3 bytes the only ff bytes?

That was my first thought as well, but no, other 0xFF bytes survive just fine.

Even worse: they get lost at 0xXXX000 (4096) page boundaries!

Good catch! Now the issue becomes interesting! Even better, it only happens at some 4096*n page boundaries, namely n=10, n=14, and n=34 ?!?

@gebner
Copy link
Member

gebner commented Jun 23, 2017

The 4096 byte boundaries align perfectly with the 8192 byte buffer that libstdc++ uses:

write(4, "oleanfile\0\3\2\1\377xFo\376\0\0\377\0\2R\260TK\0Prop"..., 8192) = 8192
write(4, "\2\377\0\0\4\1PInfo\0\0\347\200\nATTR\0\0i\377\0\0\3\350\0\347\0\0"..., 8192) = 8192
write(4, "\205\10\2x\342\202\201\0\4\0\1\10\2y\342\202\201\0\4\0\1\10\2x\342\202\202\0\4\0P\10"..., 8192) = 8192
write(4, "\377\0\0\7\262\7\0\205\0\0\272\0\377\0\0\7\271\7\0\205\0\0\222\7\2_\0\0\0\377\0\0"..., 8192) = 8192
write(4, "\0\26\0\377\0\0\nz\0\333\7\0\377\0\0\2z\4\0\377\0\0\nw\7\0\377\0\0\2{\0"..., 8191) = 8191
write(4, "\0\0\r\1\10\0\377\0\0\2\366\0\6\0\1\6\0\377\0\0\f\362\0\26\10\0\377\0\0\2\370\0"..., 8192) = 8192
write(4, "\377\0\0\3n\0\6\6\0\377\0\0\17\336\0\2\0\2\0\275\6\0\2\6\6\3\0\367\1\1\1\0"..., 8191) = 8191
write(4, "\0\0\3\277\4\0\377\0\0\21\356\10\0\377\0\0\3\275\4\0\377\0\0\21\353\10\0\377\0\0\3\277"..., 8192) = 8192
write(4, "\377\0\0\24N\0\272\0\272\2\377\0\0\4\1PInfo\0\0\377\0\0\4M\377\0\0\1\30"..., 8192) = 8192
write(4, "\6\3\4\0\377\0\0\4\275rec\0\2\1\1\0\1\1\4\0e\0\1\7\0\377\0\0\4\307\1"..., 8192) = 8192
write(4, "\0\377\0\0\31k\0\1\0\377\0\0\31l\0\377\0\0\30\234\7\0\377\0\0\5\37\4\0>\7\0"..., 8192) = 8192
write(4, "\4\0\377\0\0\5\237mk\0\1\0\377\0\0\1\332\0\26\0\2\10\0\377\0\0\5\243\0\6\0\377"..., 8192) = 8192
write(4, "\2\0\377\0\0\2\5\0\377\0\0\2\6\0\272\7\0\377\0\0\6\21\4\0\377\0\0\36\226\6\6\0"..., 8192) = 8192
write(4, "\3\350\0\377\0\0\6\200\0class\0\0\0\377\0\0\6\200gind\0\0\0\2\1\0"..., 8192) = 8192
write(4, "C\0\0\377\0\0\0075\0\377\0\0#m\2\0\4\0\377\0\0\0079\0\0\377\0\0\0078\0\0"..., 8192) = 8192
write(4, "\1\0e\1\10\0\377\0\0\7\326\0\0\4\0\377\0\0\n\376\1\0\377\0\0\7\333\10\0\377\0\0"..., 8192) = 8192
write(4, "\0\377\0\0\10<\0\0\t\0\t\0\1\0\2VMC\0\0\377\0\0\10;\0\377\0\0'h\6"..., 8191) = 8191
write(4, "\0\0\10\261\2\1\4\0e\1\4\0\326\0P\0\26\0\1\0\2\0\377\0\0)e\7\0\377\0\0"..., 8192) = 8192

Well, except for the times where it flushes a buffer with 8191 bytes instead of 8192?!?!

@gebner
Copy link
Member

gebner commented Jun 23, 2017

And I have a reproducible minimal test:

#include <iostream>
#include <fstream>

int main() {
  constexpr auto fn = "buffer_0xff.out";
  constexpr unsigned size = 1000000;

  { // write a million times 0xFF
    std::ofstream out(fn, std::ios_base::binary);
    for (unsigned i = 0; i < size; i++) {
      out.put(0xFF);
    }
  }

  auto actual = std::ifstream(fn, std::ios_base::ate | std::ios_base::binary).tellg();
  std::cout << "The file " << fn << " has size " << actual
    << " (expected: " << size << ")" << std::endl;
}

The output depends on -fsigned-char:

$ c++ -fsigned-char buffer_0xff.cc -o buffer_0xff && ./buffer_0xff
The file buffer_0xff.out has size 999877 (expected: 1000000)
$ c++ buffer_0xff.cc -o buffer_0xff && ./buffer_0xff
The file buffer_0xff.out has size 1000000 (expected: 1000000)

@gebner
Copy link
Member

gebner commented Jun 23, 2017

Upstream knows about this breakage: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47275

By changing the signedness of char you change the ABI and everything must use the same ABI.

Indeed, all the bets are off here unless the library is both built (installed) and then used with the same signedness for char. Thus, closing as invalid.

So, this means we must not use -fsigned-char. I've removed the flag in 7342741

We'll have to fix all places where we assume that char is signed.

@abliss
Copy link
Author

abliss commented Jun 23, 2017

Amazing. I get the exact same result on my device (999877 with -fsigned-char, 1000000 without). Nice detective work.

Sorry for suggesting -fsigned-char in the first place. I had originally worried it might break the ABI and cause problems with libstdc++, but then everything seemed to go so swimmingly, I figured there must be some magical reason it would be okay. I never expected it to cause this kind of bizarre failure.

@gebner
Copy link
Member

gebner commented Jun 25, 2017

@abliss The latest git version can now compile the standard library (well, half of it, my rpi is slow). Can you confirm that it compiles on your device, and could you please run the tests to make sure everything works?

@moritayasuaki
Copy link

I think curr() function should return int, because EOF is not char but int.
#1701

@abliss
Copy link
Author

abliss commented Jun 28, 2017

Thanks for the fix @gebner . Here's what I've got on my device at commit ce5ca7:

46/1146 Test #1146: lp_tst ......................................................   Passed    0.05 sec

99% tests passed, 3 tests failed out of 1146

Total Test time (real) = 37334.13 sec

The following tests FAILED:
          2 - leanchecker (Timeout)
        359 - leanruntest_all (Timeout)
        1071 - leanittest_complete_trailing_period.lean (Failed)
Errors while running CTest
Makefile:127: recipe for target 'test' failed

(btw it failed at first because make clean didn't delete the corrupt .olean files; is that expected?)

@gebner
Copy link
Member

gebner commented Jun 28, 2017

Here's what I've got on my device at commit ce5ca7:

Thanks so much for running the tests! I'll take a look at leanittest_complete_trailing_period.lean.

(btw it failed at first because make clean didn't delete the corrupt .olean files; is that expected?)

Yes, unfortunately there is no way to add additional commands to the clean target that is generated by CMake. You can call make clean-olean to clean the olean files.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants