Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SageMath fails to build on on Fedora 28 with gcc 8.1 #25391

Closed
sagetrac-Vaush mannequin opened this issue May 18, 2018 · 72 comments
Closed

SageMath fails to build on on Fedora 28 with gcc 8.1 #25391

sagetrac-Vaush mannequin opened this issue May 18, 2018 · 72 comments

Comments

@sagetrac-Vaush
Copy link
Mannequin

sagetrac-Vaush mannequin commented May 18, 2018

SageMath 8.3.beta1 building produced 3 different errors when run on Fedora 28 64bit with gcc 8.1, specifically when building the packages:

The 1st error manifests itself as a failure to import the crypt module during the build of Python 3.
It is due to an unspecified issue with Fedora 28 implementation of crypt(), which as reported in https://fedoraproject.org/wiki/Changes/Replace_glibc_libcrypt_with_libxcrypt is not the standard glibc implementation.

This was fixed in https://bugs.python.org/issue32635

Upstream: Fixed upstream, in a later stable release.

CC: @dimpase

Component: build

Keywords: gcc8.1 python3 fedora28 compilation build

Author: Dario Asprone

Branch/Commit: u/Vaush/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1 @ 35c3c33

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/25391

@sagetrac-Vaush sagetrac-Vaush mannequin added this to the sage-8.3 milestone May 18, 2018
@sagetrac-Vaush sagetrac-Vaush mannequin added c: build labels May 18, 2018
@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

Linbox patch

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

Attachment: multiple_templates.patch.gz

Python2 patch

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

Attachment: packing.patch.gz

Attachment: crypt.patch.gz

Python3 patch - 1 of 2

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

Python3 patch - 2 of 2

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

Dependencies: #25204,#25353

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

comment:1

Attachment: zzz_conf.patch.gz

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

comment:2

I added dependencies to #25204,#25353 which respectively fix the python 2 issue and the linbox (and fflas, which is not described here) issue.
The commit to be made will be based on them, and will only modify the python 3 package

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

New commits:

4c1223eFixed python 3.6.1 crypt issue on Fedora 28
cb566eafixing gcc-8.1 compilation errors.
b20fd48Merge remote-tracking branch 'trac/u/cpernet/fflas_and_linbox_broken_with_gcc_8_1_0' into t/25391/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1
d14acaeUpgrade to Python 2.7.15
473f214Merge remote-tracking branch 'trac/u/jdemeyer/upgrade_to_python_2_7_15' into t/25391/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 18, 2018

Commit: 473f214

@dimpase
Copy link
Member

dimpase commented May 19, 2018

comment:6

I'll be checking that this does not break on other systems.

@dimpase
Copy link
Member

dimpase commented May 19, 2018

comment:7

Have you run tests on your branch?: make ptest
(or make ptestlong for more tests)

This is something the ticket's author should do in such cases.
(It would not be surprising if something unrelated to your changes fails, as gcc 8 is pretty much untested, but good to know anyway).

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented May 19, 2018

comment:8

I haven't, but will do

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 25, 2018

Changed commit from 473f214 to 5be7adc

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 25, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

5be7adcMerge branch 'master' into t/25391/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1

@dimpase
Copy link
Member

dimpase commented Jun 20, 2018

comment:10

What is happening with this on the current 8.3.beta6?
Note that python2 has been updated.

@dimpase
Copy link
Member

dimpase commented Jun 20, 2018

comment:11

For Python3, it's apparently python/cpython#4691 which is still under review.

@jdemeyer
Copy link

comment:12

I am missing some context and documentation. The current branch only fixes Python 3. That is not consistent with the ticket description which mentions several packages and doesn't even mention the actual failure that the Python 3 patch is trying to fix.

@jdemeyer
Copy link

comment:13

The patch files should also contain some description, at a minimum a link to the upstream issue.

And if you add a patch to a package, you should bump the package version.

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 20, 2018

comment:14

Ok, I'm sorry if the description or the execution aren't up to par, this is my first ticket, so I'm getting used to it.

The ticket was originally opened to describe the bug. After that, I succeeded in manually fixing all the bugs listed in the description, but it was brought to my attention that two of my bugs were solved in separate tickets at the same time (see the dependencies field), so I merged those tickets and added my own fix for python3, thus fixing all the issues and solving the problem stated in the ticket title, that is SageMath not compiling on Fedora 28 with gcc 8.1.

Since the commits include the fixes for the other 2 bugs, even if by merging, I thought this counted as having a fix for all 3 of them in the branch, is this wrong?

There is a description for the python 3 bug, since the bug is literally "Python tries to call crypt instead of crypt_r on Fedora, and this for some reason breaks things, so let's allow it to call crypt_r", and the reason why it breaks things is, as pointed out by dimpase, apparently that fedora has a completely different crypt implementation, as can be seen here https://fedoraproject.org/wiki/Changes/Replace_glibc_libcrypt_with_libxcrypt.

I'll expand on the description a bit, anyway, to make this clearer, and also to reflect the fact that this ticket only directly fixes the python3 bug, since it's only reported in the comments.
I'll also quickly add a description and a link to the patch

@sagetrac-Vaush

This comment has been minimized.

@dimpase
Copy link
Member

dimpase commented Jun 21, 2018

comment:36

Replying to @sagetrac-Vaush:

I can't post the code because I didn't use any, I just stepped through the code with gdb. I checked the stack trace for the segfault, set a breakpoint to the line that called the function which calls crypt, and then stepped through manually to understand what was happening.

I can try and get some proof from gdb later in the day, but there's not much more I can do.

My understanding is that crypt returns a pointer to a statically allocated location. Indeed, its man page says: return value points to static data whose content is overwritten by each call.
Whereas crypt_r does something else, as such design as for crypt is not thread-safe.

Perhaps this unusual design of crypt is the reason the breakage is not bigger?

@dimpase
Copy link
Member

dimpase commented Jun 21, 2018

comment:37

In fact, the call to Py_BuildValue is

return Py_BuildValue("s", crypt(word, salt));

(where we see a warning in the logs).

Here "s" stands for string
It would be very interesting to see what values of word and salt are used.
Otherwise what length it assumes by default is anyone's guess (perhaps that's why you see some bits chopped off). Indeed, it does

Convert a null-terminated C string to a Python str object using 'utf-8' encoding. 

Note that Python 2 does not do utf-8 here, so this might be a problem.

How about you try modifying the patch as I suggest in comment 31 (removing crypt_r-related chunk)? This would call crypt, but supply a correct prototype.

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:38

I can do that later if you want, but it's not the string that gets chopped, it's the string's address.

@dimpase
Copy link
Member

dimpase commented Jun 21, 2018

comment:39

What I don't know is what is supposed to happen on the import attempt. Does crypt get called at all? This does not seem to be necessary, as the import is about setting up a structure to call crypt from Python, not about calling it with any values.

Did you see crypt actually being called? And with what values of word and salt?
This is something that is totally unclear. If they are set to bad values, or just left uninitialised, stuff can happen...

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:40

I remember them being set up correctly, and the call worked. Still, I am recompiling the whole of sage with debug symbols, I'll step through it again and report when I have news

@jdemeyer
Copy link

comment:41

Replying to @sagetrac-Vaush:

and then stepped through manually to understand what was happening.

Posting the actual output from gdb would have been useful here...

@jdemeyer
Copy link

comment:42

Replying to @dimpase:

Here "s" stands for string
It would be very interesting to see what values of word and salt are used.

That's what my debugging patch attachment: crypt_debug.patch does.

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:43

Replying to @jdemeyer:

Replying to @sagetrac-Vaush:

and then stepped through manually to understand what was happening.

Posting the actual output from gdb would have been useful here...

Yes, the point is that I don't have it anymore, since I did that 2 or 3 weeks ago. As I said, I am now recompiling sage with debug symbols to redo it and post the output.

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

Attachment: gdb_screenshot.png

Screenshot of gdb's output

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:44

Ok, for some reason I can't seem to step into crypt anymore, while earlier I could see the fact that crypt on my system was a wrap around crypt_r.

Anyway, I used jdemeyer's patch, and I'm attaching gdb's output.

What happens is that Python now segfaults on the printing attempt.
I don't have any way of confirming what I said earlier about the value being returned by crypt being fine, but causing segfaults when accessed outside of it.

@dimpase
Copy link
Member

dimpase commented Jun 21, 2018

comment:45

Could you post the output of ldd called on the corresponding *.so file?

Did you use a custom-built crypt library at your 1st attempt?

And, finally, could you add a print to show the value of the pointer res
before the print that causes the crash? (E.g. it could return NULL, or some
obviously invalid value...)

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:46

I don't remember using any custom-built library, so the answer would be no.

This is the output of ldd on the built python executable and on libcrypt:

[vaush@jesu-c290 src]$ ldd /lib64/libcrypt.so.1
	linux-vdso.so.1 (0x00007ffe913d6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f32ada5e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f32ae046000)
[vaush@jesu-c290 src]$ ldd ./python
	linux-vdso.so.1 (0x00007ffd247ec000)
	libpython3.6dm.so.1.0 => not found
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd8b8512000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fd8b830e000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fd8b810b000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fd8b7d77000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fd8b79b8000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd8b8731000)

Finally, no, I can't print anything about it, unless you mean before the return value of crypt is assigned to it, since we introduced it with jdemeyer's patch.

@dimpase
Copy link
Member

dimpase commented Jun 21, 2018

comment:47

Replying to @sagetrac-Vaush:

I don't remember using any custom-built library, so the answer would be no.

This is the output of ldd on the built python executable and on libcrypt:

no, I mean the crypt..so or _crypt..so or whatever it is called; on one system I have it's here:

$ ldd local/lib/python3.6/lib-dynload/_crypt.cpython-36m-x86_64-linux-gnu.so
        linux-vdso.so.1 =>  (0x00007fffdff5e000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007da738262000)
        libpython3.6m.so.1.0 => /home/dima/sagetrac-mirror/local/lib/libpython3.6m.so.1.0 (0x00007da737d32000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007da737b15000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007da73774b000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007da737547000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007da737344000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007da73703b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007da73869c000)

Finally, no, I can't print anything about it, unless you mean before the return value of crypt is assigned to it, since we introduced it with jdemeyer's patch.

I meant, certainly, that you modify Jeroen's patch so that you print the value of the
pointer res (i.e. the address) after the call to crypt(), before printing the
string it points to (the latter print crashes).

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:48

Ok, sorry for the misunderstandings.

Here is the output of ldd:

vaush@jesu-c290:src$ ldd build/lib.linux-x86_64-3.6-pydebug/_crypt.cpython-36dm-x86_64-linux-gnu.so
	linux-vdso.so.1 (0x00007ffeca7f6000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fe30d625000)
	libpython3.6dm.so.1.0 => /run/media/vaush/SageMath/SageMath/local/var/tmp/sage/build/python3-3.6.1.p1/src/libpython3.6dm.so.1.0 (0x00007fe30d0c4000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe30cea5000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe30cae6000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fe30c8e2000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fe30c6df000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fe30c34b000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe30da50000)

While now python outputs this during compilation:

[python3-3.6.1.p1] crypt('', '$6$Ukf2OXfG3jHYFFCX')
[python3-3.6.1.p1] Address = '0xc4c020'

As you can see, the address is compatible with the theory I presented, that in some way the value of the pointer gets chopped when coming out of crypt. Still no success in stepping into crypt, but I don't see why, it was so easy the first time around...

@dimpase
Copy link
Member

dimpase commented Jun 21, 2018

comment:49

Replying to @sagetrac-Vaush:

Still no success in stepping into crypt, but I don't see why, it was so easy the first time around...

I don't see how you'd step in a system library in a meaningful way - they usually are stripped, you won't see any symbols.

Have a look at https://fedoraproject.org/wiki/StackTraces#Installing_debuginfo_RPMs_using_DNF

@sagetrac-Vaush
Copy link
Mannequin Author

sagetrac-Vaush mannequin commented Jun 21, 2018

comment:50

I installed the debug infos already, to no avail.
Anyway, I know that it shouldn't be possible, but the first time around I just naively did it, and stepped into the function in some way

@jdemeyer
Copy link

comment:51

I have a theory. The implicit declaration of function 'crypt' warning causes GCC to guess the return type as int (which is 32 bits). Now in reality the return type is char * (64 bits). So that might explain the zeroed bits.

If this theory is correct, then simply adding

#include <crypt.h>

should fix the problem.

Could you try just adding that line to _cryptmodule.c?

@dimpase
Copy link
Member

dimpase commented Jun 22, 2018

comment:52

Replying to @jdemeyer:

I have a theory. The implicit declaration of function 'crypt' warning causes GCC to guess the return type as int (which is 32 bits). Now in reality the return type is char * (64 bits). So that might explain the zeroed bits.

If this theory is correct, then simply adding

#include <crypt.h>

should fix the problem.

Could you try just adding that line to _cryptmodule.c?

Or, as I have been saying already, just change the patch you are adding so that it drops crypt_r chunk, as it would be precisely the same thing...
Anyhow, this theory is easy to confirm:

/* cry.c */
#include <stdio.h>
#ifdef CRYP
#include <crypt.h>
#endif
int main()
{
char *word="blah";
char *salt="foo";
char *res;
res = crypt(word, salt);
printf("%s\n",res);
return 0;
}

Now trying this out, with the right include:

$ gcc -DCRYP cry.c -lcrypt
$ ./a.out 
fo9NXBQQtNBSA

and without include

$ gcc cry.c -lcrypt
cry.c: In function ‘main’:
cry.c:10:7: warning: implicit declaration of function ‘crypt’ [-Wimplicit-function-declaration]
 res = crypt(word, salt);
       ^~~~~
cry.c:10:5: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
 res = crypt(word, salt);
     ^
$ ./a.out 
Segmentation fault

Morale: pay attention to compiler warnings :-)

@jdemeyer
Copy link

comment:53

Replying to @dimpase:

Morale: pay attention to compiler warnings :-)

Absolutely. I was totally confused earlier because the upstream PR incorrectly states "Because crypt() only depends on primitive C types, we currently get
away with calling it without it being declared."

@jdemeyer
Copy link

comment:54

This should be fixed in 3.7.0, so we should just upgrade when that version comes out.

@jdemeyer
Copy link

Changed upstream from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release.

@jdemeyer

This comment has been minimized.

@jdemeyer jdemeyer removed this from the sage-8.3 milestone Jun 22, 2018
@vbraun
Copy link
Member

vbraun commented Jul 4, 2018

comment:55

I made #25771 to update to Python 3.6.6 which contains the backported crypt fix

@sagetrac-Vaush

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants