New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite issues on different Debian release architectures #1370
Comments
Can you please upload the full contents of |
On Wed, Apr 15, 2020 at 11:44:09AM -0700, Szabolcs Horvát wrote:
Can you please upload the full contents of `tests/testsuite.dir`?
As I tried to express: I have no access to these data of Debian autobuilders but I try to do a build on an i386 emulation and will upload the result of this (hopefully reproducing the same issues).
|
This is the build in i386 emulation. It has the same tests failing as on the real hardware. |
Thanks for the report Andreas! The bug you linked seemed to suggest that these failures started to happen only with 0.8.1. That would be strange, as these functions were not touched, at least not in any meaningful way. If it is not too much trouble, could you please try the same with 0.8.0 on the very same system (same hardware, same compiler)? |
Was there anything else that changed, other than going from igraph 0.8.0 -> 0.8.1? I assume you built igraph with all external packages (external arpack, lapack, blas, etc.) Did those dependencies change? Since some of these look like they may be due to numerical issues, could the compiler (different compiler version or different optimizations) be at least partly responsible? |
This problem was already observed previously in issue #678. There, the discussion lead us to relax the testing because it does not seem to be possible to guarantee consistent output always (but feel free to suggest possibilities). We have not yet gotten around to correcting this. I presume this is necessary to be able to include the new |
On Wed, Apr 15, 2020 at 12:52:06PM -0700, Vincent Traag wrote:
This problem was already observed previously in issue #678. There, the discussion lead us to relax the testing because it does not seem to be possible to guarantee consistent output always (but feel free to suggest possibilities). We have not yet gotten around to correcting this. I presume this is necessary to be able to include the new `igraph` release in Debian? If so, we should treat this with a higher priority.
I can confirm that it really affects the Debian distribution. The build that passed before were in 0.7.1. If you do not change the tests the new 0.8.? series will not migrate to Debian testing and will finally not be released with the stable distribution.
Kind regards, Andreas.
|
@tillea A lot of the affected code was already present in 0.7.1, so it is surprising that the tests passed before, but they are failing now. If it is not too much trouble, could you please help figure out what exactly changed? I strongly suspect that it is not just igraph, but it's the way you build it (dependencies, compiler, hardware). I can't speak for others, but personally I am stuck at home with limited access to different systems to test on, which makes it difficult to get to the bottom of the problem. Still, I tried it on whatever I could get access to. All tests pass on:
|
On Thu, Apr 16, 2020 at 02:40:47AM -0700, Szabolcs Horvát wrote:
@tillea A lot of the affected code was already present in 0.7.1, so it is surprising that the tests passed before, but they are failing now. If it is not too much trouble, could you please help figure out what exactly changed? I strongly suspect that it is not just igraph, but it's the way you build it (dependencies, compiler, hardware).
I can't speak for others, but personally I am stuck at home with limited access to different systems to test on, which makes it difficult to get to the bottom of the problem. Still, I tried it on whatever I could get access to. All tests pass on:
Speaking about me: I have my fingers in about 1000 packages in Debian. Currently those packages that are highly relevant for COVID-19 have high priority. I have no time to dig deeper into this issue. If the solution is simply relaxing the testing criteria a bit (which is completely in line with your suspicion that compiler or hardware might have changed meanwhile) than please do so and we try uploading again.
Thanks a lot for your understanding, Andreas.
|
My proposal: let's try to get this sorted out for i386 in some simulated environment (if we can reproduce this) and try the build again if it passes on i386. Maybe the relaxations that we build into the tests would also make them pass on other architectures. @szhorvat when testing on Raspbian, did you build it with external LAPACK/ARPACK libraries? If so, did you use the official packages in Raspbian or did you use some other, alternative LAPACK/ARPACK implementation? @tillea I don't have much experience with simulated i386 environments - is there a quickstart guide somewhere on the Debian pages that describes how to do this? I can try Virtualbox on my Mac or QEMU on an amd64-based Linux server at the university if any of these are feasible options. |
@ntamas No, I didn't build with external packages. This issue is almost certainly not because of i368, but because of using external packages. I do see some failures on macOS when using all external packages. |
It would be better to start with fixing up the tests so that they do not depend on the behaviour of the external dependencies (when those behaviours are correct). I suggest doing this first, and leaving the i386 stuff for later. Doing this would take quite a bit of work because none of us have a full understanding of all functionality included in igraph. E.g., several of the SCG tests fail. One would need to read up on spectral coarse graining before one can understand if the failure is a true problem, or just an issue with the test being too specific. |
There might be several problems, and one of them is related to i386: |
@AdrianBunk Do you know if x87 FPU instructions are supported at all on x86_64? On macOS, if I use
I do not know if this is a limitation of clang, a limitation of macOS, or a limitation of the x86_64 instruction set. If it were possible to use these instructions even in 64-bit mode, that would make it much easier for us to fix these problems. |
I split off the x87 issues into #1371, also to reduce the noise in this thread. |
Ignore these symbol errors, this is a Debian packaging issue. Debian has a way to track what symbols a shared library provides and when they were introduced, and can generate versioned dependencies in packages using the library based on the versions when the symbols used from the library were introduced instead of enforcing a dependency on the latest version of the library. |
Since i386-specific problems moved elsewhere, the problem in question is: Failing architectures for that:
The ppc failures alone cover big/little endian and 32/64 bit, there is no obvious pattern which architectures fail. |
@tillea Small note: I believe you do not need to pass |
On Thu, Apr 16, 2020 at 03:33:28AM -0700, Szabolcs Horvát wrote:
@ntamas No, I didn't build with external packages. This issue is almost certainly not because of i368, but because of using external packages. I do see some failures on macOS when using *all* external packages.
Debian is definitely using all external packages that are packaged for Debian.
|
On Thu, Apr 16, 2020 at 03:18:10AM -0700, Tamás Nepusz wrote:
My proposal: let's try to get this sorted out for i386 in some simulated environment (if we can reproduce this) and try the build again if it passes on i386. Maybe the relaxations that we build into the tests would also make them pass on other architectures.
I think this is a pretty sensible approach.
@tillea I don't have much experience with simulated i386 environments - is there a quickstart guide somewhere on the Debian pages that describes how to do this? I can try Virtualbox on my Mac or QEMU on an amd64-based Linux server at the university if any of these are feasible options.
The Debian build process is usually done in a clean chroot installing those packages only that are needed for the build. There is some way (which I forgot, sorry) to create a multiarch-i386 chroot to build for i386 on amd64 machine. I think you are better served with QEMU (which I don't have experience with) than with such a chroot. Let me know if I should dig for this information anyway.
Kind regards, Andreas.
|
On Thu, Apr 16, 2020 at 07:15:31AM -0700, Szabolcs Horvát wrote:
@tillea Small note: I believe you do not need to pass `--with-external-f2c` to `./configure` when you already have `--with-external-blas --with-external-lapack --with-external-arpack`. f2c simply won't be used.
Fixed in Git. Thanks for the hint, Andreas.
|
They are downloaded and installed at the beginning in the build logs at https://buildd.debian.org/status/package.php?p=igraph&suite=sid
|
Not sure whether it is related to any of the test failures, but some of the compile warnings in the build logs seem to point at real bugs.
Something like this or code causing |
Thanks @AdrianBunk , nice catch! I'll fix this soon, although it's probably unrelated (it is triggered only when reading GML files, and none of the problematic test cases do that). I have fixed two test case failures on my computer (for the external ARPACK case) in e3f48fe and 55f88af; these will be cherry-picked back to the master branch soon. |
f03d914 fixes the issue raised by @AdrianBunk ; c7fa487 probably fixes test case 128 (129 in the develop branch). Test case 251 is most likely fixed as well. Test case 195 seems to be unrelated to BLAS / LAPACK / ARPACK (the Infomap algorithm uses none of these) so we need to keep on investigating there. |
On Thu, Apr 16, 2020 at 12:29:45PM -0700, Tamás Nepusz wrote:
f03d914 fixes the issue raised by @AdrianBunk ; c7fa487 probably fixes test case 128 (129 in the develop branch). Test case 251 is most likely fixed as well. Test case 195 seems to be unrelated to BLAS / LAPACK / ARPACK (the Infomap algorithm uses none of these) so we need to keep on investigating there.
Thanks a lot for working on this! Andreas.
|
I believe that Infomap uses some eigenvector to get the stationary probabilities for a random walk. Presumably that then also relates to some of the external libraries? |
Hmmm, you may be right. Nevertheless, I still need to reproduce the issue somehow before I can proceed further. I'll try an i386 chroot somewhere and see if the bug surfaces there. |
Managed to reproduce the failures in an i386 chroot. Most of these (the ones that I actually understand) seem to be numerical inaccuracies and/or changes in labeling of clusterings that are semantically equivalent. I'll try to make them pass by modifying the test cases. |
On Fri, Apr 17, 2020 at 09:30:09AM -0700, Tamás Nepusz wrote:
Managed to reproduce the failures in an i386 chroot. Most of these (the ones that I actually understand) seem to be numerical inaccuracies and/or changes in labeling of clusterings that are semantically equivalent. I'll try to make them pass by modifying the test cases.
Thanks for this effort and the good news, Andreas.
|
@tillea All tests pass now on i386 with the recent patches I have committed to the Is there a way to re-run all the tests on the Debian build architecture without making a new upstream release first? Say, if I send you a tarball with these 9 patches, can they be applied in the Debian build process temporarily until we have a new release? We can push this through in a few days if the patches seem to work in the Debian build environment as well. |
On Mon, Apr 20, 2020 at 07:01:29AM -0700, Tamás Nepusz wrote:
@tillea All tests pass now on i386 with the recent patches I have committed to the `develop` branch -- these will be merged back to `master` soon as well.
Great.
Is there a way to re-run all the tests on the Debian build architecture without making a new upstream release first? Say, if I send you a tarball with these 9 patches, can they be applied in the Debian build process temporarily until we have a new release? We can push this through in a few days if the patches seem to work in the Debian build environment as well.
I could use your patch inside the package as `quilt patch` and upload. This would give us the full matrix of the build architecture failures.
Kind regards
Andreas.
|
That's great -- here are the patches: patches.tar.gz. |
On Mon, Apr 20, 2020 at 08:39:39AM -0700, Tamás Nepusz wrote:
That's great -- here are the patches: [patches.tar.gz](https://github.com/igraph/igraph/files/4504539/patches.tar.gz).
I'll do my best - but it might be it will take me until tomorrow to apply this. Thanks a lot in any case, Andreas.
|
On Mon, Apr 20, 2020 at 08:39:39AM -0700, Tamás Nepusz wrote:
That's great -- here are the patches: [patches.tar.gz](https://github.com/igraph/igraph/files/4504539/patches.tar.gz).
Just uploaded including your patches. Feel free to observe
https://buildd.debian.org/status/package.php?p=igraph
in the next couple of hours when the autobuilders will pick the new upload.
Thanks a lot for your cooperation, Andreas.
|
Hi, |
Okay, this remaining issue will be easy to fix; it is simply a printing issue (the test case prints |
I'm a bit baffled by this; I have no idea how the test case managed to end up with |
On Tue, Apr 21, 2020 at 01:31:11PM -0700, Tamás Nepusz wrote:
I'm a bit baffled by this; I have no idea how the test case managed to end up with `-0+0i`, but here's a stab in the dark: 6fb8f81 . I cannot reproduce the issue in my i386 chroot so I don't know whether it would work or not.
I can only confirm that the test ended up perfectly the same on arm64. I'll try this patch and let you know. Thanks for working on this.
|
On Tue, Apr 21, 2020 at 01:31:11PM -0700, Tamás Nepusz wrote:
I'm a bit baffled by this; I have no idea how the test case managed to end up with `-0+0i`, but here's a stab in the dark: 6fb8f81 . I cannot reproduce the issue in my i386 chroot so I don't know whether it would work or not.
For me it remains:
cat tests/testsuite.dir/251/testsuite.log
# -*- compilation -*-
251. scg.at:71: testing SCG of a graph, stochastic matrix (igraph_scg) : ...
./scg.at:73: ${CC} ${CFLAGS} ${abs_top_srcdir}/examples/simple/scg2.c -I${abs_top_srcdir}/include -I${abs_top_builddir}/include -L${abs_top_builddir}/src/.libs -ligraph -lm -o itest
./scg.at:73: cat ${abs_top_srcdir}/examples/'simple/scg2.out' | sed "s/@Version@/$(cat ${abs_top_srcdir}/IGRAPH_VERSION)/g" > expout
./scg.at:73:
./scg.at:73: DYLD_LIBRARY_PATH=${abs_top_builddir}/src/.libs${DYLD_LIBRARY_PATH+:$DYLD_LIBRARY_PATH} LD_LIBRARY_PATH=${abs_top_builddir}/src/.libs${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH} ./itest
--- expout 2020-04-22 06:17:00.675501581 +0000
+++ /build/igraph-0.8.1+ds/tests/testsuite.dir/at-groups/251/stdout 2020-04-22 06:17:00.683501786 +0000
@@ -47,7 +47,7 @@
0+0i
0.316228+0i
-0.316228+0i
-0+0i
+-0+0i
0.365148+0i
0.365148+0i
0.365148+0i
@@ -109,7 +109,7 @@
0.316228+0i 0+0i
0.316228+0i 0.316228+0i
0.316228+0i -0.316228+0i
-0.316228+0i 0+0i
+0.316228+0i -0+0i
0.316228+0i 0.365148+0i
0.316228+0i 0.365148+0i
0.316228+0i 0.365148+0i
251. scg.at:71: 251. SCG of a graph, stochastic matrix (igraph_scg) : (scg.at:71): FAILED (scg.at:73)
If all else fails we could even simply skip this single test.
|
Okay, I think I'll just bite the bullet and write a custom matrix printer function for the test cases that replaces all occurrences of |
@ntamas Maybe a macro in the test utilities would help? Something like:
Then this could be used in all functions that print |
@ntamas On my machine, if |
I suspect that it won't help here; what probably happens is that we end up with a small value in the real or the imaginary part of the complex number that is smaller than the printing accuracy but larger than the epsilon value that I use to force "small" values to zero. Therefore, the number would not satisfy the |
Could be, but I'd expect most tiny numbers to print like |
@tillea I hope this will be the last one: https://github.com/igraph/igraph/commit/ead7e0c59b8bb1745e0e50c892d986f9ccdb7f4f.patch Can you please try adding this to the patchset? It should explicitly avoid printing |
On Thu, Apr 23, 2020 at 01:30:22PM -0700, Tamás Nepusz wrote:
@tillea I hope this will be the last one: https://github.com/igraph/igraph/commit/ead7e0c59b8bb1745e0e50c892d986f9ccdb7f4f.patch
Can you please try adding this to the patchset? It should explicitly avoid printing `-0` in the test case.
Seems to work. I'll upload to the autobuilders network soon and will observe
https://buildd.debian.org/status/package.php?p=igraph
in a couple of hous quickly.
Thanks a lot, Andreas.
|
Yay, passed everywhere except the If this is okay, we will release 0.8.2 soon and then Debian can ship an unpatched 0.8.2 release to get the same result. |
On Fri, Apr 24, 2020 at 07:40:40AM -0700, Tamás Nepusz wrote:
Yay, passed everywhere except the `kfreebsd` ones - should we be worried about those? (It is due to missing build deps).
Yes, you can not do anything about missing Build-Depends.
If this is okay, we will release 0.8.2 soon and then Debian can ship an unpatched 0.8.2 release to get the same result.
That would be nice. However, if you plan any more changes - the patches do not really hurt in practice. So take your time and feel free to close this issue.
Thanks a lot for your support, Andreas.
|
Hi,
there is a new bug report concerning different Debian release architectures. For example the arm64 build log is mentioning the following errors:
There are even more errors on i386 architecture. I'll try to reproduce these by emulating that architecture and will attach this in another post to this bug report.
Kind regards, Andreas.
The text was updated successfully, but these errors were encountered: