New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite error on mips64el architecture #1469
Comments
Since we don't have access to this architecture (Travis does not support it), would you be able to try the latest We are preparing for a 0.8.3 release, so this would be the perfect time to fix any remaining problems. |
Hi Szabolcs,
thanks again for your always super fast response.
On Thu, Sep 17, 2020 at 02:24:40AM -0700, Szabolcs Horvát wrote:
Since we don't have access to this architecture (Travis does not support it), would you be able to try the latest `master` branch?
Theoretically I could but for the moment I'm swamped with work so I think this time consuming test is out of scope for the next couple of days / weeks for me.
The test suite has been improved since 0.8.2, so this problem may already been fixed.
We are preparing for a 0.8.3 release, so this would be the perfect time to fix any remaining problems.
I think if you have good reasons to assume that 0.8.3 will fix the problem it would be the easiest (=less time consuming way) on my side to simply try 0.8.3 and see what happens on all Debian architectures. If the issue will really remain there is probably no better way than doing the debugging (may be with the help of some mips64el porter).
Kind regards, Andreas.
|
I just double-checked, and unfortunately I could not find any changes that could have fixed this (though I may have missed some). This is likely a numerical precision issue (i.e. more of a problem with the test than the library). If you can retrieve |
I do have one idea about how to improve this test but it would still need to be tested ... there's no guarantees that it fixes this issue. |
On Thu, Sep 17, 2020 at 07:53:28AM -0700, Szabolcs Horvát wrote:
I do have one idea about how to improve this test but it would still need to be tested ... there's no guarantees that it fixes this issue.
I have asked for help here:
https://lists.debian.org/debian-mips/2020/09/msg00005.html
If this does not work I really need a couple of days to investigate myself.
|
Thank you so much for the help with this Andreas! There is no rush, if we don't fix it for 0.8.3, we'll fix it for the next release. I did spend some more time yesterday thinking about what may be going wrong, and I am actually quite puzzled: it seems unlikely that a numerical precision issue could cause this failure. Still, there is something that doesn't quite look right in the test. Normalization (the |
Just noting that the issue does not seem to be fully reproducible according to this message on the Debian-MIPS mailing list: |
On Wed, Sep 23, 2020 at 05:16:42AM -0700, Szabolcs Horvát wrote:
Just noting that the issue does not seem to be fully reproducible according to this message on the Debian-MIPS mailing list:
https://lists.debian.org/debian-mips/2020/09/msg00009.html
Ahhh, lazy me intended to forward this to github issue tracker - but was
obviously not fast enought. Thanks a lot for reading there yourself.
I'm tempted to say: Just release your next version and than we see what
might happen.
Kind regards, Andreas.
|
@tillea 0.8.3 is out now. When you have time, let us now how it fares on MIPS. |
Hi,
On Sat, Oct 03, 2020 at 12:44:36AM -0700, Szabolcs Horvát wrote:
@tillea 0.8.3 is out now. When you have time, let us now how it fares on MIPS.
Unfortunately we are not lucky here:
https://buildd.debian.org/status/fetch.php?pkg=igraph&arch=mips64el&ver=0.8.3%2Bds-1&stamp=1601805295&raw=0
keeps on saying
[igraph 0.8.3] testsuite: 228 failed
I guess you might really need tests/testsuite.log now, right?
Kind regards, Andreas.
|
Yes, that would be quite useful. But there is no hurry. |
@tillea Did you manage to retrieve the |
On Fri, Oct 23, 2020 at 04:56:34AM -0700, Szabolcs Horvát wrote:
@tillea Did you manage to retrieve the `testsuite.log` directory?
Not yet. I'm in real life mode - hopefully next week. Thanks a lot for your patience, Andreas.
|
Just another small reminder @tillea :-) If you don't have time now, I suggest closing this issue for now. MIPS seems to be a mostly defunct architecture, the issue is only intermittently reproducible, and we don't have a MIPS computer to test on. It seems it may not be worth the effort either on our or on your part. Still, if you can retrieve the logs, I'd like to take a look, in case they reveal a hidden issue that may affect other platforms too. In the meantime, a lot of housecleaning has been done for igraph. If we're lucky, the next version (0.9) won't have this problem. If it does, the issue can be re-opened at that time. |
Hi Debian Mips team,
the package igraph fails on mips64el autobuilder[1] with
...
228: Eigenvector centrality (igraph_eigenvector_centrality): FAILED (arpack.at:59)
...
ERROR: 264 tests were run,
1 failed unexpectedly.
2 tests were skipped.
## -------------------------- ##
## testsuite.log was created. ##
## -------------------------- ##
Please send `tests/testsuite.log' and all information you think might help:
To: <igraph@igraph.org>
Subject: [igraph 0.8.3] testsuite: 228 failed
You may investigate any problem if you feel able to do so, in which
case the test suite provides a good starting point. Its output may
be found below `tests/testsuite.dir'.
I tried to track down the issue on eller.debian.org to create the said
tests/testsuite.dir since upstream needs this to investigate.
Unfortunately I was not able to reproduce the issue there but got:
...
228: Eigenvector centrality (igraph_eigenvector_centrality): ok
...
## ------------- ##
## Test results. ##
## ------------- ##
263 tests were successful.
3 tests were skipped.
make[3]: Leaving directory '/home/tille/igraph-0.8.3+ds/tests'
Any idea why the package builds on eller but not on the autobuilders?
Kind regards, Andreas.
[1] https://buildd.debian.org/status/fetch.php?pkg=igraph&arch=mips64el&ver=0.8.3%2Bds-1&stamp=1601885094&raw=0
|
tests.zip |
Thanks @tillea! That result is super-weird: some of the results are zeros when they shouldn't be. Something is definitely broken, but this is going to be very hard to debug without access to a machine where the issue is reproducible. I also wonder if the issue is in igraph, or in ARPACK (which igraph uses for this specific calculation). |
Here's the diff so others don't have to download the archive: # -*- compilation -*-
228. arpack.at:57: testing Eigenvector centrality (igraph_eigenvector_centrality): ...
./arpack.at:59: ${CC} ${CFLAGS} ${abs_top_srcdir}/examples/simple/eigenvector_centrality.c -I${abs_top_srcdir}/include -I${abs_top_builddir}/include -L${abs_top_builddir}/src/.libs -ligraph -lm -o itest
./arpack.at:59: cat ${abs_top_srcdir}/examples/'simple/eigenvector_centrality.out' | sed "s/@VERSION@/$(cat ${abs_top_srcdir}/IGRAPH_VERSION)/g" > expout
./arpack.at:59:
./arpack.at:59: DYLD_LIBRARY_PATH=${abs_top_builddir}/src/.libs${DYLD_LIBRARY_PATH+:$DYLD_LIBRARY_PATH} LD_LIBRARY_PATH=${abs_top_builddir}/src/.libs${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH} ./itest ${SED_PIPE_CRLF2LF}
--- expout 2020-11-27 18:01:46.065124982 +0000
+++ /home/tille/igraph-0.8.3+ds/tests/testsuite.dir/at-groups/228/stdout 2020-11-27 18:01:46.113125752 +0000
@@ -1,3 +1,3 @@
- 1.0000 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005
+ 1.0000 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005 0.1005
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
228. arpack.at:57: 228. Eigenvector centrality (igraph_eigenvector_centrality): (arpack.at:57): FAILED (arpack.at:59) |
On Fri, Nov 27, 2020 at 12:00:38PM -0800, Szabolcs Horvát wrote:
Thanks @tillea! That result is super-weird: some of the results are zeros when they shouldn't be. *Something* is definitely broken, but this is going to be very hard to debug without access to a machine where the issue is reproducible.
The fact that it took me such a long time is a sign that I'm busy with lots of other stuff to provide more help than I did just now.
I also wonder if the issue is in igraph, or in ARPACK (which igraph uses for this specific calculation).
I need to admit that I have no idea about neither about igraph or ARPACK. I just remember that there was once an issue with ARPACK in conncetion with igraph.
Kind regards, Andreas.
|
Some update: I managed to boot a |
Apparently the issue is fixed in version 0.8.5. |
May be I was to fast: I confused ppc64el with mips64el . My mistake. Let see. |
No related code was touched in 0.8.5. I suspect that this isn't even an igraph problem, but an ARPACK one. |
igraph version 0.8.5 built well on mips64el arch: https://buildd.debian.org/status/package.php?p=igraph&suite=sid |
This is consistent with what I've seen on my machine in a QEMU |
Actually the issue is still there. The test randomly failed. |
I finally isolated the place where the issue emerges. It is indeed arpack issue. In src/arpack.c in function igraph_arpack_rssolve the dsaupd_ loop works fine Meanwhile the question is how to neutralize this issue. |
On Fri, Jan 29, 2021 at 09:21:10AM -0800, Jérôme Benoit wrote:
I finally isolated the place where the issue emerges. It is indeed arpack issue.
In src/arpack.c in function igraph_arpack_rssolve the dsaupd_ loop works fine
in the sense that the loop keeps giving the same output when the random seed is fixed.
The issue emerges at the postamble of the loop, dseupd_ gives random output in the same conditions.
I will submit a bugreport to the debian maintainer of arpack
Meanwhile the question is how to neutralize this issue.
Given that, are you still willing to implement a --without-scg option ?
Do you see any chance to provide a patch for arpack? I could forward it to the Debian arpack maintainer.
|
I can extract a C code that reproduces the issue, but not a patch as I am not familiar with the arpack library. |
A patch for arpack is the best path. I will give it a try. |
Meanwhile, I have noticed |
On Fri, Jan 29, 2021 at 10:41:26PM -0800, Jérôme Benoit wrote:
Meanwhile, I have noticed `#if HAVE_GFORTRAN` ... `#else` ... `#endif` in src/arpack.c but I could not figure out where `HAVE_GFORTRAN` is acually set. Is it dead code ?
I have no idea? What about opening an according issue arpack upstream?
|
I believe this is used by the R interface of igraph (just like all the |
Note also that |
@vtraag The error emerges after a call to dseupd_ which is an arpack function. |
I believe @vtraag was responding to Andreas about the |
My two cents: if Re the |
I am working a test case. |
I finally get a test case. I submitted the bug to the arpack[-ng Debian maintainer (#981646). It is a regression as the test works for the previous package 3.7.0. It might be a gcc-10 or a arpack 3.8.0 issue, but not a numerical issue. |
Awesome @jgmbenoit , thanks a lot, we are very grateful for your work on this! I'll keep an eye on the Debian issue. |
I forwarded the issue to upstream: #294. |
Wow, that response ... I am concerned about two things:
Here are things we can do:
Personally, I would recommend option 1. Thoughts @ntamas @jgmbenoit @tillea ? Also, how does this situation affect ARPACK-NG's inclusion in Debian, especially on MIPS? |
I am wondering, since ARPACK seems a bit unstable, and since sometimes it returns a completely wrong result (not the wrong eigenvector, but something that is not an eigenvector), would it make sense to verify the result in cases like eigenvector centrality? Computing eigenvectors is hard, but verifying them is easy. If the result is not as expect to some precision, show a warning to the user. Given the nature of numeric computation, there will always be pathological cases that it won't handle. So verification would be useful, regardless of this specific MIPS issue. But: Does ARPACK itself not verify the result? That would be strange. |
I am in favour of 1, namely, to remove igraph from the mips64el Debian distribution, at least temporarily (The issue can disappear as it appears). For a scientific software, it is reasonable. For a administrative software, it will be necessary to be able to build it (build + test + no serious issue) on all the architectures supported by Debian. A big umbrella scientific software like SageMath does not fulfil this criteria (see its current build status), but at least it runs on amd64 and arm64 computers. Solution 2 is not reasonable. Solution 3 will eat precious time for nothing. |
Thanks @jgmbenoit, sounds good! Hopefully this can be re-evaluated when a new ARPACK version (or Fortran compiler?) comes out. |
I have never been really convinced that we are using ARPACK correctly and not missing something obvious. After all, Octave uses ARPACK and does not seem to double-check ARPACK's results either. (I've spent lots of time reading the C code of Octave when I was trying to track down ARPACK-related errors in igraph back in the old days). Adding a check does not hurt (performance-wise) and it is simple enough to do, but I'd rather leave it for after 0.9; we still have enough stuff on our plate for that release. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
On Mon, Apr 05, 2021 at 01:07:29PM -0700, stale[bot] wrote:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Just pinging that issue, Andreas.
|
This problem seems to have appeared again in a different form, mentioned here by @jgmbenoit : This time, I extracted the values from the build log. The the "large unweighted graph", result are as at the end of this comment. Dividing both vectors by the first element (to have comparable normalization) shows that the only difference is where ARPACK returned zeros. The other vector elements are in fact the same. PRPACK result:
ARPACK results are:
|
Hi,
the Debian packaged version of igraph 0.8.2 has a build issue on mips64el architecture (and only on this architecture). You can find a full build log here. I recommend seeking for the string
in this longish build log. From my previous experience with igraph issues I know you usually are asking for the content of
tests/testsuite.dir
. If you need it in this case as well I could try to find a mips64el machine and fetch those information from that build but I have no way to access this "easily" for the moment.Kind regards, Andreas.
The text was updated successfully, but these errors were encountered: