Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken OpenBLAS on Ubuntu 20.04 #2437

Closed
conradsnicta opened this issue Jun 1, 2020 · 19 comments
Closed

broken OpenBLAS on Ubuntu 20.04 #2437

conradsnicta opened this issue Jun 1, 2020 · 19 comments

Comments

@conradsnicta
Copy link
Contributor

conradsnicta commented Jun 1, 2020

This is a heads up in case other people run into this problem.

It appears that OpenBLAS 0.3.8 shipped with Ubuntu 20.04 is broken. Many armadillo tests just freeze up. With mlpack tests you may also get wrong results and strange warnings. For example:
** On entry to DLASCL parameter number 4 had an illegal value
** On entry to DLASCL parameter number 5 had an illegal value

Some of the problems can be worked around by removing all default libopenblas packages and installing only the libopenblas-openmp-dev package. However this doesn't solve all issues.

The real fix is to first remove all libopenblas packages and then manually upgrade to OpenBLAS 0.3.9 (or later versions), which seems to work correctly. Armadillo will need to be reinstalled manually in order to make use of the upgraded OpenBLAS.

Extract of relevant changes in the OpenBLAS 0.3.9 changelog:

  • Fixed a miscompilation of the GETRF functions with CMAKE
  • Fixed a long-standing error with undeclared register clobbers in the DSCAL microkernel for Haswell, SkylakeX and Zen exposed by gcc 9.2
  • Fixed a long-standing bug in the SSE implementation of the IAMAX functions

Not sure if the folks at Ubuntu do proper QA before making LTS releases, especially for central software like OpenBLAS.

To (partly) detect if a broken OpenBLAS is used during the build of Armadillo, I've added the build time option BUILD_SMOKE_TEST to armadillo's cmake installer. This is part of the recent Armadillo
9.880 release. Example usage:

tar xf armadillo-9.880.1.tar.xz
cd armadillo-9.880.1
./configure -DBUILD_SMOKE_TEST=ON
sudo make install
ctest

If a broken OpenBLAS is present, ctest should freeze and never finish.

@eddelbuettel
Copy link
Contributor

Thanks for reporting it. It also bubbled a few weeks ago on the r-sig-debian list (for R users on Debian/Ubuntu, run by the R project) [1] -- after which I reported it on debian-science (list for Debian developers working on packages like R, Octave and others) [2] and the bug system. The bug thread is the most comprehensive with the clearest tracking: https://bugs.debian.org/961725

The issue, from what we now know, is

  • libopenblas-dev has a high default for libopenblas-pthread-dev which pulls in libopenblas0-pthread
  • this library locks the system over a mutex in BLAS code immediately, simplest example may be
    Rscript -e 'example(solve)'
  • the problem can be avoided by switching out libopenblas-phread-dev for libopenblas-openmp-dev and removing libopenblas0-pthread as it works with libopenblas0-openmp
  • the problem does not affect the Debian package from which the Ubuntu package is built
  • one compiler flag seems at the core of this
  • the Ubuntu package may hence be rebuilt
    Most recent/accurate and focused discussion at the bug tracker / mail list. Everybody can follow-up there too.

[1] Initial thread on r-sig-debian first wrongly attributing it to viridis: April thread https://stat.ethz.ch/pipermail/r-sig-debian/2020-April/003159.html and May thread https://stat.ethz.ch/pipermail/r-sig-debian/2020-May/003169.html. Another thread in May: https://stat.ethz.ch/pipermail/r-sig-debian/2020-May/003173.html
[2] April thread https://lists.debian.org/debian-science/2020/04/msg00081.html and May thread https://lists.debian.org/debian-science/2020/05/msg00000.html

@conradsnicta
Copy link
Contributor Author

Further info on the linker option used by Ubuntu which seems to be causing this problem (with thanks to @martin-frbg):
https://software.intel.com/content/www/us/en/develop/articles/performance-tools-for-software-developers-bsymbolic-can-cause-dangerous-side-effects.html

Extracts:

  • "... Unfortunately, -Bsymbolic is a dangerous option which can often result in some nonintuitive side effects."
  • "-Bsymbolic ... [turns] off symbol preemption in the DSO to which it was applied. ... This can cause unintended and dangerous behavior."

So why is this used by Ubuntu ?

@martin-frbg
Copy link

martin-frbg commented Jun 4, 2020

Copypasting myself from the OpenBLAS ticket:

https://bugs.launchpad.net/ubuntu/+source/libxfont/+bug/230460 (and their bug 226156 linked from it) seems to have highlighted the general problem with having -Bsymbolic_functions as the default buildpackage-LDFLAGS in 2008
and a comment in https://bugs.launchpad.net/ubuntu/+source/libxfont/+bug/226156 explains
(quoting a chat apparently)

<slangasek> bryce: -Wl,-Bsymbolic-functions is a single option; its purpose is to optimize the start-time symbol resolution at the expense of some "correctness", by causing any references to symbols available within the lib itself to be bound at build time
<bryce> slangasek: ah; is it a new addition? is there risk in turning it off?
<slangasek> bryce: there's no risk at all in turning it off, AFAIK it's entirely a performance thing

Yet they still seem to be doing it by default, only disabling on a case-by-case basis (which probably goes wrong sometimes as upstream code, maintainers or build scripts change).

@conradsnicta
Copy link
Contributor Author

<slangasek> ... at the expense of some "correctness" ...

Umm... yeah. There might be a lot of undiplomatic things to be said about these kinds of approaches. Maybe a response along the lines of Justin Trudeau would be appropriate here, including the uncomfortable 21 second pause: "We all watch in horror and consternation at what's going on at the Ubuntu camp".

@eddelbuettel
Copy link
Contributor

It was twelve years ago and AFAIK Steve (who is an entirely reasonable former Midwesterner) would be willing to revisit / whitelist more packages.

Fix also forthcoming per message on the Debian bug report ticket.

@martin-frbg
Copy link

martin-frbg commented Jun 4, 2020

Just to clarify I was only quoting what is already out on the 'net, not trying to appoint blame to anyone. From what I believe to have learned through this, it would probably be possible to come up with a --dynamic-list of symbols that should stay preemptable,though that could change over time. (Bit curious why the Ubuntu folks would insist on using this option if it is just for a tiny gain in startup speed, and also why the associated problems would hit OpenBLAS only now)

@martin-frbg
Copy link

martin-frbg commented Jun 4, 2020

BTW on the Debian ticket I now see it attributed to some change in OpenBLAS 0.3.7 , which I assume could only be my OpenMathLib/OpenBLAS#2136 (which in theory should only be relevant if you are building with the new option USE_LOCKING=1 in conjunction with USE_THREAD=0)
EDIT: or is this solely about a restructuring of the Debian package rather than the upstream code ?

@rcurtin
Copy link
Member

rcurtin commented Jun 7, 2020

Thanks everyone for the comments here and @conradsnicta for bringing up the issue. I'm not sure when we'll see our first bug report relating to this, but certainly this is a really nice thing to know and will save hours of digging. Glad to hear fixes are in progress---perhaps we won't see any issues here at all. 🤞 :)

@conradsnicta
Copy link
Contributor Author

The updated OpenBLAS package is currently in the approval queue for Ubuntu 20.04:
https://launchpad.net/ubuntu/focal/+queue?queue_state=1

@conradsnicta
Copy link
Contributor Author

A proposed fix is available. However, it won't be pushed as an official update in Ubuntu 20.04 until people actually report that it fixes their problem. This can be done by replying to the Ubuntu bug report on Launchpad.

See comment 9 in https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/1860601

@zoq
Copy link
Member

zoq commented Jun 11, 2020

Will test the version over the weekend.

@eddelbuettel
Copy link
Contributor

I could too as I am on 20.04. Someone poke me here to remind me and / or shoot me the apt-get line for that proposed repo ...

@eddelbuettel
Copy link
Contributor

Nah. The earlier link was more explicit about the focal-proposed repo which I need to enable, I suppose. Thanks for the pointer to Graham's PPA; the flexibility of that system is great (I have a few PPAs under my handle too) but these tend to become stale so I don't rush to add them. Thanks though.

@conradsnicta
Copy link
Contributor Author

@eddelbuettel
Copy link
Contributor

Thanks @conradsnicta ! All good here

To recap I did

  1. A quick cat / sudo editor to create /etc/apt/sources.list.d/ubuntu-proposed.list (and any name ending in .list` in that directory will do.
# cf https://wiki.ubuntu.com/Testing/EnableProposed
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ focal-proposed restricted main multiverse universe
  1. sudo apt update and examinig apt upgrade. I had an otherwise current system, it suggested 50+ packages so I did not go that route.

  2. Replace the workaround packages using OpenMP instead of pthread and tests:

sudo apt install libopenblas-dev libopenblas-openmp-dev libopenblas0 libopenblas0-openmp
Rscript -e 'example(solve)'
  1. Reinstall the default pthread one and test again:
sudo apt install libopenblas-pthread-dev libopenblas0-pthread
Rscript -e 'example(solve)'
  1. Commented out the proposed entry from 1. above to revert the system back to "normal" use (plus the few extras entries I have for CRAN, external software, ...). One could also delete the file.

  2. Reported success at https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/1860601

@conradsnicta
Copy link
Contributor Author

Fix finally released for Ubuntu 20.04.

$ sudo aptitude show libopenblas0
Package: libopenblas0                    
Version: 0.3.8+ds-1ubuntu0.20.04.1

Thanks to @eddelbuettel for doing the verification.

@eddelbuettel
Copy link
Contributor

I think I can confirm from my 20.04 box:

edd@rob:~$ apt list -a libopenblas0                 # new way to list
Listing... Done
libopenblas0/focal-updates,now 0.3.8+ds-1ubuntu0.20.04.1 amd64 [installed,automatic]
libopenblas0/focal 0.3.8+ds-1 amd64

edd@rob:~$ apt-cache policy libopenblas0            # older way  
libopenblas0:
  Installed: 0.3.8+ds-1ubuntu0.20.04.1
  Candidate: 0.3.8+ds-1ubuntu0.20.04.1
  Version table:
 *** 0.3.8+ds-1ubuntu0.20.04.1 500
        500 http://us.archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages
        100 /var/lib/dpkg/status
     0.3.8+ds-1 500
        500 http://us.archive.ubuntu.com/ubuntu focal/universe amd64 Packages
edd@rob:~$ 

So the candidate I am running with from focal-proposed appears to indeed by in updates. Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants