-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PowerPC without altivec causes crash (e.g. G3) #88
Comments
I think 6c0b761 should fix it. Please verify. |
That's pretty awful. Why not just this:
No point doing a text scapper for something the kernel has a binary interface for already. |
getauxval() is a relatively recent addition to glibc. |
2.16 is not that recent. That's 4 years ago. |
I do like the apple versus linux/android handling, and the option to force altivec though. At the very least checking the environment variables first and returning if set would save people the overhead of running the awful /proc/cpuinfo search if they specified what they want already. |
The emphasis was on the word relatively and 4 years is actually not that long ago. For example, the https://wiki.debian.org/DebianWheezy distribution seems to have eglibc 2.13 and is supported until 31st May 2018 (though not on the powerpc arch). |
/proc/cpuinfo parsing is how we handle this same problem on ARM processors, and the code contained in the referenced commit is largely cribbed from that implementation (refer to https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/simd/jsimd_arm.c#L32-L123.) And yes, it's a little ugly looking, but we already had this same discussion regarding ARM five years ago (https://sourceforge.net/p/libjpeg-turbo/patches/7/). At the time, I raised the same objections that @lsorense is raising now, but the code has proven to work fine over the five years that it has been in use on ARM platforms. I chose this solution for PowerPC because, after reading the thread on Debian.org, it seemed that the more simple approaches would force a dependency on a particular glibc version (which I am unwilling to do.) Furthermore, I spent two hours of my (unpaid) labor vetting this particular commit and verifying that it works properly on all of the PowerPC platforms I can get my hands on, including the servers in the Oregon State University Open Source Lab (on which the AltiVec code was developed initially) and an OS X Snow Leopard virtual machine running Rosetta. If you have a simpler solution that doesn't require a specific glibc version, then I'm all ears. Otherwise, please test my proposed solution and give me concrete feedback as to whether it works or doesn't work. I already lost $4000 on the AltiVec implementation, because I let IBM talk me into doing that work for a flat fee (and they then accused me of malfeasance after I chose to push out the code to 1.5 instead of including it in 1.4.2.) This AltiVec stuff has already left a bad taste in my mouth, so I really don't want to spend any more time on it than is absolutely necessary. If there are reasons to believe that parsing /proc/cpuinfo won't work properly on some platforms, then please list those reasons. |
Would you accept a patch that parses /proc/self/auxv instead? That seems like it would be much more efficient and a lot less code and would not require glibc support. I am just checking which kernel introduced that value, but I suspect it is quite a long time ago. Of course, it would have to be a patch that includes the other stuff your patch has for apple and such. |
So /proc/self/auxv was around in 2.6.12 when linux went to get, so that would be 11 years ago, and was clearly there before that too. |
Yeah, but... The whole point of this exercise is to extend support to 1990's PowerPC machines, is it not? Doesn't make sense to me to require an 11-year-old O/S when you're trying to support a 20-year-old CPU. I honestly know nothing about /proc/self/auxv, although it seems that Linaro proposed doing the same thing for ARM: https://wiki.linaro.org/Resources/HowTo/DeterminingCPUFeatures @ssvb may have better insight than I do regarding the relative merits of that approach, but my personal inclination is: why fix something that isn't broken? I don't see why parsing /proc/cpuinfo is any better or worse than parsing /proc/self/auxv, unless the /proc/cpuinfo interface is subject to change (but even if it is, accommodating that change would be a simple matter of adding new rules to the parser.) You seem to be having a knee-jerk reaction to it, but you haven't provided concrete reasons why that approach is problematic. Parsing /proc/cpuinfo is a proven solution that has not historically caused any maintenance problems, so I fail to see why it's so awful. It works, and I've already put in the leg work to validate that. I'm disinclined to spend more hours validating another solution unless you can make a strong argument for why it's better. |
I think using a proper interface instead of parsing text files is always preferable. Theauxv interface proposed by Lennart sounds gine. It has to be proven, but I suppose there is also a performance penalty in parsing a strings. It is not just about supporting an older CPU on old OS, but also newer OS: I think kernel 2.6 is quite acceptable, |
The /proc/self/auxv parsing is another possible approach. However it used to have some nasty problems too. The point is that it is a binary file, arranged as Now a tricky thing is that QEMU, valgrind and other emulators also exist (so that you can, for example, run powerpc binaries on x86). And their older versions did not emulate this /proc/self/auxv file correctly. And because the /proc/self/auxv format is binary, we can't easily detect if it is 32-bit or 64-bit and whether it is coming from the emulated system of from the host system. This was a major PITA in practice (causing crashes because of misinterpreting information), so parsing the text file /proc/cpuinfo turned out to be a safer choice. Personally, I think that |
Well the auxv interface exists for ld-linux.so to know the capabilities to use. It is a stable well defined interface. glibc appears to have had elf.h support the interface definition since 2005, but unfortunately has only added the nice getauxvec function in 2012. The method used by linaro is quite good, except it requires you to be main to get the envp that you use to get the auxv. Using /proc/self/auxv on the other hand can be done from any function and hence a library, which is more convinient in this case. String parsing in C is awful, and /proc/cpuinfo is rather large on some systems, and having to read through that for a keyword in something that by definition is meant to be human readable only is pretty slow and inefficient. Going through a list of about 20 intergers seems a lot faster, less error prone (After all what if a keyword shows up elsewhere in cpuinfo and not just the flags line?). And since the interface is the official one used by the elf loader, it better stay consistent and working. i am willing to write up a patch for powerpc and arm as well that uses the auxv. |
How much time are we potentially saving this way? It would be nice to have some performance numbers. In both cases there are file open and file read syscalls, which might (or might not) be expensive. |
If qemu had a bug in the auxv emulation, that would have caused other things to break too, so I don't think that is relevant. That's just a bug and fixed long ago it would seem. I will try to compare both methods to see if I can get any timing info on it. |
Sorry, but no argument I've heard thus far has convinced me that there is anything wrong with my proposed solution. @lsorense, please stop using the word "awful" unless you are willing to back it up with concrete, objective evidence. Something isn't "awful" just because you personally don't like it. We're talking about a proven solution that has worked fine for five years. And everyone please give the performance FUD a rest, OK? This interface is read exactly once during the execution of libjpeg-turbo, and in reality, it is less than 2 kilobytes of data. The same technique is being used quite successfully on a variety of relatively slow mobile devices. I'm the one that you guys need to convince, here. So convince me, or I'm going with what I have. I am unwilling to entertain a new interface for ARM. That interface works and has been thoroughly tested. I do not waste time fixing things that aren't broken. |
Well, the point is that neither As for the emulator "bugs", it means that they need to intercept the
If some application links a few dozens of various multimedia libraries and each one of them does such stunts for the CPU runtime features detection at start, then the application startup time might be affected a little bit. I believe that Mozilla had some patch for implementing their own runtime CPU features detection code and then patching every third-party library in their source tree to use this common code. Now I don't see anything like this in a more recent Mozilla code, so they have probably decided that it is not worth the efforts and abandoned this idea :-) Also |
Well I did find one big difference: Using auxv works with qemu. Using cpuinfo does not since when using qemu-user (which I do a lot for testing things), /proc/cpuinfo contains the host system data, not the emulated system. /proc/self/auxv on the other hand is emulated correctly, since it has to be, or ld-linux.so would fail to handle cpu features correctly. With newer glibc versions it would appear that glibc will actually fail to work correctly if auxv isn't emulated correctly too since it is now starting to use that to determine the cpu features as far as I can tell. When running 1.5.0, altivec is assumed, and it works with qemu when emulating a 7457 (which does altivec) and crashes when emulating a 750 (which does not have altivec). With the current patch, it works everywhere because it never uses altivec even when it should on qemu, while doing the right thing on real hardware where /proc/cpuinfo exists. Of course with a newer glibc and using the getauxval, you don't even need /proc to be mounted to do the right thing, but most systems have that mounted, so that probably isn't a big deal in general. Sure would be nice if that function had existed earlier. Other than adding a configure test for it, I can't think of a good way to use getauxval if it exists, and fall back to reading /proc/self/auxv or /proc/cpuinfo otherwise, since that would clearly be ideal. This little test program has a working function that does correctly work under qemu as well as on real hardware (tried on power7 which has altivec and e300 which does not).:
Changing altivec to neon and PPC_FEATURE_HAS_ALTIVEC to HWCAP_ARM_NEON works on arm, including on qemu (which with current libjpeg-turbo code always turns off neon when run under qemu-user even though it should be using the neon emulation, unless forced with the environment variable). |
Well, now it should not be very difficult for you to understand that both If you think that QEMU should also emulate correct Why are we even wasting time discussing this? Basically, you are proposing the code change, which does not solve any real problem. Also you are just claiming that there must be some improvement, while not providing any performance numbers. And if these changes are applied, then do we have to hope that nobody has any old problematic version of QEMU or valgrind anymore (or some other similar tool)? What's the point of this exercise?
The ld-linux.so does not need to read the
Why should it use the NEON emulation? Is QEMU emulating NEON optimized functions faster than the equivalent generic C functions? Do you have some performance numbers? And just for testing purposes you can always compile libjpeg-turbo with "-mfpu=neon" option in CFLAGS, in this case the runtime CPU features check is bypassed. |
I don't think qemu has a reason to emulate /proc/cpuinfo, since any application using it is wrong. The kernel developers seem pretty clear on that. It is human readable and not meant for anything else. Sure they try not to break anything, because even if it wasn't supposed to be used, lots of programs went and did it anyhow. As for why you would want neon under qemu? Well because you actually wanted it to run the same code as real hardware would so you can test things that way. Having to do special compiles or environments pretty much defeats the purpose of that. So the current detection code is "broken", "inefficient", etc. You don't care. Fine. I guess I will stop trying to be helpful. And just because you haven't noticed a problem in 5 years does not mean it works. Clearly under qemu it didn't, since it uses something that isn't meant for the purpose. I will stop wasting my time I guess. Most projects like code improvements. I guess this one is different. |
Says who? Sorry, but a reference would be very much appreciated.
Well, you are just doing it wrong. One can't really only test software in QEMU and hope that it will always work fine on real hardware. There are many things which are a little bit different, most notably the unaligned data access handling. Actually Codesorcery released a broken toolchain at least once, because they had only tested it with QEMU. I guess, they have learned their lesson. And the
Err, did you actually miss my explanations? There was simply no better choice. Everything else just sucked more.
The library at least always worked correctly for JPEG encoding/decoding. Would you have preferred libjpeg-turbo failing when run under qemu-user? The QEMU developers were aware of this /proc/self/auxv problem all the time and eventually fixed it. Yeah, of course we could have claimed that it's a QEMU bug and none of our concern, in order to "motivate" them better. But why stirring up conflict and causing troubles for the end users if an alternative solution exists?
Most projects don't like dangerous changes and potential regressions too. |
Why don't you check for getauxval() and use it in case it is available, while falling back to cpuinfo parsing if not? |
The problem is-- how do we check for the existence of getauxval() at run time? I am not interested in introducing dlopen()/dlsym() code-- that would make libjpeg-turbo depend on libdl, which is unacceptable. Also not interested in build-time detection, because that would mean that the getauxval() code wouldn't be included unless libjpeg-turbo was built on a newer system. I have to support any Linux distro that is actively supported by its distributor, and that means that the enterprise distros are largely what set the minimum requirements. Because libjpeg-turbo feeds into VirtualGL and TurboVNC, which are enterprise-class products, the system requirements of those products extend down into libjpeg-turbo as well. That means I have to currently support RHEL 5 (glibc 2.5) and Ubuntu 12.04 LTS (glibc 2.15) and SLES 11 (glibc 2.9.) Next year, RHEL 5 and Ubuntu 12.04 LTS will leave Production Phase, so I will no longer need to actively support those distros with new releases of libjpeg-turbo. That will push me to RHEL 6 (glibc 2.12) and Ubuntu 14.04 LTS (glibc 2.19) as minimums. However, it will be 2019 before SLES 11 leaves the General Support phase and 2020 before RHEL 6 leaves Production Phase, and that would be the earliest that I could get away with requiring glibc 2.16 or later (the only way I get paid for my work on open source is by contracting with big corporations and other large organizations, and a lot of them refuse to upgrade until they absolutely have to.) In short, I think getauxval() is a non-starter at the moment, so that leaves us with some other means of reading auxv. We can't do the envp trick, because we're a library, so that leaves parsing /proc/self/auxv. In general, it seems like the only real benefits of parsing /proc/self/auxv are:
The thing is-- it is entirely possible for qemu to present a different /proc/cpuinfo interface to programs running within it. For instance, I have a Google Nexus 5X with two Cortex-A57 (out-of-order) cores and four Cortex-A53 (in-order) cores, and Android presents a different /proc/cpuinfo interface depending on which core my process is bound to (taskset'ing it to a different core causes the /proc/cpuinfo interface to look different to the process, so the ARM feature detection code in libjpeg-turbo still works properly.) I'm not saying that adding that feature to qemu would be particularly easy. I'm just saying that it's possible. If it seems unlikely that it will ever happen, then I think we can work around the issue in one of the following ways:
|
Doing runtime detection of getauxval is clearly not a good idea. Doing build time detection on the other hand is. If someone is doing a build that they expect to run on RHEL5, then they better build on that. If they don't, that wpi;d seem incompetent and just asking for trouble. Building on a newer system and hoping it just happens to work on an older one with different library versions just isn't a good idea. This seems to be how pretty much ever program using autoconf does things. Detect what is there, and use the best option on this target, and fall back to less desirable but functional alternatives otherwise, possible allowing the user to override the detection and say "I really don't want to use this feature even if you find it". /proc/cpuinfo can get rather large, although hopefully the parser will stop when it sees the expected keyword and ignore the rest of the content for the other large number of cpu cores. I haven't seen any systems with lots of cores that didn't have SIMD. I actually thought from the previous explanations that the environment variables did force the use, and given their names, that certainly would be what I would have expected to happen. Disabling the detection if the compile time flags force on SIMD, well that makes good sense too. I wouldn't even expect a program compiled with -maltivec to run on a non altivec capable CPU. One might get lucky, but I wouldn't count on it, so that seems perfectly reasonable. As for x86, well cpuid is pretty good, as long as you know about all the erratas for the few CPUs that return bad data there for some features. I don't think anyone cares about pre pentium chips anymore, and I don't think I have ever heard of a SIMD related errata, so it's probably safe, even if using getauxval would actually be simpler code by just using the data the kernel already found and provided to the application. Perhaps in the future this would be a worthwhile cleanup to consider. |
The JSIMD_FORCE* environment variables previously meant "force the use of this instruction set if it is available but others are available as well", but that did nothing on ARM platforms, since there is only ever one instruction set available. Since the ARM and MIPS CPU feature detection code is less than bulletproof, and since there is only one SIMD instruction set (currently) supported on those platforms, it makes sense for the JSIMD_FORCE* environment variables on those platforms to actually force the use of the SIMD instruction set, thus bypassing the CPU feature detection code. This addresses a concern raised in #88 whereby parsing /proc/cpuinfo didn't work within a QEMU environment. This at least provides a workaround, allowing users to force-enable or force-disable SIMD instructions for ARM and MIPS builds of libjpeg-turbo.
Moreover, the downside is that parsing /proc/self/auxv fails in a pretty nasty way with older versions of QEMU. Which might be still in use in the long term supported Linux distributions. This is somewhat similar to the getauxval() situation.
I'm not a big fan of this idea. Right now the environment variables are "idiot proof" and the worst thing that may happen is that just some performance is lost. It is only a convenient way to select between multiple safe options (for example, MMX vs. SSE2 vs. no SIMD at all). Changing this behaviour may backfire in the case if some developers try to play with libjpeg-turbo environment variables and then somehow forget to remove them before shipping their software to the users.
The QEMU users don't have any problem in the first place and don't need special workarounds (though doing better emulation of /proc/cpuinfo would be nice). The whole point is that testing with QEMU has some limitations:
In other words, QEMU is good for doing quick preliminary "smoke testing" during development. The final testing is always done on real hardware. |
As much as I respect everyone's opinion, it's impossible to make everyone happy. Those who develop decide, and here's what I decided:
If there are any concrete, non-philosophical objections to this (i.e. "it doesn't work in such-and-such case"), then please post them, but I'm done with philosophical and hypothetical arguments. This has already occupied way too much of my time. |
The altivec_detection_fix branch has been merged into master and dev. I consider this matter resolved. |
As referenced in the Debian mailing list thread Using the library on G3 or other powerpc processors without AltiVec causes illegal instructions.
Setting JSIMD_FORCENONE=1 fixes the issue, meaning processor feature autodetection is not implemented on PPC or anyway not working.
The text was updated successfully, but these errors were encountered: