Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerPC without altivec causes crash (e.g. G3) #88

Closed
rmottola opened this issue Jul 6, 2016 · 27 comments
Closed

PowerPC without altivec causes crash (e.g. G3) #88

rmottola opened this issue Jul 6, 2016 · 27 comments

Comments

@rmottola
Copy link

rmottola commented Jul 6, 2016

As referenced in the Debian mailing list thread Using the library on G3 or other powerpc processors without AltiVec causes illegal instructions.

Setting JSIMD_FORCENONE=1 fixes the issue, meaning processor feature autodetection is not implemented on PPC or anyway not working.

@dcommander
Copy link
Member

I think 6c0b761 should fix it. Please verify.

@lsorense
Copy link

lsorense commented Jul 6, 2016

That's pretty awful. Why not just this:

diff -r -u libjpeg-turbo-1.5.0.orig/simd/jsimd_powerpc.c libjpeg-turbo-1.5.0/simd/jsimd_powerpc.c
--- libjpeg-turbo-1.5.0.orig/simd/jsimd_powerpc.c       2016-06-07 17:33:40.000000000 +0000
+++ libjpeg-turbo-1.5.0/simd/jsimd_powerpc.c    2016-07-06 17:32:53.000000000 +0000
@@ -21,6 +21,7 @@
 #include "../jdct.h"
 #include "../jsimddct.h"
 #include "jsimd.h"
+#include <sys/auxv.h>

 static unsigned int simd_support = ~0;

@@ -32,7 +33,10 @@
   if (simd_support != ~0U)
     return;

-  simd_support = JSIMD_ALTIVEC;
+  if (getauxval(AT_HWCAP) & PPC_FEATURE_HAS_ALTIVEC)
+    simd_support = JSIMD_ALTIVEC;
+  else
+    simd_support = 0;

   /* Force different settings through environment variables */
   env = getenv("JSIMD_FORCENONE");

No point doing a text scapper for something the kernel has a binary interface for already.

@ssvb
Copy link

ssvb commented Jul 6, 2016

getauxval() is a relatively recent addition to glibc.

@lsorense
Copy link

lsorense commented Jul 6, 2016

2.16 is not that recent. That's 4 years ago.

@lsorense
Copy link

lsorense commented Jul 6, 2016

I do like the apple versus linux/android handling, and the option to force altivec though.

At the very least checking the environment variables first and returning if set would save people the overhead of running the awful /proc/cpuinfo search if they specified what they want already.

@ssvb
Copy link

ssvb commented Jul 6, 2016

The emphasis was on the word relatively and 4 years is actually not that long ago. For example, the https://wiki.debian.org/DebianWheezy distribution seems to have eglibc 2.13 and is supported until 31st May 2018 (though not on the powerpc arch).

@dcommander
Copy link
Member

/proc/cpuinfo parsing is how we handle this same problem on ARM processors, and the code contained in the referenced commit is largely cribbed from that implementation (refer to https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/simd/jsimd_arm.c#L32-L123.) And yes, it's a little ugly looking, but we already had this same discussion regarding ARM five years ago (https://sourceforge.net/p/libjpeg-turbo/patches/7/). At the time, I raised the same objections that @lsorense is raising now, but the code has proven to work fine over the five years that it has been in use on ARM platforms.

I chose this solution for PowerPC because, after reading the thread on Debian.org, it seemed that the more simple approaches would force a dependency on a particular glibc version (which I am unwilling to do.) Furthermore, I spent two hours of my (unpaid) labor vetting this particular commit and verifying that it works properly on all of the PowerPC platforms I can get my hands on, including the servers in the Oregon State University Open Source Lab (on which the AltiVec code was developed initially) and an OS X Snow Leopard virtual machine running Rosetta. If you have a simpler solution that doesn't require a specific glibc version, then I'm all ears. Otherwise, please test my proposed solution and give me concrete feedback as to whether it works or doesn't work.

I already lost $4000 on the AltiVec implementation, because I let IBM talk me into doing that work for a flat fee (and they then accused me of malfeasance after I chose to push out the code to 1.5 instead of including it in 1.4.2.) This AltiVec stuff has already left a bad taste in my mouth, so I really don't want to spend any more time on it than is absolutely necessary. If there are reasons to believe that parsing /proc/cpuinfo won't work properly on some platforms, then please list those reasons.

@dcommander dcommander added the bug label Jul 6, 2016
@lsorense
Copy link

lsorense commented Jul 6, 2016

Would you accept a patch that parses /proc/self/auxv instead? That seems like it would be much more efficient and a lot less code and would not require glibc support. I am just checking which kernel introduced that value, but I suspect it is quite a long time ago. Of course, it would have to be a patch that includes the other stuff your patch has for apple and such.

@lsorense
Copy link

lsorense commented Jul 6, 2016

So /proc/self/auxv was around in 2.6.12 when linux went to get, so that would be 11 years ago, and was clearly there before that too.

@dcommander
Copy link
Member

Yeah, but... The whole point of this exercise is to extend support to 1990's PowerPC machines, is it not? Doesn't make sense to me to require an 11-year-old O/S when you're trying to support a 20-year-old CPU. I honestly know nothing about /proc/self/auxv, although it seems that Linaro proposed doing the same thing for ARM:

https://wiki.linaro.org/Resources/HowTo/DeterminingCPUFeatures

@ssvb may have better insight than I do regarding the relative merits of that approach, but my personal inclination is: why fix something that isn't broken? I don't see why parsing /proc/cpuinfo is any better or worse than parsing /proc/self/auxv, unless the /proc/cpuinfo interface is subject to change (but even if it is, accommodating that change would be a simple matter of adding new rules to the parser.) You seem to be having a knee-jerk reaction to it, but you haven't provided concrete reasons why that approach is problematic. Parsing /proc/cpuinfo is a proven solution that has not historically caused any maintenance problems, so I fail to see why it's so awful. It works, and I've already put in the leg work to validate that. I'm disinclined to spend more hours validating another solution unless you can make a strong argument for why it's better.

@rmottola
Copy link
Author

rmottola commented Jul 6, 2016

I think using a proper interface instead of parsing text files is always preferable. Theauxv interface proposed by Lennart sounds gine. It has to be proven, but I suppose there is also a performance penalty in parsing a strings.

It is not just about supporting an older CPU on old OS, but also newer OS: I think kernel 2.6 is quite acceptable,
Last but not least: I reported the issue for ppc 740/750, but other people reported it for P5020. the PowerPC e5500 family ism uch more recent and current and has no AltiVec.

@ssvb
Copy link

ssvb commented Jul 6, 2016

The /proc/self/auxv parsing is another possible approach. However it used to have some nasty problems too. The point is that it is a binary file, arranged as {32-bit key, 32-bit value} pairs on 32-bit systems and {64-bit key, 64-bit value} pairs on 64-bit systems. Depending on the 32-bit/64-bit mode of the running process, the Linux kernel provides the right variant of this binary when somebody tries to open and read this file. Seems to be somewhat wacky but not really troublesome so far?

Now a tricky thing is that QEMU, valgrind and other emulators also exist (so that you can, for example, run powerpc binaries on x86). And their older versions did not emulate this /proc/self/auxv file correctly. And because the /proc/self/auxv format is binary, we can't easily detect if it is 32-bit or 64-bit and whether it is coming from the emulated system of from the host system. This was a major PITA in practice (causing crashes because of misinterpreting information), so parsing the text file /proc/cpuinfo turned out to be a safer choice.

Personally, I think that getauxval() is the best choice in the long run. But then libjpeg-turbo needs to have a configure check to be sure that this function is really supported (the glibc version may be loo old or the system may use some other C library).

@lsorense
Copy link

lsorense commented Jul 6, 2016

Well the auxv interface exists for ld-linux.so to know the capabilities to use. It is a stable well defined interface. glibc appears to have had elf.h support the interface definition since 2005, but unfortunately has only added the nice getauxvec function in 2012. The method used by linaro is quite good, except it requires you to be main to get the envp that you use to get the auxv. Using /proc/self/auxv on the other hand can be done from any function and hence a library, which is more convinient in this case.

String parsing in C is awful, and /proc/cpuinfo is rather large on some systems, and having to read through that for a keyword in something that by definition is meant to be human readable only is pretty slow and inefficient. Going through a list of about 20 intergers seems a lot faster, less error prone (After all what if a keyword shows up elsewhere in cpuinfo and not just the flags line?). And since the interface is the official one used by the elf loader, it better stay consistent and working.

i am willing to write up a patch for powerpc and arm as well that uses the auxv.

@ssvb
Copy link

ssvb commented Jul 6, 2016

@lsorense

Would you accept a patch that parses /proc/self/auxv instead? That seems like it would be much more efficient and a lot less code and would not require glibc support.

How much time are we potentially saving this way? It would be nice to have some performance numbers. In both cases there are file open and file read syscalls, which might (or might not) be expensive.

@lsorense
Copy link

lsorense commented Jul 6, 2016

If qemu had a bug in the auxv emulation, that would have caused other things to break too, so I don't think that is relevant. That's just a bug and fixed long ago it would seem. I will try to compare both methods to see if I can get any timing info on it.

@dcommander
Copy link
Member

Sorry, but no argument I've heard thus far has convinced me that there is anything wrong with my proposed solution. @lsorense, please stop using the word "awful" unless you are willing to back it up with concrete, objective evidence. Something isn't "awful" just because you personally don't like it. We're talking about a proven solution that has worked fine for five years.

And everyone please give the performance FUD a rest, OK? This interface is read exactly once during the execution of libjpeg-turbo, and in reality, it is less than 2 kilobytes of data. The same technique is being used quite successfully on a variety of relatively slow mobile devices.

I'm the one that you guys need to convince, here. So convince me, or I'm going with what I have.

I am unwilling to entertain a new interface for ARM. That interface works and has been thoroughly tested. I do not waste time fixing things that aren't broken.

@ssvb
Copy link

ssvb commented Jul 7, 2016

@lsorense

If qemu had a bug in the auxv emulation, that would have caused other things to break too, so I don't think that is relevant. That's just a bug and fixed long ago it would seem.

Well, the point is that neither /proc/self/auxv nor /proc/cpuinfo parsing is pretty. But the /proc/self/auxv method used to actually cause very real problems for very real users (here is a valgrind bugreport - https://bugs.kde.org/show_bug.cgi?id=253519), while the /proc/cpuinfo method always worked fine. In my opinion the choice between these two methods is very obvious. As @dcommander said, there is no point fixing something that is not broken. And regarding text files in general, there are also such things as XML, JSON, etc.

As for the emulator "bugs", it means that they need to intercept the /proc/self/auxv read operations and fixup them on the fly instead of just providing transparent access to the host file system. This is an "ugly" thing from their point of view and you can't really blame them for being a bit reluctant to fix this crap. And we had at least QEMU and Valgrind both having problems with this.

@dcommander

And everyone please give the performance FUD a rest, OK? This interface is read exactly once during the execution of libjpeg-turbo, and in reality, it is less than 2 kilobytes of data. The same technique is being used quite successfully on a variety of relatively slow mobile devices.

If some application links a few dozens of various multimedia libraries and each one of them does such stunts for the CPU runtime features detection at start, then the application startup time might be affected a little bit. I believe that Mozilla had some patch for implementing their own runtime CPU features detection code and then patching every third-party library in their source tree to use this common code. Now I don't see anything like this in a more recent Mozilla code, so they have probably decided that it is not worth the efforts and abandoned this idea :-)

Also /proc might be sometimes not mounted (for example in chroot), and this causes troubles for both /proc/self/auxv and /proc/cpuinfo methods. Too bad that the glibc developers have introduced the dedicated getauxval() function so late.

@lsorense
Copy link

lsorense commented Jul 7, 2016

Well I did find one big difference: Using auxv works with qemu. Using cpuinfo does not since when using qemu-user (which I do a lot for testing things), /proc/cpuinfo contains the host system data, not the emulated system. /proc/self/auxv on the other hand is emulated correctly, since it has to be, or ld-linux.so would fail to handle cpu features correctly. With newer glibc versions it would appear that glibc will actually fail to work correctly if auxv isn't emulated correctly too since it is now starting to use that to determine the cpu features as far as I can tell.

When running 1.5.0, altivec is assumed, and it works with qemu when emulating a 7457 (which does altivec) and crashes when emulating a 750 (which does not have altivec). With the current patch, it works everywhere because it never uses altivec even when it should on qemu, while doing the right thing on real hardware where /proc/cpuinfo exists.

Of course with a newer glibc and using the getauxval, you don't even need /proc to be mounted to do the right thing, but most systems have that mounted, so that probably isn't a big deal in general. Sure would be nice if that function had existed earlier. Other than adding a configure test for it, I can't think of a good way to use getauxval if it exists, and fall back to reading /proc/self/auxv or /proc/cpuinfo otherwise, since that would clearly be ideal.

This little test program has a working function that does correctly work under qemu as well as on real hardware (tried on power7 which has altivec and e300 which does not).:

#include <stdio.h>
#include <elf.h>
#include <sys/auxv.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <limits.h>

int check_for_altivec() {
  int auxvfd;
  void *auxvmmap;
#if ( __WORDSIZE == 32 )
  Elf32_auxv_t *auxv;
#else
  Elf64 auxv_t *auxv;
#endif
  int altivec_support = 0;

  auxvfd = open("/proc/self/auxv", O_RDONLY);
  if (auxvfd == -1) {
    fprintf(stderr, "Could not open /proc/self/auxv");
    return 0; /* Assume no altivec */
  }
  auxvmmap = mmap(NULL, 1024, PROT_READ, MAP_SHARED, auxvfd, 0);
  if (auxvmmap) {
    auxv = auxvmmap;
    while(auxv && auxv->a_type != AT_HWCAP) {
      auxv++;
    }
    if (auxv) {
      altivec_support = auxv->a_un.a_val & PPC_FEATURE_HAS_ALTIVEC;
    }
    munmap(auxvmmap, 1024);
  }
  close(auxvfd);
  return altivec_support ? 1 : 0;
}

int main() {
  printf("Supports altivec: %d\n", check_for_altivec());
  return 0;
}

Changing altivec to neon and PPC_FEATURE_HAS_ALTIVEC to HWCAP_ARM_NEON works on arm, including on qemu (which with current libjpeg-turbo code always turns off neon when run under qemu-user even though it should be using the neon emulation, unless forced with the environment variable).

@ssvb
Copy link

ssvb commented Jul 7, 2016

Well I did find one big difference: Using auxv works with qemu. Using cpuinfo does not since when using qemu-user (which I do a lot for testing things), /proc/cpuinfo contains the host system data, not the emulated system.

Well, now it should not be very difficult for you to understand that both /proc/cpuinfo and /proc/self/auxv contained the host system data when run in qemu-user and valgrind back in 2011. And parsing the wrong /proc/self/auxv binary data crapped out, while parsing the wrong text did not lead to catastrophic results. Needless to say that the getauxval() function did not even exist at that time.

If you think that QEMU should also emulate correct /proc/cpuinfo contents, then you can submit a patch to QEMU.

Why are we even wasting time discussing this? Basically, you are proposing the code change, which does not solve any real problem. Also you are just claiming that there must be some improvement, while not providing any performance numbers. And if these changes are applied, then do we have to hope that nobody has any old problematic version of QEMU or valgrind anymore (or some other similar tool)? What's the point of this exercise?

/proc/self/auxv on the other hand is emulated correctly, since it has to be, or ld-linux.so would fail to handle cpu features correctly.

The ld-linux.so does not need to read the /proc/self/auxv file because this information is available in the process address space and the pointer to it is available on stack. The getauxval() function does not need to read this file either. So the correct /proc/self/auxv file emulation is not critical for this functionality.

Changing altivec to neon and PPC_FEATURE_HAS_ALTIVEC to HWCAP_ARM_NEON works on arm, including on qemu (which with current libjpeg-turbo code always turns off neon when run under qemu-user even though it should be using the neon emulation, unless forced with the environment variable).

Why should it use the NEON emulation? Is QEMU emulating NEON optimized functions faster than the equivalent generic C functions? Do you have some performance numbers?

And just for testing purposes you can always compile libjpeg-turbo with "-mfpu=neon" option in CFLAGS, in this case the runtime CPU features check is bypassed.

@lsorense
Copy link

lsorense commented Jul 7, 2016

I don't think qemu has a reason to emulate /proc/cpuinfo, since any application using it is wrong. The kernel developers seem pretty clear on that. It is human readable and not meant for anything else. Sure they try not to break anything, because even if it wasn't supposed to be used, lots of programs went and did it anyhow.

As for why you would want neon under qemu? Well because you actually wanted it to run the same code as real hardware would so you can test things that way. Having to do special compiles or environments pretty much defeats the purpose of that.

So the current detection code is "broken", "inefficient", etc. You don't care. Fine. I guess I will stop trying to be helpful. And just because you haven't noticed a problem in 5 years does not mean it works. Clearly under qemu it didn't, since it uses something that isn't meant for the purpose.

I will stop wasting my time I guess. Most projects like code improvements. I guess this one is different.

@ssvb
Copy link

ssvb commented Jul 7, 2016

I don't think qemu has a reason to emulate /proc/cpuinfo, since any application using it is wrong. The kernel developers seem pretty clear on that. It is human readable and not meant for anything else. Sure they try not to break anything, because even if it wasn't supposed to be used, lots of programs went and did it anyhow.

Says who? Sorry, but a reference would be very much appreciated.

As for why you would want neon under qemu? Well because you actually wanted it to run the same code as real hardware would so you can test things that way. Having to do special compiles or environments pretty much defeats the purpose of that.

Well, you are just doing it wrong. One can't really only test software in QEMU and hope that it will always work fine on real hardware. There are many things which are a little bit different, most notably the unaligned data access handling. Actually Codesorcery released a broken toolchain at least once, because they had only tested it with QEMU. I guess, they have learned their lesson. And the /proc/cpuinfo handling is also one of such minor differences. If you really want to have more accurate emulation in QEMU, then you need to take care of these things.

So the current detection code is "broken", "inefficient", etc. You don't care. Fine. I guess I will stop trying to be helpful. And just because you haven't noticed a problem in 5 years does not mean it works.

Err, did you actually miss my explanations? There was simply no better choice. Everything else just sucked more.

Clearly under qemu it didn't, since it uses something that isn't meant for the purpose.

The library at least always worked correctly for JPEG encoding/decoding. Would you have preferred libjpeg-turbo failing when run under qemu-user? The QEMU developers were aware of this /proc/self/auxv problem all the time and eventually fixed it. Yeah, of course we could have claimed that it's a QEMU bug and none of our concern, in order to "motivate" them better. But why stirring up conflict and causing troubles for the end users if an alternative solution exists?

I will stop wasting my time I guess. Most projects like code improvements. I guess this one is different.

Most projects don't like dangerous changes and potential regressions too.

@rmottola
Copy link
Author

rmottola commented Jul 7, 2016

Why don't you check for getauxval() and use it in case it is available, while falling back to cpuinfo parsing if not?
What way on Linux systems with current libc, you get the "fast path" and if it is not working a bug in e.g. quemu can be claimed. Otherwise your now proven system can be used (which we know may fail too in emulation depending if cpuinfo is being emulated too or not).
This discussion has shown that neither path is really safe in every case.

@dcommander
Copy link
Member

The problem is-- how do we check for the existence of getauxval() at run time? I am not interested in introducing dlopen()/dlsym() code-- that would make libjpeg-turbo depend on libdl, which is unacceptable. Also not interested in build-time detection, because that would mean that the getauxval() code wouldn't be included unless libjpeg-turbo was built on a newer system. I have to support any Linux distro that is actively supported by its distributor, and that means that the enterprise distros are largely what set the minimum requirements. Because libjpeg-turbo feeds into VirtualGL and TurboVNC, which are enterprise-class products, the system requirements of those products extend down into libjpeg-turbo as well. That means I have to currently support RHEL 5 (glibc 2.5) and Ubuntu 12.04 LTS (glibc 2.15) and SLES 11 (glibc 2.9.) Next year, RHEL 5 and Ubuntu 12.04 LTS will leave Production Phase, so I will no longer need to actively support those distros with new releases of libjpeg-turbo. That will push me to RHEL 6 (glibc 2.12) and Ubuntu 14.04 LTS (glibc 2.19) as minimums. However, it will be 2019 before SLES 11 leaves the General Support phase and 2020 before RHEL 6 leaves Production Phase, and that would be the earliest that I could get away with requiring glibc 2.16 or later (the only way I get paid for my work on open source is by contracting with big corporations and other large organizations, and a lot of them refuse to upgrade until they absolutely have to.) In short, I think getauxval() is a non-starter at the moment, so that leaves us with some other means of reading auxv. We can't do the envp trick, because we're a library, so that leaves parsing /proc/self/auxv.

In general, it seems like the only real benefits of parsing /proc/self/auxv are:

  • It satisfies some arbitrary standard of code purity (I really do not care about this.)
  • It is potentially faster, but that hasn't been proven yet (I honestly don't think the amount of time it takes to parse /proc/cpuinfo is going to be noticeable in any real end-user sense.)
  • It works in qemu. Great, but not a sufficient argument, given the downsides

The thing is-- it is entirely possible for qemu to present a different /proc/cpuinfo interface to programs running within it. For instance, I have a Google Nexus 5X with two Cortex-A57 (out-of-order) cores and four Cortex-A53 (in-order) cores, and Android presents a different /proc/cpuinfo interface depending on which core my process is bound to (taskset'ing it to a different core causes the /proc/cpuinfo interface to look different to the process, so the ARM feature detection code in libjpeg-turbo still works properly.) I'm not saying that adding that feature to qemu would be particularly easy. I'm just saying that it's possible. If it seems unlikely that it will ever happen, then I think we can work around the issue in one of the following ways:

  • A new environment variable that really forces the use of AltiVec or NEON instructions. Currently, the JSIMD_FORCE* environment variables really mean "if this instruction set is available but others are as well, use this one." That makes sense on x86, because the feature detection code is bulletproof (uses the cpuid instruction.) Thus, it's guaranteed that if, for instance, SSE2 isn't detected, forcing the use of that instruction set will crash the program. But in the case of PowerPC and ARM, only one SIMD instruction set is ever going to be available, so JSIMD_FORCEALTIVEC and JSIMD_FORCENEON don't currently do anything. I think it would be much more useful to change the meaning of those environment variables to: "always use SIMD and bypass the CPU feature detection code", which would provide a workaround for qemu users.
  • Always enable AltiVec if the code is built with -maltivec. This would require some slight modifications to the build system. Currently, when building on PowerPC, -maltivec is passed to all C files under simd/ for simplicity, but it really needs to be passed just to the intrinsics files, not to jsimd_powerpc.c. That would allow me to introduce a similar mechanism on PowerPC to the one we use on ARM, whereby the CPU feature detection code is not compiled in if the code is built with the appropriate -m option to explicitly enable SIMD instructions.

@lsorense
Copy link

lsorense commented Jul 7, 2016

Doing runtime detection of getauxval is clearly not a good idea.

Doing build time detection on the other hand is.

If someone is doing a build that they expect to run on RHEL5, then they better build on that. If they don't, that wpi;d seem incompetent and just asking for trouble. Building on a newer system and hoping it just happens to work on an older one with different library versions just isn't a good idea. This seems to be how pretty much ever program using autoconf does things. Detect what is there, and use the best option on this target, and fall back to less desirable but functional alternatives otherwise, possible allowing the user to override the detection and say "I really don't want to use this feature even if you find it".

/proc/cpuinfo can get rather large, although hopefully the parser will stop when it sees the expected keyword and ignore the rest of the content for the other large number of cpu cores. I haven't seen any systems with lots of cores that didn't have SIMD.

I actually thought from the previous explanations that the environment variables did force the use, and given their names, that certainly would be what I would have expected to happen.

Disabling the detection if the compile time flags force on SIMD, well that makes good sense too. I wouldn't even expect a program compiled with -maltivec to run on a non altivec capable CPU. One might get lucky, but I wouldn't count on it, so that seems perfectly reasonable.

As for x86, well cpuid is pretty good, as long as you know about all the erratas for the few CPUs that return bad data there for some features. I don't think anyone cares about pre pentium chips anymore, and I don't think I have ever heard of a SIMD related errata, so it's probably safe, even if using getauxval would actually be simpler code by just using the data the kernel already found and provided to the application. Perhaps in the future this would be a worthwhile cleanup to consider.

dcommander added a commit that referenced this issue Jul 7, 2016
The JSIMD_FORCE* environment variables previously meant "force the use
of this instruction set if it is available but others are available as
well", but that did nothing on ARM platforms, since there is only ever
one instruction set available.  Since the ARM and MIPS CPU feature
detection code is less than bulletproof, and since there is only one
SIMD instruction set (currently) supported on those platforms, it makes
sense for the JSIMD_FORCE* environment variables on those platforms to
actually force the use of the SIMD instruction set, thus bypassing the
CPU feature detection code.

This addresses a concern raised in #88 whereby parsing /proc/cpuinfo
didn't work within a QEMU environment.  This at least provides a
workaround, allowing users to force-enable or force-disable SIMD
instructions for ARM and MIPS builds of libjpeg-turbo.
@ssvb
Copy link

ssvb commented Jul 7, 2016

@dcommander

In general, it seems like the only real benefits of parsing /proc/self/auxv are:
...
It works in qemu. Great, but not a sufficient argument, given the downsides

Moreover, the downside is that parsing /proc/self/auxv fails in a pretty nasty way with older versions of QEMU. Which might be still in use in the long term supported Linux distributions. This is somewhat similar to the getauxval() situation.

I think it would be much more useful to change the meaning of those environment variables to: "always use SIMD and bypass the CPU feature detection code",

I'm not a big fan of this idea. Right now the environment variables are "idiot proof" and the worst thing that may happen is that just some performance is lost. It is only a convenient way to select between multiple safe options (for example, MMX vs. SSE2 vs. no SIMD at all).

Changing this behaviour may backfire in the case if some developers try to play with libjpeg-turbo environment variables and then somehow forget to remove them before shipping their software to the users.

which would provide a workaround for QEMU users.

The QEMU users don't have any problem in the first place and don't need special workarounds (though doing better emulation of /proc/cpuinfo would be nice). The whole point is that testing with QEMU has some limitations:

  • If the software does not work correctly in QEMU, then there is likely a bug either in the software or in QEMU itself.
  • But if the software works correctly in QEMU, then this does not really mean anything. It can't be used as a substitute for testing on real hardware!

In other words, QEMU is good for doing quick preliminary "smoke testing" during development. The final testing is always done on real hardware.

@dcommander
Copy link
Member

dcommander commented Jul 7, 2016

As much as I respect everyone's opinion, it's impossible to make everyone happy. Those who develop decide, and here's what I decided:

  • The current JSIMD_FORCENEON environment variable is meaningless. Because it is idiot-proof, as you assert, setting it to 1 will never override the CPU feature detection. Thus, it made sense to change the behavior of that environment variable for ARM platforms, and to add a similar environment variable (JSIMD_FORCEDSPR2) on MIPS. In both cases, these environment variables bypass CPU feature detection and really force the use of NEON or DSPR2 instructions. Since those environment variables are and will remain undocumented, I see no harm in this. JSIMD_FORCENEON did nothing. Now it actually does something.
  • I modified the code in the temporary branch (https://github.com/libjpeg-turbo/libjpeg-turbo/tree/altivec_detection_fix) so that it will always enable AltiVec and #ifdef out the CPU feature detection code if the user specifies -maltivec as part of the CFLAGS.
  • I modified the JSIMD_FORCEALTIVEC environment variable so that it similarly force-enables AltiVec, regardless of CPU feature detection.
  • At this time, I do not see sufficient reason to adopt any auxv-based solution. Parsing /proc/cpuinfo works fine. If you believe otherwise, then provide concrete data supporting your claim.

If there are any concrete, non-philosophical objections to this (i.e. "it doesn't work in such-and-such case"), then please post them, but I'm done with philosophical and hypothetical arguments. This has already occupied way too much of my time.

@dcommander
Copy link
Member

The altivec_detection_fix branch has been merged into master and dev. I consider this matter resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants