-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jnr-ffi fails LoadLibraryTest pp64el on ubuntu 14. #33
Comments
an interesting thing we see in the above trace is:
We installed the openjdk off the utopic site with " sudo apt-get install openjdk-7-jdk". I would appear that maven is identifying the "arch" as ppc64 rather than ppc64el. That might, or might not be an issue for this function. the output of uname -a is:
|
I also noticed that "execstack" error, which previously I've only seen when the JVM loads the incorrect version of a library (in the previous case it was loading Linux-x86 on Linux-ARM). It's difficult for me to diagnose this without being able to run the build and tests myself. I would suggest the following:
You could be right; it may simply not be detecting the platform or endianness correctly. |
what was pasted above was the entire stdout/stderr output, was there another place i should be finding log files? |
Yes, look at the log output. There should be test results under target/surefire, I believe, which will show the full errors. |
found it and I see:
Now i have a ralphbel@tulgpu002:~/jnr-ffi$ ls ./src/main/java/jnr/ffi/provider/jffi/platform/ppc64le/linux/TypeAliases.java, so there is someting more basic for it tracking down where it determines what it is going to load |
If you need more assistance investigating, post the full failure error using https://gist.github.com. Otherwise keep me posted :-) |
the complete log is here: |
Ok, another flag to pass to that JVM: [EDITED] -verbose:jni It's definitely having problems loading the library, and seems to be picking up the big-endian version. You might have better luck poking at this locally by turning on or adding some logging or doing some step-debugging to see why it's finding the wrong library (if that is indeed the case). |
Edited my comment...the flag is -verbose:jni |
I instrumented this code:
spit out this:
So, where does this property get set? |
Where do i add that flag on the mvn command line? |
I think you can pass it through to maven and its sub-JVMs with JAVA_OPTS env var. I'm not sure about that line of code or why/whether it would be null. I don't suppose there's a possibility to get shell access to a ppc64le machine, is there? It would help us collaborate. |
that was a red-herring: i changed the instrumentation to:
and got the following telemetry:
I think i have to dig into the class.forName... function. |
Somewhere before this it will have selected the .so to become that tmp/jffi so. |
do you know off hand what code selected the .so file here and how it made the decision which one it needed to select? Getting company externals access to a ppc64el machine is something i'm trying to push through our management. That is going to take a bit of time. We have to first reconfigure the machines to be accessable from outside our firewall (something we are planning for this cluster, for other colaborations), but have not done yet. And then there is the process of getting you and other collaborators approved for access to these machines. We realize that since we are serious about the OpenPower initiative that we are going to have to be proactive about making the resources for the open source community to solve problems like this. For now. you have me to work through. |
ok, this is interesting: I traced the problem into NativeRuntime and instrumented the following code:
I also instrumented the code that called it.
Here is the telemetry:
It would appear that the attempt to access SingletonHolder.INSTANCE, throws an exception. also there is no trace of the stdout in from this code:
that appears to imply that this static internalize is not getting called:
I was under the impression that private static final stuff would be called prior to any other code being initialized. Of course, that could also mean that the System.out is not initialized at this point either. Is there a way to trace the code during the static init phases? |
Static final stuff will run when the class is first accessed. Any classes the static initializer(s) run will also run their static init, and so on. So if you're not getting past the |
From your description, the static function should be able do a print. We don't see the output for the print in this instrumentation:
which seems to imply something important, and maybe explains the exception when the access happens... |
seems to be dying in the AbstractRuntime constructor, investigating. |
I did this insturmentation:
And got
so that says this statment is throw the exception.
I'm not seeing the "ordnal" function in the jnr-ffi code... is this some basic Java function. I see that ADDRESS part of:
|
Now i'm confused: I traced the problem back into what fills out the types array in
I instrumented the code TypeDelegate:
And it generated the following telemetry:
This appears to imply that this funciton throws an exception.
Did i just chase this problem back into another library the implementation for: com.kenai.jffi.Type? If so, how do i go about debugging...? |
I'm not sure if this is significant or not, but I see this in the logs output:
And the system property os.arch is being picked up by the application and its supporting libraries at some point. I pulled the source code from maven and found the line of code that prints:
That comes from the maven code in CLIReporting.java here:
That pulls the information from:
If i do the same in the jnr-jffi code we see:
This would indicate that at least part of the stuff maven relies on miss-reports the os architecture. I have more of a concern about what is putting this system property information into the system.
As that is likely screwing something up as well. |
Yikes, This comes all the way from the OpenJdk implementation: I wrote a very simple java properties dumper.
ran it and it and we get:
I figure that the system can't figure which of the jffi-1.2.8-native.jar classes to load, although i have not caught it in the act of doing that yet. So now to figure out why this is not reporting the correct value and how to get it fixed... |
same issue happens with java8 as well as java8. |
Ok, so I guess what we need to do is add logic under the "ppc64" section to check the platform endianness, since OpenJDK appears to report "ppc64" regardless of endianness. I'll make that change and you can try it out. You mentioned some deeper problems in the other PR...will you open another issue for those problems? |
that would be one way to try to fix that now. However, that would only fix this modules attempt at finding the proper jar modules, and not others. For ppcel support on openJDK, the openJDK distrubition from ubuntu needs to be fixed. It should be reporting the same value as uname -p, which on the ubuntu ppc64le system reports:
The proper fix for this is to get the openJDK distro fixed. |
I have the power to file OpenJDK bugs...I will file one for this issue. Meanwhile, it's certainly not wrong for us to detect LE in the ppc64 section and do the right thing with it. |
no, it would not be "wrong" to try to correct this in this library, but this library will NOT be the only thing that blows up because the os.arch does not match the uname -p output. |
Oops, my bug was a duplicate. Here's a better one: https://bugs.openjdk.java.net/browse/JDK-8073139 |
NOTE: this may not be an openJDK issue but an issue with how that openJDK gets into the ubuntu distro... I took a look at the openJDK source files, and it would appear that the ARCh value is supposed to come from the uname -p command during the build of the binaries. |
If OpenJDK is not at fault then it could be that Ubuntu is cross-compiling the ppc64le binaries without compensating for uname, maybe? |
this bug: https://bugs.openjdk.java.net/browse/JDK-8073139 is interesting. It appears to imply that there was a decision not to give the architecture a unique name. Maybe my analysis that it came from uname -p in the openjdk makefiles is incorrect. One thing about this which is interesting is that the IBM jdk reports the os.arch the same way that uname -p does. so, can we find out from openJDK if the os.arch is working, as designed, or is there going to be a change to this..? Or was this some distribution error.? |
I took a look at the proposed code change, and I was incorrect, this is not a distro error, but an error in how the openJDK was done. It would be good to get some idea when this will be fixed, otherwise, a lot of java code is going to just break. |
@gnu-andrew has a webrev (patch) on that issue you could use to build your own OpenJDK with the correct arch...perhaps we could try that for now so we can progress to the next issue? |
we could work the problem on the ibm jdk i have installed on this system if you like? That would be a lot less work than figuring out how to build and patch the openJDK. Or do you think sticking with the openJDK is the only path we should persue here? |
Oh sure, using IBM JDK is probably a fine way to go too! FWIW, building OpenJDK is easy...once you clone it and run the "get_sources.sh" script (if I remember the name right), it's just ./configure and then make all. |
two potential things to tackle here. Should I work off of the code I forked, where I inserted the Missing Type aliases, or the master branch... should I make a pull request for the ppc64 changes, since only the ppc64le changes were made to the tree? |
Also, is there a better way for us to collaborate, than exchanging e-mail or bouncing messages through this issue? |
@ralphbellofatto Two suggestions:
|
Added ticket: #36 to deal with |
I think this is fixed now. |
I just downloaded the test on a ppc64le system running ubuntu 14.10 and the summary test resutts show:
|
We are attempt to get jnr-ffi to work on the openPower ppc64el architecture and. After adding the TypeAlias.java file and adding PPC64EL to the Platform.java file we were able to get a clean compile and the tests starting up with out complaining it can't find the TypeAlias file.
However, when we attempt a maven test, they ALL fail. Tracing this, we found that all tests fail during the LoadLibrary section, so we isolated just this test, as all tests do this as their first step.
when we execute we see:
Enabling -X on this we see:
The text was updated successfully, but these errors were encountered: