New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for cygwin setup on 64 bit. #302

Merged
merged 15 commits into from Mar 28, 2018

Conversation

Projects
None yet
3 participants
@Thrameos
Contributor

Thrameos commented Mar 27, 2018

Cygwin was having issues because it uses the win32 jni_md.h which was not compatible. I bypassed this in the case were native jni.h was found with my own typedefs. However, in the testing framework the native one was not found sending us to the fallback which of course did not take the same type of bypass. Thus I have hardcoded the cygwin to always take the fallback removing the jlong incompatibility.

@marscher

This comment has been minimized.

Collaborator

marscher commented Mar 27, 2018

I think it is better to include jni.h only where it is really needed. Is there a good reason to do it in jpype.h? This is a very central spot.

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 27, 2018

It was hidden one layer lower in jp_utility.h. Thus either way we were getting jni.h. All I was doing was moving it one level higher exposing the fact that we pulled that in everywhere. We do need the types declared in jni.h, but we don't need anything else except for two of the wrapper files. I am not sure how we would get only the types.

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 27, 2018

I can certainly move it back to the original place, but I moved it up when I had to grep to figure out how anything was getting jlong. Call it a style thing but I don't like really huge headers like that pulled in several layers deep. If we are pulling in something big and nasty like everywhere which we did indirectly (jpype.h -> jp_utility.h -> jni.h), we should do it at the top level (jpype.h -> jni.h).

@marscher

This comment has been minimized.

Collaborator

marscher commented Mar 27, 2018

ah I see, jp_utility.h is also included in jpype.h

@@ -10,43 +10,43 @@ environment:
matrix:
- PYTHON: "python3"
CYGWIN: "C:\\cygwin"
JAVA_HOME: "C:\\Program Files (x86)\\Java\\jdk1.8.0"
JAVA_HOME: "C:\\jdk8"

This comment has been minimized.

@marscher

marscher Mar 27, 2018

Collaborator

does chocolatey take care of the bitness of the host? Or are we now only testing against only one bitness, eg. 32bit?

This comment has been minimized.

@Thrameos

Thrameos Mar 27, 2018

Contributor

I believe it does. Looking at the logs cinst was passively installing jdk8 with ant.

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 27, 2018

Okay I am going to have to mess with the appveyor pattern a bit. For some reason we are not consistently getting JAVA_HOME set up thus we are often failing to build our test suite. That is what triggered the jlong issue in the first place. Fixing the jlong problem just pushed it farther down in the script where ant failed to build the required harness.

I need to add more checks into install.sh so that we fail much earlier if something vital is wrong or we will get lead on wild goose chases like this.

@marscher

This comment has been minimized.

Collaborator

marscher commented Mar 27, 2018

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 27, 2018

Yes, the base cause of the issue was that cinst put java into a different place and appveyor.yml had a hard coded path. This caused the build process to try to build without a JAVA_HOME set which triggered a whole bunch of code paths that I don't evaluate locally. Had it been clear what the original error was we would have fixed it in the test harness.

That said my original code was a hack that should not exist and I am sure others triggered the bug just like the test harness did. So fixing it to use the same code path every time is likely the better solution.

I would like to try to work the following into the test pattern:

  1. Don't do a separate appveyor for with and without numpy. Instead it should build both and test them. Appveyor jobs are really slow and the actually build and test is a small portion of the whole.
  2. Test multiple versions of java in the same pass. (same reason as previous)
  3. Bomb proof the process so that it bails quickly if something can't proceed so it doesn't get stuck doing useless work.

Unfortunately, appveyor gets stuck so often that trying to debug it remotely is a real pain. The appveyor job queue insists on fully testing even if a patch has been superseded. It is going to take me a very long time before I could get those patterns fully tested. I will likely have to push in a test patch with all tests disabled but one to speed up development.

@marscher

This comment has been minimized.

Collaborator

marscher commented Mar 27, 2018

This is why I asked @originell to give me Appveyor rights, which he did? in the meantime, so we can enable the rolling build option. This one cancels builds for superseded patches/commits.
I don't know about the implications of merging separated job configs into one build run. I mean the whole idea of having separated build boxes is to avoid messed up configurations. We should use a tool like tox or conda-build (which I'd prefer) to set up build matrices in one box. As this tool ensures (at least does a good job) we have the exact configuration we want.

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 27, 2018

I will defer to you on the testing matrix. I believe that I can have more than one jpype module built in the same run so something simple like trying with one switch or another could be in the same build. I have multiple builds (win-py2. win-py3, cygwin-py2, and cygwin-py3) in the same directory on my development machine.

I agree it is nicest if we don't have to manage any special switching in the build process but instead use a pure matrix. However, if we really want to test (win,cygwin) x (x86, 64) x (2.7,3.5,3.6) x (Jdk7, Jdk8, Jdk9) x (w/o numpy, w/ numpy), it is just going to be too big of a matrix for appveyor. If I force multiple tests together reducing this to (win,cygwin)x(x86,64)x(2.7,3.6) with evaluation of (Jdk7, Jdk8, Jdk9)x(w/o numpy, w/ numpy), then at least the matrix is reasonable. It does mean calling ant once for each version of java, and then calling nose for each of those builds.

If you can switch the pattern to use conda-build as an example it would certainly help. CI is not something I do much and tends to be more of a distraction for me.

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 28, 2018

Okay getting much closer. We are still getting a random failure which is not reproducible in the appveyor environment that is preventing the cygwin from finding jvm.dll. And we have successfully replicated the threading error on cygwin python2. The obvious question being was python2 built with thread support.

Thrameos added some commits Mar 28, 2018

@originell

This comment has been minimized.

Collaborator

originell commented Mar 28, 2018

@marscher you should've just gotten an email from appveyor, I hope :) added you with nearly all rights. Hope it helps. If not, I'll try to be super responsive today so we can move quick on this

Thrameos added some commits Mar 28, 2018

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 28, 2018

I have now tried virtually every combination that I can think of. I will have to try to find a way to replicate the problem. The mystery to me is I know that we have had this working in the past. Given that we never specified the correct 32bit jvm with the 32bit python, how could this have ever worked?

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 28, 2018

Okay if the 32 bit wasn't bad enough when I disabled that test the second test gave a false negative...

testPublicMethodPropertyOverlap (test.jpypetest.PropertiesTestCase) ... ok
Test multiple proxy calls with different threads at the same time. ... Fatal Python error: This thread state must be current when releasing
XXX lineno: 220, opcode: 0
XXX lineno: 220, opcode: 0
java.lang.RuntimeException: Python exception thrown: TypeError: attribute name must be string, not 'type'
	at jpype.JPypeInvocationHandler.hostInvoke(Native Method)
	at jpype.JPypeInvocationHandler.invoke(JPypeInvocationHandler.java:10)
	at com.sun.proxy.$Proxy1.testMethodObject(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: Python exception thrown: SystemError: unknown opcode
	at jpype.JPypeInvocationHandler.hostInvoke(Native Method)
	at jpype.JPypeInvocationHandler.invoke(JPypeInvocationHandler.java:10)
	at com.sun.proxy.$Proxy1.testMethodObject(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: Python exception thrown: SystemError: unknown opcode
	at jpype.JPypeInvocationHandler.hostInvoke(Native Method)
	at jpype.JPypeInvocationHandler.invoke(JPypeInvocationHandler.java:10)
	at com.sun.proxy.$Proxy1.testMethodObject(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
XXX lineno: 220, opcode: 0
XXX lineno: 220, opcode: 0
java.lang.RuntimeException: Python exception thrown: SystemError: unknown opcode
	at jpype.JPypeInvocationHandler.hostInvoke(Native Method)
	at jpype.JPypeInvocationHandler.invoke(JPypeInvocationHandler.java:10)
	at com.sun.proxy.$Proxy1.testMethodObject(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at jpype.proxy.ProxyExecutor$ProxyCaller.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[thread 1504 also had an error]
[thread 2432 also had an error]
[thread 1036 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000003fdae783a, pid=2560, tid=0x000000000000032c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 1.8.0_162-b12)
@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 28, 2018

Something is bothering me with the window anaconda 32 bit test.

Downloading JDK from http://download.oracle.com/otn-pub/java/jdk/8u162-b12/0da788060d494f5095bf8624735fa2f1/jdk-8u162-windows-x64.exe

Why is it using the 64 bit jvm?

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 28, 2018

@marscher I have disabled all of the failed tests so that we have a baseline configuration for the CI. I placed an issue in the tracker to mark the defective platforms for now. Assuming that the latest build passes we should squash the branch and merge it in. We can then rebase some of those outstanding pulls to see what if anything is broken. I am leaving the unsquashed here so you can browse the failures to see the failure modes.

@marscher

This comment has been minimized.

Collaborator

marscher commented Mar 28, 2018

Thanks for your effort here. Is Cygwin really such a common platform? I mean we have Anaconda for Windows, which provides tons of packages (not restricted to Python itself).
Did you disable Travis in this PR?

@Thrameos

This comment has been minimized.

Contributor

Thrameos commented Mar 28, 2018

I use cygwin mostly as a development platform because it is separate from my production line which is running on anaconda. The debugging tools are better for tracking down segfaults. I think I have seen several other users post issues regarding the cygwin so it is certainly getting used though likely just as a poor mans unix. That said if something is having problems on cygwin it is not the end of the world. Those that happen on 64 bit/python 3.6 will get cleaned up as those are the paths I test, but if another contributor wants to keep the rest going I am fine with it.

I don't see anything in my changes that should have disabled Travis. It was certainly not intended.

@marscher

This comment has been minimized.

Collaborator

marscher commented Mar 28, 2018

Ok. Dunno why travis is not being executed any more (eg to many builds, huh?). Maybe somebody is volunteering to setup cygwin. But it seems to be quite a hassle.

@marscher marscher merged commit 205d9ff into jpype-project:master Mar 28, 2018

1 check was pending

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details

@Thrameos Thrameos deleted the Thrameos:cygwin-setup branch Mar 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment