Cannot import some Lucene classes using OpenJDK 11 #838

afalquina · 2020-08-07T17:35:49Z

I am using Lucene 8.6.0 on a project of mine. I am using Python 3.8.2 on Pop! OS (Ubuntu) and Python 3.8.5 on RHEL 7.8.

The following code fails on OpenJDK 11 and OpenJDK 14 but works just fine on OpenJDK 8:

$ export CLASSPATH=lib/lucene-core-8.6.0.jar
$ python
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> print(jpype.getDefaultJVMPath())
/usr/lib/jvm/java-14-openjdk-amd64/lib/server/libjvm.so
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> from org.apache.lucene.search import BooleanClause
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 652, in _load_unlocked
AttributeError: type object 'org.apache.lucene.search.BooleanClause' has no attribute 'loader'

Am i doing something wrong?

The text was updated successfully, but these errors were encountered:

Thrameos · 2020-08-07T17:52:47Z

I see nothing obviously wrong with the line. The thing is that error trace shows it is not in JPype code by in Python bootloader. So my guess is that you have something interfering with the loading process (such as a directory or module named "org" in the Python path).

I would proceed by using JClass to perform the class load instead of the import. If it works then the issue is something in the Python loading system. I would start that debugging by just importing "org" and see if it has a "file" attribute so that I can see where it is coming from. Repeat the process for org.apache and so forth. You can can also add a few "print" statements to the jpype/imports.py to figure out the difference in the path that was taken up to that import statement.

afalquina · 2020-08-07T18:26:29Z

I'll try that. What baffles me, though, is that it works with OpenJDK 8. Just changing to OpenJDK 14 triggers the error.

afalquina · 2020-08-07T18:39:15Z

OK. I have tried the following. First with OpenJDK 14:

$ python
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> import org
>>> dir(org)
['apache', 'graalvm', 'ietf', 'jcp', 'w3c', 'xml']
>>> import org.apache
>>> dir(org.apache)
['lucene']
>>> import org.apache.lucene
>>> dir(org.apache.lucene)
['analysis', 'codecs', 'document', 'index', 'search', 'store', 'util']
>>> import org.apache.lucene.search
>>> dir(org.apache.lucene.search)
['TopFieldCollector']

And then with OpenJDK 8:

$ python                                
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> import org
>>> dir(org)
['apache', 'classpath', 'ietf', 'jcp', 'omg', 'w3c', 'xml']
>>> import org.apache
>>> dir(org.apache)
['lucene']
>>> import org.apache.lucene
>>> dir(org.apache.lucene)
['LucenePackage', 'analysis', 'codecs', 'document', 'geo', 'index', 'search', 'store', 'util']
>>> import org.apache.lucene.search
>>> dir(org.apache.lucene.search)
['AutomatonQuery', 'BlendedTermQuery', 'BlockMaxDISI', 'BooleanClause', 'BooleanQuery', 'BoostAttribute', 'BoostAttributeImpl', 'BoostQuery', 'BulkScorer', 'CachingCollector', 'CollectionStatistics', 'CollectionTerminatedException', 'Collector', 'CollectorManager', 'ConjunctionDISI', 'ConstantScoreQuery', 'ConstantScoreScorer', 'ConstantScoreWeight', 'ControlledRealTimeReopenThread', 'DisiPriorityQueue', 'DisiWrapper', 'DisjunctionDISIApproximation', 'DisjunctionMaxQuery', 'DocIdSet', 'DocIdSetIterator', 'DocValuesFieldExistsQuery', 'DocValuesRewriteMethod', 'DoubleValues', 'DoubleValuesSource', 'Explanation', 'FieldComparator', 'FieldComparatorSource', 'FieldDoc', 'FieldValueHitQueue', 'FilterCollector', 'FilterLeafCollector', 'FilterMatchesIterator', 'FilterScorable', 'FilterScorer', 'FilterWeight', 'FilteredDocIdSetIterator', 'FuzzyQuery', 'FuzzyTermsEnum', 'ImpactsDISI', 'IndexOrDocValuesQuery', 'IndexSearcher', 'LRUQueryCache', 'LeafCollector', 'LeafFieldComparator', 'LeafSimScorer', 'LiveFieldValues', 'LongValues', 'LongValuesSource', 'MatchAllDocsQuery', 'MatchNoDocsQuery', 'Matches', 'MatchesIterator', 'MatchesUtils', 'MaxNonCompetitiveBoostAttribute', 'MaxNonCompetitiveBoostAttributeImpl', 'MultiCollector', 'MultiCollectorManager', 'MultiPhraseQuery', 'MultiTermQuery', 'NGramPhraseQuery', 'NamedMatches', 'NormsFieldExistsQuery', 'PhraseQuery', 'PointInSetQuery', 'PointRangeQuery', 'PositiveScoresOnlyCollector', 'PrefixQuery', 'Query', 'QueryCache', 'QueryCachingPolicy', 'QueryRescorer', 'QueryVisitor', 'ReferenceManager', 'RegexpQuery', 'Rescorer', 'Scorable', 'ScoreCachingWrappingScorer', 'ScoreDoc', 'ScoreMode', 'Scorer', 'ScorerSupplier', 'ScoringRewrite', 'SearcherFactory', 'SearcherLifetimeManager', 'SearcherManager', 'SegmentCacheable', 'SimpleCollector', 'SimpleFieldComparator', 'Sort', 'SortField', 'SortRescorer', 'SortedNumericSelector', 'SortedNumericSortField', 'SortedSetSelector', 'SortedSetSortField', 'SynonymQuery', 'TermInSetQuery', 'TermQuery', 'TermRangeQuery', 'TermStatistics', 'TimeLimitingCollector', 'TopDocs', 'TopDocsCollector', 'TopFieldCollector', 'TopFieldDocs', 'TopScoreDocCollector', 'TopTermsRewrite', 'TotalHitCountCollector', 'TotalHits', 'TwoPhaseIterator', 'UsageTrackingQueryCachingPolicy', 'Weight', 'WildcardQuery', 'similarities', 'spans']

For some reason, the import finds less on the newer JVM.

What can I do to investigate this further?

Thrameos · 2020-08-07T19:42:50Z

That gives me a very good start. It the org.apache.lucene jar file publicly available (would I be able to replicate this myself)? The problem is likely in org.jpype.pkg.PackageManager which is responsibly for getting the list of packages. It was tested on open JDK from 8 to 11 and has had not issues, but if there was a change in Java or if something is going wrong in the code (exception or the like) then I could see the behavior you describe happen.

The next step for you would be to see if you can load using JClass instead. If you can't do that then the problem could be a Class initializer problem rather then the import system. So knowing which side of the equation to look on will help.

Thrameos · 2020-08-07T19:44:51Z

I have one other idea. JPype only declares something as viewable if it a public class and it uses the byte code to figure that out. If there is a change in the byte code the my routine does not handle I could see a fail. In the infinite wisdom of the original Jar format you have to part through 100 fields to get the public flag.

afalquina · 2020-08-07T20:16:54Z

The jar is available here. The file contains several jars. You'll need lucene-core-8.6.0.jar.

Is this what you meant when you said “use JClass“?

$ python                                
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> print(jpype.getDefaultJVMPath())
/usr/lib/jvm/java-14-openjdk-amd64/lib/server/libjvm.so
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> BooleanClause = jpype.JClass("org.apache.lucene.search.BooleanClause")
>>> BooleanClause
<java class 'org.apache.lucene.search.BooleanClause'>

I am using the same jar on both JVM 8 and JVM 11/14, so I guess that the byte code is always the same. Can the byte code API have changed between JVMs?

Thrameos · 2020-08-07T20:19:35Z

BooleanClause = jpype.JClass('org.apache.lucene.search.BooleanClause')

There are jars that can have different byte code by jvm if the developers want to have additional features in the jar for later versions. But it is pretty rare.

afalquina · 2020-08-07T20:38:12Z

Well, the JClass code works on both JVM 8 and JVM 14. At least it does not throw any exceptions… Enviado desde mi iPhone El 7 ago 2020, a las 22:19, Karl Nelson <notifications@github.com> escribió: BooleanClause = jpype.JClass('org.apache.lucene.search. BooleanClause') There are jars that can have different byte code by jvm if the developers want to have additional features in the jar for later versions. But it is pretty rare. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#838 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQRGBII4EHTMN455WOGG4QLR7ROWJANCNFSM4PX3F5LA>.

Thrameos · 2020-08-19T17:51:37Z

Still waiting on my development machine. I have not forgotten.

afalquina · 2020-08-20T07:43:20Z

Thanks! Is there anything I can do on my side?

Thrameos · 2020-11-14T05:03:43Z

I believe another post on Stackoverflow found this has something to do with the length of the import. So something must be chopping the import string. I will try to run this to ground when I can replicate it.

afalquina · 2020-11-14T08:49:20Z

Thanks for the update. As always, is there anything I can do to help?

Thrameos · 2020-11-21T05:24:15Z

I investigated this bug. It doesn't seem very satisfying. The jar file requested is a mult-version jar with both Java 8 and Java 9 layers.

Unfortunately, when I request the directory on Java 9 it is only giving me the contents of the Java 9 layer and not the Java 8 layer. As the class specified only exists in the Java 8 layer the requested class is missing. JPype then tried to throw an exception by calling Java forname. Only when it do so rather than getting an error instead Java is giving the class from the Java 8 layer. This is causing the import system to panic resulting in the incorrect error report.

The bug is not really in JPype as it is calling getResources just as it should to get a directory of the contents. It is the JVM implementation that is incorrectly giving me an empty content. This is similar to the issue with a obfuscated jar where the directories were missing entirely.

So how do we go about addressing this issue? At the time we find the class it is already too late as we were given a chance to produce a member before find_spec was called. Thus the only way to resolve it would be to try a forname when we do a get property and see if that resolves. Unfortunately that will only work if the package structure is only one level deep.

Oddly when I search for "/org/apache" it does the right thing and returns back two directories. So I need to investigate further.

   777 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWeight$TermMatch.class
  8993 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWeight.class
  2391 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery$SpanWithinWeight$1.class
  3261 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery$SpanWithinWeight.class
  3102 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery.class
  1851 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/Spans.class
  4004 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/TermSpans.class
   136 Tue Jul 07 12:46:30 P
DT 2020 org/apache/lucene/search/spans/package-info.class
     0 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/
  1455 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/BooleanScorer$TailPriorityQueue.class
  3432 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/PointInSetQuery$SinglePointVisitor.class
  6931 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/PointRangeQuery$1.class
 14775 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/TopFieldCollector.class

Thrameos · 2020-11-21T06:52:41Z

Okay I believe I found a workaround that will fix this behavior on versions going forward. The bug was absolutely obnoxious as there is nothing that would indicate that MRJAR files would do something like this. I looked into this several times but reading the code and doc gave me no clues, but your example eventually lead me to unpack the jar file showing me that the directory entries are being misreported by Java.

Thanks again for the bug report and sorry it took so long to find a resolution.

afalquina · 2020-11-21T13:23:08Z

Thank you!

Thrameos added the bug Unable to deliver desired behavior (crash, fail, untested) label Nov 14, 2020

Thrameos mentioned this issue Nov 21, 2020

Fix for MRJAR bug. #895

Merged

Thrameos closed this as completed in #895 Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot import some Lucene classes using OpenJDK 11 #838

Cannot import some Lucene classes using OpenJDK 11 #838

afalquina commented Aug 7, 2020

Thrameos commented Aug 7, 2020

afalquina commented Aug 7, 2020

afalquina commented Aug 7, 2020

Thrameos commented Aug 7, 2020

Thrameos commented Aug 7, 2020

afalquina commented Aug 7, 2020

Thrameos commented Aug 7, 2020 •

edited

afalquina commented Aug 7, 2020 via email

Thrameos commented Aug 19, 2020

afalquina commented Aug 20, 2020

Thrameos commented Nov 14, 2020

afalquina commented Nov 14, 2020

Thrameos commented Nov 21, 2020

Thrameos commented Nov 21, 2020

afalquina commented Nov 21, 2020

Cannot import some Lucene classes using OpenJDK 11 #838

Cannot import some Lucene classes using OpenJDK 11 #838

Comments

afalquina commented Aug 7, 2020

Thrameos commented Aug 7, 2020

afalquina commented Aug 7, 2020

afalquina commented Aug 7, 2020

Thrameos commented Aug 7, 2020

Thrameos commented Aug 7, 2020

afalquina commented Aug 7, 2020

Thrameos commented Aug 7, 2020 • edited

afalquina commented Aug 7, 2020 via email

Thrameos commented Aug 19, 2020

afalquina commented Aug 20, 2020

Thrameos commented Nov 14, 2020

afalquina commented Nov 14, 2020

Thrameos commented Nov 21, 2020

Thrameos commented Nov 21, 2020

afalquina commented Nov 21, 2020

Thrameos commented Aug 7, 2020 •

edited