Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot import some Lucene classes using OpenJDK 11 #838

Closed
afalquina opened this issue Aug 7, 2020 · 15 comments · Fixed by #895
Closed

Cannot import some Lucene classes using OpenJDK 11 #838

afalquina opened this issue Aug 7, 2020 · 15 comments · Fixed by #895
Labels
bug Unable to deliver desired behavior (crash, fail, untested)

Comments

@afalquina
Copy link

I am using Lucene 8.6.0 on a project of mine. I am using Python 3.8.2 on Pop! OS (Ubuntu) and Python 3.8.5 on RHEL 7.8.

The following code fails on OpenJDK 11 and OpenJDK 14 but works just fine on OpenJDK 8:

$ export CLASSPATH=lib/lucene-core-8.6.0.jar
$ python
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> print(jpype.getDefaultJVMPath())
/usr/lib/jvm/java-14-openjdk-amd64/lib/server/libjvm.so
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> from org.apache.lucene.search import BooleanClause
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 652, in _load_unlocked
AttributeError: type object 'org.apache.lucene.search.BooleanClause' has no attribute 'loader'

Am i doing something wrong?

@Thrameos
Copy link
Contributor

Thrameos commented Aug 7, 2020

I see nothing obviously wrong with the line. The thing is that error trace shows it is not in JPype code by in Python bootloader. So my guess is that you have something interfering with the loading process (such as a directory or module named "org" in the Python path).

I would proceed by using JClass to perform the class load instead of the import. If it works then the issue is something in the Python loading system. I would start that debugging by just importing "org" and see if it has a "file" attribute so that I can see where it is coming from. Repeat the process for org.apache and so forth. You can can also add a few "print" statements to the jpype/imports.py to figure out the difference in the path that was taken up to that import statement.

@afalquina
Copy link
Author

I'll try that. What baffles me, though, is that it works with OpenJDK 8. Just changing to OpenJDK 14 triggers the error.

@afalquina
Copy link
Author

OK. I have tried the following. First with OpenJDK 14:

$ python
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> import org
>>> dir(org)
['apache', 'graalvm', 'ietf', 'jcp', 'w3c', 'xml']
>>> import org.apache
>>> dir(org.apache)
['lucene']
>>> import org.apache.lucene
>>> dir(org.apache.lucene)
['analysis', 'codecs', 'document', 'index', 'search', 'store', 'util']
>>> import org.apache.lucene.search
>>> dir(org.apache.lucene.search)
['TopFieldCollector']

And then with OpenJDK 8:

$ python                                
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> import org
>>> dir(org)
['apache', 'classpath', 'ietf', 'jcp', 'omg', 'w3c', 'xml']
>>> import org.apache
>>> dir(org.apache)
['lucene']
>>> import org.apache.lucene
>>> dir(org.apache.lucene)
['LucenePackage', 'analysis', 'codecs', 'document', 'geo', 'index', 'search', 'store', 'util']
>>> import org.apache.lucene.search
>>> dir(org.apache.lucene.search)
['AutomatonQuery', 'BlendedTermQuery', 'BlockMaxDISI', 'BooleanClause', 'BooleanQuery', 'BoostAttribute', 'BoostAttributeImpl', 'BoostQuery', 'BulkScorer', 'CachingCollector', 'CollectionStatistics', 'CollectionTerminatedException', 'Collector', 'CollectorManager', 'ConjunctionDISI', 'ConstantScoreQuery', 'ConstantScoreScorer', 'ConstantScoreWeight', 'ControlledRealTimeReopenThread', 'DisiPriorityQueue', 'DisiWrapper', 'DisjunctionDISIApproximation', 'DisjunctionMaxQuery', 'DocIdSet', 'DocIdSetIterator', 'DocValuesFieldExistsQuery', 'DocValuesRewriteMethod', 'DoubleValues', 'DoubleValuesSource', 'Explanation', 'FieldComparator', 'FieldComparatorSource', 'FieldDoc', 'FieldValueHitQueue', 'FilterCollector', 'FilterLeafCollector', 'FilterMatchesIterator', 'FilterScorable', 'FilterScorer', 'FilterWeight', 'FilteredDocIdSetIterator', 'FuzzyQuery', 'FuzzyTermsEnum', 'ImpactsDISI', 'IndexOrDocValuesQuery', 'IndexSearcher', 'LRUQueryCache', 'LeafCollector', 'LeafFieldComparator', 'LeafSimScorer', 'LiveFieldValues', 'LongValues', 'LongValuesSource', 'MatchAllDocsQuery', 'MatchNoDocsQuery', 'Matches', 'MatchesIterator', 'MatchesUtils', 'MaxNonCompetitiveBoostAttribute', 'MaxNonCompetitiveBoostAttributeImpl', 'MultiCollector', 'MultiCollectorManager', 'MultiPhraseQuery', 'MultiTermQuery', 'NGramPhraseQuery', 'NamedMatches', 'NormsFieldExistsQuery', 'PhraseQuery', 'PointInSetQuery', 'PointRangeQuery', 'PositiveScoresOnlyCollector', 'PrefixQuery', 'Query', 'QueryCache', 'QueryCachingPolicy', 'QueryRescorer', 'QueryVisitor', 'ReferenceManager', 'RegexpQuery', 'Rescorer', 'Scorable', 'ScoreCachingWrappingScorer', 'ScoreDoc', 'ScoreMode', 'Scorer', 'ScorerSupplier', 'ScoringRewrite', 'SearcherFactory', 'SearcherLifetimeManager', 'SearcherManager', 'SegmentCacheable', 'SimpleCollector', 'SimpleFieldComparator', 'Sort', 'SortField', 'SortRescorer', 'SortedNumericSelector', 'SortedNumericSortField', 'SortedSetSelector', 'SortedSetSortField', 'SynonymQuery', 'TermInSetQuery', 'TermQuery', 'TermRangeQuery', 'TermStatistics', 'TimeLimitingCollector', 'TopDocs', 'TopDocsCollector', 'TopFieldCollector', 'TopFieldDocs', 'TopScoreDocCollector', 'TopTermsRewrite', 'TotalHitCountCollector', 'TotalHits', 'TwoPhaseIterator', 'UsageTrackingQueryCachingPolicy', 'Weight', 'WildcardQuery', 'similarities', 'spans']

For some reason, the import finds less on the newer JVM.

What can I do to investigate this further?

@Thrameos
Copy link
Contributor

Thrameos commented Aug 7, 2020

That gives me a very good start. It the org.apache.lucene jar file publicly available (would I be able to replicate this myself)? The problem is likely in org.jpype.pkg.PackageManager which is responsibly for getting the list of packages. It was tested on open JDK from 8 to 11 and has had not issues, but if there was a change in Java or if something is going wrong in the code (exception or the like) then I could see the behavior you describe happen.

The next step for you would be to see if you can load using JClass instead. If you can't do that then the problem could be a Class initializer problem rather then the import system. So knowing which side of the equation to look on will help.

@Thrameos
Copy link
Contributor

Thrameos commented Aug 7, 2020

I have one other idea. JPype only declares something as viewable if it a public class and it uses the byte code to figure that out. If there is a change in the byte code the my routine does not handle I could see a fail. In the infinite wisdom of the original Jar format you have to part through 100 fields to get the public flag.

@afalquina
Copy link
Author

The jar is available here. The file contains several jars. You'll need lucene-core-8.6.0.jar.

Is this what you meant when you said “use JClass“?

$ python                                
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jpype
>>> print(jpype.getDefaultJVMPath())
/usr/lib/jvm/java-14-openjdk-amd64/lib/server/libjvm.so
>>> jpype.startJVM(jpype.getDefaultJVMPath())
>>> BooleanClause = jpype.JClass("org.apache.lucene.search.BooleanClause")
>>> BooleanClause
<java class 'org.apache.lucene.search.BooleanClause'>

I am using the same jar on both JVM 8 and JVM 11/14, so I guess that the byte code is always the same. Can the byte code API have changed between JVMs?

@Thrameos
Copy link
Contributor

Thrameos commented Aug 7, 2020

BooleanClause = jpype.JClass('org.apache.lucene.search.BooleanClause')

There are jars that can have different byte code by jvm if the developers want to have additional features in the jar for later versions. But it is pretty rare.

@afalquina
Copy link
Author

afalquina commented Aug 7, 2020 via email

@Thrameos
Copy link
Contributor

Still waiting on my development machine. I have not forgotten.

@afalquina
Copy link
Author

Thanks! Is there anything I can do on my side?

@Thrameos
Copy link
Contributor

I believe another post on Stackoverflow found this has something to do with the length of the import. So something must be chopping the import string. I will try to run this to ground when I can replicate it.

@Thrameos Thrameos added the bug Unable to deliver desired behavior (crash, fail, untested) label Nov 14, 2020
@afalquina
Copy link
Author

Thanks for the update. As always, is there anything I can do to help?

@Thrameos
Copy link
Contributor

I investigated this bug. It doesn't seem very satisfying. The jar file requested is a mult-version jar with both Java 8 and Java 9 layers.

Unfortunately, when I request the directory on Java 9 it is only giving me the contents of the Java 9 layer and not the Java 8 layer. As the class specified only exists in the Java 8 layer the requested class is missing. JPype then tried to throw an exception by calling Java forname. Only when it do so rather than getting an error instead Java is giving the class from the Java 8 layer. This is causing the import system to panic resulting in the incorrect error report.

The bug is not really in JPype as it is calling getResources just as it should to get a directory of the contents. It is the JVM implementation that is incorrectly giving me an empty content. This is similar to the issue with a obfuscated jar where the directories were missing entirely.

So how do we go about addressing this issue? At the time we find the class it is already too late as we were given a chance to produce a member before find_spec was called. Thus the only way to resolve it would be to try a forname when we do a get property and see if that resolves. Unfortunately that will only work if the package structure is only one level deep.

Oddly when I search for "/org/apache" it does the right thing and returns back two directories. So I need to investigate further.

   777 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWeight$TermMatch.class
  8993 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWeight.class
  2391 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery$SpanWithinWeight$1.class
  3261 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery$SpanWithinWeight.class
  3102 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/SpanWithinQuery.class
  1851 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/Spans.class
  4004 Tue Jul 07 12:46:30 PDT 2020 org/apache/lucene/search/spans/TermSpans.class
   136 Tue Jul 07 12:46:30 P
DT 2020 org/apache/lucene/search/spans/package-info.class
     0 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/
  1455 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/BooleanScorer$TailPriorityQueue.class
  3432 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/PointInSetQuery$SinglePointVisitor.class
  6931 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/PointRangeQuery$1.class
 14775 Tue Jul 07 12:46:32 PDT 2020 META-INF/versions/9/org/apache/lucene/search/TopFieldCollector.class

@Thrameos
Copy link
Contributor

Okay I believe I found a workaround that will fix this behavior on versions going forward. The bug was absolutely obnoxious as there is nothing that would indicate that MRJAR files would do something like this. I looked into this several times but reading the code and doc gave me no clues, but your example eventually lead me to unpack the jar file showing me that the directory entries are being misreported by Java.

Thanks again for the bug report and sorry it took so long to find a resolution.

@afalquina
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unable to deliver desired behavior (crash, fail, untested)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants