bug in lucene RecordId property #6459

publicocean0 · 2016-07-23T18:49:34Z

OrientDB Version, operating system, or hardware.

v2.2.6 SNAPSHOT[ ]

Operating System

[x ] Linux
MacOSX
Windows
Other Unix
Other, name?

Expected behavior and actual behavior

select from C where ref =#33:360
12 rows 



test 1: 
select from C  WHERE [index, ref,startOffset,endOffset] LUCENE "ref:#33\\:360"
0 rows
select from C WHERE [index, ref,startOffset,endOffset] LUCENE "ref:'#33\\:360'"
0 rows
select from C WHERE [index, ref,startOffset,endOffset] LUCENE "ref:#33:360"
0 rows

publicocean0 · 2016-07-24T10:45:20Z

Suggestions for new engine

I investigated in the source a bit for understanding for what reason these index bugs....

There a lot of problems:

inside a transaction the building of lucene Document is made 3 times (performance degradation in multiple insertions ). I dont investigate in this part but it seams there is a bug. Index might be inserted just before to complete document insertion (after rid creation not before because is a atomic operation the insertion of document with indexes inclusions , so just 1 document building not 3)
the select can create result completely wrong based on order of "[property1, property2]" in the query .... this part seams used just for OCompositeKey items order
rid is not inserted because instead to create a OCompositeKey with a simple rid is built a OCompositeKey with multiple keys. This bug is related again to key mapping ( the adt to much simple for OCompositeKey).
OCompositeKey and the key container model seams create a lot of problems because is a simply list and not a MultiValueMap (as for example https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/MultiValueMap.html). There a lot of code just for alignement, complex algorithm for generate CompositeKey and keeping the order(but sometimes this order can be lost).Also point 2,3 it is solved indirectly by multimap without no additional code.
It could be create simplty with a map emproving also performance reducing a lot of code for handling all. The same lucene Document is a Multimap:it is the same abstract model used by the most famous index engine . In this way is very simple also to solve other bugs present in manual index (other bugs notified as manual index bug #6457), solving this bug and permit to handle future extension very easily.
pay attention searcher is already thread safe so it is not necessary lock.

wolf4ood · 2016-07-24T11:55:35Z

hi @publicocean0

which type is ref?

actually it should be 2 times one for insert in TX and
then at commit time.
the select does not take count of the order [property1, property2] in query. That is the only way to make trigger the lucene index. See here on
http://orientdb.com/docs/2.2/Full-Text-Index.html#working-with-multiple-fields
to look for single field in a query

3 and 4 ) OCompositeKey is a concept of OrientDB, and works just fine for the native OrientDB indexes and i know less with external indexes like Lucene.

publicocean0 · 2016-07-24T12:36:41Z

Hi @maggiolo00

ref type is LINK.

ok so 3 times globally.
i deducted it by testing it in debug mode inside orientdb. If put [property2,property] in the query the document building is built wrong (i saw createField method called with value associated to wrong fieldname). I developed a little framework in the past with lucene for indexing persisted cached i didnt need not extra info for querying. I dont know what do you mean with "That is the only way to make trigger the lucene index". To search a single property was just a exmple for explaning the bug(it dont work also with multiple properties)

3,4) A ADT(abstract data type) for my opinion might be designed for working well with all the uses case else you will lose too much time for fixing (if possible) side effects in the bad case.

wolf4ood · 2016-07-24T13:56:47Z

hi @publicocean0

why 3 times globally?

do you have a test case to reproduce [property2,property] issue?

What do you mean extra info for querying?

It means that in order to enable the indexed query from the query engine the notation [property,property1] is necessary.

publicocean0 · 2016-07-30T20:04:35Z

you can create a single test in project lucene with a query adding 2 properties in document (this is a simplified test obviously for multifield query based on your tests source )
City.name
City.country where country is a link to Country document

country and name are mapped in index schema

Country.name is another property.

you want search in lucene all city of country=#10:0 where 10:0 is a valid country.

select from city where ['country','name'] lucene "country:Native support for inheritance [moved] #10:0 and name:London"
select from city where [name,'country'] lucene "country:Native support for inheritance [moved] #10:0 and name:London"
in my case is sufficient this query but ...
in this example you can see also another actual limitation present already notified (possible enhancements for next releases #5136 subsection INDEX ON SUBNODE ).

select from city where ['country','name'] lucene "country.name:'UK' and name:London"
where country.name is a mapped field mapped in the index schema is not possible.

publicocean0 · 2016-08-04T20:37:44Z

other error found: #6524. Lucene permits to search in collections.

robfrank · 2016-08-04T20:49:24Z

if you have some snippets of code os SQL to share with us, it will be very appreciated.

publicocean0 · 2016-08-04T21:22:20Z

regarding last my post ... it is very simple. You can create a simplified example in lucene schema over 2 fields : field1 is a map|list , field2 is a list|map.
it is a bug related to the above ADT model "Suggestions for new engine"

robfrank · 2016-08-04T21:39:15Z

Have you got some snippet with these examples to speed-up my work?
Regarding suggestions, the Lucene index implements internal OrientDB's interface and behaviour.
The implementation enable search to work inside a transaction, and to handle "delta" changes to documents. We can improve/fix implementation, but it is quite impossible to redesign the Lucene index implementation.

publicocean0 · 2016-08-06T09:02:37Z

my suggestion was not in redesign lucene index , but replace adt above it . lucene index implementation remains pratically identical

lvca assigned wolf4ood and robfrank and unassigned wolf4ood Jul 23, 2016

lvca added this to the 2.2.x (next hotfix) milestone Jul 23, 2016

lvca added the bug label Sep 23, 2016

wolf4ood assigned wolf4ood and unassigned robfrank Nov 21, 2016

andrii0lomakin closed this as completed Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in lucene RecordId property #6459

bug in lucene RecordId property #6459

publicocean0 commented Jul 23, 2016

publicocean0 commented Jul 24, 2016 •

edited

Loading

wolf4ood commented Jul 24, 2016

publicocean0 commented Jul 24, 2016 •

edited

Loading

wolf4ood commented Jul 24, 2016

publicocean0 commented Jul 30, 2016 •

edited

Loading

publicocean0 commented Aug 4, 2016

robfrank commented Aug 4, 2016

publicocean0 commented Aug 4, 2016 •

edited

Loading

robfrank commented Aug 4, 2016

publicocean0 commented Aug 6, 2016

bug in lucene RecordId property #6459

bug in lucene RecordId property #6459

Comments

publicocean0 commented Jul 23, 2016

OrientDB Version, operating system, or hardware.

Operating System

Expected behavior and actual behavior

publicocean0 commented Jul 24, 2016 • edited Loading

wolf4ood commented Jul 24, 2016

publicocean0 commented Jul 24, 2016 • edited Loading

wolf4ood commented Jul 24, 2016

publicocean0 commented Jul 30, 2016 • edited Loading

publicocean0 commented Aug 4, 2016

robfrank commented Aug 4, 2016

publicocean0 commented Aug 4, 2016 • edited Loading

robfrank commented Aug 4, 2016

publicocean0 commented Aug 6, 2016

publicocean0 commented Jul 24, 2016 •

edited

Loading

publicocean0 commented Jul 24, 2016 •

edited

Loading

publicocean0 commented Jul 30, 2016 •

edited

Loading

publicocean0 commented Aug 4, 2016 •

edited

Loading