Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in lucene RecordId property #6459

Closed
4 tasks
publicocean0 opened this issue Jul 23, 2016 · 10 comments
Closed
4 tasks

bug in lucene RecordId property #6459

publicocean0 opened this issue Jul 23, 2016 · 10 comments
Assignees
Labels

Comments

@publicocean0
Copy link

OrientDB Version, operating system, or hardware.

  • v2.2.6 SNAPSHOT[ ]

Operating System

  • [x ] Linux
  • MacOSX
  • Windows
  • Other Unix
  • Other, name?

Expected behavior and actual behavior

select from C where ref =#33:360
12 rows 



test 1: 
select from C  WHERE [index, ref,startOffset,endOffset] LUCENE "ref:#33\\:360"
0 rows
select from C WHERE [index, ref,startOffset,endOffset] LUCENE "ref:'#33\\:360'"
0 rows
select from C WHERE [index, ref,startOffset,endOffset] LUCENE "ref:#33:360"
0 rows
@lvca lvca assigned wolf4ood and robfrank and unassigned wolf4ood Jul 23, 2016
@lvca lvca added this to the 2.2.x (next hotfix) milestone Jul 23, 2016
@publicocean0
Copy link
Author

publicocean0 commented Jul 24, 2016

Suggestions for new engine

I investigated in the source a bit for understanding for what reason these index bugs....

There a lot of problems:

  1. inside a transaction the building of lucene Document is made 3 times (performance degradation in multiple insertions ). I dont investigate in this part but it seams there is a bug. Index might be inserted just before to complete document insertion (after rid creation not before because is a atomic operation the insertion of document with indexes inclusions , so just 1 document building not 3)
  2. the select can create result completely wrong based on order of "[property1, property2]" in the query .... this part seams used just for OCompositeKey items order
  3. rid is not inserted because instead to create a OCompositeKey with a simple rid is built a OCompositeKey with multiple keys. This bug is related again to key mapping ( the adt to much simple for OCompositeKey).
  4. OCompositeKey and the key container model seams create a lot of problems because is a simply list and not a MultiValueMap (as for example https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/MultiValueMap.html). There a lot of code just for alignement, complex algorithm for generate CompositeKey and keeping the order(but sometimes this order can be lost).Also point 2,3 it is solved indirectly by multimap without no additional code.
    It could be create simplty with a map emproving also performance reducing a lot of code for handling all. The same lucene Document is a Multimap:it is the same abstract model used by the most famous index engine . In this way is very simple also to solve other bugs present in manual index (other bugs notified as manual index bug #6457), solving this bug and permit to handle future extension very easily.
  5. pay attention searcher is already thread safe so it is not necessary lock.

@wolf4ood
Copy link
Member

hi @publicocean0

which type is ref?

  1. actually it should be 2 times one for insert in TX and
    then at commit time.

  2. the select does not take count of the order [property1, property2] in query. That is the only way to make trigger the lucene index. See here on
    http://orientdb.com/docs/2.2/Full-Text-Index.html#working-with-multiple-fields
    to look for single field in a query

3 and 4 ) OCompositeKey is a concept of OrientDB, and works just fine for the native OrientDB indexes and i know less with external indexes like Lucene.

@publicocean0
Copy link
Author

publicocean0 commented Jul 24, 2016

Hi @maggiolo00

ref type is LINK.

  1. ok so 3 times globally.

  2. i deducted it by testing it in debug mode inside orientdb. If put [property2,property] in the query the document building is built wrong (i saw createField method called with value associated to wrong fieldname). I developed a little framework in the past with lucene for indexing persisted cached i didnt need not extra info for querying. I dont know what do you mean with "That is the only way to make trigger the lucene index". To search a single property was just a exmple for explaning the bug(it dont work also with multiple properties)

3,4) A ADT(abstract data type) for my opinion might be designed for working well with all the uses case else you will lose too much time for fixing (if possible) side effects in the bad case.

@wolf4ood
Copy link
Member

hi @publicocean0

why 3 times globally?

do you have a test case to reproduce [property2,property] issue?

What do you mean extra info for querying?

It means that in order to enable the indexed query from the query engine the notation [property,property1] is necessary.

@publicocean0
Copy link
Author

publicocean0 commented Jul 30, 2016

you can create a single test in project lucene with a query adding 2 properties in document (this is a simplified test obviously for multifield query based on your tests source )
City.name
City.country where country is a link to Country document

country and name are mapped in index schema

Country.name is another property.

you want search in lucene all city of country=#10:0 where 10:0 is a valid country.

  1. select from city where ['country','name'] lucene "country:Native support for inheritance [moved] #10:0 and name:London"
  2. select from city where [name,'country'] lucene "country:Native support for inheritance [moved] #10:0 and name:London"
    in my case is sufficient this query but ...
    in this example you can see also another actual limitation present already notified (possible enhancements for next releases  #5136 subsection INDEX ON SUBNODE ).

select from city where ['country','name'] lucene "country.name:'UK' and name:London"
where country.name is a mapped field mapped in the index schema is not possible.

@publicocean0
Copy link
Author

other error found: #6524. Lucene permits to search in collections.

@robfrank
Copy link
Contributor

robfrank commented Aug 4, 2016

if you have some snippets of code os SQL to share with us, it will be very appreciated.

@publicocean0
Copy link
Author

publicocean0 commented Aug 4, 2016

regarding last my post ... it is very simple. You can create a simplified example in lucene schema over 2 fields : field1 is a map|list , field2 is a list|map.
it is a bug related to the above ADT model "Suggestions for new engine"

@robfrank
Copy link
Contributor

robfrank commented Aug 4, 2016

Have you got some snippet with these examples to speed-up my work?
Regarding suggestions, the Lucene index implements internal OrientDB's interface and behaviour.
The implementation enable search to work inside a transaction, and to handle "delta" changes to documents. We can improve/fix implementation, but it is quite impossible to redesign the Lucene index implementation.

@publicocean0
Copy link
Author

my suggestion was not in redesign lucene index , but replace adt above it . lucene index implementation remains pratically identical

@lvca lvca added the bug label Sep 23, 2016
@wolf4ood wolf4ood assigned wolf4ood and unassigned robfrank Nov 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants