Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select is using indexes other than lucene even though "lucene" keyword is used #7193

Closed
scrabb opened this issue Feb 23, 2017 · 21 comments
Closed
Assignees
Labels

Comments

@scrabb
Copy link

scrabb commented Feb 23, 2017

OrientDB Version: 2.2.17

OS: windows

Expected behavior

I have 2 indexes for the same attribute in a vertex, one is a non unique hash index and the other is a lucene index.
Using a select statement with explain like
EXPLAIN SELECT FROM Person where Name lucene "Bob"
I would expect it to just show that the lucene index was used and not the hash index.

Actual behavior

The explain does not mention the lucene index and only mentions the hash index.
This then has a large impact on the performance of the query.

@luigidellaquila
Copy link
Member

Hi @scrabb

I did a quick debug on this, it seems to be only a problem with the EXPLAIN that reports wrong info. In fact the hash index is never used with LUCENE operator.
I'm checking how to fix it

Thanks

Luigi

@luigidellaquila
Copy link
Member

Hi @scrabb

I just pushed a fix for this on branch 2.2.x

Closing, thanks!

Luigi

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

Hello,

I've upgraded to 2.2.18 but i'm still seeing issues with having 2 different types of index on the same attribute.
If i use
SELECT FROM Person where Name lucene "Bob"
the query is extremely slow (with around 2 million Person vertices) and often returns no results due to timeout.
If i use
SELECT FROM index:Person.Name where key lucene "Bob"
the query is really fast (less than 1 second).

Also, the explain says that it does not use the lucene index in the first query.

Any help is much appreciated.

Steve

@luigidellaquila
Copy link
Member

Hi @scrabb

Do you have a dataset to reproduce the issue? I think I'm missing something about your schema and index definitions...

Thanks

Luigi

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

I'm unable to share the dataset due to confidentiality but i'll try and give more details on my schema in an abstract fashion.

I have a Vertex with a property called Name. I have a custom index engine based on the HashIndex called ASID.
I then create 2 indexes for the property in Java, one using our custom index engine and the other using lucene.
In studio indexes list the 2 indexes appear as
Person.Asid, NOTUNIQUE,Perosn,[Name],ASID
Person.Name,FULLTEXT,Person,[Name],LUCENE

A query like
select from Person where Name='bob'
is fast and explain does state that it uses the Person.Asid index

A query like
select from Person where Name lucene 'bob'
is very slow and explain states that it uses the Person.Asid index

A query like
select from index:Person.Name where key lucene 'bob'
is very fast

Also note that we do have a hierarchy of classes so Person does have subclasses and each of those has a further 2 indexes of ASID and LUCENE.

@luigidellaquila
Copy link
Member

luigidellaquila commented Apr 7, 2017

Hi @scrabb

Could you please post the result of the following queries:

select from (select expand(classes) from metadata:schema) where name = 'Person'

select from (select expand(indexes) from metadata:indexmanager) where name like 'Person%'

explain EXPLAIN SELECT FROM Person where Name lucene "Bob"

explain EXPLAIN SELECT FROM Person where Name = "Bob"

Thanks

Luigi

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

I'll send the results of those 4 commands in 4 separate comments

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

select from (select expand(classes) from metadata:schema) where name = 'Person'

{
"result": [
{
"@type": "d",
"@Version": 0,
"name": "Person",
"shortName": null,
"description": null,
"defaultClusterId": 225,
"clusterIds": [
225,
226,
227,
228,
229,
230,
231,
232
],
"clusterSelection": "round-robin",
"overSize": 0,
"strictMode": false,
"abstract": false,
"properties": [
{
"@type": "d",
"@Version": 0,
"name": "Name",
"type": 12,
"globalId": 35,
"mandatory": false,
"readonly": false,
"notNull": false,
"defaultValue": null,
"min": null,
"max": null,
"linkedType": 7,
"customFields": null,
"collate": "default",
"description": null
},
{
"@type": "d",
"@Version": 0,
"name": "Example",
"type": 10,
"globalId": 38,
"mandatory": false,
"readonly": false,
"notNull": false,
"defaultValue": null,
"min": null,
"max": null,
"linkedType": 7,
"customFields": null,
"collate": "default",
"description": null
},
{
"@type": "d",
"@Version": 0,
"name": "Alias",
"type": 10,
"globalId": 37,
"mandatory": false,
"readonly": false,
"notNull": false,
"defaultValue": null,
"min": null,
"max": null,
"linkedType": 7,
"customFields": null,
"collate": "default",
"description": null
}
],
"superClass": "V",
"superClasses": [
"V"
],
"customFields": null,
"@fieldTypes": "overSize=f,properties=e"
}
],
"notification": "Query executed in 0.042 sec. Returned 1 record(s)"
}

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

select expand(indexes) from metadata:indexmanager where name like 'Person%'

did not return any results, i tried with just

select expand(indexes) from metadata:indexmanager

and got just 10 results with a mixture of indexes from the subclasses of Person and indexes from other vertices

@luigidellaquila
Copy link
Member

Sorry @scrabb

I updated the second query, please check my previous comment for the right statement

Thanks

Luigi

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

explain EXPLAIN SELECT FROM Person where Name lucene "Bob"

timed out on me (our server has a timeout of 1 minute for all queries). I changed it to

explain EXPLAIN SELECT FROM Person where Name lucene "Bob" timeout 5000 return

and got

{
"result": [
{
"@type": "d",
"@Version": 0,
"limit": -1,
"fetchingFromTargetElapsed": 5093,
"involvedIndexes": [
"PersonC1.Asid1"
],
"elapsed": 5110.417,
"resultType": "document",
"resultSize": 1,
"@fieldTypes": "fetchingFromTargetElapsed=l,involvedIndexes=e,elapsed=f"
}
],
"notification": "Query executed in 5.178 sec. Returned 1 record(s)"
}

PersonC1 is one of the subclasses of Person.

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

explain EXPLAIN SELECT FROM Person where Name = "Bob"

{
"result": [
{
"@type": "d",
"@Version": 0,
"documentReads": 2,
"fullySortedByIndex": false,
"compositeIndexUsed": 1,
"current": "#3461:5308",
"documentAnalyzedCompatibleClass": 2,
"recordReads": 2,
"involvedIndexes": [
"Person.Asid1"
],
"limit": -1,
"fetchingFromTargetElapsed": 1,
"indexIsUsedInOrderBy": false,
"evaluated": 2,
"elapsed": 29.085169,
"resultType": "document",
"resultSize": 1,
"@fieldTypes": "documentReads=l,compositeIndexUsed=l,current=x,documentAnalyzedCompatibleClass=l,recordReads=l,involvedIndexes=e,fetchingFromTargetElapsed=l,evaluated=l,elapsed=f"
}
],
"notification": "Query executed in 0.035 sec. Returned 1 record(s)"
}

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

select from (select expand(indexes) from metadata:indexmanager) where name like 'Person%'

this returned 10 results all from the indexes on the subclasses of Person so i did a minor change to

select from (select expand(indexes) from metadata:indexmanager) where name like 'Person.%'

Note: I removed the cluster information from this result as the list was really long as there are about 100 subclasses of Person

{
    "result": [
        {
            "@type": "d",
            "@version": 0,
            "type": "NOTUNIQUE",
            "name": "Person.Asid2",
            "indexVersion": 2,
            "indexDefinition": {
                "@type": "d",
                "@version": 0,
                "className": "Person",
                "field": "Alias",
                "keyType": "STRING",
                "collate": "default",
                "nullValuesIgnored": false
            },
            "indexDefinitionClass": "com.orientechnologies.orient.core.index.OPropertyListIndexDefinition",
            "clusters": [

            ],
            "algorithm": "ASID",
            "valueContainerAlgorithm": "SBTREEBONSAISET",
            "@fieldTypes": "clusters=e"
        },
        {
            "@type": "d",
            "@version": 0,
            "type": "FULLTEXT",
            "name": "Person.Example",
            "indexVersion": -1,
            "indexDefinition": {
                "@type": "d",
                "@version": 0,
                "className": "Person",
                "field": "Example",
                "keyType": "STRING",
                "collate": "default",
                "nullValuesIgnored": false
            },
            "indexDefinitionClass": "com.orientechnologies.orient.core.index.OPropertyListIndexDefinition",
            "clusters": [

            ],
            "algorithm": "LUCENE",
            "valueContainerAlgorithm": "SBTREEBONSAISET",
            "metadata": {
                "@type": "d",
                "@version": 0,
                "query_analyzer": "org.apache.lucene.analysis.standard.StandardAnalyzer",
                "index_analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
            },
            "@fieldTypes": "clusters=e"
        },
        {
            "@type": "d",
            "@version": 0,
            "type": "FULLTEXT",
            "name": "Person.Name",
            "indexVersion": -1,
            "indexDefinition": {
                "@type": "d",
                "@version": 0,
                "className": "Person",
                "field": "Name",
                "keyType": "STRING",
                "collate": "default",
                "nullValuesIgnored": false,
                "mapIndexBy": "VALUE"
            },
            "indexDefinitionClass": "com.orientechnologies.orient.core.index.OPropertyMapIndexDefinition",
            "clusters": [

            ],
            "algorithm": "LUCENE",
            "valueContainerAlgorithm": "SBTREEBONSAISET",
            "metadata": {
                "@type": "d",
                "@version": 0,
                "query_analyzer": "org.apache.lucene.analysis.standard.StandardAnalyzer",
                "index_analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
            },
            "@fieldTypes": "clusters=e"
        },
        {
            "@type": "d",
            "@version": 0,
            "type": "FULLTEXT",
            "name": "Person.Alias",
            "indexVersion": -1,
            "indexDefinition": {
                "@type": "d",
                "@version": 0,
                "className": "Person",
                "field": "Alias",
                "keyType": "STRING",
                "collate": "default",
                "nullValuesIgnored": false
            },
            "indexDefinitionClass": "com.orientechnologies.orient.core.index.OPropertyListIndexDefinition",
            "clusters": [

            ],
            "algorithm": "LUCENE",
            "valueContainerAlgorithm": "SBTREEBONSAISET",
            "metadata": {
                "@type": "d",
                "@version": 0,
                "query_analyzer": "org.apache.lucene.analysis.standard.StandardAnalyzer",
                "index_analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
            },
            "@fieldTypes": "clusters=e"
        },
        {
            "@type": "d",
            "@version": 0,
            "type": "NOTUNIQUE",
            "name": "Person.Asid1",
            "indexVersion": 2,
            "indexDefinition": {
                "@type": "d",
                "@version": 0,
                "className": "Person",
                "field": "Name",
                "keyType": "STRING",
                "collate": "default",
                "nullValuesIgnored": false,
                "mapIndexBy": "VALUE"
            },
            "indexDefinitionClass": "com.orientechnologies.orient.core.index.OPropertyMapIndexDefinition",
            "clusters": [

            ],
            "algorithm": "ASID",
            "valueContainerAlgorithm": "SBTREEBONSAISET",
            "@fieldTypes": "clusters=e"
        }
    ],
    "notification": "Query executed in 0.024 sec. Returned 5 record(s)"
}

@luigidellaquila
Copy link
Member

Hi @scrabb

What is ASID algorithm? Did you define your own indexing algorithm?

Thanks

Luigi

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

ASID is a custom index engine based on the hash index, i basically have a custom GET and PUT as documented here

@scrabb
Copy link
Author

scrabb commented Apr 7, 2017

@luigidellaquila
Copy link
Member

Ok, it explains a lot of things, I'll check the executor and see if there is a problem with custom indexes mixed with Lucene

Thanks

Luigi

@clepelli
Copy link

clepelli commented May 31, 2017

I also see the same behaviour, I think the most simple scenario to reproduce is having a class with both a unique index (default SBTREE) and a lucene fulltext index on the same parameter (a string of course).
There is an exception thrown at each query :
java.lang.ClassCastException: com.orientechnologies.lucene.collections.OFullTextCompositeKey cannot be cast to java.lang.String at com.orientechnologies.common.serialization.types.OStringSerializer.preprocess(OStringSerializer.java:33) at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.get(OSBTree.java:198) at com.orientechnologies.orient.core.index.engine.OSBTreeIndexEngine.get(OSBTreeIndexEngine.java:128) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doGetIndexValue(OAbstractPaginatedStorage.java:1764) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getIndexValue(OAbstractPaginatedStorage.java:1753) at com.orientechnologies.orient.core.index.OIndexOneValue.get(OIndexOneValue.java:58) at com.orientechnologies.orient.core.index.OIndexOneValue.get(OIndexOneValue.java:40) at com.orientechnologies.orient.core.index.OIndexAbstractDelegate.get(OIndexAbstractDelegate.java:58) at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:262) at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:40) at com.orientechnologies.lucene.operator.OLuceneTextOperator.executeIndexQuery(OLuceneTextOperator.java:82) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.searchForIndexes(OCommandExecutorSQLSelect.java:2173) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.searchInClasses(OCommandExecutorSQLSelect.java:997) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLResultsetAbstract.assignTarget(OCommandExecutorSQLResultsetAbstract.java:209) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.assignTarget(OCommandExecutorSQLSelect.java:526) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:508) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:484) at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:74) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:2578) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:2524) at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:69) at com.tinkerpop.blueprints.impls.orient.OrientGraphCommand.execute(OrientGraphCommand.java:49)

Then the query is slowly executed...
If I delete the unique index and keep only the lucene one, the same query is fast. (But I need the unique index !). It is 100% reproducible.

For context :

  • the OClass extends V and the result of the search goes through gremlin 2.6 pipes after that.
  • I have detected this on v2.2.15 and it is still there on v2.2.20
  • I use orientdb server embedded

@apapacy
Copy link

apapacy commented Jun 25, 2017

Im use orientdb-community-2.2.22 on ubuntu 16 and create 2 indexes on same column: (1)full_text_lucene and (2)dictionary or nounuque and have same problem.

Im run 2 select:
select * from class where field='value'
select * from class where field lucene 'val~0.5'

Both select try to use same index and one of its work geopardic slow (thet select do not use index)

What index is used depends on the order of their creation.

But select from index:class.field key=/lucene val...' ... work fine.

It is desirable to use by default full_text index on full_text, full_text_lucene on lucene search(select where) and dictionary index on other search (select where).

@clepelli
Copy link

I confirm apapacy's workaround works.
By the way, what is the most efficient way to query table content from the index query, wich returns rids ?
Select from Class where rid in (select rid from index:...) ?

@luigidellaquila
Copy link
Member

Hi @clepelli

The most efficient way to get records is:

SELECT expand(rid) from index:...

Thanks

Luigi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants