Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch Highlighting problem (0.90.1) #3200

Closed
kuttiKumarv opened this Issue Jun 18, 2013 · 6 comments

Comments

Projects
None yet
3 participants
@kuttiKumarv
Copy link

kuttiKumarv commented Jun 18, 2013

I am working on upgradion of elasticsearch from 0.20.2 to 0.90.1 and come across the following issue.
Elasticsearch Highlighting was working fine(getting results as we expected) in 0.20.2, but the same doesn’t work in ES 0.90.1 "type": "pattern",

Steps to reproduce the issue

Environment:
JDK 1.7,Windows 7, elasticsearch 0.90.1, used elasticsearch head plugin to create/query documents

Step 1:-
Defined mappings and settings for index(test_hightlight)/type(hightlight).

http://localhost:9200/test_hightlight [POST]

{
"settings": {
"index": {
"number_of_shards": 6,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"CommaAnalyzer": {
"type": "pattern",
"flags": "DOTALL",
"lowercase": "true",
"pattern": ",",
"stopwords": "none"
}
}
}
}
},
"mappings": {
"hightlight": {
"properties": {
"documentName": {
"analyzer": "CommaAnalyzer",
"type": "string"
},
"description": {
"analyzer": "CommaAnalyzer",
"type": "string"
}
}
}
}
}

Step2:
Indexed following documents to newly created Index.

http://localhost:9200/test_hightlight/hightlight/1001  [POST]
{
    "documentName":"business Contract JSON business vendor and rep credentialing program ensures that Kutti Kumar and reps you are doing business with meet your requirements and are sound business partners With the business Small Business Package you can be where the buyers business are Not only do you get Business access to more than Business 1800 business hospitals you can BUSINESS promote your business in the only credentialed supplier sourcing tool used by Business healthcare organizations across business the country",
    "description":"Manage Kutti Kumar access and influence permissions Monitor Kutti Kumar sanction and financial details"

}

http://localhost:9200/test_hightlight/hightlight/1002  [POST]
{
    "documentName":"business JSON Communicating and managing those standards across all of your vendors can be a coordination nightmare Let business manage it for you Kutti Kumar",
    "description":"notifications and management enable you to have better insight to your business Kutti Kumar"

}

http://localhost:9200/test_hightlight/hightlight/1003  [POST]
{
  "documentName": "Kutti Kumar business JSON Communicating and managing those standards across all of your vendors can be a coordination nightmare Let business manage it for Kutti Kumar",
  "description": "Kutti Kumar notifications and management enable Kutti Kumar to have better insight to your business"
}

Step3:
Executed the following query

URL:http://localhost:9200/test_hightlight/
Query:

{
"timeout": 60000,
"query": {
"bool": {
"must": {
"query_string": {
"query": "business",
"default_operator": "and"
}
}
}
},
"explain": false,
"highlight": {
"pre_tags": [
""
],
"post_tags": [
"
"
],
"fields": {
"documentName": {
"fragment_size": 20,
"number_of_fragments": 5,
"fragment_offset": 0
}
}
}
}

Got the following error

{

took: 20
timed_out: false
_shards: {
    total: 6
    successful: 3
    failed: 3
    failures: [
        {
            index: test_hightlight
            shard: 3
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][3]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
        {
            index: test_hightlight
            shard: 4
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][4]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
        {
            index: test_hightlight
            shard: 5
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][5]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
    ]
}
hits: {
    total: 3
    max_score: 0.12557761
    hits: [ ]
}

}

Note: The same query is working as we expected with Elasticsearch 0.20.2

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Jun 18, 2013

can you provide a full gist that reproduces this problem?

@kuttiKumarv

This comment has been minimized.

Copy link
Author

kuttiKumarv commented Jun 19, 2013

Elasticsearch Highlighting problem (0.90.1)

I am working on upgradion of elasticsearch from 0.20.2 to 0.90.1 and come across the following issue.
Elasticsearch Highlighting was working fine(getting results as we expected) in 0.20.2, but the same doesn’t work in ES 0.90.1 "type": "pattern",

Steps to reproduce the issue

Environment:
JDK 1.7,Windows 7, elasticsearch 0.90.1, used elasticsearch head plugin to create/query documents

Step 1:-
Defined mappings and settings for index(test_hightlight)/type(hightlight).

http://localhost:9200/test_hightlight [POST]

{
"settings": {
"index": {
"number_of_shards": 6,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"CommaAnalyzer": {
"type": "pattern",
"flags": "DOTALL",
"lowercase": "true",
"pattern": ",",
"stopwords": "none"
}
}
}
}
},
"mappings": {
"hightlight": {
"properties": {
"documentName": {
"analyzer": "CommaAnalyzer",
"type": "string"
},
"description": {
"analyzer": "CommaAnalyzer",
"type": "string"
}
}
}
}
}

Step2:
Indexed following documents to newly created Index.

http://localhost:9200/test_hightlight/hightlight/1001  [POST]
{
    "documentName":"business Contract JSON business vendor and rep credentialing program ensures that Kutti Kumar and reps you are doing business with meet your requirements and are sound business partners With the business Small Business Package you can be where the buyers business are Not only do you get Business access to more than Business 1800 business hospitals you can BUSINESS promote your business in the only credentialed supplier sourcing tool used by Business healthcare organizations across business the country",
    "description":"Manage Kutti Kumar access and influence permissions Monitor Kutti Kumar sanction and financial details"

}

http://localhost:9200/test_hightlight/hightlight/1002  [POST]
{
    "documentName":"business JSON Communicating and managing those standards across all of your vendors can be a coordination nightmare Let business manage it for you Kutti Kumar",
    "description":"notifications and management enable you to have better insight to your business Kutti Kumar"

}

http://localhost:9200/test_hightlight/hightlight/1003  [POST]
{
  "documentName": "Kutti Kumar business JSON Communicating and managing those standards across all of your vendors can be a coordination nightmare Let business manage it for Kutti Kumar",
  "description": "Kutti Kumar notifications and management enable Kutti Kumar to have better insight to your business"
}

Step3:
Executed the following query

URL:http://localhost:9200/test_hightlight/
Query:

{
"timeout": 60000,
"query": {
"bool": {
"must": {
"query_string": {
"query": "business",
"default_operator": "and"
}
}
}
},
"explain": false,
"highlight": {
"pre_tags": [
""
],
"post_tags": [
"
"
],
"fields": {
"documentName": {
"fragment_size": 20,
"number_of_fragments": 5,
"fragment_offset": 0
}
}
}
}

Got the following error

{

took: 20
timed_out: false
_shards: {
    total: 6
    successful: 3
    failed: 3
    failures: [
        {
            index: test_hightlight
            shard: 3
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][3]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
        {
            index: test_hightlight
            shard: 4
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][4]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
        {
            index: test_hightlight
            shard: 5
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][5]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
    ]
}
hits: {
    total: 3
    max_score: 0.12557761
    hits: [ ]
}

}

Note: The same query is working as we expected with Elasticsearch 0.20.2

@kuttiKumarv

This comment has been minimized.

Copy link
Author

kuttiKumarv commented Jun 19, 2013

Hi s1monw,
Windows Environment:
JDK 1.7,Windows 7, elasticsearch 0.90.1, used elasticsearch head plugin to create/query documents

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jun 21, 2013

The JSON looks invalid: "pattern": "," should be replaced by either "pattern": "," or "pattern": ",".

I tried with both options and didn't manage to reproduce the issue with 0.90.1. Could you provide us with a bash script containing a set of curl commands that always reproduce the problem?

@ghost ghost assigned jpountz Jun 21, 2013

@kuttiKumarv

This comment has been minimized.

Copy link
Author

kuttiKumarv commented Jun 26, 2013

Delete previous tests

curl -XDELETE 'http://127.0.0.1:9200/test_hightlight/?pretty=1'

Step 1:-

Defined mappings and settings for index(test_hightlight)/type(hightlight).

curl -XPUT 'http://127.0.0.1:9200/test_hightlight/?pretty=1' -d '
{
"settings": {
"index": {
"number_of_shards": 6,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"CommaAnalyzer": {
"type": "pattern",
"flags": "DOTALL",
"lowercase": "true",
"pattern": "\,",
"stopwords": "none"
}
}
}
}
},
"mappings": {
"hightlight": {
"properties": {
"documentName": {
"analyzer": "CommaAnalyzer",
"type": "string"
},
"description": {
"analyzer": "CommaAnalyzer",
"type": "string"
}
}
}
}
}
'

Step2:

Indexed following documents to newly created Index.

curl -X POST 'http://localhost:9200/test_hightlight/hightlight/1001' -d '
{
"documentName":"business Contract JSON business vendor and rep credentialing program ensures that Kutti Kumar and reps you are doing business with meet your requirements and are sound business partners With the business Small Business Package you can be where the buyers business are Not only do you get Business access to more than Business 1800 business hospitals you can BUSINESS promote your business in the only credentialed supplier sourcing tool used by Business healthcare organizations across business the country",
"description":"Manage Kutti Kumar access and influence permissions Monitor Kutti Kumar sanction and financial details"
}'
curl -X POST 'http://localhost:9200/test_hightlight/hightlight/1002' -d '
{
"documentName":"business JSON Communicating and managing those standards across all of your vendors can be a coordination nightmare Let business manage it for you Kutti Kumar",
"description":"notifications and management enable you to have better insight to your business Kutti Kumar"
}'
curl -X POST 'http://localhost:9200/test_hightlight/hightlight/1003' -d '
{
"documentName": "Kutti Kumar business JSON Communicating and managing those standards across all of your vendors can be a coordination nightmare Let business manage it for Kutti Kumar",
"description": "Kutti Kumar notifications and management enable Kutti Kumar to have better insight to your business"
}'

#Step3:
#Executed the following query
curl -XGET 'http://127.0.0.1:9200/test_hightlight/hightlight/_search?pretty=1' -d '
{
"timeout": 60000,
"query": {
"bool": {
"must": {
"query_string": {
"query": "business",
"default_operator": "and"
}
}
}
},
"explain": false,
"highlight": {
"pre_tags": [
""
],
"post_tags": [
"
"
],
"fields": {
"documentName": {
"fragment_size": 20,
"number_of_fragments": 5,
"fragment_offset": 0
}
}
}
}'

Got the following error

{

took: 20
timed_out: false
_shards: {
    total: 6
    successful: 3
    failed: 3
    failures: [
        {
            index: test_hightlight
            shard: 3
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][3]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
        {
            index: test_hightlight
            shard: 4
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][4]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
        {
            index: test_hightlight
            shard: 5
            status: 500
            reason: FetchPhaseExecutionException[[test_hightlight][5]: query[_all:business],from[0],size[10]: Fetch Failed [Failed to highlight field [documentName]]]; nested: IOException[Stream closed]; 
        }
    ]
}
hits: {
    total: 3
    max_score: 0.12557761
    hits: [ ]
}

}

@jpountz jpountz closed this in c37de66 Jul 8, 2013

jpountz added a commit that referenced this issue Jul 8, 2013

Don't reset TokenStreams twice when highlighting.
When using PlainHighlighter, TokenStreams are resetted both before highlighting
and at the beginning of highlighting, causing issues with analyzers that read
in reset() such as PatternAnalyzer. This commit removes the call to reset which
was performed before passing the TokenStream to the highlighter.

Close #3200
@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 8, 2013

@kuttiKumarv Thanks for the detailed steps, I managed to reproduce the issue and fix it!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Don't reset TokenStreams twice when highlighting.
When using PlainHighlighter, TokenStreams are resetted both before highlighting
and at the beginning of highlighting, causing issues with analyzers that read
in reset() such as PatternAnalyzer. This commit removes the call to reset which
was performed before passing the TokenStream to the highlighter.

Close elastic#3200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.