Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering by document ID and NOT IN (..) intermittently returns incorrect results if the NOT IN portion of the WHERE clause falls after the id portion #196

Closed
caseyf opened this issue May 22, 2019 · 5 comments

Comments

Projects
None yet
4 participants
@caseyf
Copy link

commented May 22, 2019

Environment

Manticore Search version:

3.0.0

OS version:

Linux 4.1 (Gentoo)

Build version:

3.0.0 official release

Problem

Description of the issue:

Queries like select id from index_1, index_2 where id in (...) and attribute not in (...) intermittently return incorrect results if all of the following conditions are met:

  1. more than 1 index is searched

  2. id in (...) is part of the where clause

  3. a attr not in (...) or attr != ? comes AFTER the id portion of the where clause

I am able to workaround this issue by relocating all NOT IN and != segments to the beginning of the where clause.

Steps to reproduce

config file:

    
searchd {
    
  listen = 0.0.0.0:9301:mysql41
    
  pid_file = searchd.pid
    
  binlog_path =
    
}
    

    
index patterns_search_base {
    
}
    

    
index patterns_search_1 : patterns_search_base {
    
  source    = patterns_search_1
    
  path      = ./patterns_search_1
    
}
    

    
index patterns_search_2 : patterns_search_base {
    
  source    = patterns_search_2
    
  path      = ./patterns_search_2
    
}
    

Index files:

https://www.dropbox.com/s/47celsagkcdzvjj/index-files.tar.gz?dl=1

Correct result

    
mysql>  select id from patterns_search_1, patterns_search_2 where id in (6748,9436) and pattern_author_id not in (1) ;
    
 ------ 
    
| 6748 |
    
| 9436 |
    
 ------ 
    
2 rows in set (0.00 sec)
    

Test script:

    
#!/bin/bash
    

    
PORT=9301
    
TEST_QUERY="select id from patterns_search_1, patterns_search_2 where id in (6748,9436) and pattern_author_id not in (1);"
    

    
for i in `seq 1 10000`;
    
do
    
  result=`echo $TEST_QUERY |   mysql -N -P $PORT -h 127.0.0.1 -u root | wc -l`
    
  if [ "$result" -lt 2 ]
    
  then
    
    echo "[`date`] iteration $i did not find the correct number of results: $result"
    
  fi
    
done
    

Incorrect results:

$ ./test-script

[Wed May 22 14:42:36 EDT 2019] iteration 23 did not find the correct number of results: 1

[Wed May 22 14:42:36 EDT 2019] iteration 30 did not find the correct number of results: 1

[Wed May 22 14:42:37 EDT 2019] iteration 182 did not find the correct number of results: 1

[Wed May 22 14:42:39 EDT 2019] iteration 513 did not find the correct number of results: 1

[Wed May 22 14:42:40 EDT 2019] iteration 668 did not find the correct number of results: 1

[Wed May 22 14:42:41 EDT 2019] iteration 687 did not find the correct number of results: 1

[Wed May 22 14:42:41 EDT 2019] iteration 707 did not find the correct number of results: 1

Other notes

The following test queries also return incorrect results:

change attribute NOT IN list to !=:

    
select id from patterns_search_1, patterns_search_2 where id in (6748, 9436) and  pattern_author_id !=1;
    

The following test queries do not return any incorrect results:

change id IN list to use OR:

    
select id from patterns_search_1, patterns_search_2 where (id = 6748 or id = 9436) and  pattern_author_id  not in (1)
    

    
select id from patterns_search_1, patterns_search_2 where (id = 6748 or id = 9436) and  pattern_author_id !=1
    

move id IN list to the end:

    
select id from patterns_search_1, patterns_search_2 where  pattern_author_id not in (1) and id in (6748,9436);
    

    
select id from patterns_search_1, patterns_search_2 where  pattern_author_id !=1 and id in (6748,9436);
    

    
select id from patterns_search_1, patterns_search_2 where  pattern_author_id !=1 and (id  = 6748 or id = 9436);
    
```
  

  

@caseyf caseyf changed the title Filtering by document ID and `NOT IN (any_attribute)` intermittently returns incorrect results if the `NOT IN` portion of the WHERE clause falls after the `id` portion Filtering by document ID and NOT IN (..) intermittently returns incorrect results if the NOT IN portion of the WHERE clause falls after the `id` portion May 22, 2019

@caseyf caseyf changed the title Filtering by document ID and NOT IN (..) intermittently returns incorrect results if the NOT IN portion of the WHERE clause falls after the `id` portion Filtering by document ID and NOT IN (..) intermittently returns incorrect results if the NOT IN portion of the WHERE clause falls after the id portion May 22, 2019

@tomatolog

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

I uploaded index files your provided to our FTP into github-issue-196 going to check issue

@tomatolog

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

yes I see issue here and inform you on fix

@tomatolog tomatolog added the bug label May 22, 2019

@caseyf

This comment has been minimized.

Copy link
Author

commented May 23, 2019

I think I had the cause of the bug wrong - it seems to occur in other instances:

Incorrect results:

select * from patterns_search_dist where  id in (1, 2, 3...) and yarn_weight_id in (5)

moving the id list to the end yields the correct results:

select * from patterns_search_dist where  yarn_weight_id in (5) and id in (1, 2, 3...)
@githubmanticore

This comment has been minimized.

Copy link
Contributor

commented May 25, 2019

➤ Ilya Kuznetsov commented:

Fixed in 5802b85

@glookka glookka closed this May 25, 2019

@tomatolog

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

you have to update your daemon to get issue fixed, no need to reindex your data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.