Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix application of docid killlists from other indexes. #1660

Merged
merged 1 commit into from Dec 19, 2023

Conversation

raxoft
Copy link
Contributor

@raxoft raxoft commented Dec 4, 2023

Type of Change (select one):

  • Bug fix

Description of the Change:

The direction of indexes in ApplyKillListsTo is swapped in case of docid killists - they are accidentally applied from target to source, not from source to target. So it has never worked properly, the docid killlists completely messed up other indexes. This fix corrects that.

Related Issue (provide the link):
None.

The direction of indexes in ApplyKillListsTo is swapped in case of docid killists - they are accidentally applied from target to source, not from source to target. This fix corrects that.
@sanikolaev
Copy link
Collaborator

Thanks for the PR! We'll review it.

@sanikolaev
Copy link
Collaborator

they are accidentally applied from target to source, not from source to target

@raxoft How do I reproduce this?

The most basic example doesn't reveal the issue:

➜  ~ cat kl_test.conf
searchd {
    listen = 9315:mysql41
    log = searchd.log
    pid_file = searchd.pid
    binlog_path =
}

source main {
    type = csvpipe
    csvpipe_command = echo "1,a" && echo "2,b" && echo "3,c"
    csvpipe_field = f
}

index main {
    type = plain
    source = main
    path = /tmp/kl_main
}

source delta {
    type = csvpipe
    csvpipe_command = echo "3,c"
    csvpipe_field = f
}

index delta {
    type = plain
    source = delta
    path = /tmp/kl_delta
    killlist_target = main:id
}

➜  ~ indexer -c kl_test.conf --all
Manticore 6.2.13 6a2f4a2d6@231124 dev (columnar 2.2.5 975172c@231117) (secondary 2.2.5 975172c@231117) (knn 2.2.5 975172c@231117)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

using config file '/Users/sn/kl_test.conf'...
indexing table 'main'...
collected 3 docs, 0.0 MB
creating secondary index
creating lookup: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 3 docs, 3 bytes
total 0.044 sec, 67 bytes/sec, 67.83 docs/sec
indexing table 'delta'...
collected 1 docs, 0.0 MB
creating secondary index
creating lookup: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 1 docs, 1 bytes
total 0.021 sec, 46 bytes/sec, 46.24 docs/sec
total 6 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 31 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg

➜  ~ searchd -c kl_test.conf
Manticore 6.2.13 6a2f4a2d6@231124 dev (columnar 2.2.5 975172c@231117) (secondary 2.2.5 975172c@231117) (knn 2.2.5 975172c@231117)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[40:14.398] [31625859] using config file '/Users/sn/kl_test.conf' (499 chars)...
starting daemon version '6.2.13 6a2f4a2d6@231124 dev (columnar 2.2.5 975172c@231117) (secondary 2.2.5 975172c@231117) (knn 2.2.5 975172c@231117)' ...
listening on all interfaces for mysql, port=9315
precaching table 'main'
precaching table 'delta'
applying killlist of table 'delta'
precached 2 tables in 0.001 sec

➜  ~ mysql -v -P9315 -h0 -e "select * from main; select * from delta"
--------------
select * from main
--------------

+------+------+
| id   | f    |
+------+------+
|    1 | a    |
|    2 | b    |
+------+------+
--------------
select * from delta
--------------

+------+------+
| id   | f    |
+------+------+
|    3 | c    |
+------+------+

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Dec 6, 2023
@raxoft
Copy link
Contributor Author

raxoft commented Dec 6, 2023

Following the example above, try rebuilding the index main with --rotate after you start searchd. You will see that the delta will not affect that and it will contain all ids, including duplicate. However, with this simple setup, you will not see that it was applied in the opposite direction, perhaps because searchd will not consider that table modified. At least I was not able to make it work with ids only. However, if you add killlists with ids identical to the ids in the index, you should see that it actually kills the ids from the index which was not rotated, too.

It becomes even more apparent if you add a third index, which can target either delta or both delta and main. Just try rotating each of the indexes in turn and see how the killlists are applied incorrectly.

(In addition to the tests, it's also apparent from the source that the parameter passed to KillExistingDocids is the target index, while in case of KillMulti the recipient is the target. Unfortunate API choice, but that's what it is. Few more lines above that in ApplyIndexKillList you can see both being used in the correct direction.)

@sanikolaev
Copy link
Collaborator

However, if you add killlists with ids identical to the ids in the index, you should see that it actually kills the ids from the index which was not rotated, too.

I'm not sure how to do it.

@raxoft can you please modify my example, so we can see the issue clearly? We'll need it for the test, too.

@raxoft
Copy link
Contributor Author

raxoft commented Dec 10, 2023

@raxoft can you please modify my example, so we can see the issue clearly? We'll need it for the test, too.

Sorry for late response. Yes, I will try to prepare one next week.

@raxoft
Copy link
Contributor Author

raxoft commented Dec 18, 2023

OK, here is an example setup which messes up the other index than the rotated one:

➜  ~ cat kl_test.conf
searchd {
    listen = 9315:mysql41
    log = searchd.log
    pid_file = searchd.pid
    binlog_path =
}

source test {
    type = xmlpipe2
    xmlpipe_command = /usr/bin/cat test_kl.xml
    xmlpipe_fixup_utf8 = 1
}

index ancient {
    type = plain
    source = test
    path = /tmp/kl_ancient
}

index archive {
    type = plain
    source = test
    path = /tmp/kl_archive
    killlist_target = ancient
}

index year {
    type = plain
    source = test
    path = /tmp/kl_year
    killlist_target = archive, ancient
}

➜  ~ cat kl_test.xml
<?xml version="1.0" encoding="UTF-8"?>
<sphinx:docset>
 <sphinx:schema>
  <sphinx:field name="text"/>
 </sphinx:schema>
 <sphinx:document id="1">
  <text>Test</text>
 </sphinx:document>
 <sphinx:killlist>
  <id>1</id>
 </sphinx:killlist>
</sphinx:docset>

Like in your example above, build all indexes, then run searchd. Checking indexes will show year contains one document, as expected, while the other two indexes are empty. Now rotate archive. Afterwards, year will be empty, which is wrong. You can now rotate year, to fill it again. Now rotate ancient, and again, year will be empty.

The example might be reduced to two indexes perhaps, I guess, but this is the setup I had, so I went with that.

Hope this helps.

@glookka glookka merged commit 8e54a54 into manticoresoftware:master Dec 19, 2023
110 checks passed
@tomatolog
Copy link
Contributor

maybe worth to add ubertest to make sure the behavior will not change in the future?

@sanikolaev sanikolaev removed the waiting Waiting for the original poster (in most cases) or something else label Dec 25, 2023
@sanikolaev
Copy link
Collaborator

maybe worth to add ubertest to make sure the behavior will not change in the future?

I've created a separate task about it #1691

@sanikolaev sanikolaev added the rel::upcoming Upcoming release label Dec 25, 2023
@raxoft
Copy link
Contributor Author

raxoft commented Jan 7, 2024

BTW, on a related note - I have noticed that after merging two indexes with indexer and using the --rotate option, it seems that only killlist applied is if one of those indexes targets the other. No other killlists from other indices seem to be applied. Is there some intentional reason behind this logic, or is it another bug related to indexes?

@sanikolaev
Copy link
Collaborator

@raxoft can you please create a separate issue with a minimal reproducible example about it? It looks like a bug at first glance, but we'd need to check deeper.

@sanikolaev sanikolaev added the bug label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug rel::upcoming Upcoming release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants