Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Search results are different after and before compaction for binary data in standalone mode #15783

Closed
1 task done
binbinlv opened this issue Feb 28, 2022 · 3 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug
Milestone

Comments

@binbinlv
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20220225-adca79fa
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.0.1.dev3
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Search results are different after and before compaction for binary data in standalone mode

Expected Behavior

Search results are same after and before compaction

Steps To Reproduce

@pytest.mark.tags(CaseLabel.L1)
    def test_compact_after_binary_index(self):
        """
        target: test compact after create index
        method: 1.insert binary data into two segments
                2.create binary index
                3.compact
                4.search
        expected: Verify segment info and index info
        """
        collection_w = self.init_collection_wrap(name=cf.gen_unique_str(prefix), shards_num=1,
                                                 schema=cf.gen_default_binary_collection_schema())
        for i in range(2):
            df, _ = cf.gen_default_binary_dataframe_data()
            collection_w.insert(data=df)
            assert collection_w.num_entities == (i + 1) * ct.default_nb

        # create index
        collection_w.create_index(ct.default_binary_vec_field_name, ct.default_binary_index)
        log.debug(collection_w.index())

        collection_w.load()

        search_params = {"metric_type": "JACCARD", "params": {"nprobe": 10}}
        vectors = cf.gen_binary_vectors(ct.default_nq, ct.default_dim)[1]
        search_res_one, _ = collection_w.search(vectors,
                                                ct.default_binary_vec_field_name,
                                                search_params, ct.default_limit)
        assert len(search_res_one) == ct.default_nq
        for hits in search_res_one:
            assert len(hits) == ct.default_limit

        # compact
        collection_w.compact()
        collection_w.wait_for_compaction_completed()
        collection_w.get_compaction_plans(check_task=CheckTasks.check_merge_compact)

        # verify index re-build and re-load
        search_params = {"metric_type": "L1", "params": {"nprobe": 10}}
        search_res_two, _ = collection_w.search(vectors,
                                                ct.default_binary_vec_field_name,
                                                search_params, ct.default_limit,
                                                check_task=CheckTasks.err_res,
                                                check_items={ct.err_code: 1,
                                                             ct.err_msg: "metric type not found: (L1)"})

        # verify search result
        search_params = {"metric_type": "JACCARD", "params": {"nprobe": 10}}
        search_res_two, _ = collection_w.search(vectors,
                                                ct.default_binary_vec_field_name,
                                                search_params, ct.default_limit)
        for i in range(ct.default_nq):
            for j in range(ct.default_limit):
                assert search_res_two[i][j].id == search_res_one[i][j].id

Anything else?

1 Log:
milvus-standalone-58-pymilvus-e2e-logs.tar.gz

2 Failed timeline:
[2022-02-27T06:24:24.884Z] [gw4] [ 12%] FAILED testcases/test_compaction.py::TestCompactionOperation::test_compact_after_binary_index

3 collection name:
compact_XKXdTNW0

@binbinlv binbinlv added the kind/bug Issues or changes related a bug label Feb 28, 2022
@binbinlv binbinlv added this to the 2.0.2 milestone Feb 28, 2022
@congqixia
Copy link
Contributor

There are three issues here:

Test code does not cover the case it means to cover

 # compact
    collection_w.compact()
    collection_w.wait_for_compaction_completed()
    collection_w.get_compaction_plans(check_task=CheckTasks.check_merge_compact)


    # verify index re-build and re-load
    search_params = {"metric_type": "L1", "params": {"nprobe": 10}}
    search_res_two, _ = collection_w.search(vectors,
                                            ct.default_binary_vec_field_name,
                                            search_params, ct.default_limit,
                                            check_task=CheckTasks.err_res

This code simply wait the compaction is finished. However the hand-off procedure is not guaranteed to be finished. QueryCluster will perform the handoff after the index is built if there is an index defined on the collection. Actually, the case passes all the times before because hand-off did not happen when search task dispatched.

Search case Recall rate not 100%

In the test code, we build an index with nlist = 128 while the search param is nprobe=10. So the recall rate may be less than 100%.
We cannot test the equality of segments with this setup.

JACCARD metric type may be not suitable for K-Means methods

The shall be describe in another issue.

@congqixia
Copy link
Contributor

/unassign
/assign @ThreadDao

@binbinlv
Copy link
Contributor Author

binbinlv commented Mar 7, 2022

Fixed and closed.

@binbinlv binbinlv closed this as completed Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug
Projects
None yet
Development

No branches or pull requests

3 participants