Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][standalone] Milvus panic segment not found in concurrent DML scene #34325

Closed
1 task done
wangting0128 opened this issue Jul 2, 2024 · 6 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4-20240701-3c5ad499-amd64 
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: inverted-corn-1719849600
test case name: test_inverted_locust_partition_key_dml_standalone

server:

[2024-07-01 19:26:33,087 -  INFO - fouram]: [Base] Deploy initial state: 
I0701 16:10:49.442587     420 request.go:665] Waited for 1.176647648s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/policy/v1beta1?timeout=32s
I0701 16:10:59.641718     420 request.go:665] Waited for 6.997821513s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/node.k8s.io/v1beta1?timeout=32s
NAME                                                              READY   STATUS                   RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-149600-2-79-3547-etcd-0                             1/1     Running                  0                 4m52s   10.104.26.247   4am-node32   <none>           <none>
inverted-corn-149600-2-79-3547-milvus-standalone-7df89f646ds4sl   1/1     Running                  3 (2m32s ago)     4m52s   10.104.26.248   4am-node32   <none>           <none>
inverted-corn-149600-2-79-3547-minio-58c7ccf54f-jxkps             1/1     Running                  0                 4m52s   10.104.16.168   4am-node21   <none>           <none> (base.py:261)
[2024-07-01 19:26:33,087 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|inverted-corn-149600-2-79-3547-milvus|inverted-corn-149600-2-79-3547-minio|inverted-corn-149600-2-79-3547-etcd|inverted-corn-149600-2-79-3547-pulsar|inverted-corn-149600-2-79-3547-zookeeper|inverted-corn-149600-2-79-3547-kafka|inverted-corn-149600-2-79-3547-log|inverted-corn-149600-2-79-3547-tikv'  (util_cmd.py:14)
[2024-07-01 19:26:49,663 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-149600-2-79-3547): 
 I0701 19:26:34.337262     530 request.go:665] Waited for 1.177881965s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/resolution.tekton.dev/v1beta1?timeout=32s
I0701 19:26:44.537756     530 request.go:665] Waited for 6.997021061s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/dashboard.tekton.dev/v1alpha1?timeout=32s
NAME                                                              READY   STATUS                        RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-149600-2-79-3547-etcd-0                             1/1     Running                       0                 3h20m   10.104.26.247   4am-node32   <none>           <none>
inverted-corn-149600-2-79-3547-milvus-standalone-7df89f646ds4sl   1/1     Running                       4 (127m ago)      3h20m   10.104.26.248   4am-node32   <none>           <none>
inverted-corn-149600-2-79-3547-minio-58c7ccf54f-jxkps             1/1     Running                       0                 3h20m   10.104.16.168   4am-node21   <none>           <none> (cli_client.py:144)

image

截屏2024-07-02 10 52 01

client pod name: inverted-corn-1719849600-788157396
client error time:
2024-07-01 17:19:07,081 ~ 2024-07-01 19:28:11,949

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `partition_key: scalar enable partition_key(num_partitions=128)`
            verify concurrent DML scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1': is_partition_key
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - release

Milvus Log

No response

Anything else?

test result:

[2024-07-01 19:26:18,617 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-07-01 19:26:18,617 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-07-01 19:26:18,617 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-07-01 19:26:18,617 -  INFO - fouram]: grpc     delete                                                                          2223     3(0.13%) |     46       0   30669      2 |    0.21        0.00 (stats.py:789)
[2024-07-01 19:26:18,617 -  INFO - fouram]: grpc     flush                                                                           2155 1041(48.31%) |  99229     504  361609 136000 |    0.20        0.10 (stats.py:789)
[2024-07-01 19:26:18,618 -  INFO - fouram]: grpc     insert                                                                          2128     0(0.00%) |    239       6  129764     13 |    0.20        0.00 (stats.py:789)
[2024-07-01 19:26:18,618 -  INFO - fouram]: grpc     release                                                                         2255     3(0.13%) |     67       0   30673      2 |    0.21        0.00 (stats.py:789)
[2024-07-01 19:26:18,618 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-07-01 19:26:18,618 -  INFO - fouram]:          Aggregated                                                                      8761 1047(11.95%) |  24495       0  361609      9 |    0.81        0.10 (stats.py:789)
[2024-07-01 19:26:18,618 -  INFO - fouram]:  (stats.py:790)
[2024-07-01 19:26:18,620 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '5.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240701-3c5ad499-amd64'}}},
            'host': 'inverted-corn-149600-2-79-3547-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2,
                                                       'num_partitions': 128},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 180,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'release',
                                                       'weight': 1,
                                                       'params': {'timeout': 30}}]},
            'run_id': 2024070199707716,
            'datetime': '2024-07-01 16:06:10.584222',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 709.6943,
                                      'id': {'RT': 1.0496},
                                      'int64_1': {'RT': 1.0175}},
                            'insert': {'total_time': 176.862,
                                       'VPS': 28270.6291,
                                       'batch_time': 1.7686,
                                       'batch': 50000},
                            'flush': {'RT': 6.6009},
                            'load': {'RT': 6.1357},
                            'Locust': {'Aggregated': {'Requests': 8761,
                                                      'Fails': 1047,
                                                      'RPS': 0.81,
                                                      'fail_s': 0.12,
                                                      'RT_max': 361609.5,
                                                      'RT_avg': 24495.41,
                                                      'TP50': 9,
                                                      'TP99': 182000.0},
                                       'delete': {'Requests': 2223,
                                                  'Fails': 3,
                                                  'RPS': 0.21,
                                                  'fail_s': 0.0,
                                                  'RT_max': 30669.76,
                                                  'RT_avg': 46.62,
                                                  'TP50': 2,
                                                  'TP99': 90},
                                       'flush': {'Requests': 2155,
                                                 'Fails': 1041,
                                                 'RPS': 0.2,
                                                 'fail_s': 0.48,
                                                 'RT_max': 361609.5,
                                                 'RT_avg': 99229.53,
                                                 'TP50': 136000.0,
                                                 'TP99': 190000.0},
                                       'insert': {'Requests': 2128,
                                                  'Fails': 0,
                                                  'RPS': 0.2,
                                                  'fail_s': 0.0,
                                                  'RT_max': 129764.23,
                                                  'RT_avg': 239.5,
                                                  'TP50': 13,
                                                  'TP99': 1100.0},
                                       'release': {'Requests': 2255,
                                                   'Fails': 3,
                                                   'RPS': 0.21,
                                                   'fail_s': 0.0,
                                                   'RT_max': 30673.81,
                                                   'RT_avg': 67.1,
                                                   'TP50': 2,
                                                   'TP99': 90}}}}}
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Jul 2, 2024
@wangting0128 wangting0128 added this to the 2.4.6 milestone Jul 2, 2024
@yanliang567
Copy link
Contributor

/assign @weiliu1031
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 2, 2024
@xiaofan-luan
Copy link
Contributor

/assign @wangting0128

@wangting0128
Copy link
Contributor Author

Same case, different panic

#34376

@weiliu1031
Copy link
Contributor

please verify this with latest images

@weiliu1031
Copy link
Contributor

/assign @wangting0128

@yanliang567 yanliang567 removed the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jul 8, 2024
@wangting0128
Copy link
Contributor Author

verification passed

argo task: inverted-corn-1720386000
image: 2.4-20240705-326370c1-amd64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants