test: add remove tests for delete chunks #1620

CatStark · 2021-01-07T08:18:54Z

First draft of tests for crud operations at chunk level.
This is just the first overview

cristianmtr · 2021-01-07T08:21:11Z

tests/integration/crud/test_chunks.py

+
+@pytest.fixture
+def config(tmpdir):
+    os.environ['JINA_TOPK_DIR'] = str(tmpdir)


Why TOPK_DIR?

github-actions · 2021-01-07T08:33:33Z

Latency summary

Current PR yields:

🐢🐢 index QPS at 1224, delta to last 3 avg.: -13%
😶 query QPS at 29, delta to last 3 avg.: +4%

Breakdown

Version	Index QPS	Query QPS
current	1224	29
`0.9.19`	1253	27
`0.9.18`	1593	27

Backed by latency-tracking. Further commits will update this comment.

codecov · 2021-01-07T08:47:12Z

Codecov Report

Merging #1620 (fbb7e8c) into master (650f87e) will increase coverage by 0.14%.
The diff coverage is 94.68%.

@@            Coverage Diff             @@
##           master    #1620      +/-   ##
==========================================
+ Coverage   85.18%   85.33%   +0.14%     
==========================================
  Files         134      135       +1     
  Lines        6865     6893      +28     
==========================================
+ Hits         5848     5882      +34     
+ Misses       1017     1011       -6

Impacted Files	Coverage Δ
jina/enums.py	`95.91% <ø> (ø)`
jina/executors/metas.py	`96.87% <ø> (ø)`
jina/optimizers/discovery.py	`80.26% <ø> (ø)`
jina/peapods/pods/helper.py	`96.80% <ø> (-0.10%)`	⬇️
jina/peapods/runtimes/zmq/zed.py	`89.92% <ø> (-1.44%)`	⬇️
jina/optimizers/flow_runner.py	`84.44% <75.00%> (-15.56%)`	⬇️
jina/clients/sugary_io.py	`96.36% <100.00%> (+0.53%)`	⬆️
jina/executors/compound.py	`89.31% <100.00%> (+3.71%)`	⬆️
jina/executors/decorators.py	`91.27% <100.00%> (ø)`
jina/executors/indexers/vector.py	`93.26% <100.00%> (+0.14%)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8878ccd...a800012. Read the comment docs.

JoanFM · 2021-01-07T08:49:48Z

tests/integration/crud/test_chunks.py

+from jina import Document
+from jina.flow import Flow
+
+random.seed(0)


can this be put inside the config?

This reverts commit 31fbe64.

…unks

This reverts commit e404781.

This reverts commit 21895b2.

This reverts commit 898db65, reversing changes made to e404781.

cristianmtr · 2021-01-22T11:51:27Z

CHANGELOG.md

@@ -10,6 +10,7 @@



+


Why is this here as a change? Perhaps rebase?

cristianmtr · 2021-01-22T11:53:10Z

tests/integration/crud/chunks/test_chunks.py

+def config(tmpdir):
+    random.seed(0)
+    np.random.seed(0)
+    os.environ['JINA_TOPK_DIR'] = str(tmpdir)


Why TOPK_DIR? Better name

cristianmtr · 2021-01-22T11:55:00Z

tests/integration/crud/chunks/test_chunks.py

+    with Flow.load_config(flow_file) as index_flow:
+        index_flow.index(
+            input_fn=document_generator(content_same=content_same, start=0, num_docs=num_docs, num_chunks=num_chunks))
+    validate_index_size(50) #5 chunks for each of the 10 docs


I would make the 50 dependent on the num_docs and num_chunks explicitly =num_docs*num_chunks

cristianmtr · 2021-01-22T11:57:19Z

tests/integration/crud/chunks/test_chunks.py

+c_embedding = np.random.random([9])
+
+
+def document_generator(content_same, start, num_docs, num_chunks):


content_same is not relevant for your test, unless you plan on adding DocIDCache as well

cristianmtr · 2021-01-22T11:58:45Z

tests/integration/crud/chunks/test_chunks.py

+    ['flow_vector.yml', False],
+    ['flow_vector.yml', True]
+])
+def test_update_vector(config, mocker, flow_file, content_same):


again, remove content_same It's irrelevant without the cache

cristianmtr · 2021-01-22T12:35:18Z

tests/integration/crud/simple/chunks/test_chunks.py

+    random.seed(0)
+    np.random.seed(0)
+    os.environ['JINA_CRUD_CHUNKS'] = str(tmpdir)
+    os.environ['JINA_TOPK'] = '10'


Suggested change

os.environ['JINA_TOPK'] = '10'

os.environ['JINA_TOPK'] = str(TOP_K)

cristianmtr · 2021-01-22T12:35:51Z

tests/integration/crud/simple/chunks/test_chunks.py

+    ['flow_vector.yml', False],
+    ['flow_vector.yml', True]
+])
+def test_delete_vector(config, mocker, flow_file, content_same):


I think I mentioned in the previous version, but again: content_same should be removed. It's unnecessary without the cache in front

Yes, I'll make the changes now

cristianmtr · 2021-01-22T12:36:40Z

tests/integration/crud/simple/chunks/test_chunks.py

+    with Flow.load_config(flow_file) as index_flow:
+        index_flow.index(
+            input_fn=document_generator(content_same=content_same, start=0, num_docs=num_docs, num_chunks=num_chunks))
+    validate_index_size(50) #5 chunks for each of the 10 docs


I would remove the comment and

Suggested change

validate_index_size(50) #5 chunks for each of the 10 docs

validate_index_size(num_chunks*num_docs)

to make it clear

cristianmtr

some minor things

…st-chunks

test: add remove tests for delete chunks

b42e34b

CatStark requested a review from a team as a code owner January 7, 2021 08:18

CatStark requested review from nan-wang and deepankarm January 7, 2021 08:18

CatStark marked this pull request as draft January 7, 2021 08:19

CatStark removed the request for review from deepankarm January 7, 2021 08:19

jina-bot added size/S area/testing This issue/PR affects testing labels Jan 7, 2021

CatStark requested review from cristianmtr and maximilianwerk and removed request for nan-wang January 7, 2021 08:19

cristianmtr reviewed Jan 7, 2021

View reviewed changes

JoanFM reviewed Jan 7, 2021

View reviewed changes

fix: change level of recursion for chunks

31fbe64

jina-bot added size/M and removed size/S labels Jan 7, 2021

CatStark added 5 commits January 10, 2021 14:11

test: add test update vector with chunks

9697c3d

test: add test update vector with chunks

7bf4c5d

Revert "fix: change level of recursion for chunks"

8a71f71

This reverts commit 31fbe64.

Merge branch 'master' of https://github.com/jina-ai/jina into test-ch…

9597194

…unks

fix: small refactoring

e404781

florian-hoenicke previously approved these changes Jan 22, 2021

View reviewed changes

fix: small refactor

898db65

CatStark dismissed florian-hoenicke’s stale review via 898db65 January 22, 2021 11:24

jina-bot added area/core This issue/PR affects the core codebase area/docs This issue/PR affects the docs area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality labels Jan 22, 2021

jina-bot added area/network This issue/PR affects network functionality component/peapod labels Jan 22, 2021

CatStark added 3 commits January 22, 2021 12:35

Revert "fix: small refactoring"

21895b2

This reverts commit e404781.

Revert "Revert "fix: small refactoring""

3bbdfd8

This reverts commit 21895b2.

Revert "fix: small refactor"

9faee62

This reverts commit 898db65, reversing changes made to e404781.

jina-bot added size/L and removed size/M labels Jan 22, 2021

cristianmtr reviewed Jan 22, 2021

View reviewed changes

CHANGELOG.md Outdated

@@ -10,6 +10,7 @@

Copy link

Contributor

cristianmtr Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here as a change? Perhaps rebase?

cristianmtr reviewed Jan 22, 2021

View reviewed changes

CatStark removed the request for review from maximilianwerk January 22, 2021 12:20

fix: cleanup

87a358e

cristianmtr force-pushed the test-chunks branch from 9faee62 to 87a358e Compare January 22, 2021 12:31

CatStark marked this pull request as ready for review January 22, 2021 12:33

jina-bot added size/M and removed size/L labels Jan 22, 2021

cristianmtr reviewed Jan 22, 2021

View reviewed changes

cristianmtr suggested changes Jan 22, 2021

View reviewed changes

CatStark added 2 commits January 22, 2021 13:40

Merge branch 'test-chunks' of https://github.com/jina-ai/jina into te…

e5597df

…st-chunks

fix: remove content_same

a800012

CatStark requested a review from cristianmtr January 22, 2021 14:01

hanxiao approved these changes Jan 22, 2021

View reviewed changes

hanxiao merged commit e29734b into master Jan 22, 2021

hanxiao deleted the test-chunks branch January 22, 2021 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add remove tests for delete chunks #1620

test: add remove tests for delete chunks #1620

CatStark commented Jan 7, 2021

cristianmtr Jan 7, 2021

github-actions bot commented Jan 7, 2021 •

edited

codecov bot commented Jan 7, 2021 •

edited

JoanFM Jan 7, 2021

cristianmtr Jan 22, 2021

cristianmtr Jan 22, 2021

cristianmtr Jan 22, 2021

cristianmtr Jan 22, 2021

cristianmtr Jan 22, 2021

cristianmtr Jan 22, 2021

cristianmtr Jan 22, 2021

CatStark Jan 22, 2021

cristianmtr Jan 22, 2021

cristianmtr left a comment

		c_embedding = np.random.random([9])


		def document_generator(content_same, start, num_docs, num_chunks):

	os.environ['JINA_TOPK'] = '10'
	os.environ['JINA_TOPK'] = str(TOP_K)

	validate_index_size(50) #5 chunks for each of the 10 docs
	validate_index_size(num_chunks*num_docs)

test: add remove tests for delete chunks #1620

test: add remove tests for delete chunks #1620

Conversation

CatStark commented Jan 7, 2021

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2021 • edited

Latency summary

Breakdown

codecov bot commented Jan 7, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cristianmtr left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2021 •

edited

codecov bot commented Jan 7, 2021 •

edited