Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add remove tests for delete chunks #1620

Merged
merged 14 commits into from
Jan 22, 2021
Merged

test: add remove tests for delete chunks #1620

merged 14 commits into from
Jan 22, 2021

Conversation

CatStark
Copy link
Member

@CatStark CatStark commented Jan 7, 2021

First draft of tests for crud operations at chunk level.
This is just the first overview

@CatStark CatStark requested a review from a team as a code owner January 7, 2021 08:18
@CatStark CatStark marked this pull request as draft January 7, 2021 08:19
@CatStark CatStark removed the request for review from deepankarm January 7, 2021 08:19
@jina-bot jina-bot added size/S area/testing This issue/PR affects testing labels Jan 7, 2021
@CatStark CatStark requested review from cristianmtr and maximilianwerk and removed request for nan-wang January 7, 2021 08:19

@pytest.fixture
def config(tmpdir):
os.environ['JINA_TOPK_DIR'] = str(tmpdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why TOPK_DIR?

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

Latency summary

Current PR yields:

  • 🐢🐢 index QPS at 1224, delta to last 3 avg.: -13%
  • 😶 query QPS at 29, delta to last 3 avg.: +4%

Breakdown

Version Index QPS Query QPS
current 1224 29
0.9.19 1253 27
0.9.18 1593 27

Backed by latency-tracking. Further commits will update this comment.

@codecov
Copy link

codecov bot commented Jan 7, 2021

Codecov Report

Merging #1620 (fbb7e8c) into master (650f87e) will increase coverage by 0.14%.
The diff coverage is 94.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1620      +/-   ##
==========================================
+ Coverage   85.18%   85.33%   +0.14%     
==========================================
  Files         134      135       +1     
  Lines        6865     6893      +28     
==========================================
+ Hits         5848     5882      +34     
+ Misses       1017     1011       -6     
Impacted Files Coverage Δ
jina/enums.py 95.91% <ø> (ø)
jina/executors/metas.py 96.87% <ø> (ø)
jina/optimizers/discovery.py 80.26% <ø> (ø)
jina/peapods/pods/helper.py 96.80% <ø> (-0.10%) ⬇️
jina/peapods/runtimes/zmq/zed.py 89.92% <ø> (-1.44%) ⬇️
jina/optimizers/flow_runner.py 84.44% <75.00%> (-15.56%) ⬇️
jina/clients/sugary_io.py 96.36% <100.00%> (+0.53%) ⬆️
jina/executors/compound.py 89.31% <100.00%> (+3.71%) ⬆️
jina/executors/decorators.py 91.27% <100.00%> (ø)
jina/executors/indexers/vector.py 93.26% <100.00%> (+0.14%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8878ccd...a800012. Read the comment docs.

from jina import Document
from jina.flow import Flow

random.seed(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be put inside the config?

@jina-bot jina-bot added size/M and removed size/S labels Jan 7, 2021
@jina-bot jina-bot added area/core This issue/PR affects the core codebase area/docs This issue/PR affects the docs area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality labels Jan 22, 2021
@jina-bot jina-bot added area/network This issue/PR affects network functionality component/peapod labels Jan 22, 2021
This reverts commit 898db65, reversing
changes made to e404781.
@jina-bot jina-bot added size/L and removed size/M labels Jan 22, 2021
CHANGELOG.md Outdated
@@ -10,6 +10,7 @@




Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here as a change? Perhaps rebase?

def config(tmpdir):
random.seed(0)
np.random.seed(0)
os.environ['JINA_TOPK_DIR'] = str(tmpdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why TOPK_DIR? Better name

with Flow.load_config(flow_file) as index_flow:
index_flow.index(
input_fn=document_generator(content_same=content_same, start=0, num_docs=num_docs, num_chunks=num_chunks))
validate_index_size(50) #5 chunks for each of the 10 docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the 50 dependent on the num_docs and num_chunks explicitly =num_docs*num_chunks

c_embedding = np.random.random([9])


def document_generator(content_same, start, num_docs, num_chunks):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content_same is not relevant for your test, unless you plan on adding DocIDCache as well

['flow_vector.yml', False],
['flow_vector.yml', True]
])
def test_update_vector(config, mocker, flow_file, content_same):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, remove content_same It's irrelevant without the cache

@CatStark CatStark removed the request for review from maximilianwerk January 22, 2021 12:20
@CatStark CatStark marked this pull request as ready for review January 22, 2021 12:33
@jina-bot jina-bot added size/M and removed size/L labels Jan 22, 2021
random.seed(0)
np.random.seed(0)
os.environ['JINA_CRUD_CHUNKS'] = str(tmpdir)
os.environ['JINA_TOPK'] = '10'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.environ['JINA_TOPK'] = '10'
os.environ['JINA_TOPK'] = str(TOP_K)

['flow_vector.yml', False],
['flow_vector.yml', True]
])
def test_delete_vector(config, mocker, flow_file, content_same):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned in the previous version, but again: content_same should be removed. It's unnecessary without the cache in front

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll make the changes now

with Flow.load_config(flow_file) as index_flow:
index_flow.index(
input_fn=document_generator(content_same=content_same, start=0, num_docs=num_docs, num_chunks=num_chunks))
validate_index_size(50) #5 chunks for each of the 10 docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the comment and

Suggested change
validate_index_size(50) #5 chunks for each of the 10 docs
validate_index_size(num_chunks*num_docs)

to make it clear

Copy link
Contributor

@cristianmtr cristianmtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor things

@hanxiao hanxiao merged commit e29734b into master Jan 22, 2021
@hanxiao hanxiao deleted the test-chunks branch January 22, 2021 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/docs This issue/PR affects the docs area/entrypoint This issue/PR affects the entrypoint codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/peapod size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants