-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add remove tests for delete chunks #1620
Conversation
|
||
@pytest.fixture | ||
def config(tmpdir): | ||
os.environ['JINA_TOPK_DIR'] = str(tmpdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why TOPK_DIR
?
Latency summaryCurrent PR yields:
Breakdown
Backed by latency-tracking. Further commits will update this comment. |
Codecov Report
@@ Coverage Diff @@
## master #1620 +/- ##
==========================================
+ Coverage 85.18% 85.33% +0.14%
==========================================
Files 134 135 +1
Lines 6865 6893 +28
==========================================
+ Hits 5848 5882 +34
+ Misses 1017 1011 -6
Continue to review full report at Codecov.
|
from jina import Document | ||
from jina.flow import Flow | ||
|
||
random.seed(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be put inside the config?
This reverts commit 31fbe64.
CHANGELOG.md
Outdated
@@ -10,6 +10,7 @@ | |||
|
|||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here as a change? Perhaps rebase?
def config(tmpdir): | ||
random.seed(0) | ||
np.random.seed(0) | ||
os.environ['JINA_TOPK_DIR'] = str(tmpdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why TOPK_DIR
? Better name
with Flow.load_config(flow_file) as index_flow: | ||
index_flow.index( | ||
input_fn=document_generator(content_same=content_same, start=0, num_docs=num_docs, num_chunks=num_chunks)) | ||
validate_index_size(50) #5 chunks for each of the 10 docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the 50 dependent on the num_docs and num_chunks explicitly =num_docs*num_chunks
c_embedding = np.random.random([9]) | ||
|
||
|
||
def document_generator(content_same, start, num_docs, num_chunks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content_same
is not relevant for your test, unless you plan on adding DocIDCache
as well
['flow_vector.yml', False], | ||
['flow_vector.yml', True] | ||
]) | ||
def test_update_vector(config, mocker, flow_file, content_same): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, remove content_same
It's irrelevant without the cache
9faee62
to
87a358e
Compare
random.seed(0) | ||
np.random.seed(0) | ||
os.environ['JINA_CRUD_CHUNKS'] = str(tmpdir) | ||
os.environ['JINA_TOPK'] = '10' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.environ['JINA_TOPK'] = '10' | |
os.environ['JINA_TOPK'] = str(TOP_K) |
['flow_vector.yml', False], | ||
['flow_vector.yml', True] | ||
]) | ||
def test_delete_vector(config, mocker, flow_file, content_same): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned in the previous version, but again: content_same
should be removed. It's unnecessary without the cache in front
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll make the changes now
with Flow.load_config(flow_file) as index_flow: | ||
index_flow.index( | ||
input_fn=document_generator(content_same=content_same, start=0, num_docs=num_docs, num_chunks=num_chunks)) | ||
validate_index_size(50) #5 chunks for each of the 10 docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the comment and
validate_index_size(50) #5 chunks for each of the 10 docs | |
validate_index_size(num_chunks*num_docs) |
to make it clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor things
First draft of tests for crud operations at chunk level.
This is just the first overview