Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: sharding non existent #1719

Merged
merged 11 commits into from
Jan 22, 2021
Merged

test: sharding non existent #1719

merged 11 commits into from
Jan 22, 2021

Conversation

florian-hoenicke
Copy link
Member

Needs to be fixed. We insert 201 documents but only get 101 back.

@jina-bot jina-bot added size/S area/testing This issue/PR affects testing labels Jan 17, 2021
@codecov
Copy link

codecov bot commented Jan 17, 2021

Codecov Report

Merging #1719 (5363d25) into master (75f9569) will increase coverage by 1.94%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1719      +/-   ##
==========================================
+ Coverage   83.32%   85.27%   +1.94%     
==========================================
  Files         134      134              
  Lines        6878     6884       +6     
==========================================
+ Hits         5731     5870     +139     
+ Misses       1147     1014     -133     
Impacted Files Coverage Δ
jina/drivers/convert.py 100.00% <ø> (+3.70%) ⬆️
jina/drivers/evaluate.py 98.27% <ø> (+1.72%) ⬆️
jina/drivers/predict.py 89.65% <ø> (-0.18%) ⬇️
jina/drivers/reduce.py 100.00% <100.00%> (ø)
jina/logging/profile.py 70.40% <0.00%> (+3.20%) ⬆️
jina/parsers/helloworld.py 100.00% <0.00%> (+3.84%) ⬆️
jina/helloworld/helper.py 90.52% <0.00%> (+90.52%) ⬆️
jina/helloworld/__init__.py 100.00% <0.00%> (+100.00%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75f9569...8f790f4. Read the comment docs.

@github-actions
Copy link

github-actions bot commented Jan 17, 2021

Latency summary

Current PR yields:

  • 🐢🐢 index QPS at 1199, delta to last 3 avg.: -12%
  • 😶 query QPS at 27, delta to last 3 avg.: +2%

Breakdown

Version Index QPS Query QPS
current 1199 27
0.9.19 1186 26
0.9.18 1565 25

Backed by latency-tracking. Further commits will update this comment.

@JoanFM
Copy link
Member

JoanFM commented Jan 17, 2021

This is expected as the KVSearchDriver removes the non-found documents.

@florian-hoenicke
Copy link
Member Author

This is expected as the KVSearchDriver removes the non-found documents.

In this case we index random_docs(0, 201) and query random_docs(0, 220) but just get 101 documents back. Expected would be to get 201 documents.

@JoanFM
Copy link
Member

JoanFM commented Jan 18, 2021

This is expected as the KVSearchDriver removes the non-found documents.

In this case we index random_docs(0, 201) and query random_docs(0, 220) but just get 101 documents back. Expected would be to get 201 documents.

The expected result is that u will get in output as many documents as they are found in the KVIndex. Do the 220 documents guarantee that 201 of them have id that had been previously indexed?

@JoanFM
Copy link
Member

JoanFM commented Jan 18, 2021

This is expected as the KVSearchDriver removes the non-found documents.

In this case we index random_docs(0, 201) and query random_docs(0, 220) but just get 101 documents back. Expected would be to get 201 documents.

And if it is the case, u need uses_after to merge all the documents at root level, so that the Request is rebuilt.

What is happening in this test is that, the first shard finishing returns its results. And that is why u just get a portion, what u would maybe see is 2 respones

@florian-hoenicke
Copy link
Member Author

The expected result is that u will get in output as many documents as they are found in the KVIndex. Do the 220 documents guarantee that 201 of them have id that had been previously indexed?

Yes

And if it is the case, u need uses_after to merge all the documents at root level, so that the Request is rebuilt.

Good point. However, I tried it out with uses_after='_merge', as well and it did not work.

@jina-bot jina-bot added the area/core This issue/PR affects the core codebase label Jan 18, 2021
@JoanFM
Copy link
Member

JoanFM commented Jan 18, 2021

The expected result is that u will get in output as many documents as they are found in the KVIndex. Do the 220 documents guarantee that 201 of them have id that had been previously indexed?

Yes

And if it is the case, u need uses_after to merge all the documents at root level, so that the Request is rebuilt.

Good point. However, I tried it out with uses_after='_merge', as well and it did not work.

what traversal paths are used there?

@florian-hoenicke
Copy link
Member Author

As discussed with @JoanFM supporting sharding for plain kv-indexers would be usefull but still requires some conceptual thinking and might take some time to implement.
In this example we use polling all to retrieve results from two shards.
The problem is that the ReduceDriver seems not to work on document level but only on matches.

@JoanFM
Copy link
Member

JoanFM commented Jan 18, 2021

As discussed with @JoanFM supporting sharding for plain kv-indexers would be usefull but still requires some conceptual thinking and might take some time to implement.
In this example we use polling all to retrieve results from two shards.
The problem is that the ReduceDriver seems not to work on document level but only on matches.

ReduceAllDriver ...

@maximilianwerk maximilianwerk marked this pull request as ready for review January 21, 2021 15:57
@maximilianwerk maximilianwerk requested a review from a team as a code owner January 21, 2021 15:57
def _apply_root(
self,
docs: 'DocumentSet',
field: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we say is None, should we also hide field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field contains "docs".
The context_doc is set to None

docs.append(doc)
request = self.msg.request
request.body.ClearField(field)
request.docs.extend(docs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I guess you saw the problem I had?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoanFM The problem was that the docs were stored in the body.docs as well as in docs?

JoanFM
JoanFM previously approved these changes Jan 21, 2021
Copy link
Contributor

@cristianmtr cristianmtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick clarifications

@property
def docs(self):
if self.expect_parts > 1:
return (d for r in reversed(self.partial_reqs) for d in r.docs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoanFM do we reverse the requests in order to wait for the last one to arrive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was like this, so to be honest not sure why it was there in the first place

@florian-hoenicke
Copy link
Member Author

@hanxiao could you review this?
On this pr @JoanFM @maximilianwerk and me worked collaboratively together.
Means no one of us can approve it.

@hanxiao hanxiao merged commit 86853bf into master Jan 22, 2021
@hanxiao hanxiao deleted the test-sharding-non-existent branch January 22, 2021 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/driver component/resource size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants