Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: create DocumentArray index map lazily #3944

Merged
merged 4 commits into from Nov 18, 2021
Merged

Conversation

alaeddine-13
Copy link
Contributor

@alaeddine-13 alaeddine-13 commented Nov 17, 2021

closes https://github.com/jina-ai/internal-tasks/issues/279

results:

from jina import Document
from jina.logging.profile import TimeContext

doc = Document()
with TimeContext('appending chunks'):
    for _ in range(10000):
        doc.chunks.append(
            Document(
                text='hey',
                offset=0,
                weight=1.0 ,
                location=[1, 2],
            )
        )

on master:

appending chunks ...	appending chunks takes 31 seconds (31.40s)

on feature branch:

appending chunks ...	appending chunks takes 0 seconds (0.64s)

@github-actions github-actions bot added size/S area/core This issue/PR affects the core codebase component/type labels Nov 17, 2021
@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #3944 (c3ee85b) into master (3957743) will increase coverage by 0.04%.
The diff coverage is 96.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3944      +/-   ##
==========================================
+ Coverage   89.04%   89.08%   +0.04%     
==========================================
  Files         180      180              
  Lines       12629    12642      +13     
==========================================
+ Hits        11245    11262      +17     
+ Misses       1384     1380       -4     
Flag Coverage Δ
daemon 43.41% <15.38%> (-0.03%) ⬇️
jina 87.32% <96.15%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/types/arrays/document.py 92.48% <94.73%> (-0.27%) ⬇️
jina/types/document/graph.py 92.45% <100.00%> (ø)
jina/math/evaluation.py 87.50% <0.00%> (-5.36%) ⬇️
jina/peapods/zmq/__init__.py 89.06% <0.00%> (-1.40%) ⬇️
jina/peapods/runtimes/zmq/zed.py 91.32% <0.00%> (-0.46%) ⬇️
jina/helper.py 83.27% <0.00%> (+0.17%) ⬆️
jina/peapods/peas/__init__.py 87.23% <0.00%> (+2.12%) ⬆️
jina/types/arrays/mixins/evaluation.py 82.35% <0.00%> (+2.35%) ⬆️
jina/peapods/pods/compound.py 90.05% <0.00%> (+3.31%) ⬆️
jina/peapods/stream/client.py 90.90% <0.00%> (+12.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4f6d5c...c3ee85b. Read the comment docs.

@github-actions
Copy link

github-actions bot commented Nov 17, 2021

Latency summary

Current PR yields:

  • 🐎🐎🐎🐎 index QPS at 1467, delta to last 2 avg.: +26%
  • 🐎🐎🐎🐎 query QPS at 65, delta to last 2 avg.: +26%
  • 🐎🐎🐎🐎 dam extend QPS at 42484, delta to last 2 avg.: +31%
  • 🐎🐎🐎🐎 avg flow time within 1.4928 seconds, delta to last 2 avg.: +0%
  • 🐢🐢 import jina within 0.387 seconds, delta to last 2 avg.: -17%

Breakdown

Version Index QPS Query QPS DAM Extend QPS Avg Flow Time (s) Import Time (s)
current 1467 65 42484 1.4928 0.387
2.4.5 1224 53 33468 1.5396 0.4513
2.4.4 1094 48 31193 1.4183 0.4912

Backed by latency-tracking. Further commits will update this comment.

self._id_to_index = None

@property
def id_to_index(self) -> Dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it provate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's used in jina/types/document/graph.py so it can't be private (IDE used to show a warning)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets still make it private ignore graphDoc for now. graphDoc has been removed from our docs and will highly likely to be deprecated in 3.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@alaeddine-13 alaeddine-13 changed the title feat: create DocumentArray index map lazily perf: create DocumentArray index map lazily Nov 17, 2021
bwanglzu
bwanglzu previously approved these changes Nov 17, 2021
Copy link
Member

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! i also think we can safely ignore graph document at the moment

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test in the types than make sure that a DocumentArray has _id_to_index to None at the beginning?

@alaeddine-13 alaeddine-13 marked this pull request as ready for review November 18, 2021 07:20
@alaeddine-13
Copy link
Contributor Author

Add a test in the types than make sure that a DocumentArray has _id_to_index to None at the beginning?

added

@github-actions github-actions bot added the area/testing This issue/PR affects testing label Nov 18, 2021
assert da._id_to_index is None

# build index map
assert len(da._index_map.keys()) == 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the test to show that the _index_map is not None after accessing by id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great job!

@JoanFM JoanFM merged commit ea5519a into master Nov 18, 2021
@JoanFM JoanFM deleted the feat-lazy-index-map branch November 18, 2021 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/type size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants