Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(binarypb): delete on dump #2102

Merged
merged 1 commit into from Mar 5, 2021
Merged

feat(binarypb): delete on dump #2102

merged 1 commit into from Mar 5, 2021

Conversation

cristianmtr
Copy link
Contributor

@cristianmtr cristianmtr commented Mar 2, 2021

  • basic tests pass
  • add benchmark

Benchmark

delete_on_dump=True, entries=100000. took 1.7688062191009521 seconds
delete_on_dump=False, entries=100000. took 0.001649618148803711 seconds

Code:

@pytest.mark.skipif('GITHUB_WORKFLOW' in os.environ, reason='skip the network test on github workflow')
@pytest.mark.parametrize('delete_on_dump', [True, False])
def test_binarypb_benchmark(test_metas, delete_on_dump):
    entries = 100000
    nr_to_update = 10000
    keys = np.arange(entries)
    values = np.random.randint(0, 10, size=entries).astype(bytes)

    with BinaryPbIndexer(metas=test_metas, delete_on_dump=delete_on_dump) as idxer:
        idxer.add(keys, values)
        idxer.save()
        assert idxer.size == entries
        save_abspath = idxer.save_abspath

    new_values = np.random.randint(0, 10, size=nr_to_update).astype(bytes)

    with BaseIndexer.load(save_abspath) as idxer:
        idxer.update(keys[:nr_to_update], new_values)
        time_now = time.time()
        idxer.save()

    time_end = time.time()
    print(f'{delete_on_dump=}, entries={entries}. took {time_end - time_now} seconds')

@jina-bot jina-bot added the size/M label Mar 2, 2021
@cristianmtr cristianmtr linked an issue Mar 2, 2021 that may be closed by this pull request
2 tasks
@jina-bot jina-bot added area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/executor executor/indexer labels Mar 2, 2021
@cristianmtr cristianmtr requested a review from JoanFM March 2, 2021 10:57
@github-actions
Copy link

github-actions bot commented Mar 2, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 1117, delta to last 3 avg.: +0%
  • 😶 query QPS at 18, delta to last 3 avg.: -4%

Breakdown

Version Index QPS Query QPS
current 1117 18
1.0.7 1116 18
1.0.6 1111 18

Backed by latency-tracking. Further commits will update this comment.

@codecov
Copy link

codecov bot commented Mar 2, 2021

Codecov Report

Merging #2102 (1e537f6) into master (cb40b44) will decrease coverage by 22.12%.
The diff coverage is 40.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #2102       +/-   ##
===========================================
- Coverage   89.74%   67.61%   -22.13%     
===========================================
  Files         211      192       -19     
  Lines       11054    10531      -523     
===========================================
- Hits         9920     7121     -2799     
- Misses       1134     3410     +2276     
Flag Coverage Δ
daemon ?
jina 67.61% <40.00%> (-22.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/executors/indexers/keyvalue.py 73.04% <40.00%> (-25.66%) ⬇️
jina/schemas/pod.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/ping.py 0.00% <0.00%> (-100.00%) ⬇️
jina/schemas/flow.py 0.00% <0.00%> (-100.00%) ⬇️
jina/schemas/meta.py 0.00% <0.00%> (-100.00%) ⬇️
jina/docker/helper.py 0.00% <0.00%> (-100.00%) ⬇️
jina/schemas/driver.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/hub/new.py 0.00% <0.00%> (-100.00%) ⬇️
jina/schemas/request.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/hub/list.py 0.00% <0.00%> (-100.00%) ⬇️
... and 126 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb40b44...1e537f6. Read the comment docs.

@cristianmtr cristianmtr requested a review from JoanFM March 2, 2021 12:28
@cristianmtr cristianmtr force-pushed the feat-real-delete-kv branch 2 times, most recently from c2e896a to 76e090d Compare March 2, 2021 13:49
@cristianmtr cristianmtr marked this pull request as ready for review March 2, 2021 14:10
@cristianmtr cristianmtr requested a review from a team as a code owner March 2, 2021 14:10
@cristianmtr
Copy link
Contributor Author

#2102 (comment)

@JoanFM Refactoring the handlers to be context managers broke a lot of tests. The issue: the write handler needs to be initialized in different modes (append or create) depending on is_exist. But it's not. Need to investigate why. But I will revert this change from this PR and open a new one after this is merged.

nan-wang
nan-wang previously approved these changes Mar 3, 2021
Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

JoanFM
JoanFM previously approved these changes Mar 3, 2021
@cristianmtr cristianmtr dismissed stale reviews from JoanFM and nan-wang via aac6e65 March 3, 2021 09:49
@jina-bot jina-bot added area/cicd This issue/PR affects the cicd pipeline area/housekeeping This issue/PR is housekeeping labels Mar 3, 2021
JoanFM
JoanFM previously approved these changes Mar 4, 2021
jina/executors/indexers/keyvalue.py Outdated Show resolved Hide resolved
@cristianmtr cristianmtr merged commit 3af2905 into master Mar 5, 2021
@cristianmtr cristianmtr deleted the feat-real-delete-kv branch March 5, 2021 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cicd This issue/PR affects the cicd pipeline area/core This issue/PR affects the core codebase area/housekeeping This issue/PR is housekeeping area/testing This issue/PR affects testing component/executor executor/indexer size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permanently delete documents from storage
4 participants