Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report shard transfer progress #3555

Merged
merged 5 commits into from
Feb 23, 2024
Merged

Report shard transfer progress #3555

merged 5 commits into from
Feb 23, 2024

Conversation

xzfc
Copy link
Contributor

@xzfc xzfc commented Feb 7, 2024

Closes #3433.

The shard progress state is tracked locally inside TransferTasksPool (aka Collection::transfer_tasks).
Since it's local, only the sender node shows the progress in the result of GET collections/{name}/cluster.

Output of GET collections/{name}/cluster:

{"result": {
  "shard_transfers": [
    {
      "shard_id": 1,
      "from": 3351940210608703,
      "to": 7522668664620757,
      "sync": false,
      "method": "stream_records",
      ↓ ↓ ↓   This field is added   ↓ ↓ ↓
      "comment": "Transferring records (136000/193725), started 15s ago, ETA: 6.83s"
      ↑ ↑ ↑   This field is added  ↑ ↑ ↑
    }
  ],
  "peer_id": 3351940210608703,
  "local_shards": [
    {
      "shard_id": 1,
      "points_count": 178839,
      "state": "Active"
    }, 
  ],
  "remote_shards": [
    {
      "shard_id": 1,
      "peer_id": 7522668664620757,
      "state": "Partial"
    }, 
  ], 
}, }

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@xzfc xzfc marked this pull request as draft February 7, 2024 02:37
@xzfc xzfc requested a review from timvisee February 7, 2024 02:43
@xzfc xzfc marked this pull request as ready for review February 8, 2024 21:38
@xzfc xzfc changed the title [WIP] Report shard transfer progress Report shard transfer progress Feb 8, 2024
lib/collection/src/shards/transfer/stream_records.rs Outdated Show resolved Hide resolved
lib/collection/src/common/eta_calculator.rs Outdated Show resolved Hide resolved
@xzfc xzfc marked this pull request as draft February 14, 2024 01:10
@xzfc xzfc force-pushed the 3433-transfer-progress branch 2 times, most recently from 2827213 to edc2e1a Compare February 19, 2024 09:51
@xzfc xzfc marked this pull request as ready for review February 19, 2024 09:52
@xzfc xzfc requested a review from timvisee February 19, 2024 09:52
@agourlay
Copy link
Member

"eta_seconds": 73.496774076

I think truncating to two decimals should be enough precision :)

@xzfc
Copy link
Contributor Author

xzfc commented Feb 20, 2024

I think truncating to two decimals should be enough precision :)

Done.

Copy link
Member

@agourlay agourlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests and switching to the ringbuffer crate 👍

@xzfc
Copy link
Contributor Author

xzfc commented Feb 22, 2024

After a discussion, switched to a human-readable text format, e.g.

"comment": "Transferring records (136000/193725), started 15s ago, ETA: 6.83s"

@agourlay agourlay self-requested a review February 22, 2024 18:35
@agourlay
Copy link
Member

After a discussion, switched to a human-readable text format, e.g.

Can you please share why this decision was taken? :)

@xzfc
Copy link
Contributor Author

xzfc commented Feb 22, 2024

@agourlay
The text-based format is enough for debugging purposes, and it is flexible enough to cover other transfer types.
OTOH, a strict schema will be too complicated and we don't see any meaningful way to use it in automation.

Like, we can't just to put these values on Grafana dashboard: if we aggregate on points_transferred and points_total over all transfers, we'll get saw graphs which is not very informative.

We want to add total active transfers count metric later to put it on Grafana, but it's outside scope of this PR.

Also, while previous schema was fine for record streaming, the delta/snapshot transfer will complicate reporting by introducing steps (i.e. first download a blob, then transfer records), and we do not want to complicate the schema because of it.

We considered these alternatives:

  • Just percentage.
    { "progress": 35.0 }
    
  • Generic counter name.
    { "items_transferred": 123, "items_total": 12345 }
    
  • Multiple counters.
    { "points_transferred": …, "points_total": …,
      "bytes_transferred”: …, “bytes_total”: … }
    
  • Dynamic JSON object. e.g. progress: Map<String, ProgressItem>
    { "points": { "transferred": 30, "total": 12345 },
      "snapshot": { "transferred": 12345, "total": 12345 },
      "…": { … }
    }
    
  • String just for users, not for machines.
    { "progress": "Step 2/4: Transferring records (123/12345)…" }
    
  • Steps counter.
    { "step_current": 2,
      "step_total": 4,
      "step_progress_current": 123,
      "step_progress_total": 12345 }
    

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@xzfc xzfc merged commit ad606e7 into dev Feb 23, 2024
17 checks passed
@xzfc xzfc deleted the 3433-transfer-progress branch February 23, 2024 10:40
timvisee pushed a commit that referenced this pull request Mar 5, 2024
* Shard transfer progress

* Add test

* Round to two decimals

* Use ringbuffer crate

* Text-based human readable report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants