Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make progress indicator more predictable #286

Open
mvglasow opened this issue Aug 17, 2022 · 0 comments
Open

Make progress indicator more predictable #286

mvglasow opened this issue Aug 17, 2022 · 0 comments

Comments

@mvglasow
Copy link

Running duperemove gives a percentage indicator, which, however, presents two surprises:

  • The percentage indicator is limited to the indexing phase. When it reaches 100%, it doesn’t mean everything is deduplicated – we’re just moving on to the next phase. (Office Space, anyone?)
  • Percentages for the indexing phase seem to be calculated exclusively on file count, not on block count. This may lead to surprises if all the big files tend to be underneath one particular directory – around 50%, progress will either seem to speed up or slow down dramatically.

For first-time users, this can be a bit misleading. If progress reaches 10% after a day, one might expect the whole process to take 10 days, only to find 19 days later that indexing is still in progress, and 2 days later that indexing is just one out of two or three phases.

The holy grail of UX for progress would be a near-steady rate, though I understand that may be difficult, depending on circumstances.

Suggestions:

  • Calculate the total number of files as well as the total number of blocks – should be fairly easy to do, just add up file sizes in blocks.
  • For index progress, calculate the average of file and block progress. For example, after processing 80% of files but only 20% of blocks, progress should be 50% (currently 80%).
  • To account for the total number of phases:
    • Easy: display something like phase 1/3, 80%
    • Advanced: guesstimate the percentage of total time each phase will take, and scale accordingly. For example, indexing might go from 0% to 40%, loading indexes from 40% to 60%, and actual deduplication from 60% to 100%.

For the last point, the advanced solution is probably suitable only if the duration of phases relative to each other is somewhat predictable from the moment indexing starts (else, extrapolations based on progress will become unreliable again). The easy solution is probably preferable if the duration of the phases is highly variable and cannot be predicted from the start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant