Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pattern matching [cps/tasks/download.py: parse yt-dlp output for download progress percentage] #88

Merged
merged 5 commits into from
Jan 11, 2024

Conversation

deldesir
Copy link
Collaborator

@deldesir deldesir commented Jan 10, 2024

These changes enhance the download task in several ways:

  • The download progress is now calculated dynamically based on the subprocess output
  • Updated the pattern to identify progress updates more reliably during the download process.
  • Cleaned up the code by removing redundant lines. Also, added the missing import statement for the 're' module.

Smoke and feature tested on Ubuntu 24.04

  • I reviewed the changes in the download task logic and subprocess handling.

  • I confirm that the dynamic progress calculation reflects the download progress.

  • I verified the improved subprocess output pattern for reliable progress tracking.

  • I checked the adjusted completion condition for 99% progress at 100% subprocess completion

Testing Steps in Calibre-Web UI

  • Initiate a download task for a YouTube url.
  • Monitor the progress updates in the tasks' view.
  • Verify that the progress is accurately calculated and reaches 100% upon successful completion.

Adjusted the download progress polling to dynamically capture progress updates from the subprocess output, enhancing accuracy.
@deldesir deldesir requested a review from holta January 10, 2024 19:16
@deldesir deldesir self-assigned this Jan 10, 2024
@deldesir deldesir marked this pull request as draft January 10, 2024 19:16
@holta holta added the enhancement New feature or request label Jan 10, 2024
Fix indentation
@deldesir deldesir marked this pull request as ready for review January 10, 2024 19:56
Copy link
Member

@holta holta Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deldesir Special Request:

Please communicate in 3-6 words the central purpose of each commit.

Not in the hidden secondary commit message. But in the actual commit message.

With non-descriptive commit messages like "Update download.py" it becomes unnecessarily complicated for everyone to understand (1) what is happening and (2) why. 😕

Goal: Commit Messages should Communicate Clear Intent! ✅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I thought I wrote "Fix indentation"

Copy link
Collaborator Author

@deldesir deldesir Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wrote it in description field. I thought I was putting the commit message in the title. It was an accident, sorry.

Copy link
Member

@holta holta Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everyone benefits when motivation is clear & concise e.g. anything like:

  • "cps/tasks/download.py: Fix indentation"

  • "download.py: Fix indenting in ~30 lines"

(Or anything similar!)

else:
log.error("Failed to send the list of requested files to %s", self.original_url)
self.progress = 0.99
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I record the first 100% progress cycle as %99 so the script won't stop before yt-dlp finishes. There is often one or more quick 0%-100% cycles after the first one (for example the thumbnail and/or subtitles downloads)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't understand it yet, ah well 😎

But thanks so much for explaining!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I record the first 100% progress cycle as %99 so the script won't stop before yt-dlp finishes. There is often one or more quick 0%-100% cycles after the first one (for example the thumbnail and/or subtitles downloads)

@deldesir ideally put some-or-all of this explanation in-line as a comment, near the Python code in question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this explanation (or similar) into a comment near the actual Python code.

Then let's move forward with merging this PR.

Thanks!

@holta
Copy link
Member

holta commented Jan 10, 2024

Has this PR been quickly tested?

If so we should feel free to merge it, to keep community momentum!

@deldesir
Copy link
Collaborator Author

Yes, it is well tested.

Smoke and feature tested on Ubuntu 24.04

  • I reviewed the changes in the download task logic and subprocess handling.
  • I confirm that the dynamic progress calculation reflects the download progress.
  • I verified the improved subprocess output pattern for reliable progress tracking.
  • I checked the adjusted completion condition for 99% progress at 100% subprocess completion

Testing Steps in Calibre-Web UI

  • Initiate a download task for a YouTube url.
  • Monitor the progress updates in the tasks' view.
  • Verify that the progress is accurately calculated and reaches 100% upon successful completion.

@deldesir
Copy link
Collaborator Author

deldesir commented Jan 10, 2024

while p.poll() is None:
line = p.stdout.readline()
if line:
if pattern_progress in line:
percentage = int(re.search(r'\d+', line).group())
if percentage < 100:
self.progress = percentage / 100
else:
self.progress = 0.99
p.wait()

The first 100% progress cycle is recorded as %99. This prevent the script from being stopped before yt-dlp finishes. There is often one or more quick 0%-100% cycles after the first one (for example the thumbnail and/or subtitles downloads)

@@ -41,44 +42,42 @@ def run(self, worker_thread):
p = process_open(subprocess_args, newlines=True)

# Define the pattern for the subprocess output
pattern_analyze = r"Running ANALYZE"
pattern_download = r"'action': 'download'"
pattern_progress = r"downloading"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "downloading" is an extremely common word, that might appear anywhere on a line in future.

Does the following make the regex a bit more precise, and future-proof?

Suggested change
pattern_progress = r"downloading"
# Equivalent Regex: https://github.com/iiab/calibre-web/blob/3c3c77f4dbf54ff093c3e3a68ac9b93a522c4070/scripts/lb-wrapper#L26
pattern_progress = r"^downloading"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use additional testing! Thanks @deldesir and @EMG70 if you can reconfirm Line 46 (now merged) is working well!

Copy link
Member

@holta holta Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad News:

Regex "^downloading" was being used incorrectly yesterday. Functional testing didn't happen when it should have (to catch this serious oversight). ❌

Good News:

PR's like #90, #91 and #93 are getting much smaller and hence tangibly clearer ~ "TDD Craftsmanship" Organic + Incremental + Visible ~ nurturing a Responsive Culture of Collaboration.

By learning from our mistakes (the only way!) one day at a time 🌱 🌱 🌱

Co-authored-by: A Holt <holta@users.noreply.github.com>
@holta
Copy link
Member

holta commented Jan 11, 2024

Legibility = Understandability needs to be the central driving force going forward:

  • Sweeping indentation changes make PR's almost impossible to read ❌

  • Large-scale indentation changes need to be moved to a completely separate PR in future ✔️

Thank @deldesir for taking approachable (small) PR's very seriously!

AI² = Articulate Assumptions, Intent & Intuition

@holta holta changed the title Refactor pattern matching Refactor pattern matching [cps/tasks/download.py: parse yt-dlp output for download progress percentage] Jan 11, 2024
Co-authored-by: A Holt <holta@users.noreply.github.com>
@holta holta merged commit d594aeb into iiab:master Jan 11, 2024
@holta
Copy link
Member

holta commented Jan 11, 2024

@deldesir any quick tip(s) you can provide to @EMG70 to help him reconfirm / validate this just-merged PR?

e.g. Is there anything specific he should look for in "Tasks" view today or in coming days?

# Define the pattern for the subprocess output
# Equivalent Regex: https://github.com/iiab/calibre-web/blob/3c3c77f4dbf54ff093c3e3a68ac9b93a522c4070/scripts/lb-wrapper#L26
pattern_progress = r"^downloading"

Related:

@holta holta mentioned this pull request Jan 11, 2024
@deldesir
Copy link
Collaborator Author

@deldesir any quick tip(s) you can provide to @EMG70 to help him reconfirm / validate this just-merged PR?

A positive scenario testing should finish with 100% recorded in tasks' progress column.

e.g. Is there anything specific he should look for in "Tasks" view today or in coming days?

We will work today on tasks' messages printed in Task column (this is a discussion for another PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants