feat: Add overview extraction for myvideo extractor#29
Merged
Conversation
This change implements the extraction of episode overviews for the myvideo.net.tw extractor. The extractor now correctly parses and stores the overview text found within 'span[class='episodeIntro movieIntro'] blockquote'. Additionally, the extractor has been made more resilient by handling cases where season-level information (like title and overall description) might be missing on the page, allowing it to proceed with episode data extraction.
There was a problem hiding this comment.
Pull Request Overview
Adds extraction of episode overview and improves resilience when season-level data is missing.
- Wraps season-level selectors in a try/except to continue on missing title or description
- Extracts episode overview text from the
span.episodeIntro blockquoteelement - Leaves
episode_overviewblank if the blockquote is absent
Comments suppressed due to low confidence (2)
tmdb-import/extractors/myvideo.py:47
- New behavior for extracting
episode_overviewshould be covered by tests, including cases where the blockquote is present and absent.
episode_overview_element = episode.find_element(By.CSS_SELECTOR, value="span[class='episodeIntro movieIntro'] blockquote")
tmdb-import/extractors/myvideo.py:15
- The
try:block is mis-indented compared to its nested code, which will raise an IndentationError. Align thetryand matchingexceptwith the other statements at the same indentation level.
try:
Comment on lines
+47
to
+48
| episode_overview_element = episode.find_element(By.CSS_SELECTOR, value="span[class='episodeIntro movieIntro'] blockquote") | ||
| episode_overview = episode_overview_element.text if episode_overview_element else "" |
There was a problem hiding this comment.
Using find_element will raise a NoSuchElementException if no blockquote exists, so the subsequent if episode_overview_element check is never reached. Consider using find_elements and checking for a non-empty list or wrapping this call in its own try/except.
Suggested change
| episode_overview_element = episode.find_element(By.CSS_SELECTOR, value="span[class='episodeIntro movieIntro'] blockquote") | |
| episode_overview = episode_overview_element.text if episode_overview_element else "" | |
| episode_overview_elements = episode.find_elements(By.CSS_SELECTOR, value="span[class='episodeIntro movieIntro'] blockquote") | |
| episode_overview = episode_overview_elements[0].text if episode_overview_elements else "" |
Comment on lines
+26
to
+27
| except Exception as e: | ||
| logging.warning(f"Could not extract season-level information: {e}") |
There was a problem hiding this comment.
Catching the base Exception may hide unexpected errors. It’s better to catch specific exceptions (e.g., NoSuchElementException) to avoid masking real issues.
Suggested change
| except Exception as e: | |
| logging.warning(f"Could not extract season-level information: {e}") | |
| except NoSuchElementException as e: | |
| logging.warning(f"Element not found during season-level extraction: {e}") | |
| except WebDriverException as e: | |
| logging.warning(f"Web driver error during season-level extraction: {e}") | |
| except Exception as e: | |
| logging.critical(f"Unexpected error during season-level extraction: {e}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change implements the extraction of episode overviews for the myvideo.net.tw extractor.
The extractor now correctly parses and stores the overview text found within 'span[class='episodeIntro movieIntro'] blockquote'.
Additionally, the extractor has been made more resilient by handling cases where season-level information (like title and overall description) might be missing on the page, allowing it to proceed with episode data extraction.