Skip to content

Conversation

@karilaa-dev
Copy link
Owner

@karilaa-dev karilaa-dev commented Jan 15, 2026

PR Type

Enhancement, Bug fix


Description

  • Implement 3-part retry strategy with proxy rotation

  • Add ProxySession class for per-request proxy state management

  • Refactor video/music extraction with separate retry logic per stage

  • Add TikTokInvalidLinkError for URL resolution failures

  • Remove dead code and improve logging configuration


Diagram Walkthrough

flowchart LR
  A[Request Start] --> B[ProxySession Init]
  B --> C[Part 1: URL Resolution]
  C -->|Success| D[Part 2: Video Info Extraction]
  C -->|Failure| E[Rotate Proxy]
  E --> C
  D -->|Success| F[Part 3: Download]
  D -->|Failure| E
  F -->|Success| G[Return Result]
  F -->|Failure| E
Loading

File Walkthrough

Relevant files
Configuration changes
2 files
config.py
Add retry and logging configuration                                           
+45/-5   
loader.py
Update logging configuration                                                         
+23/-12 
Enhancement
9 files
get_inline.py
Simplify to use built-in retry                                                     
+6/-33   
get_music.py
Simplify to use built-in retry                                                     
+5/-26   
get_video.py
Simplify to use built-in retry                                                     
+11/-29 
video_types.py
Add slideshow retry with error handling                                   
+121/-58
__init__.py
Export TikTokInvalidLinkError                                                       
+2/-0     
client.py
Implement 3-part retry with ProxySession                                 
+769/-294
exceptions.py
Add TikTokInvalidLinkError                                                             
+6/-0     
models.py
Add proxy session to VideoInfo                                                     
+9/-1     
proxy_manager.py
Add proxy auth encoding                                                                   
+25/-1   
Documentation
4 files
.env.example
Add LOG_LEVEL configuration                                                           
+3/-0     
AGENTS.md
Add Claude Code guidance                                                                 
+55/-0   
CLAUDE.md
Add symlink to AGENTS.md                                                                 
+1/-0     
CODEBASE_MAP.md
Add comprehensive codebase documentation                                 
+536/-0 

Generate comprehensive codebase map documenting architecture, module
purposes, data flows, and navigation guides. Add CLAUDE.md for quick
reference when working with Claude Code.
Restructure video extraction into 3 separate retry parts, each with
its own configurable retry count and instant proxy rotation on failure:

- Part 1: URL resolution (short URLs to full URLs)
- Part 2: Video info extraction (metadata and video data)
- Part 3: Download (video/images/music)

Key changes:
- Add ProxySession class to manage proxy state per request flow
- Add TikTokInvalidLinkError for Part 1 failures
- Add new env vars: URL_RESOLVE_MAX_RETRIES, VIDEO_INFO_MAX_RETRIES,
  DOWNLOAD_MAX_RETRIES (all default to 3)
- Refactor video() and music() to use internal 3-part retry
- Add download_slideshow_images() with individual image retry
- Update handlers to use simplified video()/music() API
- Add TikTok status code handling (10204=deleted, 10222=private)

The same proxy persists across all parts unless a retry is triggered,
at which point it rotates to the next proxy for instant retry.
Document the new ProxySession class and 3-part retry architecture
in CODEBASE_MAP.md, including updated diagrams, method references,
and environment variable configuration.
- Remove unused _extract_raw_data_sync method (~207 lines of dead code)
- Move config import to module level in client.py (removes 7 inline imports)
- Add error handling for slideshow fallback path when all images fail
@zam-review
Copy link

zam-review bot commented Jan 15, 2026

PR Reviewer Guide 🔍

(Review updated until commit 50661ad)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
The _strip_proxy_auth function in tiktok_api/client.py (lines 60-79) properly removes credentials from proxy URLs for logging. However, verify that all log statements throughout the codebase use this function when logging proxy information. The proxy_manager.py now URL-encodes credentials (lines 45-63), which is good practice, but ensure the encoded credentials are never logged in error messages or debug output.

⚡ Recommended focus areas for review

Thread Safety Concern

The ProxySession class uses instance variables (_current_proxy, _initialized) without any locking mechanism. While ProxyManager has a rotation lock, ProxySession itself could have race conditions if the same session instance is used concurrently across multiple async tasks. Consider adding thread safety guarantees or documenting that ProxySession should not be shared across concurrent operations.

@dataclass
class ProxySession:
    """Manages proxy state for a single video request flow.

    This class ensures that the same proxy is used across all parts of the
    video extraction flow (URL resolution -> video info -> download) unless
    a retry is triggered, in which case the proxy is rotated.

    The proxy is lazily initialized on first use (get_proxy) and can be
    explicitly rotated via rotate_proxy() when retrying a failed operation.

    Usage:
        proxy_manager = ProxyManager.get_instance()
        session = ProxySession(proxy_manager)

        # Part 1, 2, 3 use the same proxy
        proxy = session.get_proxy()

        # On retry, rotate to a new proxy
        proxy = session.rotate_proxy()
    """

    proxy_manager: Optional["ProxyManager"]
    _current_proxy: Optional[str] = field(default=None, init=False)
    _initialized: bool = field(default=False, init=False)

    def get_proxy(self) -> Optional[str]:
        """Get the current proxy (lazily initialized on first call).

        Returns:
            Proxy URL string, or None for direct connection.
        """
        if not self._initialized:
            self._initialized = True
            if self.proxy_manager:
                self._current_proxy = self.proxy_manager.get_next_proxy()
                logger.debug(
                    f"ProxySession initialized with proxy: "
                    f"{_strip_proxy_auth(self._current_proxy)}"
                )
            else:
                logger.debug("ProxySession initialized with direct connection")
        return self._current_proxy

    def rotate_proxy(self) -> Optional[str]:
        """Rotate to the next proxy in the rotation (for retries).

        Returns:
            New proxy URL string, or None for direct connection.
        """
        if self.proxy_manager:
            old_proxy = self._current_proxy
            self._current_proxy = self.proxy_manager.get_next_proxy()
            logger.debug(
                f"ProxySession rotated: {_strip_proxy_auth(old_proxy)} -> "
                f"{_strip_proxy_auth(self._current_proxy)}"
            )
        self._initialized = True
        return self._current_proxy
Resource Cleanup Risk

In download_slideshow_images, if an exception occurs during the download process, the download_context stored in video_info._download_context might not be properly cleaned up. The method raises TikTokNetworkError when images fail, but the context cleanup happens only in the calling code. Verify that all exception paths properly clean up resources to prevent yt-dlp resource leaks.

async def download_slideshow_images(
    self,
    video_info: VideoInfo,
    proxy_session: ProxySession,
    max_retries: Optional[int] = None,
    max_concurrent: Optional[int] = None,
) -> list[bytes]:
    """Download all slideshow images with individual retry per image.

    Part 3 of the 3-part retry strategy for slideshows.

    Downloads all images in parallel. If an individual image fails,
    retries only that image with a new proxy (up to max_retries per image).
    Successfully downloaded images are kept.

    Args:
        video_info: VideoInfo object containing image URLs in data field
        proxy_session: ProxySession for proxy management
        max_retries: Maximum retry attempts per image (default from config)
        max_concurrent: Maximum concurrent downloads (default from config)

    Returns:
        List of image bytes in the same order as input URLs

    Raises:
        TikTokNetworkError: If any image fails to download after all retries
        ValueError: If video_info is not a slideshow or has no download context
    """
    if video_info.type != "images":
        raise ValueError("video_info is not a slideshow")

    if not video_info._download_context:
        raise ValueError("video_info has no download context")

    if max_retries is None:
        retry_config = config.get("retry", {})
        max_retries = retry_config.get("download_max_retries", 3)

    if max_concurrent is None:
        perf_config = config.get("performance", {})
        max_concurrent = perf_config.get("max_concurrent_images", 20)

    image_urls: list[str] = video_info.data  # type: ignore
    num_images = len(image_urls)

    # Track results and retry counts per image
    results: list[Optional[bytes]] = [None] * num_images
    retry_counts: list[int] = [0] * num_images
    failed_indices: set[int] = set(range(num_images))  # All start as "to download"

    semaphore = asyncio.Semaphore(max_concurrent)

    async def download_single_image(
        index: int, url: str, proxy: Optional[str]
    ) -> Tuple[int, Optional[bytes], Optional[Exception]]:
        """Download a single image with semaphore limiting."""
        async with semaphore:
            context_with_proxy = {**video_info._download_context, "proxy": proxy}
            try:
                result = await self._download_media_async(
                    url, context_with_proxy, max_retries=1
                )
                return index, result, None
            except Exception as e:
                return index, None, e

    # First pass: download all images in parallel
    proxy = proxy_session.get_proxy()
    logger.debug(
        f"Downloading {num_images} slideshow images via {_strip_proxy_auth(proxy)}"
    )

    tasks = [
        download_single_image(i, url, proxy) for i, url in enumerate(image_urls)
    ]
    first_pass_results = await asyncio.gather(*tasks, return_exceptions=True)

    # Process first pass results
    for result in first_pass_results:
        if isinstance(result, Exception):
            # Task itself raised exception (shouldn't happen with our wrapper)
            continue
        index, data, error = result
        if data is not None:
            results[index] = data
            failed_indices.discard(index)
        else:
            retry_counts[index] = 1

    # Retry failed images individually
    while failed_indices:
        # Check if any images have exhausted retries
        exhausted = [i for i in failed_indices if retry_counts[i] >= max_retries]
        if exhausted:
            logger.error(
                f"Failed to download {len(exhausted)} images after {max_retries} "
                f"attempts each: indices {exhausted}"
            )
            raise TikTokNetworkError(
                f"Failed to download {len(exhausted)} slideshow images"
            )

        # Rotate proxy for retries
        proxy = proxy_session.rotate_proxy()
        logger.debug(
            f"Retrying {len(failed_indices)} failed images "
            f"via {_strip_proxy_auth(proxy)}"
        )

        # Retry all currently failed images
        retry_tasks = [
            download_single_image(i, image_urls[i], proxy) for i in failed_indices
        ]
        retry_results = await asyncio.gather(*retry_tasks, return_exceptions=True)

        # Process retry results
        for result in retry_results:
            if isinstance(result, Exception):
                continue
            index, data, error = result
            retry_counts[index] += 1
            if data is not None:
                results[index] = data
                failed_indices.discard(index)

    # All images downloaded successfully
    logger.info(f"Successfully downloaded {num_images} slideshow images")
    return results  # type: ignore
Inconsistent Retry Logic

The retry strategy treats different error types inconsistently across the 3 parts. Part 2 (_extract_video_info_with_retry) immediately raises permanent errors (deleted, private, region), but Part 3 (_download_video_with_retry) retries all errors including network errors. Consider whether certain download errors (like 404, 403) should also be treated as permanent errors to avoid unnecessary retries.

async def _download_video_with_retry(
    self,
    video_url: str,
    download_context: dict[str, Any],
    proxy_session: ProxySession,
    duration: Optional[int] = None,
    max_retries: Optional[int] = None,
) -> bytes:
    """Download video with retry and proxy rotation.

    Part 3 of the 3-part retry strategy for videos.

    Args:
        video_url: Direct video URL from TikTok CDN
        download_context: Context containing ydl, ie, referer_url
        proxy_session: ProxySession for proxy management
        duration: Video duration in seconds (for streaming decision)
        max_retries: Maximum retry attempts (default from config)

    Returns:
        Video bytes

    Raises:
        TikTokNetworkError: Download failed after all retries
    """
    if max_retries is None:
        retry_config = config.get("retry", {})
        max_retries = retry_config.get("download_max_retries", 3)

    last_error: Optional[Exception] = None

    for attempt in range(1, max_retries + 1):
        proxy = proxy_session.get_proxy()
        logger.debug(
            f"Video download attempt {attempt}/{max_retries} "
            f"via {_strip_proxy_auth(proxy)}"
        )

        # Update download context with current proxy
        context_with_proxy = {**download_context, "proxy": proxy}

        try:
            # Use single CDN attempt per proxy (cdn_max_retries=1)
            result = await self._download_media_async(
                video_url,
                context_with_proxy,
                duration=duration,
                max_retries=1,  # Single attempt per proxy
            )

            if result is not None:
                return result

            # Download returned None - treat as failure
            last_error = TikTokNetworkError("Download returned empty result")

        except Exception as e:
            last_error = e

        # Rotate proxy for next attempt (instant retry since different IP)
        if attempt < max_retries:
            proxy_session.rotate_proxy()
            logger.warning(
                f"Video download attempt {attempt}/{max_retries} failed, "
                f"rotating proxy: {last_error}"
            )

    # All attempts exhausted
    logger.error(
        f"Video download failed after {max_retries} attempts: {last_error}"
    )
    raise TikTokNetworkError(f"Failed to download video after {max_retries} attempts")

Repository owner deleted a comment from zam-review bot Jan 15, 2026
Repository owner deleted a comment from zam-review bot Jan 15, 2026
Repository owner deleted a comment from zam-review bot Jan 15, 2026
@zam-review
Copy link

zam-review bot commented Jan 15, 2026

PR Description updated to latest commit (50661ad)

Repository owner deleted a comment from zam-review bot Jan 15, 2026
Repository owner deleted a comment from zam-review bot Jan 15, 2026
@karilaa-dev
Copy link
Owner Author

/describe

@karilaa-dev
Copy link
Owner Author

/review

@zam-review
Copy link

zam-review bot commented Jan 15, 2026

PR Description updated to latest commit (50661ad)

@zam-review
Copy link

zam-review bot commented Jan 15, 2026

Persistent review updated to latest commit 50661ad

Replace old RETRY_MAX_ATTEMPTS and RETRY_REQUEST_TIMEOUT with granular
retry configuration for each part of the extraction flow:
- URL_RESOLVE_MAX_RETRIES (Part 1)
- VIDEO_INFO_MAX_RETRIES (Part 2)
- DOWNLOAD_MAX_RETRIES (Part 3)
@karilaa-dev karilaa-dev merged commit 940f8c1 into main Jan 15, 2026
@karilaa-dev karilaa-dev deleted the temp-fix branch January 17, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants