Add 3-part retry with proxy rotation #58

karilaa-dev · 2026-01-15T07:50:39Z

PR Type

Enhancement, Bug fix

Description

Implement 3-part retry strategy with proxy rotation
Add ProxySession class for per-request proxy state management
Refactor video/music extraction with separate retry logic per stage
Add TikTokInvalidLinkError for URL resolution failures
Remove dead code and improve logging configuration

Diagram Walkthrough

flowchart LR
  A[Request Start] --> B[ProxySession Init]
  B --> C[Part 1: URL Resolution]
  C -->|Success| D[Part 2: Video Info Extraction]
  C -->|Failure| E[Rotate Proxy]
  E --> C
  D -->|Success| F[Part 3: Download]
  D -->|Failure| E
  F -->|Success| G[Return Result]
  F -->|Failure| E

File Walkthrough

Relevant files

Configuration changes

2 files

config.py `Add retry and logging configuration`	+45/-5
loader.py `Update logging configuration`	+23/-12

Enhancement

9 files

get_inline.py `Simplify to use built-in retry`	+6/-33
get_music.py `Simplify to use built-in retry`	+5/-26
get_video.py `Simplify to use built-in retry`	+11/-29
video_types.py `Add slideshow retry with error handling`	+121/-58
__init__.py `Export TikTokInvalidLinkError`	+2/-0
client.py `Implement 3-part retry with ProxySession`	+769/-294
exceptions.py `Add TikTokInvalidLinkError`	+6/-0
models.py `Add proxy session to VideoInfo`	+9/-1
proxy_manager.py `Add proxy auth encoding`	+25/-1

Documentation

4 files

.env.example `Add LOG_LEVEL configuration`	+3/-0
AGENTS.md `Add Claude Code guidance`	+55/-0
CLAUDE.md `Add symlink to AGENTS.md`	+1/-0
CODEBASE_MAP.md `Add comprehensive codebase documentation`	+536/-0

Generate comprehensive codebase map documenting architecture, module purposes, data flows, and navigation guides. Add CLAUDE.md for quick reference when working with Claude Code.

Restructure video extraction into 3 separate retry parts, each with its own configurable retry count and instant proxy rotation on failure: - Part 1: URL resolution (short URLs to full URLs) - Part 2: Video info extraction (metadata and video data) - Part 3: Download (video/images/music) Key changes: - Add ProxySession class to manage proxy state per request flow - Add TikTokInvalidLinkError for Part 1 failures - Add new env vars: URL_RESOLVE_MAX_RETRIES, VIDEO_INFO_MAX_RETRIES, DOWNLOAD_MAX_RETRIES (all default to 3) - Refactor video() and music() to use internal 3-part retry - Add download_slideshow_images() with individual image retry - Update handlers to use simplified video()/music() API - Add TikTok status code handling (10204=deleted, 10222=private) The same proxy persists across all parts unless a retry is triggered, at which point it rotates to the next proxy for instant retry.

Document the new ProxySession class and 3-part retry architecture in CODEBASE_MAP.md, including updated diagrams, method references, and environment variable configuration.

- Remove unused _extract_raw_data_sync method (~207 lines of dead code) - Move config import to module level in client.py (removes 7 inline imports) - Add error handling for slideshow fallback path when all images fail

zam-review · 2026-01-15T07:52:03Z

PR Reviewer Guide 🔍

(Review updated until commit `50661ad`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns Sensitive information exposure: The _strip_proxy_auth function in tiktok_api/client.py (lines 60-79) properly removes credentials from proxy URLs for logging. However, verify that all log statements throughout the codebase use this function when logging proxy information. The proxy_manager.py now URL-encodes credentials (lines 45-63), which is good practice, but ensure the encoded credentials are never logged in error messages or debug output.
⚡ Recommended focus areas for review Thread Safety Concern The ProxySession class uses instance variables (_current_proxy, _initialized) without any locking mechanism. While ProxyManager has a rotation lock, ProxySession itself could have race conditions if the same session instance is used concurrently across multiple async tasks. Consider adding thread safety guarantees or documenting that ProxySession should not be shared across concurrent operations. @dataclass class ProxySession: """Manages proxy state for a single video request flow. This class ensures that the same proxy is used across all parts of the video extraction flow (URL resolution -> video info -> download) unless a retry is triggered, in which case the proxy is rotated. The proxy is lazily initialized on first use (get_proxy) and can be explicitly rotated via rotate_proxy() when retrying a failed operation. Usage: proxy_manager = ProxyManager.get_instance() session = ProxySession(proxy_manager) # Part 1, 2, 3 use the same proxy proxy = session.get_proxy() # On retry, rotate to a new proxy proxy = session.rotate_proxy() """ proxy_manager: Optional["ProxyManager"] _current_proxy: Optional[str] = field(default=None, init=False) _initialized: bool = field(default=False, init=False) def get_proxy(self) -> Optional[str]: """Get the current proxy (lazily initialized on first call). Returns: Proxy URL string, or None for direct connection. """ if not self._initialized: self._initialized = True if self.proxy_manager: self._current_proxy = self.proxy_manager.get_next_proxy() logger.debug( f"ProxySession initialized with proxy: " f"{_strip_proxy_auth(self._current_proxy)}" ) else: logger.debug("ProxySession initialized with direct connection") return self._current_proxy def rotate_proxy(self) -> Optional[str]: """Rotate to the next proxy in the rotation (for retries). Returns: New proxy URL string, or None for direct connection. """ if self.proxy_manager: old_proxy = self._current_proxy self._current_proxy = self.proxy_manager.get_next_proxy() logger.debug( f"ProxySession rotated: {_strip_proxy_auth(old_proxy)} -> " f"{_strip_proxy_auth(self._current_proxy)}" ) self._initialized = True return self._current_proxy Resource Cleanup Risk In download_slideshow_images, if an exception occurs during the download process, the download_context stored in video_info._download_context might not be properly cleaned up. The method raises TikTokNetworkError when images fail, but the context cleanup happens only in the calling code. Verify that all exception paths properly clean up resources to prevent yt-dlp resource leaks. async def download_slideshow_images( self, video_info: VideoInfo, proxy_session: ProxySession, max_retries: Optional[int] = None, max_concurrent: Optional[int] = None, ) -> list[bytes]: """Download all slideshow images with individual retry per image. Part 3 of the 3-part retry strategy for slideshows. Downloads all images in parallel. If an individual image fails, retries only that image with a new proxy (up to max_retries per image). Successfully downloaded images are kept. Args: video_info: VideoInfo object containing image URLs in data field proxy_session: ProxySession for proxy management max_retries: Maximum retry attempts per image (default from config) max_concurrent: Maximum concurrent downloads (default from config) Returns: List of image bytes in the same order as input URLs Raises: TikTokNetworkError: If any image fails to download after all retries ValueError: If video_info is not a slideshow or has no download context """ if video_info.type != "images": raise ValueError("video_info is not a slideshow") if not video_info._download_context: raise ValueError("video_info has no download context") if max_retries is None: retry_config = config.get("retry", {}) max_retries = retry_config.get("download_max_retries", 3) if max_concurrent is None: perf_config = config.get("performance", {}) max_concurrent = perf_config.get("max_concurrent_images", 20) image_urls: list[str] = video_info.data # type: ignore num_images = len(image_urls) # Track results and retry counts per image results: list[Optional[bytes]] = [None] * num_images retry_counts: list[int] = [0] * num_images failed_indices: set[int] = set(range(num_images)) # All start as "to download" semaphore = asyncio.Semaphore(max_concurrent) async def download_single_image( index: int, url: str, proxy: Optional[str] ) -> Tuple[int, Optional[bytes], Optional[Exception]]: """Download a single image with semaphore limiting.""" async with semaphore: context_with_proxy = {*video_info._download_context, "proxy": proxy} try: result = await self._download_media_async( url, context_with_proxy, max_retries=1 ) return index, result, None except Exception as e: return index, None, e # First pass: download all images in parallel proxy = proxy_session.get_proxy() logger.debug( f"Downloading {num_images} slideshow images via {_strip_proxy_auth(proxy)}" ) tasks = [ download_single_image(i, url, proxy) for i, url in enumerate(image_urls) ] first_pass_results = await asyncio.gather(tasks, return_exceptions=True) # Process first pass results for result in first_pass_results: if isinstance(result, Exception): # Task itself raised exception (shouldn't happen with our wrapper) continue index, data, error = result if data is not None: results[index] = data failed_indices.discard(index) else: retry_counts[index] = 1 # Retry failed images individually while failed_indices: # Check if any images have exhausted retries exhausted = [i for i in failed_indices if retry_counts[i] >= max_retries] if exhausted: logger.error( f"Failed to download {len(exhausted)} images after {max_retries} " f"attempts each: indices {exhausted}" ) raise TikTokNetworkError( f"Failed to download {len(exhausted)} slideshow images" ) # Rotate proxy for retries proxy = proxy_session.rotate_proxy() logger.debug( f"Retrying {len(failed_indices)} failed images " f"via {_strip_proxy_auth(proxy)}" ) # Retry all currently failed images retry_tasks = [ download_single_image(i, image_urls[i], proxy) for i in failed_indices ] retry_results = await asyncio.gather(retry_tasks, return_exceptions=True) # Process retry results for result in retry_results: if isinstance(result, Exception): continue index, data, error = result retry_counts[index] += 1 if data is not None: results[index] = data failed_indices.discard(index) # All images downloaded successfully logger.info(f"Successfully downloaded {num_images} slideshow images") return results # type: ignore Inconsistent Retry Logic* The retry strategy treats different error types inconsistently across the 3 parts. Part 2 (_extract_video_info_with_retry) immediately raises permanent errors (deleted, private, region), but Part 3 (_download_video_with_retry) retries all errors including network errors. Consider whether certain download errors (like 404, 403) should also be treated as permanent errors to avoid unnecessary retries. async def _download_video_with_retry( self, video_url: str, download_context: dict[str, Any], proxy_session: ProxySession, duration: Optional[int] = None, max_retries: Optional[int] = None, ) -> bytes: """Download video with retry and proxy rotation. Part 3 of the 3-part retry strategy for videos. Args: video_url: Direct video URL from TikTok CDN download_context: Context containing ydl, ie, referer_url proxy_session: ProxySession for proxy management duration: Video duration in seconds (for streaming decision) max_retries: Maximum retry attempts (default from config) Returns: Video bytes Raises: TikTokNetworkError: Download failed after all retries """ if max_retries is None: retry_config = config.get("retry", {}) max_retries = retry_config.get("download_max_retries", 3) last_error: Optional[Exception] = None for attempt in range(1, max_retries + 1): proxy = proxy_session.get_proxy() logger.debug( f"Video download attempt {attempt}/{max_retries} " f"via {_strip_proxy_auth(proxy)}" ) # Update download context with current proxy context_with_proxy = {**download_context, "proxy": proxy} try: # Use single CDN attempt per proxy (cdn_max_retries=1) result = await self._download_media_async( video_url, context_with_proxy, duration=duration, max_retries=1, # Single attempt per proxy ) if result is not None: return result # Download returned None - treat as failure last_error = TikTokNetworkError("Download returned empty result") except Exception as e: last_error = e # Rotate proxy for next attempt (instant retry since different IP) if attempt < max_retries: proxy_session.rotate_proxy() logger.warning( f"Video download attempt {attempt}/{max_retries} failed, " f"rotating proxy: {last_error}" ) # All attempts exhausted logger.error( f"Video download failed after {max_retries} attempts: {last_error}" ) raise TikTokNetworkError(f"Failed to download video after {max_retries} attempts")

zam-review · 2026-01-15T07:59:52Z

PR Description updated to latest commit (50661ad)

karilaa-dev · 2026-01-15T08:02:45Z

/describe

karilaa-dev · 2026-01-15T08:02:48Z

/review

zam-review · 2026-01-15T08:03:25Z

PR Description updated to latest commit (50661ad)

zam-review · 2026-01-15T08:03:28Z

Persistent review updated to latest commit 50661ad

Replace old RETRY_MAX_ATTEMPTS and RETRY_REQUEST_TIMEOUT with granular retry configuration for each part of the extraction flow: - URL_RESOLVE_MAX_RETRIES (Part 1) - VIDEO_INFO_MAX_RETRIES (Part 2) - DOWNLOAD_MAX_RETRIES (Part 3)

karilaa-dev added 10 commits January 13, 2026 01:52

Add support for proxy with weird symbols

c6cc64a

Fix proxy for yt-dlp

98ea436

Add logging configuration and enhance proxy logging

c4e91b5

Add proxies.txt to .gitignore

0b8fd43

Enhance URL resolution to support additional TikTok short link formats

2bc71a9

Integrate ProxyManager into TikTokClient for improved proxy handling

3f5822d

Add codebase documentation with Cartographer

5e435af

Generate comprehensive codebase map documenting architecture, module purposes, data flows, and navigation guides. Add CLAUDE.md for quick reference when working with Claude Code.

Update codebase documentation with 3-part retry strategy

5040e1e

Document the new ProxySession class and 3-part retry architecture in CODEBASE_MAP.md, including updated diagrams, method references, and environment variable configuration.

Clean up code review issues from PR #57

50661ad

- Remove unused _extract_raw_data_sync method (~207 lines of dead code) - Move config import to module level in client.py (removes 7 inline imports) - Add error handling for slideshow fallback path when all images fail

zam-review bot added the Review effort 4/5 label Jan 15, 2026

Repository owner deleted a comment from zam-review bot Jan 15, 2026

Update .env.example with 3-part retry settings

3f321be

Replace old RETRY_MAX_ATTEMPTS and RETRY_REQUEST_TIMEOUT with granular retry configuration for each part of the extraction flow: - URL_RESOLVE_MAX_RETRIES (Part 1) - VIDEO_INFO_MAX_RETRIES (Part 2) - DOWNLOAD_MAX_RETRIES (Part 3)

karilaa-dev merged commit 940f8c1 into main Jan 15, 2026

karilaa-dev deleted the temp-fix branch January 17, 2026 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 3-part retry with proxy rotation #58

Add 3-part retry with proxy rotation #58

Uh oh!

karilaa-dev commented Jan 15, 2026 •

edited by zam-review bot

Loading

Uh oh!

zam-review bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

zam-review bot commented Jan 15, 2026

Uh oh!

karilaa-dev commented Jan 15, 2026

Uh oh!

karilaa-dev commented Jan 15, 2026

Uh oh!

zam-review bot commented Jan 15, 2026

Uh oh!

zam-review bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add 3-part retry with proxy rotation #58

Add 3-part retry with proxy rotation #58

Uh oh!

Conversation

karilaa-dev commented Jan 15, 2026 • edited by zam-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

zam-review bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 50661ad)

Uh oh!

zam-review bot commented Jan 15, 2026

Uh oh!

karilaa-dev commented Jan 15, 2026

Uh oh!

karilaa-dev commented Jan 15, 2026

Uh oh!

zam-review bot commented Jan 15, 2026

Uh oh!

zam-review bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karilaa-dev commented Jan 15, 2026 •

edited by zam-review bot

Loading

zam-review bot commented Jan 15, 2026 •

edited

Loading

(Review updated until commit `50661ad`)