Skip to content

Conversation

@grantdfoster
Copy link
Collaborator

@grantdfoster grantdfoster commented Jul 28, 2025

PR Description: Enhanced Twitter Apify Integration for Followers/Following Data Collection

Overview

This PR introduces comprehensive Apify integration to the Twitter scraping capabilities, adding a new job type specifically for retrieving Twitter follower and following data through Apify's premium actor services. This significantly enhances the platform's data collection capabilities with reliable, production-grade scraping infrastructure.

Key Improvements

🔧 New Twitter Apify Job Type

  • Added TwitterApifyJob as a new job type alongside existing TwitterJob, TwitterCredentialJob, and TwitterApiJob
  • Implemented dedicated ApifyScrapeStrategy for handling Apify-specific operations
  • Enhanced capability detection to automatically include Apify capabilities when API key is available

🏗️ New Infrastructure Components

  • pkg/client/apify_client.go: Generic Apify API client with full actor lifecycle management (run, poll, retrieve results)
  • internal/jobs/twitterapify/: Complete Twitter-specific Apify integration module
    • client.go: Twitter-specific Apify operations using the premium follower scraper actor
    • scraper.go: High-level scraper interface for followers/following operations

📊 Enhanced Data Collection

  • Followers Collection: Retrieve user followers with pagination support through Apify's premium actors
  • Following Collection: Retrieve user following lists with proper pagination
  • Advanced Pagination: Base64-encoded cursor system for seamless pagination across large datasets
  • Production Constraints: Handles Apify actor requirements (minimum 200 results, proper input validation)

🔄 Improved Job Execution Strategy

  • Intelligent Prioritization: Updated DefaultScrapeStrategy to prioritize Apify for followers/following operations when available
  • Centralized Type Safety: Migrated to centralized tee-types argument unmarshalling for better type validation
  • Enhanced Error Handling: Comprehensive error handling with proper statistics tracking

⚙️ Configuration & Environment

  • Added APIFY_API_KEY environment variable support
  • Enhanced capability detection to automatically include Apify capabilities when API key is configured
  • Updated job configuration structure to support Apify authentication

🧪 Testing & Quality Improvements

  • Extensive test updates to use centralized tee-types structures
  • Updated existing Twitter tests to handle new argument validation
  • Improved error handling and unmarshalling validation in test scenarios

🛠️ Technical Debt Resolution

  • TODO Resolution: Replaced manual string-based capability checking with type-safe enum validation
  • Type Safety: Enhanced argument validation using centralized unmarshalling from tee-types
  • Code Organization: Better separation of concerns with dedicated Apify modules

Impact

This enhancement provides a robust, scalable alternative for Twitter data collection, particularly valuable for large-scale follower/following operations where Apify's premium infrastructure offers better reliability and compliance compared to direct scraping methods. The implementation maintains backward compatibility while adding powerful new capabilities.

Files Modified

  • 14 files changed: 1,059 additions, 220 deletions
  • New modules: Complete Apify integration infrastructure
  • Enhanced: Twitter job execution, capability detection, configuration management
  • Improved: Type safety, error handling, and test coverage

@mudler mudler requested review from Copilot and mcamou August 8, 2025 06:45

This comment was marked as outdated.

@mudler mudler requested a review from Copilot August 8, 2025 10:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive Apify integration for enhanced Twitter follower/following data collection, adding a new TwitterApifyJob type with robust infrastructure and improved argument validation across the codebase.

Key changes include:

  • New Apify Twitter job type - TwitterApifyJob with dedicated client infrastructure for reliable follower/following scraping
  • Enhanced argument validation - Migrated to centralized type-safe unmarshalling from tee-types for better validation and error handling
  • Intelligent job prioritization - Updated DefaultScrapeStrategy to prioritize Apify for follower operations when available

Reviewed Changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/client/apify_client.go New generic Apify API client with actor lifecycle management
internal/jobs/twitterapify/ Complete Twitter-specific Apify integration module
internal/jobs/twitter.go Major refactor with new scrape strategies and centralized argument validation
internal/jobs/twitter_test.go Updated tests for new argument types and added Apify integration tests
internal/jobs/webscraper.go Migrated to centralized argument unmarshalling
tee/masa-tee-worker.json Added APIFY_API_KEY environment variable support

@rapidfix rapidfix requested a review from mcamou August 12, 2025 06:48
Copy link
Contributor

@mcamou mcamou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's do it! Thanks for your patience!

@grantdfoster grantdfoster merged commit 2b21733 into main Aug 12, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants