Skip to content

Implement two-sided platform optimal design for task-worker matching#2

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/design-two-sided-platform
Draft

Implement two-sided platform optimal design for task-worker matching#2
Copilot wants to merge 3 commits intomasterfrom
copilot/design-two-sided-platform

Conversation

Copy link

Copilot AI commented Nov 18, 2025

Description

Implements a two-sided platform matching system for optimal task-worker assignment in distributed crawling scenarios. Uses a greedy matching algorithm with configurable weights to assign tasks to workers based on priority, capacity, performance, and specialization.

Core Components:

  • Task: Represents crawling tasks with priority, complexity, and metadata
  • Worker: Represents crawler instances with capacity, specialization, and performance scores
  • PlatformMatcher: Implements O(n×m) greedy matching algorithm with quality scoring
  • Match: Represents task-worker assignments with quality metrics

Algorithm:

  • Sorts tasks by priority (highest first)
  • Assigns each task to best available worker based on weighted factors:
    • Capacity fit (how well worker capacity matches task complexity)
    • Worker performance score
    • Task priority
    • 10% quality bonus for specialization matches
  • One-to-one assignment with capacity constraints (workers need ≥50% of required capacity)

Usage:

from crawlee import PlatformMatcher, Task, Worker

tasks = [
    Task(id='api', url='https://api.example.com', priority=8.0, 
         estimated_complexity=0.7, metadata={'type': 'api'}),
    Task(id='page', url='https://example.com', priority=5.0, 
         estimated_complexity=0.4),
]

workers = [
    Worker(id='w1', capacity=0.8, specialization='api', performance_score=0.95),
    Worker(id='w2', capacity=0.6, performance_score=0.85),
]

matcher = PlatformMatcher(capacity_weight=0.3, performance_weight=0.3, priority_weight=0.4)
matches = matcher.match_tasks_to_workers(tasks, workers)
efficiency = matcher.calculate_platform_efficiency(matches)  # 86.92%

Files Added:

  • src/crawlee/platform_matching.py - Core implementation (221 LOC)
  • tests/unit/test_platform_matching.py - 19 unit tests
  • docs/examples/code_examples/two_sided_platform_matching.py - Working example
  • docs/guides/two_sided_platform_design.md - API documentation and use cases

Issues

  • Closes: N/A (new feature implementation)

Testing

  • 19 unit tests covering matching logic, validation, edge cases, and integration scenarios
  • Example script demonstrating realistic matching with 86.92% platform efficiency
  • Type checking passes (mypy)
  • Security scan passes (CodeQL: 0 alerts)

Checklist

  • CI passed
Original prompt

Two-sided Platform Optimal Design


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI self-assigned this Nov 18, 2025
Copilot AI and others added 2 commits November 18, 2025 00:59
Co-authored-by: riverodense <168745379+riverodense@users.noreply.github.com>
Co-authored-by: riverodense <168745379+riverodense@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize design for two-sided platform Implement two-sided platform optimal design for task-worker matching Nov 18, 2025
Copilot AI requested a review from riverodense November 18, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants