Skip to content

feat: Implement true pagination everywhere with cursor-based pagination #152

@sapientpants

Description

@sapientpants

Feature: True Pagination for All List Queries

Business Value

Enable efficient, scalable data retrieval for large DeepSource accounts with hundreds or thousands of projects, issues, and analysis runs. This feature ensures complete, deterministic results without silent truncation, improving reliability for enterprise users and enabling proper integration with automation tools that need to process all available data.

User Story

As a DeepSource MCP server user with a large account
I want to retrieve all available data through paginated requests
So that I can process complete datasets without missing items or hitting memory limits

Gherkin Specification

Feature: Cursor-Based Pagination for All List Queries
  All list operations in the DeepSource MCP server should support cursor-based pagination
  to handle large datasets efficiently and provide deterministic, complete results.

  Background:
    Given the DeepSource API is available
    And the user has a valid DEEPSOURCE_API_KEY configured
    And the account has more than 100 items in various collections

  Scenario: Paginating through projects list
    Given the account has 250 projects
    When I request projects with page_size of 50
    Then I should receive the first 50 projects
    And the response should include a next_cursor
    And the response should indicate has_more_pages is true
    When I request projects with the next_cursor
    Then I should receive the next 50 projects
    And I can continue paginating until all 250 projects are retrieved

  Scenario: Paginating through issues with filters
    Given a project has 500 issues
    When I request issues with analyzer filter "python" and page_size of 25
    Then I should receive the first 25 Python issues
    And the pagination should respect the filter throughout all pages
    When I continue paginating with the same filter
    Then I should receive all Python issues across multiple pages

  Scenario: Handling max_pages limit
    Given a project has 1000 issues
    When I request issues with page_size of 100 and max_pages of 3
    Then I should receive exactly 300 issues across 3 pages
    And the response should indicate that the limit was reached
    And the response should provide a cursor to continue if needed

  Scenario: Backward pagination support
    Given I have a cursor from page 3 of results
    When I request the previous page using backward pagination
    Then I should receive page 2 of results
    And I should be able to navigate bidirectionally through pages

  Scenario Outline: Pagination for different query types
    Given the <collection> has <total_items> items
    When I request <collection> with page_size of <page_size>
    Then pagination should work correctly for <collection>
    And all <total_items> items should be retrievable

    Examples:
      | collection                  | total_items | page_size |
      | projects                    | 150         | 50        |
      | project_issues              | 500         | 100       |
      | runs                        | 200         | 25        |
      | dependency_vulnerabilities  | 300         | 50        |
      | quality_metrics            | 75          | 20        |

  Scenario: Consistent ordering across pages
    Given a collection with items created at different times
    When I paginate through the collection
    Then items should appear in consistent order across all pages
    And no items should be duplicated across pages
    And no items should be skipped

  Scenario: Empty result handling
    Given a project with no issues
    When I request issues with pagination parameters
    Then I should receive an empty items array
    And has_more_pages should be false
    And no cursor should be provided

  Scenario: Single page result
    Given a project with 15 issues
    When I request issues with page_size of 20
    Then I should receive all 15 issues
    And has_more_pages should be false
    And no next_cursor should be provided

  Scenario: Invalid cursor handling
    Given an invalid or expired cursor
    When I request data with this cursor
    Then I should receive a clear error message
    And the error should suggest starting fresh pagination

Acceptance Criteria

  • All list endpoints support cursor-based pagination parameters:
    • page_size (or first/last for GraphQL compatibility)
    • cursor (or after/before for forward/backward pagination)
    • max_pages to limit total pages retrieved
  • Pagination works consistently across all list queries:
    • projects
    • project_issues
    • runs
    • recent_run_issues
    • dependency_vulnerabilities
    • quality_metrics (when applicable)
    • Any future list endpoints
  • Response format includes pagination metadata:
    • items array with actual data
    • pagination object with:
      • has_more_pages boolean
      • next_cursor (when more pages exist)
      • previous_cursor (for backward navigation)
      • total_count (when available from API)
      • page_size (actual items returned)
  • Filters and sorting are maintained across pagination
  • Performance is optimized:
    • Queries request only needed fields
    • Appropriate default page sizes (10-50 items)
    • Exponential backoff for rate limiting
  • Error handling for pagination edge cases:
    • Invalid cursors
    • Exceeding max_pages
    • API rate limits
  • Backward compatibility maintained:
    • Existing queries without pagination work as before
    • Default behavior returns first page of results

Non-Goals

  • This feature will NOT implement offset-based pagination (use cursor-based only)
  • Will NOT cache pages across requests (each request fetches fresh data)
  • Will NOT implement client-side sorting (rely on API ordering)
  • Out of scope: Infinite scroll or streaming APIs
  • Will NOT modify the underlying DeepSource GraphQL API

Risks & Mitigations

  • Risk: Breaking changes for existing users
    Mitigation: Maintain backward compatibility by making pagination optional with sensible defaults

  • Risk: Performance degradation for small datasets
    Mitigation: Use intelligent defaults that don't paginate unnecessarily for small result sets

  • Risk: Cursor expiration or invalidation
    Mitigation: Provide clear error messages and recovery instructions

  • Risk: Memory issues when max_pages is too high
    Mitigation: Implement reasonable limits and provide streaming alternatives for very large datasets

Technical Considerations

  • Architecture impact:

    • Modify all client classes to support pagination parameters
    • Update response models to include pagination metadata
    • Enhance GraphQL query builders for cursor support
  • Performance considerations:

    • Implement request batching where appropriate
    • Consider connection pooling for multiple page requests
    • Monitor and log pagination performance metrics
  • Implementation approach:

    • Utilize existing Relay-style pagination from GraphQL
    • Extend PaginationParams type for all list methods
    • Create consistent pagination response wrapper
    • Update GraphQL queries to include pageInfo fields
  • Dependencies:

    • No new external dependencies required
    • Leverage existing GraphQL pagination support
    • Use current error handling and retry logic

Testing Requirements

  • Unit tests for pagination logic in each client class
  • Integration tests with mock data for multi-page scenarios
  • Property-based tests for pagination invariants:
    • No duplicates across pages
    • Complete coverage of all items
    • Consistent ordering
  • Edge case testing:
    • Empty results
    • Single page results
    • Maximum page limits
    • Invalid cursors
  • Performance tests for large datasets
  • Backward compatibility tests

Definition of Done

  • All list endpoints support cursor-based pagination
  • Pagination parameters are documented in tool definitions
  • Response format includes complete pagination metadata
  • All tests passing with >80% coverage
  • Documentation updated with pagination examples
  • Performance benchmarks show no regression
  • Large account testing confirms complete data retrieval
  • Backward compatibility verified
  • Error scenarios handled gracefully
  • Code reviewed and approved

Implementation Notes

  1. Priority Order:

    • Start with project_issues (most likely to have large datasets)
    • Then runs and dependency_vulnerabilities
    • Finally projects and other endpoints
  2. Reusable Components:

    • Create generic pagination utilities in src/utils/pagination/
    • Extend existing helper functions
    • Build consistent response formatters
  3. Migration Strategy:

    • Phase 1: Add pagination support without breaking changes
    • Phase 2: Update documentation and examples
    • Phase 3: Deprecate non-paginated responses (future)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions