Skip to content

Add shared markitdown MCP workflow and command-based resource summarizer with persistent storage#1292

Merged
pelikhan merged 5 commits intomainfrom
copilot/add-shared-workflow-markitdown
Oct 7, 2025
Merged

Add shared markitdown MCP workflow and command-based resource summarizer with persistent storage#1292
pelikhan merged 5 commits intomainfrom
copilot/add-shared-workflow-markitdown

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 7, 2025

This PR adds support for Microsoft's markitdown MCP server and implements a command-based resource summarizer workflow with persistent storage across runs.

Changes

Shared MCP Configuration

Created .github/workflows/shared/markitdown-mcp.md - a reusable configuration for the markitdown MCP server that can be imported into any workflow:

mcp-servers:
  markitdown:
    container: "ghcr.io/microsoft/markitdown"
    allowed: ["*"]

This uses a Docker container-based transport for reliable execution and follows the MCP stdio pattern. The shared configuration promotes DRY principles and consistency across workflows.

Resource Summarizer Workflow

Created .github/workflows/pdf-summary.md - a command-based agentic workflow that analyzes and summarizes resources from URLs:

Triggers:

  • Command trigger: Responds to /summarize in issues and comments
  • Workflow dispatch: Manual trigger with url and query inputs

How it works:

  1. Identifies URLs from command text or workflow_dispatch inputs
  2. Uses the markitdown MCP server to convert resources to markdown (supports PDF, HTML, Word, PowerPoint, images, and more)
  3. Analyzes the content to answer the provided query
  4. Stores analysis results in cache-memory for persistent knowledge building
  5. Posts a formatted response as a comment

Key features:

  • Flexible queries: Supports custom questions or defaults to "summarize in the context of this repository"
  • Multiple resource types: Handles any resource that markitdown can convert (PDF, HTML, DOCX, PPTX, images, etc.)
  • Persistent storage: Uses cache-memory to store analysis results across workflow runs, building a knowledge base over time
  • Knowledge building: Can reference previous analyses to provide more contextual insights
  • Secure comment creation: Uses safe-outputs.add-comment pattern with minimal permissions (contents: read, actions: read)
  • Repository context: Analyzes resources specifically in the context of the current repository
  • Timeout protection: 15-minute timeout prevents runaway executions

Technical Details

The markitdown MCP server is configured in the compiled workflow as:

{
  "mcpServers": {
    "markitdown": {
      "type": "local",
      "command": "docker",
      "tools": ["*"],
      "args": ["run", "--rm", "-i", "ghcr.io/microsoft/markitdown"]
    }
  }
}

Cache-memory is enabled to provide persistent file storage at /tmp/cache-memory/ for storing analysis results as structured JSON files.

The workflow uses the imports field in the frontmatter to merge the shared MCP configuration, demonstrating the reusability pattern for MCP server definitions across multiple workflows.

Example Usage

Using the command trigger:

/summarize https://example.com/research-paper.pdf

Or with a specific query:

/summarize https://example.com/security-guide.pdf what are the key security recommendations?

Using workflow dispatch:

  1. Navigate to Actions → Resource Summarizer Agent → Run workflow
  2. Enter URL(s) in the url field (comma-separated for multiple)
  3. Optionally enter a custom query (defaults to "summarize in the context of this repository")

The workflow will:

  1. Convert the resource(s) to markdown using markitdown
  2. Analyze the content to answer the query
  3. Store the analysis results in cache-memory for future reference
  4. Post a formatted response as a comment with findings and repository context

This makes external resources more accessible by converting them to markdown, providing intelligent analysis within the repository context, and building a persistent knowledge base of analyzed resources.

Testing

  • ✅ All unit tests pass
  • ✅ Workflow compiles successfully (179.0 KB lock file generated)
  • ✅ MCP server configuration validated
  • ✅ Command trigger and workflow_dispatch inputs properly configured
  • ✅ Cache-memory tool configured for persistent storage
  • ✅ Follows repository conventions and security best practices
Original prompt

Add a shared workflow that uses the Microsoft markitdown mcp (use container)

Add an agentic workflow that triggers on push with pdf files, and generates a summary in a new pull request (file.pdf -> file.summary.pdf) using markitdown.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add shared workflow for markdown summary generation Add shared markitdown MCP workflow and PDF summary generator Oct 7, 2025
Copilot AI requested a review from pelikhan October 7, 2025 05:22
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Oct 7, 2025

@copilot update prompt to skip if summary file already exists

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Oct 7, 2025

@copilot update prompt to skip if summary file already exists

Updated the prompt to skip PDFs that already have existing .summary.md files. The workflow now:

  1. Checks for existing .summary.md files before processing each PDF
  2. Skips PDFs that already have summaries to avoid duplicate work
  3. Lists skipped files in the PR description
  4. Exits gracefully if all PDFs already have summaries (no empty PRs)

Changes in commit 9c7b74e.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Oct 7, 2025

@copilot change the way the workflow works:

It is a command workflow "summarize" that takes URLs to resources to analyzes and potentially a query.

Also support workflow_dispatch with url, query inputs

The agent should identify the resources, use markitdown to convert them to markdown that analyze them to respond to the query. If kissing, the query is "summarize in the context of this repository"

…spatch

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Oct 7, 2025

@copilot change the way the workflow works:

It is a command workflow "summarize" that takes URLs to resources to analyzes and potentially a query....

Changed the workflow to a command-based resource summarizer:

Command Trigger: /summarize in issues and comments

  • Example: /summarize https://example.com/document.pdf
  • Optional query: /summarize https://example.com/doc.pdf what are the security implications?
  • Default query: "summarize in the context of this repository"

Workflow Dispatch: Manual trigger with inputs

  • url: URL(s) to analyze (required, supports comma-separated)
  • query: Question to answer (optional, defaults to repository context summary)

How it works:

  1. Identifies URLs from command or workflow_dispatch input
  2. Uses markitdown MCP to convert resources to markdown
  3. Analyzes content to answer the query
  4. Posts formatted response as a comment

Changes in commit 0bf3e7d.

Copilot AI changed the title Add shared markitdown MCP workflow and PDF summary generator Add shared markitdown MCP workflow and command-based resource summarizer Oct 7, 2025
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Oct 7, 2025

@copilot prompt store results if cache-memory tool as well.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Oct 7, 2025

@copilot prompt store results if cache-memory tool as well.

Added cache-memory tool and updated the prompt to store analysis results:

Cache-Memory Configuration:

  • Enabled cache-memory: true in workflow tools
  • Storage location: /tmp/cache-memory/
  • Persists across workflow runs using GitHub Actions cache

Prompt Updates:

  1. Added Step 5: "Store Results in Cache Memory" with instructions to save analysis results
  2. Added "Cache Memory Usage" section explaining how to use persistent storage
  3. Updated Current Context to include persistent storage path
  4. Added example JSON structure for storing analyses

Benefits:

  • Builds knowledge over time about analyzed resources
  • Can reference previous analyses for context
  • Tracks patterns across multiple resource analyses
  • Avoids redundant analysis of previously reviewed resources

Changes in commit 7c2a8f5.

Copilot AI changed the title Add shared markitdown MCP workflow and command-based resource summarizer Add shared markitdown MCP workflow and command-based resource summarizer with persistent storage Oct 7, 2025
@pelikhan pelikhan marked this pull request as ready for review October 7, 2025 06:03
@pelikhan pelikhan merged commit 3ba100b into main Oct 7, 2025
26 of 31 checks passed
@pelikhan pelikhan deleted the copilot/add-shared-workflow-markitdown branch October 7, 2025 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants