Skip to content

Conversation

@ups216
Copy link
Member

@ups216 ups216 commented Apr 29, 2025

This pull request introduces several updates and enhancements to the server project, focusing on improving the data processing pipeline and updating metadata for various JSON files. Key changes include adding batch processing capability to the process_githubinfo.ts script, updating metadata logs, and refreshing server data with the latest GitHub statistics.

Enhancements to the process_githubinfo.ts script:

  • Added support for a --batch_size argument to limit the number of files processed in a single run. This includes parsing command-line arguments, filtering files by batch size, and reporting progress. (server/src/data/process_githubinfo.ts) [1] [2] [3] [4]
  • Updated the loadProcessedLog function to reset the log if all files have already been processed. (server/src/data/process_githubinfo.ts)

Metadata and configuration updates:

  • Updated the package.json script names to reflect the new crawl-servers-postprocess task and added a default batch size of 200 for the process_githubinfo script. (server/package.json)
  • Refreshed metadata in mcp_servers_official_list.json and process_githubinfo.log.json to reflect the latest extraction and processing timestamps. (server/src/data/mcp_servers_official_list.json, server/src/data/process_githubinfo.log.json) [1] [2]

Data updates for server JSON files:

  • Updated GitHub-related statistics (e.g., stars, forks) for various server JSON files in the split directory to reflect the latest data. (e.g., githubStars, githubForks). [1] [2] [3] [4] and others)

These changes improve the maintainability of the data processing pipeline while ensuring up-to-date server metadata.

@ups216 ups216 requested a review from Copilot April 29, 2025 08:07
@ups216 ups216 self-assigned this Apr 29, 2025
@ups216 ups216 added the enhancement New feature or request label Apr 29, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the GitHub info collection process by introducing batch processing in the data pipeline and refreshing metadata logs for improved accuracy.

  • Adds a new --batch_size argument to process_githubinfo.ts to limit processing of files in each run.
  • Updates the loadProcessedLog function to automatically reset if all files have been processed.
  • Improves console logging to inform users about the progress and batch details.
Files not reviewed (19)
  • server/package.json: Language not supported
  • server/src/data/mcp_servers_official_list.json: Language not supported
  • server/src/data/process_githubinfo.log.json: Language not supported
  • server/src/data/split/00739d70-842d-4d36-bb98-04bde51a2220_kubernetes.json: Language not supported
  • server/src/data/split/0092c98a-a48a-41fd-bdc8-e69c66e207a8_youtubesubtitles.json: Language not supported
  • server/src/data/split/00cca75b-f426-4279-ae1c-7e6b80636b50_youtubetranscripts.json: Language not supported
  • server/src/data/split/00f1ad7c-0109-47d9-893d-004fe0b555f7_agentcarefhiremr.json: Language not supported
  • server/src/data/split/022d2a79-e4da-47b2-a138-2560f3822400_elevenlabstexttospeech.json: Language not supported
  • server/src/data/split/0272fd9a-3249-4990-bed5-d66aaa4ebdb4_browseruse.json: Language not supported
  • server/src/data/split/02f9e25f-4904-4350-8a2e-a6a0a69f9dfb_onenote.json: Language not supported
  • server/src/data/split/03490ac9-ce58-485d-9166-a295173de417_fetch.json: Language not supported
  • server/src/data/split/04373d12-e650-4b53-b91b-4945607391e2_postgrest.json: Language not supported
  • server/src/data/split/045d532b-d1a6-41f0-a836-b8c40e2caacc_awsknowledgebase.json: Language not supported
  • server/src/data/split/05460e9c-6aab-4f63-9e94-0c5c0f61bc78_inboxzero.json: Language not supported
  • server/src/data/split/05e19a5d-df7f-497e-9272-b36744db5cd5_cognee.json: Language not supported
  • server/src/data/split/0730e308-7fb2-452e-a423-07c6c90e4ed0_mcpazuredevopsserver.json: Language not supported
  • server/src/data/split/075a1d44-e621-461f-b3ae-b454ef1ef34b_deepseekreasoner.json: Language not supported
  • server/src/data/split/078c5c1e-892f-46fd-8646-e9a884f36fc0_zig.json: Language not supported
  • server/src/data/split/07eb49c7-9a55-4a00-b1a2-ba7fbef84fe6_devdb.json: Language not supported

@ups216 ups216 merged commit 909611d into main Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants