Skip to content

Clone tools locally#697

Open
paulzierep wants to merge 21 commits into
mainfrom
clone-tools-locally
Open

Clone tools locally#697
paulzierep wants to merge 21 commits into
mainfrom
clone-tools-locally

Conversation

@paulzierep

@paulzierep paulzierep commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

This PR changes the extract_all_tools logic, instead of using the github API it clones each repo and crawles the repos.
This way we do not need a github API secret and can run the full pipeline --test for every PR, which makes breaking things much harder.
Also it reduced the time to run the full tool fetching to less then 10 minutes intead of hours.
It also uses the galaxy-utils macro expansion, which improves many tool information, like better inputs / outputs and overall 70 tools more.
I am currently running the full workflow to check if there are any negative effects. Only merge when we are sure.
Full description in the new changelog.md
https://github.com/galaxyproject/galaxy_codex/actions/runs/28222587334

I made the PR from the galaxyproject repo, so I can run the CI quickly, my repo is hitting a limit :)

- clone_repositories() clones/pulls repos locally instead of PyGithub API
- Proper XML macro expansion via galaxy.util.xml_macros
- Supports non-GitHub URLs (GitLab, self-hosted)
- Shallow clones (--depth 1) by default for CI efficiency
- Repo URL deduplication prevents wasted clones
- --workers N for parallel parsing
- No --api flag or GITHUB_API_KEY required
- Fixes: 74 previously missed tools found, 36 conda packages gained
- Suite parsed folder: use GitHub repo URL + tree/master + relative path
  instead of local filesystem path
- Suite version: resolve combined macro tokens (e.g. @TOOL_VERSION@)
  using values from macro XML files before stripping +galaxy suffix
- Suite first commit date: deepen shallow clones (git fetch --deepen 1000)
  when no commit history is found for a tool folder
@paulzierep

Copy link
Copy Markdown
Collaborator Author

Comparison using IUC, looks good, @scorreard I think this one is ready, we can only fully test in production due to the PAT.

Field Old (main) New (PR) Δ
Suite ID 776/776 (100%) 833/833 (100%) 0pp
Tool IDs 775/776 (99%) 832/833 (99%) 0pp
Tool output formats 728/776 (93%) 778/833 (93%) 0pp
Description 773/776 (99%) 830/833 (99%) 0pp
First commit date 776/776 (100%) 833/833 (100%) 0pp
Homepage 760/776 (97%) 817/833 (98%) +1pp
Version 776/776 (100%) 833/833 (100%) 0pp
Conda package 745/776 (96%) 806/833 (96%) 0pp
Latest conda version 663/776 (85%) 685/833 (82%) -3pp
Version status 776/776 (100%) 833/833 (100%) 0pp
Categories 776/776 (100%) 833/833 (100%) 0pp
EDAM operations 381/776 (49%) 424/833 (50%) +1pp
EDAM operations (reduced) 375/776 (48%) 418/833 (50%) +2pp
EDAM topics 392/776 (50%) 433/833 (51%) +1pp
EDAM topics (reduced) 389/776 (50%) 430/833 (51%) +1pp
Owner 775/776 (99%) 832/833 (99%) 0pp
Source 774/776 (99%) 831/833 (99%) 0pp
Parsed folder 776/776 (100%) 833/833 (100%) 0pp
bio.tools ID 422/776 (54%) 465/833 (55%) +1pp
bio.tools name 421/776 (54%) 464/833 (55%) +1pp
bio.tools description 421/776 (54%) 464/833 (55%) +1pp
biii ID 0/776 (0%) 17/833 (2%) +2pp
Related Workflows 444/776 (57%) 482/833 (57%) 0pp
Related Tutorials 258/776 (33%) 289/833 (34%) +1pp

- _normalize_repo_url: trailing slash, .git, whitespace
- _repo_name_from_url: org-repo name extraction
- get_first_commit_for_local_folder: shallow clone deepening, empty output
- Install scholarly (already in requirements.txt) instead of try/except hack
- Fix test_extract_galaxy_workflows.py import path to work without PYTHONPATH
- Replace synthetic test fixtures with real wrappers from CI test repo
- Test get_tool_metadata_from_local with fastp (macros) and 2d_auto_threshold (no macros)
- Add tests for 22 previously untested functions
- Fix lint: ruff, black, isort, mypy all passing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants