Skip to content

Conversation

kamieyy
Copy link
Contributor

@kamieyy kamieyy commented Jul 28, 2025

Description

adds support for downloading the OpenAI Whisper large-v3 model from Hugging Face using MLCFlow.

Related Issue

Fixes #544

🧾 PR Checklist

  • Target branch is dev

📌 Note: PRs must be raised against dev. Do not commit directly to main.

@kamieyy kamieyy requested a review from a team as a code owner July 28, 2025 16:09
Copy link
Contributor

github-actions bot commented Jul 28, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@anandhu-eng
Copy link
Contributor

Hi @kamieyy, Thanks for the effort in raising this PR.

To align with the structure and practices we follow in this repo, a few changes would be needed before we can move forward. Specifically, for model and dataset downloads, we typically expose them through the meta configuration, which allows for more consistent automation. You can refer to how this was handled for Llama2 here for reference.

Since you're new to the repo, I’d highly recommend going through the MLCFlow documentation to get a clearer picture of how the execution flow is structured.

Feel free to revise the PR based on this. Let me know if you need help, I'll be happy to guide you through it!

@kamieyy
Copy link
Contributor Author

kamieyy commented Jul 29, 2025

Hi @anandhu-eng, thanks for the review!

I've made changes to the meta.yaml to

  • Included prehook for HuggingFace model download
  • configured variations and env keys for automation

Please let me know if anything needs further adjustment or cleanup. And, thanks again for pointing me in the right direction.

@kamieyy kamieyy force-pushed the add-whisper-support branch from 7d122a5 to 4934283 Compare July 29, 2025 16:55
The automation framework 

automation/script/module.py
dep_tags_list = dep.get('tags').split(",")

expects every dependency to have a tags string, otherwise it throws an error.
@kamieyy kamieyy requested a review from anandhu-eng August 1, 2025 19:55
@kamieyy
Copy link
Contributor Author

kamieyy commented Aug 4, 2025

recheck

- MLC_OUTDIRNAME
names:
- whisper-outdir-setup
tags: setup,ml-model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this mlc script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjunsuresh Thanks for pointing it out. I've changed it to 'dae' like others.

kamieyy and others added 2 commits August 5, 2025 11:06
removed

  - MLC_OUTDIRNAME
  names:
  - whisper-outdir-setup
  tags: setup,ml-model
@kamieyy
Copy link
Contributor Author

kamieyy commented Aug 5, 2025

There are currently two CI errors: one related to insufficient disk space on the CI runner (OSError: [Errno 28] No space left on device), and another due to missing 'offline' scenario results for the PointPainting model (SubmissionCheckerError: Offline scenario results not found). Based on my review, I'm not sure if these issues are caused by the changes introduced in this PR. Is there anything I'm missing?

@arjunsuresh
Copy link
Collaborator

You can ignore the test failures. 2 of those are due to a pending PR in the inference repository.

@anandhu-eng anandhu-eng merged commit 30ada65 into mlcommons:dev Aug 6, 2025
105 of 107 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 6, 2025
@anandhu-eng
Copy link
Contributor

Thanks again @kamieyy !

@kamieyy kamieyy deleted the add-whisper-support branch August 6, 2025 15:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants