- 
                Notifications
    You must be signed in to change notification settings 
- Fork 706
Add HuggingFace Llama3.2 1B to benchmark #5368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5368
 Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled JobAs of commit e2779ee with merge base 8460d42 ( NEW FAILURE - The following job has failed:
 
 This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
449b4d1    to
    b48035a      
    Compare
  
    53e7756    to
    a13a44b      
    Compare
  
    | @guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. | 
a13a44b    to
    97050c2      
    Compare
  
    | @guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. | 
| Upload model artifacts to GitHub is skipped https://github.com/pytorch/executorch/actions/runs/10858058150/job/30136354800. Don't see the reason behind from the log. The model artifacts are placed under  | 
| 
 Oops, the size of the export model is 11+ GB I think. I think uploading such large file to GH is taking too long and the job timed out. I think I need to rework the upload part here as GH doesn't scale, so we need to go straight to S3. | 
cd4c507    to
    60b62d3      
    Compare
  
    | @guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. | 
60b62d3    to
    009f932      
    Compare
  
    | @guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. | 
| Tried running the gemma-2b on Google Pixel 8 Pro (w/ 12GB RAM). The failure is same. Some I/O failures when connecting the device in the pool: https://github.com/pytorch/executorch/actions/runs/10908663134/job/30277474048. In the stacktrace I see there is a call  | 
| I'm checking AWS doc on this https://docs.aws.amazon.com/devicefarm/latest/developerguide/limits.html and it mentions a 4GB limit, but that's for the size of the app, not the extra data archive. Let me run this manually using AWS UI and see if it accepts the model. The archive size is 5.4 GB https://github.com/pytorch/executorch/actions/runs/10908663134/job/30278173066#step:11:38. IIRC, llam2 7b works but it's only ~3GB | 
9e89593    to
    b2d837e      
    Compare
  
    f936584    to
    7b55bb9      
    Compare
  
    7b55bb9    to
    6cb6af9      
    Compare
  
    cb3efe3    to
    bedecd8      
    Compare
  
    | SpinQuant and QLORA are passing. | 
| Orignal BF16 is passing: | 
bedecd8    to
    e2779ee      
    Compare
  
    | @guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. | 
| Decided to leaving the logics of running the 1b model in scheduled jobs in a separate PR to simplify the review as it will require significant refactoring in the workflow. | 
| Just a data point that I see your test run shows up on the dashboard at https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2011%20Dec%202024%2004%3A07%3A58%20GMT&stopTime=Wed%2C%2018%20Dec%202024%2004%3A07%3A58%20GMT&granularity=hour&lBranch=add_hf_model_to_benchinfra&lCommit=e2779ee5cbe666072a2d0f7a6821d640a11d1ad9&rBranch=add_hf_model_to_benchinfra&rCommit=e2779ee5cbe666072a2d0f7a6821d640a11d1ad9&repoName=pytorch%2Fexecutorch&modelName=All%20Models&backendName=All%20Backends&dtypeName=All%20DType&deviceName=All%20Devices and the extraction logic looks wrong in which llama model has the backend and benchmark configs swapped.  I guess this is what you means by introducing the new  | 
Add llama3.2 1b from Hugging Face to benchmark w/ the following configs:
Switched to use the memory intensive runners in the benchmark workflow to reduce operation cost.