Conversation
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @attafosu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the evaluation framework for the Llama 3.1-8B model by integrating dedicated accuracy testing capabilities. It updates existing benchmark configurations to align with '.jsonl' dataset formats, fine-tunes generation parameters for consistency, and refines runtime and client settings to improve the reliability and efficiency of both performance and accuracy measurements. The changes are designed to provide a more robust system for assessing the model's summarization quality using ROUGE metrics. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR adds configuration for running accuracy benchmarks on the Llama 3.1-8B model using the CNN/DailyMail dataset. The changes update the example configurations to support accuracy evaluation with ROUGE scoring and adjust various runtime parameters for proper accuracy measurement.
Changes:
- Updated model and dataset configurations to support accuracy evaluation with ROUGE metrics
- Changed temperature from 1.0 to 0.0 for deterministic generation during accuracy runs
- Modified dataset file format from
.jsonto.jsonlacross all configurations
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| examples/05_Llama3.1-8B_Example/online_llama3_8b_cnn.yaml | Added accuracy dataset configuration with ROUGE evaluation, updated load pattern to concurrency-based, and changed file format to jsonl |
| examples/05_Llama3.1-8B_Example/offline_llama3_8b_cnn.yaml | Updated to use jsonl format, adjusted runtime durations, and increased worker count |
| examples/05_Llama3.1-8B_Example/download_cnndm.py | Changed output file extension from .json to .jsonl |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request adds configuration for Llama3.1-8B accuracy runs. The changes are generally good, updating file formats to JSONL, adjusting model parameters for deterministic output, and setting up datasets for accuracy and performance tests. My review focuses on improving the portability and clarity of the example configuration files. I've suggested making the model path more generic and refactoring the dataset configuration to remove redundancy and use a more descriptive name.
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/inference_endpoint/dataset_manager/predefined/cnndailymail/__init__.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/dataset_manager/predefined/cnndailymail/__init__.py
Show resolved
Hide resolved
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| force_regenerate: bool = False, | ||
| ) -> "Dataset": | ||
| transforms = (transforms or []) + cls.PRESETS.llama3() | ||
| df = cls.generate(force=force_regenerate, datasets_dir=datasets_dir) | ||
| return cls(df, transforms=transforms, repeats=num_repeats) |
There was a problem hiding this comment.
The transforms parameter is being overridden by appending preset transforms, which prevents callers from fully controlling the transform pipeline. Consider either applying presets first (so caller transforms can override) or making the preset application optional via a parameter.
| force_regenerate: bool = False, | |
| ) -> "Dataset": | |
| transforms = (transforms or []) + cls.PRESETS.llama3() | |
| df = cls.generate(force=force_regenerate, datasets_dir=datasets_dir) | |
| return cls(df, transforms=transforms, repeats=num_repeats) | |
| apply_presets: bool = True, | |
| force_regenerate: bool = False, | |
| ) -> "Dataset": | |
| if apply_presets: | |
| final_transforms: list[Transform] = (transforms or []) + cls.PRESETS.llama3() | |
| else: | |
| final_transforms = transforms or [] | |
| df = cls.generate(force=force_regenerate, datasets_dir=datasets_dir) | |
| return cls(df, transforms=final_transforms, repeats=num_repeats) |
…/__init__.py Sanitize logger Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) -> "Dataset": | ||
| transforms = cls.PRESETS.llama3() + (transforms or []) |
There was a problem hiding this comment.
The preset transforms are hardcoded to llama3(), but the function accepts a transforms parameter that could conflict with or duplicate the preset transforms. Consider either making the preset configurable or documenting that custom transforms will be appended after the llama3 preset.
| ) -> "Dataset": | |
| transforms = cls.PRESETS.llama3() + (transforms or []) | |
| preset_transforms: list[Transform] | None = None, | |
| ) -> "Dataset": | |
| # By default, use the llama3 preset and append any user-provided transforms. | |
| # Callers can override or disable presets by passing `preset_transforms`. | |
| if transforms is None: | |
| transforms = [] | |
| if preset_transforms is None: | |
| preset_transforms = cls.PRESETS.llama3() | |
| transforms = preset_transforms + transforms |
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@nvzhihanj @arekay-nv please review when you get the chance. thx |
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| force_regenerate: bool = False, | ||
| ) -> "Dataset": | ||
| transforms = (transforms or []) + cls.PRESETS.llama3() | ||
| df = cls.generate(force=force_regenerate, datasets_dir=datasets_dir) | ||
| return cls(df, transforms=transforms, repeats=num_repeats) |
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
What does this PR do?
Adds config for accuracy run
Requires #97
Rouge scores:
Type of change
Related issues
Resolves #56
Testing
Checklist