-
Notifications
You must be signed in to change notification settings - Fork 35
Aime seq #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Aime seq #90
Changes from all commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
bc915db
readme updates
e290211
fix links
72b9b9e
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
3a20480
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
f58154e
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
601c074
enable passing args and kwrgs from data loader to model
efde053
make model args consistent
9b9c5c0
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
e3a295c
enable chat mode inference
ef3238f
linting
2d6f72b
tmp sample usage
7c63409
merge main
2282fe1
fix failing tests
cdc87bf
abort chat after first failure
83445bc
fix datetime decoding issue
19c57bb
formatting
ce1b2fe
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
b2a8376
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
b2391c1
Merge branch 'main' into dl_updates
645eefa
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
08c4fbe
Merge branch 'main' into aime-seq
a90494b
added prompt templates for sequential scenario
b622c03
makes model output col configurable
5df66a2
seq user conf
db64ccc
fix obv issues
b03cc11
cleanup
89daef5
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
4669530
flow fix
77ebe46
prompt fixes
410cfe2
rm redundant line
d9988df
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
12dffdf
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
d832380
merge conflict
2e60522
union all iter results
da05363
cleanup
18c589f
emptiness
111ea88
typo fix
2db7e46
aime extractor fix
7053f65
formatting
173538a
resolve concurrency issues
d77dfad
formatting
382f452
thread safety
8fa0931
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
03c848c
merge with main
5c790b1
bug fixes
2c45d8b
revert to single model uinstance
b88b67f
merge conflict res
37f2f6f
aggregation bug fix
a738891
bug fix
4123c89
col name simplification
e81748a
dedup subset
1b6af70
formatting
536b6f7
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
10b0aba
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
f9c2a48
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
e2947ba
allow sys msg as part of model config
57504cb
merge
f676e6e
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
9efef0d
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
269d296
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
fd84d48
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
81504c2
pull
3a1cd28
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
ca1bfea
merge with main
f758195
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
3d53322
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
f9b2046
merge with main
b4b56da
merge with main
ec366ed
merge with main
559d495
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
4f5e487
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
f71926e
Merge branch 'main' of https://github.com/microsoft/eureka-ml-insights
a54fe84
merge main
f7c1c5b
sync with latest aime
4b95930
aime seq test
10b14bb
new chat test model
d5a2e2c
made EndpointModel importable
71fc25c
revert
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 8 additions & 0 deletions
8
eureka_ml_insights/prompt_templates/aime_templates/hint_creation.jinja
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| - You are a teacher providing hints to guide a student. | ||
| - The answer the student gave is INCORRECT. FIRST, list all the incorrect numerical answers so far in a enumeration like list. | ||
| - Reflect verbally on what likely went wrong and offer a hint to the student, but not the solution. | ||
| <question>{{ prompt }}</question> | ||
|
|
||
| <studentanswer and past teacher hints> | ||
| previous_messages[1:] | ||
| </studentanswer and past teacher hints>" |
1 change: 1 addition & 0 deletions
1
eureka_ml_insights/prompt_templates/aime_templates/prompt_w_hint.jinja
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| That's incorrect. You have made {{attempt_id}} attempts, and they were all wrong. Here are some thoughts\n {{teacher_hint}}.\n\n Try harder, you can do it! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,231 @@ | ||
| import os | ||
| from typing import Any | ||
|
|
||
| from eureka_ml_insights.configs import ( | ||
| DataProcessingConfig, | ||
| DataSetConfig, | ||
| DataUnionConfig, | ||
| InferenceConfig, | ||
| ModelConfig, | ||
| PipelineConfig, | ||
| PromptProcessingConfig, | ||
| ) | ||
| from eureka_ml_insights.core import ( | ||
| DataProcessing, | ||
| DataUnion, | ||
| Inference, | ||
| PromptProcessing, | ||
| ) | ||
| from eureka_ml_insights.data_utils import ( | ||
| AddColumnAndData, | ||
| ColumnRename, | ||
| CopyColumn, | ||
| DataReader, | ||
| RunPythonTransform, | ||
| SamplerTransform, | ||
| SequenceTransform, | ||
| ) | ||
| from eureka_ml_insights.data_utils.aime_utils import AIMEExtractAnswer | ||
| from eureka_ml_insights.data_utils.data import MMDataLoader | ||
| from eureka_ml_insights.metrics.metrics_base import ( | ||
| ExactMatch, | ||
| MetricBasedVerifier, | ||
| ) | ||
|
|
||
| from .aime import AIME_PIPELINE | ||
|
|
||
| DEFAULT_N_ITER = 3 | ||
| RESULT_COLS = [ | ||
| "attempt_id", | ||
| "model_output", | ||
| "uid", | ||
| "prompt", | ||
| "ground_truth", | ||
| "Year", | ||
| "Part", | ||
| "ID", | ||
| "extracted_answer", | ||
| "verification_result", | ||
| "usage", | ||
| ] | ||
| resume_from_dict = {} | ||
|
|
||
|
|
||
| class AIME_SEQ_PIPELINE(AIME_PIPELINE): | ||
| """This class specifies the config for running AIME benchmark on any model""" | ||
|
|
||
| def configure_pipeline( | ||
| self, model_config: ModelConfig, resume_from: str = None, **kwargs: dict[str, Any] | ||
| ) -> PipelineConfig: | ||
|
|
||
| # this call to super will configure the initial prompt processing and final eval reporting comps that can be reused. | ||
| super().configure_pipeline(model_config, resume_from, **kwargs) | ||
|
|
||
| n_iter = kwargs.get("n_iter", DEFAULT_N_ITER) | ||
| # Uncomment if you want to sample a subset of the data for debugging | ||
| #self.data_processing_comp.data_reader_config.init_args["transform"].transforms.append( | ||
| # SamplerTransform(sample_count=2, random_seed=42) | ||
| #) | ||
| component_configs = [self.data_processing_comp] | ||
| for i in range(1, n_iter + 1): | ||
| # Student inference component, reads prompts from the last prompt processing component | ||
| last_prompt_proc_comp = component_configs[-1] | ||
| self.student_inference_comp = InferenceConfig( | ||
| component_type=Inference, | ||
| model_config=model_config, | ||
| data_loader_config=DataSetConfig( | ||
| MMDataLoader, | ||
| { | ||
| "path": os.path.join(last_prompt_proc_comp.output_dir, "transformed_data.jsonl"), | ||
| # if this is not the first iteration, we need to add the previous messages to the data loader config | ||
| "misc_columns": ["previous_messages"] if i > 1 else None, | ||
| }, | ||
| ), | ||
| output_dir=os.path.join(self.log_dir, f"student_inference_result_{i}"), | ||
| resume_from=resume_from_dict.get(i, None), | ||
| chat_mode=True, | ||
| ) | ||
|
|
||
| component_configs.append(self.student_inference_comp) | ||
|
|
||
| # Answer extraction and metric-based verification | ||
| self.verification_comp = DataProcessingConfig( | ||
| component_type=DataProcessing, | ||
| data_reader_config=DataSetConfig( | ||
| DataReader, | ||
| { | ||
| "path": os.path.join(self.student_inference_comp.output_dir, "inference_result.jsonl"), | ||
| "format": ".jsonl", | ||
| "transform": SequenceTransform( | ||
| [ | ||
| # extract and verify the student answer | ||
| AIMEExtractAnswer(f"model_output", f"extracted_answer"), | ||
| MetricBasedVerifier(ExactMatch, f"extracted_answer"), | ||
| AddColumnAndData("attempt_id", i), | ||
| CopyColumn(column_name_src="model_output", column_name_dst=f"student_output"), | ||
| ] | ||
| ), | ||
| }, | ||
| ), | ||
| output_dir=os.path.join(self.log_dir, f"verification_{i}"), | ||
| ) | ||
| component_configs.append(self.verification_comp) | ||
|
|
||
| # Variable maintaining link to the most recent inference result results to be used for evaluation | ||
| # This will be updated to point to the concatenation of results from all iterations | ||
|
|
||
| if i > 1: | ||
| self.last_inference_result_join_comp = DataUnionConfig( | ||
| component_type=DataUnion, | ||
| data_reader_config=DataSetConfig( | ||
| DataReader, | ||
| { | ||
| "path": os.path.join(self.verification_comp.output_dir, "transformed_data.jsonl"), | ||
| "format": ".jsonl", | ||
| }, | ||
| ), | ||
| other_data_reader_config=DataSetConfig( | ||
| DataReader, | ||
| { | ||
| "path": os.path.join(last_agg_dir, "transformed_data.jsonl"), | ||
| "format": ".jsonl", | ||
| }, | ||
| ), | ||
| output_data_columns=RESULT_COLS, | ||
| dedupe_cols=["ID", "attempt_id"], | ||
| output_dir=os.path.join(self.log_dir, f"last_inference_result_join_{i}"), | ||
| ) | ||
| last_agg_dir = self.last_inference_result_join_comp.output_dir | ||
| component_configs.append(self.last_inference_result_join_comp) | ||
| else: | ||
| last_agg_dir = self.verification_comp.output_dir | ||
|
|
||
| # Filtering out the rows with correct answer | ||
| self.filtering_comp = DataProcessingConfig( | ||
| component_type=DataProcessing, | ||
| data_reader_config=DataSetConfig( | ||
| DataReader, | ||
| { | ||
| "path": os.path.join(self.verification_comp.output_dir, "transformed_data.jsonl"), | ||
| "format": ".jsonl", | ||
| "transform": RunPythonTransform(python_code="df = df[df['verification_result'] != 'correct']"), | ||
| }, | ||
| ), | ||
| output_dir=os.path.join(self.log_dir, f"filtering_{i}"), | ||
| ) | ||
| component_configs.append(self.filtering_comp) | ||
|
|
||
| # Create a new prompt to ask the teacher model to provide hints. | ||
| self.hint_processing_comp = PromptProcessingConfig( | ||
| component_type=PromptProcessing, | ||
| data_reader_config=DataSetConfig( | ||
| DataReader, | ||
| { | ||
| "path": os.path.join(self.filtering_comp.output_dir, "transformed_data.jsonl"), | ||
| "format": ".jsonl", | ||
| }, | ||
| ), | ||
| prompt_template_path=os.path.join( | ||
| os.path.dirname(__file__), "../prompt_templates/aime_templates/hint_creation.jinja" | ||
| ), | ||
| output_dir=os.path.join(self.log_dir, f"hint_processing_output_{i}"), | ||
| ) | ||
| component_configs.append(self.hint_processing_comp) | ||
|
|
||
| # Inference component to ask teacher model to provide hints | ||
| self.teacher_inference_comp = InferenceConfig( | ||
| component_type=Inference, | ||
| model_config=model_config, | ||
| data_loader_config=DataSetConfig( | ||
| MMDataLoader, | ||
| { | ||
| "path": os.path.join(self.hint_processing_comp.output_dir, "transformed_data.jsonl"), | ||
| "misc_columns": ["previous_messages"], | ||
| }, | ||
| ), | ||
| output_dir=os.path.join(self.log_dir, f"teacher_inference_result_{i}"), | ||
| max_concurrent=10, | ||
| chat_mode=False, | ||
| ) | ||
| component_configs.append(self.teacher_inference_comp) | ||
|
|
||
| # Prompt processing to ask the stundent to try again | ||
| self.prompt_processing_with_hint = PromptProcessingConfig( | ||
| component_type=PromptProcessing, | ||
| data_reader_config=DataSetConfig( | ||
| DataReader, | ||
| { | ||
| "path": os.path.join(self.teacher_inference_comp.output_dir, "inference_result.jsonl"), | ||
| "format": ".jsonl", | ||
| "transform": ColumnRename(name_mapping={"model_output": "teacher_hint"}), | ||
| }, | ||
| ), | ||
| prompt_template_path=os.path.join( | ||
| os.path.dirname(__file__), "../prompt_templates/aime_templates/prompt_w_hint.jinja" | ||
| ), | ||
| output_dir=os.path.join(self.log_dir, f"teacher_hint_prompt_{i}"), | ||
| ) | ||
| component_configs.append(self.prompt_processing_with_hint) | ||
|
|
||
| # Pass the combined results from all iterations to the eval reporting component | ||
| self.final_preeval_data_processing.data_reader_config.init_args["path"] = os.path.join( | ||
| last_agg_dir, "transformed_data.jsonl" | ||
| ) | ||
|
|
||
| component_configs.extend( | ||
| [ | ||
| self.final_preeval_data_processing, | ||
| self.evalreporting_comp, | ||
| self.data_post_processing_addmv, | ||
| self.mv_evalreporting_comp, | ||
| self.posteval_data_post_processing_comp, | ||
| self.bon_evalreporting_comp, | ||
| self.won_evalreporting_comp, | ||
| ] | ||
| ) | ||
|
|
||
| # Configure the pipeline | ||
| return PipelineConfig( | ||
| component_configs, | ||
| self.log_dir, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.