Feature/code eval metrics #29

yisz · 2024-02-16T02:06:16Z

Deterministic Code Eval metrics

string match (Exact and Fuzzy match score)
Python AST similarity

	🚀 This PR description was created by Ellipsis for commit `967273e`.

Summary:

This PR introduces CodeStringMatch and PythonASTSimilarity metrics for code evaluation, includes their tests, updates the relevant documentation, and makes unrelated changes to docs/README.md.

Key points:

Introduced CodeStringMatch and PythonASTSimilarity metrics in continuous_eval/metrics/code_deterministic_metrics.py.
Imported these metrics in continuous_eval/metrics/__init__.py.
Added tests for these metrics in tests/code_metrics_test.py.
Updated documentation in /docs/src/content/docs/metrics/Code/Deterministic/python_ast_similarity.md and /docs/src/content/docs/metrics/Code/Deterministic/string_match.md.
Made unrelated changes to docs/README.md.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

Changes requested. Reviewed entire PR up to commit 665be35

Reviewed 394 lines of code across 4 files in 1 minute(s) and 10 second(s).

See details

Skipped files: 2 (please contact us to request support for these files): poetry.lock, pyproject.toml
Confidence threshold: 50%
Drafted 0 additional comments.
Workflow ID: wflow_SZUmwxIbI5H7NJ0C

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2024-02-16T02:07:36Z

continuous_eval/metrics/code_deterministic_metrics.py

+    Modifications: Adjusted to be used in the context of generated code evaluation
+    '''
+
+    def _compare_ASTs(ast_a: AST, ast_b: AST, reorder_depth: int) -> int:


The docstring for the _compare_ASTs method says it returns a boolean, but the method actually returns an integer. Please update the docstring to reflect the actual return type.

@ellipsis-dev can you fix this docstrings for me?

@yisz, I have addressed your comments in pull request #30

ellipsis-dev · 2024-02-16T02:07:36Z

continuous_eval/metrics/code_deterministic_metrics.py

+        """
+        return len(list(ast.walk(root)))
+
+    def _apply_weights_to_subtrees(weight: float, subtree: AST) -> float:


The weights applied in the _apply_weights_to_subtrees method are hardcoded and not explained. Consider defining these weights as constants at the top of the file, with comments explaining their purpose and how they were determined.

ellipsis-dev · 2024-02-16T02:07:36Z

continuous_eval/metrics/code_deterministic_metrics.py

+        ]
+
+        similarity_scores = [
+            PythonASTSimilarity._compare_subtrees(answer_subtree, ground_truth_subtree, 1000)[0]


The reorder_depth parameter is hardcoded as 1000 in the calculate method. This could potentially lead to performance issues for large ASTs. Consider making this a configurable parameter.

…_mj7A [Ellipsis] Change return type of `_compare_ASTs` function in docstring

ellipsis-dev

Changes requested. Incremental review on commit 24518d0

Reviewed 401 lines of code across 3 files in 3 minute(s) and 1 second(s).

See details

Skipped files: 0
Confidence threshold: 50%
Drafted 2 additional comments.
Workflow ID: wflow_Kw60igPcehGZ3deG

View 2 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 2 additional comments

Comment at `continuous_eval/metrics/code_deterministic_metrics.py:304`

Consider logging the SyntaxError message for easier debugging when parsing the answer and ground truth code.

Comment at `continuous_eval/metrics/code_deterministic_metrics.py:292`

Consider making the reorder_depth in the _compare_many method configurable through method parameters or class attributes for greater flexibility.

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2024-02-16T21:14:25Z

continuous_eval/metrics/code_deterministic_metrics.py

+class PythonASTSimilarity(Metric):
+    """
+    The following functions are adapted from python-ast-comparison by Pedro Salazar Paredes
+    Copyright (c) 2023 Pedro Salazar Paredes


The copyright year in the docstring seems to be incorrect. It is mentioned as 2023, which is a future year. Please correct it.

ellipsis-dev · 2024-02-16T21:14:25Z

continuous_eval/metrics/code_deterministic_metrics.py

+            answer_tree = ast.parse(answer, mode="exec")
+            ground_truth_trees = [ast.parse(gt, mode="exec") for gt in ground_truths]
+        except SyntaxError as e:
+            return {"Python_AST_Similarity": -1.0}


Returning a Python_AST_Similarity score of -1.0 when a SyntaxError is caught could be misleading. Consider returning None or raising an exception to indicate that an error occurred.

ellipsis-dev · 2024-02-16T21:14:25Z

continuous_eval/metrics/code_deterministic_metrics.py

+        """
+        return len(list(ast.walk(root)))
+
+    def _apply_weights_to_subtrees(self, weight: float, subtree: ast.AST) -> float:


Consider making the weights applied to the subtrees in the _apply_weights_to_subtrees method configurable through method parameters or class attributes for greater flexibility.

ellipsis-dev

Looks good to me! Incremental review on commit 0ab1174

Reviewed 121 lines of code across 3 files in 1 minute(s) and 38 second(s).

See details

Skipped files: 1 (please contact us to request support for these files): docs/astro.config.mjs
Confidence threshold: 50%
Drafted 3 additional comments.
Workflow ID: wflow_2wBmE1swc6G3R4Zo

View 3 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 3 comments under confidence threshold

Filtered comment at `/docs/src/content/docs/metrics/Code/Deterministic/python_ast_similarity.md:11`

Confidence changes required: 50%

Commentary: The documentation for the new metrics seems to be well written and clear. The examples provided are helpful and the definitions are concise. However, I would like to see the actual implementation of these metrics to ensure they are correctly implemented and match the descriptions provided in the documentation.

The documentation is clear and concise. Please ensure the implementation matches the descriptions provided here.

Filtered comment at `/docs/src/content/docs/metrics/Code/Deterministic/string_match.md:9`

Confidence changes required: 50%

Commentary: The documentation for the StringMatch metric is also clear and concise. The examples provided are helpful and the definitions are concise. However, I would like to see the actual implementation of these metrics to ensure they are correctly implemented and match the descriptions provided in the documentation.

The documentation is clear and concise. Please ensure the implementation matches the descriptions provided here.

Filtered comment at `/docs/src/content/docs/metrics/Code/LLM-Based/llm-based.md:9`

Confidence changes required: 33%

Commentary: The LLM-based Code Metrics documentation does not provide any details about the metrics. It only mentions that they are coming soon. This is not an issue, but it would be helpful to have a brief description of what these metrics will measure and how they will be used.

Consider providing a brief description of what the LLM-Based Code Evaluation Metrics will measure and how they will be used, even if they are not yet implemented.

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

Looks good to me! Incremental review on commit 515d087

Reviewed 54 lines of code across 1 files in 1 minute(s) and 19 second(s).

See details

Skipped files: 0
Confidence threshold: 50%
Drafted 1 additional comments.
Workflow ID: wflow_RjiYXYv77PQIjqqx

View 1 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 1 additional comments

Comment at `docs/README.md:1`

This change removes a lot of information about the project structure. Consider keeping this information or ensuring it's available elsewhere in the documentation.

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

Looks good to me! Incremental review on commit 967273e

Reviewed 606 lines of code across 8 files in 1 minute(s) and 25 second(s).

See details

Skipped files: 3 (please contact us to request support for these files): docs/astro.config.mjs, poetry.lock, pyproject.toml
Confidence threshold: 50%
Drafted 1 additional comments.
Workflow ID: wflow_5bztvcJN8lNxyPn6

View 1 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 1 additional comments

Comment at `continuous_eval/metrics/init.py:33`

It's a good practice to end a file with a newline. Please add a newline at the end of this file.

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

yisz added 2 commits February 15, 2024 17:34

code deterministic metrics

54de5d3

added code metric pytest

665be35

yisz requested a review from pantonante February 16, 2024 02:06

ellipsis-dev bot reviewed Feb 16, 2024

View reviewed changes

address comments left by @yisz on #29 (Feature/code eval metrics);

1eb8ba1

ellipsis-dev bot mentioned this pull request Feb 16, 2024

[Ellipsis] Change return type of _compare_ASTs function #30

Merged

yisz and others added 2 commits February 15, 2024 21:16

Merge pull request #30 from relari-ai/ellipsis/wflow_YiIQGNULMvuMTwsb…

0a44fdd

…_mj7A [Ellipsis] Change return type of `_compare_ASTs` function in docstring

Small fix and more tests

24518d0

ellipsis-dev bot reviewed Feb 16, 2024

View reviewed changes

update docs for code metrics

0ab1174

ellipsis-dev bot reviewed Feb 16, 2024

View reviewed changes

Remove startlight info in readme

515d087

ellipsis-dev bot reviewed Feb 16, 2024

View reviewed changes

yisz merged commit 967273e into main Feb 16, 2024
2 checks passed

ellipsis-dev bot reviewed Feb 16, 2024

View reviewed changes

pantonante deleted the feature/code-eval-metrics branch February 19, 2024 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/code eval metrics #29

Feature/code eval metrics #29

yisz commented Feb 16, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Feb 16, 2024

yisz Feb 16, 2024

ellipsis-dev bot Feb 16, 2024

ellipsis-dev bot Feb 16, 2024

ellipsis-dev bot Feb 16, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot Feb 16, 2024

ellipsis-dev bot Feb 16, 2024

ellipsis-dev bot Feb 16, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Feature/code eval metrics #29

Feature/code eval metrics #29

Conversation

yisz commented Feb 16, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

yisz Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Drafted 2 additional comments

Comment at continuous_eval/metrics/code_deterministic_metrics.py:304

Comment at continuous_eval/metrics/code_deterministic_metrics.py:292

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Feb 16, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Drafted 3 comments under confidence threshold

Filtered comment at /docs/src/content/docs/metrics/Code/Deterministic/python_ast_similarity.md:11

Filtered comment at /docs/src/content/docs/metrics/Code/Deterministic/string_match.md:9

Filtered comment at /docs/src/content/docs/metrics/Code/LLM-Based/llm-based.md:9

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Drafted 1 additional comments

Comment at docs/README.md:1

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Drafted 1 additional comments

Comment at continuous_eval/metrics/__init__.py:33

yisz commented Feb 16, 2024 •

edited by ellipsis-dev bot

Loading

Comment at `continuous_eval/metrics/code_deterministic_metrics.py:304`

Comment at `continuous_eval/metrics/code_deterministic_metrics.py:292`

Filtered comment at `/docs/src/content/docs/metrics/Code/Deterministic/python_ast_similarity.md:11`

Filtered comment at `/docs/src/content/docs/metrics/Code/Deterministic/string_match.md:9`

Filtered comment at `/docs/src/content/docs/metrics/Code/LLM-Based/llm-based.md:9`

Comment at `docs/README.md:1`

Comment at `continuous_eval/metrics/init.py:33`