Working draft for CodeTFv3 data model #1012

drdavella · 2025-03-03T18:24:06Z

Overview

Introduces new fix-oriented CodeTF data model as implementation for v3

sonarqubecloud · 2025-03-03T18:24:40Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

drdavella · 2025-03-03T18:26:57Z

src/codemodder/codetf/v3/codetf.py

+
+class AIMetadata(BaseModel):
+    provider: Optional[str] = None
+    models: Optional[list[str]] = None


Multiple models may be involved in fix generation

drdavella · 2025-03-03T18:27:26Z

src/codemodder/codetf/v3/codetf.py

+    fixStatus: FixStatus
+    changeSets: list[ChangeSet]
+    fixMetadata: Optional[FixMetadata] = None
+    fixQuality: Optional[FixQuality] = None


Fix quality evaluation belongs to the entire result, not just to a single file/diff.

drdavella · 2025-03-03T18:27:52Z

src/codemodder/codetf/v3/codetf.py

+class FixResult(BaseModel):
+    """Result corresponding to a single finding"""
+
+    finding: Finding


There is now a strict 1:1 mapping between finding and fix.

drdavella · 2025-03-03T18:28:44Z

src/codemodder/codetf/v3/codetf.py

+    fixMetadata: Optional[FixMetadata] = None
+    fixQuality: Optional[FixQuality] = None
+    # A description of the reasoning process that led to the fix
+    reasoningSteps: Optional[list[str]] = None


Potentially each entry here should be something richer than a flat string: we might want to incorporate steps that represent user input/feedback.

I could quibble that we have no idea what form that would take, and whether we should design for anything beyond a str now... but if you have short term thoughts that this is a promising data type, I'm okay with it.

drdavella · 2025-03-03T18:29:38Z

src/codemodder/codetf/v3/codetf.py

+
+class CodeTF(CodeTFWriter, BaseModel):
+    run: Run
+    results: list[FixResult]


My expectation is that each finding in the input will have a corresponding result. I don't believe that's enforceable in the spec, and it probably should not be since we want to allow for the possibility of streaming subsets of results.

I don't believe that's enforceable in the spec, and it probably should not be since we want to allow for the possibility of streaming subsets of results.

I'm not sure why these requirements are at odds. Aren't we only giving it 1 input as well? We had discussed as an input a single SARIF, and a single result ID (or something.) In this case, we'd expect 0 or 1 FixResult.

Aren't we only giving it 1 input as well? We had discussed as an input a single SARIF, and a single result ID (or something.)

As a practical matter I'm still not convinced that's the best approach although we can work towards that setup. In any case I don't think we want to preclude the possibility of batch processing.

In this case, we'd expect 0 or 1 FixResult.

I would argue that the absence of any FixResult should be an exceptional condition. We should potentially consider making FixResult required. The fact that this would encode fix status means we should return a failure status if processing is failed or skipped

drdavella · 2025-03-03T18:30:39Z

src/codemodder/codetf/v3/codetf.py

+
+class PackageAction(BaseModel):
+    action: Action
+    result: PackageResult


I'm not sure I ever fully understood the rationale for the result field: it might be worth discussing whether it's still necessary.

I think associating a change to a dependency can allow downstream users to tell users add a dependency (like if it failed), or that this increases reliance on a dependency, etc.

drdavella · 2025-03-06T15:23:56Z

I am merging this as it is still currently in draft status and does not affect any production codepaths. However making it available will be useful for prototyping new features and functionality.

drdavella added 2 commits March 3, 2025 10:11

Move current CodeTF implementation to v2

3c77c77

Add initial draft of CodeTF v3 model

155b1f1

drdavella requested review from andrecsilva and clavedeluna as code owners March 3, 2025 18:24

drdavella requested a review from nahsra March 3, 2025 18:26

drdavella commented Mar 3, 2025

View reviewed changes

clavedeluna approved these changes Mar 3, 2025

View reviewed changes

drdavella added this pull request to the merge queue Mar 6, 2025

Merged via the queue into main with commit 160ada0 Mar 6, 2025
15 checks passed

drdavella deleted the codetf-v3-working-draft branch March 6, 2025 15:33

Working draft for CodeTFv3 data model #1012

Working draft for CodeTFv3 data model #1012

Uh oh!

Conversation

drdavella commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

sonarqubecloud bot commented Mar 3, 2025

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drdavella commented Mar 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drdavella commented Mar 3, 2025 •

edited

Loading