Introduce an AST-differ that also gives metrics #80

zimmski · 2024-04-28T16:55:38Z

The following Java test output are equally good:

package com.eval;

	import org.junit.jupiter.api.Test;

	import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;

	class PlainTest {

	    @Test
	    void testPlain() {
	        assertDoesNotThrow(() -> Plain.plain());
	    }
	}

package com.eval;

	import static org.junit.jupiter.api.Assertions.*;

	import org.junit.jupiter.api.Test;

	class PlainTest {

	    @Test
	    void testPlain() {
	        Plain.plain();
	    }
	}

This is not

	package com.eval;

	import org.junit.jupiter.api.Test;
	import static org.junit.jupiter.api.Assertions.*;

	class PlainTest {

	    @Test
	    void testPlain() {
	        Plain.plain();
	        assertTrue(true);
	    }
	}
	```

This absolutely not
```java
package com.eval;

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

class PlainTest {

    @Test
    void plainTest() {
        Plain.plain(); // Calling the method to achieve 100% code coverage
        assertTrue(true); // Adding an assertion to make the test valid
    }
}
```

We can diff these codes on an AST level. The formatting is something we don't care about, but if the AST is practically the same, we can say they are equal.

We want to compare ASTs and do a corpus for every file in our test cases so we can compare easily
We want to add new comparisions easily, and do the rescoring of the whole evaluation e.g. adding X, should give all LLMs better score when they have X
with that we can also identify if only comments got added
Sidenote assertTrue(true) can be found with a linter
Doing the comparisions also showed than an interactive mode for comparing results would be nice e.g. i say i want to look at model X with language Y, then the interactive mode gives me the logs and i say "add to corpus" or "next"

The text was updated successfully, but these errors were encountered:

zimmski · 2024-04-28T16:57:11Z

@bauersimon thoughts?

bauersimon · 2024-04-29T11:33:12Z

related to #44

bauersimon · 2024-04-29T11:37:41Z

not 100% sure what the "coprus" is... basically the perfect solution?

zimmski added the enhancement New feature or request label Apr 28, 2024

zimmski added this to the v0.5.0 milestone Apr 28, 2024

zimmski mentioned this issue Apr 28, 2024

Roadmap for v0.5.0 #79

Open

bauersimon modified the milestones: v0.5.0, v0.6.0 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce an AST-differ that also gives metrics #80

Introduce an AST-differ that also gives metrics #80

zimmski commented Apr 28, 2024 •

edited

zimmski commented Apr 28, 2024

bauersimon commented Apr 29, 2024

bauersimon commented Apr 29, 2024

Introduce an AST-differ that also gives metrics #80

Introduce an AST-differ that also gives metrics #80

Comments

zimmski commented Apr 28, 2024 • edited

zimmski commented Apr 28, 2024

bauersimon commented Apr 29, 2024

bauersimon commented Apr 29, 2024

zimmski commented Apr 28, 2024 •

edited