Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an AST-differ that also gives metrics #80

Open
5 tasks
zimmski opened this issue Apr 28, 2024 · 3 comments
Open
5 tasks

Introduce an AST-differ that also gives metrics #80

zimmski opened this issue Apr 28, 2024 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@zimmski
Copy link
Member

zimmski commented Apr 28, 2024

The following Java test output are equally good:

package com.eval;

	import org.junit.jupiter.api.Test;

	import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;

	class PlainTest {

	    @Test
	    void testPlain() {
	        assertDoesNotThrow(() -> Plain.plain());
	    }
	}
package com.eval;

	import static org.junit.jupiter.api.Assertions.*;

	import org.junit.jupiter.api.Test;

	class PlainTest {

	    @Test
	    void testPlain() {
	        Plain.plain();
	    }
	}

This is not

	package com.eval;

	import org.junit.jupiter.api.Test;
	import static org.junit.jupiter.api.Assertions.*;

	class PlainTest {

	    @Test
	    void testPlain() {
	        Plain.plain();
	        assertTrue(true);
	    }
	}
	```

This absolutely not
```java
package com.eval;

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

class PlainTest {

    @Test
    void plainTest() {
        Plain.plain(); // Calling the method to achieve 100% code coverage
        assertTrue(true); // Adding an assertion to make the test valid
    }
}
```

We can diff these codes on an AST level. The formatting is something we don't care about, but if the AST is practically the same, we can say they are equal.

  • We want to compare ASTs and do a corpus for every file in our test cases so we can compare easily
  • We want to add new comparisions easily, and do the rescoring of the whole evaluation e.g. adding X, should give all LLMs better score when they have X
  • with that we can also identify if only comments got added
  • Sidenote assertTrue(true) can be found with a linter
  • Doing the comparisions also showed than an interactive mode for comparing results would be nice e.g. i say i want to look at model X with language Y, then the interactive mode gives me the logs and i say "add to corpus" or "next"
@zimmski zimmski added the enhancement New feature or request label Apr 28, 2024
@zimmski zimmski added this to the v0.5.0 milestone Apr 28, 2024
@zimmski
Copy link
Member Author

zimmski commented Apr 28, 2024

@bauersimon thoughts?

@bauersimon
Copy link
Member

related to #44

@bauersimon
Copy link
Member

not 100% sure what the "coprus" is... basically the perfect solution?

@bauersimon bauersimon modified the milestones: v0.5.0, v0.6.0 Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants