Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilable code #74

Closed
Debdeep1998 opened this issue Nov 26, 2022 · 2 comments
Closed

Compilable code #74

Debdeep1998 opened this issue Nov 26, 2022 · 2 comments

Comments

@Debdeep1998
Copy link

Debdeep1998 commented Nov 26, 2022

I've finetuned CodeT5 large on a small python dataset(~1700) data points. I see that the results are more or less correct but the code is not always compilable(due to inconsistent spacing and new line characters). Any idea on fixing this? And how CodeBLEU work if the code generated by the model isn't compilable? The model might generate non compilable code during initial phases of the training right?

@yuewang-cuhk
Copy link
Contributor

Hi there, we cannot gaurantee the generated code is compilable for in a good format as we directly use the code files without normalization or refactoring for pretraining. You might consider to include another post-processing step to reformat the generated code from our models.

@Debdeep1998
Copy link
Author

Hi thanks, can you direct us to necessary post processing steps that we might need to adopt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants