Compilable code #74

Debdeep1998 · 2022-11-26T06:08:55Z

I've finetuned CodeT5 large on a small python dataset(~1700) data points. I see that the results are more or less correct but the code is not always compilable(due to inconsistent spacing and new line characters). Any idea on fixing this? And how CodeBLEU work if the code generated by the model isn't compilable? The model might generate non compilable code during initial phases of the training right?

yuewang-cuhk · 2022-12-21T02:51:13Z

Hi there, we cannot gaurantee the generated code is compilable for in a good format as we directly use the code files without normalization or refactoring for pretraining. You might consider to include another post-processing step to reformat the generated code from our models.

Debdeep1998 · 2022-12-22T03:11:05Z

Hi thanks, can you direct us to necessary post processing steps that we might need to adopt?

yuewang-cuhk closed this as completed Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilable code #74

Compilable code #74

Debdeep1998 commented Nov 26, 2022 •

edited

yuewang-cuhk commented Dec 21, 2022

Debdeep1998 commented Dec 22, 2022

Compilable code #74

Compilable code #74

Comments

Debdeep1998 commented Nov 26, 2022 • edited

yuewang-cuhk commented Dec 21, 2022

Debdeep1998 commented Dec 22, 2022

Debdeep1998 commented Nov 26, 2022 •

edited