Questions about the inputs for getting embeddings #204

rongqipan · 2022-12-25T22:21:03Z

Hi,

Thanks for your work.
I tried to use CodeBERT, GraphCodeBERT and UnixCoder to extract Java code embeddings.
However, for inputs to the models, I only used the Java source code, something like [CLS][JavaCode][SEP].

Should I also add comments to the inputs?
For GraphCodeBERT and UnixCoder, should I also add dataflow and also the flattened AST as input? Since I care about the execution time of the approach, so would adding that information (Comments, Dataflow and AST) make the time for getting embeddings much longer?

I would appreciate your kind suggestions,

Thanks.

guoday · 2023-01-09T11:31:37Z

It's better to add comments
You don't need to add dataflow or the flattened AST as input. The original code is enough. If you want to extract code embedding, I suggest you use UniXcoder which I test better on most datasets.

rongqipan · 2023-01-13T19:11:08Z

Thanks for your reply and kind suggestions : )

celbree closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the inputs for getting embeddings #204

Questions about the inputs for getting embeddings #204

rongqipan commented Dec 25, 2022

guoday commented Jan 9, 2023 •

edited

rongqipan commented Jan 13, 2023

Questions about the inputs for getting embeddings #204

Questions about the inputs for getting embeddings #204

Comments

rongqipan commented Dec 25, 2022

guoday commented Jan 9, 2023 • edited

rongqipan commented Jan 13, 2023

guoday commented Jan 9, 2023 •

edited