Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can I get embedding for java code snippet? #164

Closed
ramsey-coding opened this issue Aug 20, 2022 · 6 comments
Closed

can I get embedding for java code snippet? #164

ramsey-coding opened this issue Aug 20, 2022 · 6 comments

Comments

@ramsey-coding
Copy link

I am looking into the following example to extract code embedding.

# Encode maximum function
func = "def f(a,b): if a>b: return a else return b"
tokens_ids = model.tokenize([func],max_length=512,mode="<encoder-only>")
source_ids = torch.tensor(tokens_ids).to(device)
tokens_embeddings,max_func_embedding = model(source_ids)

So I suppose I can get embedding for a python function from this max_func_embedding.

However, I have the following three questions:

a) Can I use CodeBERT to extract embedding for Java code?

b) Can I feed incomplete JavaScript code to extract embedding? Or the code snippet needs to be complete?

And most importantly:
c) Can I feed multiple function and get embedding for the whole snippet?

Lets say the code snippet has two functions and slightly incomplete:

testCreateProcessDefinitionQuery ( ) { 
org . foxbpm . engine . repository . ProcessDefinitionQuery processQuery = modelService . createProcessDefinitionQuery ( ) ; "<AssertPlaceHolder>" ; 
} 

createProcessDefinitionQuery ( ) { 
return new org . foxbpm . engine . impl . model . ProcessDefinitionQueryImpl ( commandExecutor ) ; 
}

Will Unixcoder even generate embedding for this case?

@ramsey-coding
Copy link
Author

@guoday please help 🙏

@guoday
Copy link
Contributor

guoday commented Aug 21, 2022

a) You can. Please follow CodeBERT readme. But the embeddings are not good.
b) You can feed incomplete JavaScript code
c) You can try. But I am not sure the performance of the embeddings, since we only use single function as pre-training data.

@ramsey-coding
Copy link
Author

@guoday What would be your response if I try this with UniXcoder? Really looking forward to your response.

@guoday
Copy link
Contributor

guoday commented Aug 21, 2022

You can try to use UniXcoder.

@ramsey-coding
Copy link
Author

@guoday is the UniXcoder a better bit for feeding multiple function and get embedding for the whole snippet? Example below:

testCreateProcessDefinitionQuery ( ) { 
org . foxbpm . engine . repository . ProcessDefinitionQuery processQuery = modelService . createProcessDefinitionQuery ( ) ; "<AssertPlaceHolder>" ; 
} 

createProcessDefinitionQuery ( ) { 
return new org . foxbpm . engine . impl . model . ProcessDefinitionQueryImpl ( commandExecutor ) ; 
}

@guoday
Copy link
Contributor

guoday commented Aug 21, 2022

yes

@celbree celbree closed this as completed Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants