Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference for java code summarization #10

Closed
lyriccoder opened this issue Oct 1, 2021 · 4 comments
Closed

Inference for java code summarization #10

lyriccoder opened this issue Oct 1, 2021 · 4 comments

Comments

@lyriccoder
Copy link

lyriccoder commented Oct 1, 2021

Is it possible to make code summarization for raw Java code?

I can't find the example of inference for code summarization. Could you please provide an example?
E.g., I expect the following code:

from transformers import RobertaTokenizer,  WHICH_MODELTO_USE

tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-base')
model = WHICH_MODELTO_USE.from_pretrained('Salesforce/codet5-base')

java_code = 'int i = 0; ++i;  int b = runSomeFunction(i); extract(b);'
code_summarization = model.predict(java_code)
print(code_summarization)

The expected result is the following:
'Extracts and returns max value'

Is it possible to make such the prediction? The problem is I can't understand how you are translating from code to the vector which will be used to predict the summarization without pretraining procedures.

Could you please provide an example?

@mosh98
Copy link

mosh98 commented Oct 4, 2021

Hi,
I modified a small package to work with CodeT5, you can try it out here : https://github.com/mosh98/simpleT5

If you want to look at the prediction example, you can also find the snippet here:
https://github.com/mosh98/simpleT5/blob/40bd043ab9d83122db2c55385f469c00b23f2aff/simplet5/simplet5.py#L410

Hope it helps

@lyriccoder
Copy link
Author

Thank you for your work. Is there a pre-trained model for Java code summarization? Unfortunately, my network doesn't allow to download anything inside python (from_pretrained is failed due to networks settings).

Could you please provide a pre-trained fine-tuned model for Java code summarization (on google drive or smth else)? I tried to run fine-tuning, but it told me that it is not even enough 30 GB of video memory (it is necessary to have 70 GB)

@mosh98
Copy link

mosh98 commented Oct 5, 2021 via email

@yuewang-cuhk
Copy link
Contributor

Hi there, please refer to our newly released multi-lingual CodeT5-base model (codet5-base-multi-sum) fine-tuned for code summarization, which also achieves SOTA performance for Java.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants