New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues related to experimentation results #8
Comments
Hi, I guess there is a problem in the data preprocessing stage.
|
Thanks for your response and clarification. I have created a dataset in a way that I extract code methods along with its mentioned comments. Comments serve as code method's natural language description (NLD). Then, I randomly distribute all the pairs of code method and NLD into training, validation and testing sets. Finally, I got train.txt, valid.txt and test.txt files in the below mentioned format, whose description is defined previously:
I don't have an idea about positive and negative sampling. How can I distribute my dataset into positive and negative sampling? What is the purpose of this type of sampling? I also could be not be to understand this step in the paper as well. Do I need to randomly assign 1 and 0 numbers to each instance and make sure that in each training, validation and testing sets have balanced positive and negative samples? Secondly, I have created a test data in the same format as training and validation set. Will it work? or do I need to perform some other steps? Please let me know about your advise and guidance. |
In this fintuning step, we learn the representations of code and natural language (NL) through a binary classification task. So the dataset should contain some positive examples (code and NL are from the same instance.) and some negative examples (code and NL are from different instances.). Now, the dataset you created contains only positive examples (Each instanc is denoted as (c, w).). We can randomly replace code or NL to construct negative examples. In our settings, negative samples consist of balanced number of instances with randomly replaced NL (i.e. (c, wˆ)) and code (i.e. (cˆ, w)). If you create test data in the same format as training and validation set, you can only get the classification accuracy. Maybe it's enough for you. We need to calculate MRR to be consistent with baselines, so we created a test set in accordance with baselines. You can decide for yourself whether to keep the data format consistent. |
Thanks for the clarification.
It create around 2 GB text file named as batch_0.txt. I am confused why there are so many instances come from test_0.jsonl file in batch_0.txt file?
Can you please clarify all the above points and let me know about your kind feedback? |
|
Thanks for the clarification and guidance:
However, I change it into the following and codebert successfully fine-tunes on multi-classification model:
Please let me know about your advise and guidance. |
|
Thanks for the clarification. Can you please let me know about the following concerns:
|
from transformers import RobertaConfig, RobertaTokenizer, RobertaForMaskedLM, pipeline
model = RobertaForMaskedLM.from_pretrained("microsoft/codebert-base-mlm")
tokenizer = RobertaTokenizer.from_pretrained("microsoft/codebert-base-mlm")
NL = "Calculates the maximum timeGradient of all Terminations. Not supported timeGradients (-1.0) are ignored."
PL = "@Override public double calculatePhaseTimeGradient(AbstractPhaseScope phaseScope) { double timeGradient = 0.0; for (Termination termination : terminationList) { double nextTimeGradient = termination.calculatePhaseTimeGradient(phaseScope); if (nextTimeGradient >= 0.0) { timeGradient = Math.<mask>(timeGradient, nextTimeGradient); } } return timeGradient; }"
CODE = NL + " " + PL
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
outputs = fill_mask(CODE)
print(outputs) Outputs {'sequence': '<s> Calculates the maximum timeGradient of all Terminations. Not supported timeGradients (-1.0) are ignored. @Override public double calculatePhaseTimeGradient(AbstractPhaseScope phaseScope) { double timeGradient = 0.0; for (Termination termination : terminationList) { double nextTimeGradient = termination.calculatePhaseTimeGradient(phaseScope); if (nextTimeGradient >= 0.0) { timeGradient = Math.max(timeGradient, nextTimeGradient); } } return timeGradient; }</s>', 'score': 0.9246102571487427, 'token': 29459}
{'sequence': '<s> Calculates the maximum timeGradient of all Terminations. Not supported timeGradients (-1.0) are ignored. @Override public double calculatePhaseTimeGradient(AbstractPhaseScope phaseScope) { double timeGradient = 0.0; for (Termination termination : terminationList) { double nextTimeGradient = termination.calculatePhaseTimeGradient(phaseScope); if (nextTimeGradient >= 0.0) { timeGradient = Math. max(timeGradient, nextTimeGradient); } } return timeGradient; }</s>', 'score': 0.035343579947948456, 'token': 19220}
{'sequence': '<s> Calculates the maximum timeGradient of all Terminations. Not supported timeGradients (-1.0) are ignored. @Override public double calculatePhaseTimeGradient(AbstractPhaseScope phaseScope) { double timeGradient = 0.0; for (Termination termination : terminationList) { double nextTimeGradient = termination.calculatePhaseTimeGradient(phaseScope); if (nextTimeGradient >= 0.0) { timeGradient = Math.Max(timeGradient, nextTimeGradient); } } return timeGradient; }</s>', 'score': 0.013716962188482285, 'token': 19854}
{'sequence': '<s> Calculates the maximum timeGradient of all Terminations. Not supported timeGradients (-1.0) are ignored. @Override public double calculatePhaseTimeGradient(AbstractPhaseScope phaseScope) { double timeGradient = 0.0; for (Termination termination : terminationList) { double nextTimeGradient = termination.calculatePhaseTimeGradient(phaseScope); if (nextTimeGradient >= 0.0) { timeGradient = Math.min(timeGradient, nextTimeGradient); } } return timeGradient; }</s>', 'score': 0.009721478447318077, 'token': 4691}
{'sequence': '<s> Calculates the maximum timeGradient of all Terminations. Not supported timeGradients (-1.0) are ignored. @Override public double calculatePhaseTimeGradient(AbstractPhaseScope phaseScope) { double timeGradient = 0.0; for (Termination termination : terminationList) { double nextTimeGradient = termination.calculatePhaseTimeGradient(phaseScope); if (nextTimeGradient >= 0.0) { timeGradient = Math.MAX(timeGradient, nextTimeGradient); } } return timeGradient; }</s>', 'score': 0.005634027533233166, 'token': 30187} |
Thanks for the clarification and kind cooperation. Can you please let me know about the following concerns?
|
|
Hi,
I have constructed a new dataset [train.txt, test.txt, valid.txt] with the following format:
1<CODESPLIT>URL<CODESPLIT>returnType.methodName<CODESPLIT>[docString]<CODESPLIT>[code]
I have placed constant values such as “1”, “URL”, and ”returnType.methodName” for the whole dataset.
When I run following script, I have gotten results such as [acc = 1.0, acc_and_f1 = 1.0, and f1 = 1.0]:
Following are the learning rate and loss graphs:
However, when I run following two scripts, I achieve MRR as 0.0031. I am not sure, why is it like that? Why it is so less MRR value?
python CodeBERT/codesearch/mrr.py
Secondly, does Table 2 in the paper represent MRR values generated from the above scripts?
Finally, what is the difference between jsonl and text file format data? I guess jsonl format files are used in document generation experiments? For this purpose, I construct jsonl files having the same data but in jsonl format as follows. Only code_tokens and docstring_tokens contain token list of code snippet and natural langunge description. Is it a right approach?
`{"repo": "", "path": "", "func_name": "", "original_string": "", "language": "lang", "code": "", "code_tokens": [], "docstring": "", "docstring_tokens": [], "sha": "", "url": "", "partition": ""}
Kindly, let me know about my concerns.
`
The text was updated successfully, but these errors were encountered: