Model Inferencing #117

lokesh-ixo · 2022-04-12T02:36:19Z

Hi, I'm using clone detection in GraphCodeBERT and I'm trying to do model inference from the saved checkpoint. I'm trying to load the saved model from a folder called "roberta" with all the pre-requisits that I got from hugging-face transformers. While Inferencing using the code given to inference "!python run.py --output_dir=saved_models --config_name=roberta --model_name_or_path=roberta --tokenizer_name=roberta --do_eval --train_data_file=dataset/train.txt --eval_data_file=dataset/valid.txt --test_data_file=dataset/test.txt --epoch 1 --code_length 128 --eval_batch_size 4 --learning_rate 2e-5 --max_grad_norm 1.0 --evaluate_during_training" So while running this command that I'm trying to load from the folder "roberta". I'm facing an error as "ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?". I load the model separately and try to print it. The model is in OrderedDict instead of StateDict. I tried rerunning and fine tune the model once again to save the model correctly and I see it doing the same things and saving in an OrderedDict format. And maybe this is causing the error.

PS:
What I'm trying to do is, Given two pieces of code in the dataset.json and the test.txt when all the indexes and actual predictions are mentioned I need to to inference on the saved model to give me the prediction whether it is similar or not. Either 0 or 1. How do I go about this, If possible give me a brief on how to do the process only for the above mentioned problem.

guoday · 2022-04-12T03:27:59Z

Do you change saved path? "roberta" folder is a pre-trained model and you can't save the model to this folder when you fine-tune the model.

please follow this https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/clonedetection and results will be printed to saved_models/predictions.txt

CodeBERT/GraphCodeBERT/clonedetection/run.py

Line 510 in edc05c3

with open(os.path.join(args.output_dir,"predictions.txt"),'w') as f:

lokesh-ixo · 2022-04-12T03:51:12Z

Do you change saved path? "roberta" folder is a pre-trained model and you can't save the model to this folder when you fine-tune the model.

please follow this https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT/clonedetection and results will be printed to saved_models/predictions.txt

CodeBERT/GraphCodeBERT/clonedetection/run.py

Line 510 in edc05c3

with open(os.path.join(args.output_dir,"predictions.txt"),'w') as f:

So after fine tuning the model, I downloaded the fine tuned model and added it to the roberta folder and renamed it to "pytorch_model.bin". Now after that I'm trying to inference the model to do eval from the folder "roberta"

guoday · 2022-04-12T03:54:10Z

You can't replace "roberta/pytorch_model.bin" by fine-tuned model, and you need to keep roberta folder unchanged. Here will reload fine-tuned model

CodeBERT/GraphCodeBERT/clonedetection/run.py

Lines 614 to 616 in edc05c3

    
           checkpoint_prefix = 'checkpoint-best-f1/model.bin' 
        
           output_dir = os.path.join(args.output_dir, '{}'.format(checkpoint_prefix))   
        
           model.load_state_dict(torch.load(output_dir))

lokesh-ixo · 2022-04-12T05:29:20Z

You can't replace "roberta/pytorch_model.bin" by fine-tuned model, and you need to keep roberta folder unchanged. Here will reload fine-tuned model

CodeBERT/GraphCodeBERT/clonedetection/run.py

Lines 614 to 616 in edc05c3

checkpoint_prefix = 'checkpoint-best-f1/model.bin'

output_dir = os.path.join(args.output_dir, '{}'.format(checkpoint_prefix))

model.load_state_dict(torch.load(output_dir))

So, I was able to load and run the fine-tuned model for inferencing for evaluation only, using the following command "!python run.py --output_dir=saved_models --config_name=microsoft/graphcodebert-base --model_name_or_path=microsoft/graphcodebert-base --tokenizer_name=microsoft/graphcodebert-base --do_eval --train_data_file=dataset/train.txt --eval_data_file=dataset/valid.txt --test_data_file=dataset/test.txt --epoch 1 --code_length 128 --eval_batch_size 4 --learning_rate 2e-5 --max_grad_norm 1.0 --evaluate_during_training". I'm running this on the dataset provided in the github itself. So I have created my own dataset with func and idx and created valid.txt also. So how do I evalute this on the model. and generate predictions.txt on clone detection of graphcodebert

guoday · 2022-04-12T05:36:18Z

You can use the same format to save your dataset. And then replace our dataset by your own dataset.

lokesh-ixo · 2022-04-12T13:02:07Z

I need to create the dataset in python and I'm facing few errors when done and being tested on the model. json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 414 (char 413 This one and an nother jsondecodeerror so. Can you please let us know. Or provide a simple code here lets say for adding two numbers and subracting two numbers in python. maybe if you provide this simple input it will be a lot helpful for me to create the custom data and test it on the model

ps. i tried following the same instructions and created the dataset
find below an example of what i have created

{"func": " public class MinimumValueExample \n {\n \n static int findMin(int inputArr[], int index, int size)\n {\n \n if(size == 1)\n {\n return inputArr[index];\n }\n \n return Math.min( inputArr[index], findMin(inputArr, index + 1, size - 1));\n }\n public static void main(String argvs[]) \n {\n int numArr[] = { 27, 56, 90, 12, 120, 263 }; \n System.out.println("The input Array is : " + Arrays.toString(numArr));\n int length = numArr.length; \n int minVal = findMin(numArr, 0, length); \n \n System.out.print("Minimum element of the input array is: " + minVal + " \n ");\n }\n }", "idx": "1212121"}

guoday · 2022-04-12T13:32:01Z

Your data is not correct json format. Please use json.dumps to dump your each func. Here is an example:

>>>import json
>>>js = {"func": """public class MinimumValueExample \n {\n \n static int findMin(int inputArr[], int index, int size)\n""", "idx": "123"}
>>>f = open("valid.txt","w")
>>>f.write(json.dumps(js)+"\n")

lokesh-ixo · 2022-04-13T05:29:17Z

So I dumped as you mentioned above and got rid of the json error. Now I'm getting a Value error

I have only 3 values in my test.txt file

and i have successfuly entered the code that i wanna test into the data.jsonl file too

after all this when i run the comman !python run.py --output_dir=saved_models --config_name=microsoft/graphcodebert-base --model_name_or_path=microsoft/graphcodebert-base --tokenizer_name=microsoft/graphcodebert-base --do_test --train_data_file=dataset/train.txt --test_data_file=dataset/test.txt --epoch 1 --code_length 256 --data_flow_length 64 --learning_rate 2e-5 --max_grad_norm 1.0 --seed 123456 2>&1| tee custom_saved_models/test.log
I'm getting this valueerror. how shall it be solved

guoday · 2022-04-13T05:36:10Z

For test.txt file, please use Tab, i.e \t, to separate them instead of space key.

lokesh-ixo · 2022-04-17T10:23:21Z

So, I saved the finetuned model and I am trying to load and use it and I'm getting the following error
ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?

What to do to load the saved model and use it while inferencing

guoday · 2022-04-17T10:32:04Z

Don't change any code you show. If you want to reload fine-tuned model from other path, just change here:

CodeBERT/GraphCodeBERT/clonedetection/run.py

Lines 614 to 615 in 3e58572

    
           checkpoint_prefix = 'checkpoint-best-f1/model.bin' 
        
           output_dir = os.path.join(args.output_dir, '{}'.format(checkpoint_prefix))

lokesh-ixo closed this as completed Apr 19, 2022

guoday mentioned this issue Apr 21, 2022

UniXcoder - Testing the model #125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Inferencing #117

Model Inferencing #117

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022 •

edited

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022

lokesh-ixo commented Apr 13, 2022 •

edited

guoday commented Apr 13, 2022

lokesh-ixo commented Apr 17, 2022

guoday commented Apr 17, 2022 •

edited

Model Inferencing #117

Model Inferencing #117

Comments

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022 • edited

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022

lokesh-ixo commented Apr 12, 2022

guoday commented Apr 12, 2022

lokesh-ixo commented Apr 13, 2022 • edited

guoday commented Apr 13, 2022

lokesh-ixo commented Apr 17, 2022

guoday commented Apr 17, 2022 • edited

guoday commented Apr 12, 2022 •

edited

lokesh-ixo commented Apr 13, 2022 •

edited

guoday commented Apr 17, 2022 •

edited