## Compare the results of the 2-encoders and 1-encoder models

** This notebook compares the results of the 2-encoders and 1-encoder models.**

The 2-encoders model is trained with the review comment *rnl*  and the function that the comment is associated with *ms*. 
The 1-encoder model is trained with the function on review *ms* only. 

**Results** 
- The 2-encoders model performs better than the 1-encoder model.
- Overall results are not as good as the results in the paper. 
    -  In the paper 1-encoder model with the beam size of 1 achieves 2.91% perfect prediction rate. In our experiment, the 1-encoder model with the beam size of 1 achieves 1.5% perfect prediction rate.  
    - In the paper 2-encoders model with the beam size of 1 achieves 12.16% perfect prediction rate. In our experiment, the 2-encoders model with the beam size of 1 achieves 4.7% perfect prediction rate.
- This difference may be due to the difference in the dataset. The paper uses the dataset from both the Gerrit and the GitHub which creates a total of 17194 training data. However, our experiment uses the dataset from the GitHub only which creates a total of 8315 training data.

In [1]:
two_encoders_predictions_path = "./code/2-encoders/predictions.txt"
one_encoder_predictions_path = "./code/1-encoder/predictions.txt"

two_encoders_target_path = "./datasets/2-encoders/test/tgt-test.txt"
two_encoers_ms_path = "./datasets/2-encoders/test/src1-test.txt"
two_encoders_rnl_path = "./datasets/2-encoders/test/src2-test.txt"

one_encoder_target_path = "./datasets/1-encoder/test/tgt-test.txt"
one_encoder_ms_path = "./datasets/1-encoder/test/src-test.txt"

two_encoders_predictions = open(two_encoders_predictions_path, "r").readlines()
one_encoder_predictions = open(one_encoder_predictions_path, "r").readlines()

two_encoders_target = open(two_encoders_target_path, "r").readlines()
two_encoders_ms = open(two_encoers_ms_path, "r").readlines()
two_encoders_rnl = open(two_encoders_rnl_path, "r").readlines()

one_encoder_target = open(one_encoder_target_path, "r").readlines()
one_encoder_ms = open(one_encoder_ms_path, "r").readlines()




In [2]:
def correct_predictions_percentage(predictions, targets):
    assert len(predictions) == len(targets)
    correct = 0
    for i in range(len(predictions)):
        if predictions[i] == targets[i]:
            correct += 1
    return correct / len(predictions) * 100

print("2 encoders accuracy percentage: ", correct_predictions_percentage(two_encoders_predictions, two_encoders_target), "%")
print("1 encoder accuracy percentage: ", correct_predictions_percentage(one_encoder_predictions, one_encoder_target), "%")

2 encoders accuracy percentage:  4.693140794223827 %
1 encoder accuracy percentage:  1.5643802647412757 %


In [3]:
def get_correct_predictions(predictions, targets):
    assert len(predictions) == len(targets)
    correct = []
    for i in range(len(predictions)):
        if predictions[i] == targets[i]:
            correct.append(i)
    return correct

two_encoders_correct_predictions = get_correct_predictions(two_encoders_predictions, two_encoders_target)
one_encoder_correct_predictions = get_correct_predictions(one_encoder_predictions, one_encoder_target)

print("2 encoders correct predictions: ", two_encoders_correct_predictions)
print("1 encoder correct predictions: ", one_encoder_correct_predictions)

2 encoders correct predictions:  [90, 95, 134, 136, 140, 191, 238, 263, 278, 287, 296, 337, 351, 367, 380, 382, 398, 420, 435, 456, 484, 492, 525, 574, 606, 613, 649, 657, 675, 705, 713, 720, 729, 781, 791, 796, 797, 800, 821]
1 encoder correct predictions:  [91, 135, 145, 215, 359, 421, 553, 614, 618, 632, 715, 802, 823]


In [10]:
def get_sample_two_encoders(index):
    print("MS: ", two_encoders_ms[index])
    print("RNL: ", two_encoders_rnl[index])
    print("Prediction: ", two_encoders_predictions[index])
    print("Target: ", two_encoders_target[index])


def get_sample_one_encoder(index):
    print("MS: ", one_encoder_ms[index])
    print("Prediction: ", one_encoder_predictions[index])
    print("Target: ", one_encoder_target[index])


In [11]:
print("Sample 2 encoder: ")
get_sample_two_encoders(two_encoders_correct_predictions[0])
print("*" * 50)
print("Sample 1 encoder: ")
get_sample_one_encoder(one_encoder_correct_predictions[0])

Sample 2 encoder: 
MS:  public void VAR_1 ( VAR_2 VAR_1 ) { this . VAR_1 = VAR_1 ; } }

RNL:  Could use final VAR_2 VAR_1 please

Prediction:  public void VAR_1 ( final VAR_2 VAR_1 ) { this . VAR_1 = VAR_1 ; } }

Target:  public void VAR_1 ( final VAR_2 VAR_1 ) { this . VAR_1 = VAR_1 ; } }

**************************************************
Sample 1 encoder: 
MS:  protected synchronized void METHOD_1 ( ) { }

Prediction:  protected void METHOD_1 ( ) { }

Target:  protected void METHOD_1 ( ) { }



## Training Graphs

## 1-encoder model

<img width="545" alt="image" src="https://user-images.githubusercontent.com/46859098/236644947-06124430-fd71-42f5-a590-013978daf9dc.png">

<img width="535" alt="image" src="https://user-images.githubusercontent.com/46859098/236645065-2a1b1cfc-52fc-4381-9591-80d8f45d1a68.png">

<img width="549" alt="image" src="https://user-images.githubusercontent.com/46859098/236645089-2ba583a8-6aa7-46cd-8cd6-a11a05ee9f7b.png">



## 2-encoders model

<img width="547" alt="image" src="https://user-images.githubusercontent.com/46859098/236645239-136b89a0-9812-4c55-a024-28537f7e69b5.png">
<img width="542" alt="image" src="https://user-images.githubusercontent.com/46859098/236645258-b9672f9b-fdc0-43e7-af5b-ea75d3a6782d.png">
<img width="537" alt="image" src="https://user-images.githubusercontent.com/46859098/236645268-f216ce2d-ccd9-481b-9929-fd64059dfbbf.png">
