## Compare the results of the 2-encoders and 1-encoder models

** This notebook compares the results of the 2-encoders and 1-encoder models.**

The 2-encoders model is trained with the review comment *rnl*  and the function that the comment is associated with *ms*. 
The 1-encoder model is trained with the function on review *ms* only. 

**Results** 
- The 2-encoders model performs better than the 1-encoder model.
- Overall results are not as good as the results in the paper. 
    - In the paper 1-encoder model with the beam size of 1 achieves 2.91% perfect prediction rate. In our experiment, the 1-encoder model with the beam size of 1 achieves 1.5% perfect prediction rate.
    - In the paper 1-encoder model with the beam size of 10 achieves 15.76% perfect prediction rate. In our experiment, the 1-encoder model with the beam size of 10 achieves 10.34% perfect prediction rate.
    - In the paper 2-encoders model with the beam size of 1 achieves 12.16% perfect prediction rate. In our experiment, the 2-encoders model with the beam size of 1 achieves 4.7% perfect prediction rate.
    - In the paper 2-encoders model with the beam size of 10 achieves 30.72% perfect prediction rate. In our experiment, the 2-encoders model with the beam size of 10 achieves 20.93% perfect prediction rate.

- This difference may be due to the difference in the dataset. The paper uses the dataset from both the Gerrit and the GitHub which creates a total of 17194 training data. However, our experiment uses the dataset from the GitHub only which creates a total of 8315 training data. Also the paper uses 2566 different GitHub projects while our experiment uses 1986 different GitHub projects.

In [1]:
import pandas as pd

## Load the data

In [2]:
two_encoders_one_beam_predictions_path = "./code/2-encoders/predictions.txt"
two_encoders_ten_beam_predictions_path = "./code/2-encoders/10_beam_predictions.txt"

one_encoder_one_beam_predictions_path = "./code/1-encoder/predictions.txt"
one_encoder_ten_beam_predictions_path = "./code/1-encoder/10_beam_predictions.txt"


two_encoders_target_path = "./datasets/2-encoders/test/tgt-test.txt"
two_encoers_ms_path = "./datasets/2-encoders/test/src1-test.txt"
two_encoders_rnl_path = "./datasets/2-encoders/test/src2-test.txt"

one_encoder_target_path = "./datasets/1-encoder/test/tgt-test.txt"
one_encoder_ms_path = "./datasets/1-encoder/test/src-test.txt"

dataset_with_mapping_path = "./datasets/ms_mr_rnl_map_dataset.csv"

two_encoders_one_beam_predictions = open(two_encoders_one_beam_predictions_path, "r").readlines()
two_encoders_ten_beam_predictions = open(two_encoders_ten_beam_predictions_path, "r").readlines()

one_encoder_one_beam_predictions = open(one_encoder_one_beam_predictions_path, "r").readlines()
one_encoder_ten_beam_predictions = open(one_encoder_ten_beam_predictions_path, "r").readlines()

two_encoders_target = open(two_encoders_target_path, "r").readlines()
two_encoders_ms = open(two_encoers_ms_path, "r").readlines()
two_encoders_rnl = open(two_encoders_rnl_path, "r").readlines()

one_encoder_target = open(one_encoder_target_path, "r").readlines()
one_encoder_ms = open(one_encoder_ms_path, "r").readlines()

dataset_with_mapping = pd.read_csv(dataset_with_mapping_path)

In [3]:
mapped_data = "../data/ms_mr_rnl_map_dataset.csv"

## Perfect Prediction Rate

In [4]:
def correct_predictions_percentage_10_beam(predictions, targets):
    # for the 10 predictions check if any of them is correct
    current_target_index = 0
    correct_predictions = 0
    for i in range(0, len(predictions), 10):
        for j in range(10):
            if predictions[i + j].strip() == targets[current_target_index].strip():
                correct_predictions += 1
                break
        current_target_index += 1
        
    return correct_predictions / len(targets) * 100

In [5]:
def get_correct_predictions_10_beam(predictions, targets):
    # for the 10 predictions check if any of them is correct
    current_target_index = 0
    correct_predictions = 0
    correct_predictions_list = []
    for i in range(0, len(predictions), 10):
        for j in range(10):
            if predictions[i + j].strip() == targets[current_target_index].strip():
                correct_predictions += 1
                correct_prediction_positons = {
                    "prediction": i + j,
                    "target": current_target_index,
                }
                correct_predictions_list.append(correct_prediction_positons)
                break
        current_target_index += 1

    return correct_predictions_list

In [6]:
def correct_predictions_percentage(predictions, targets):
    assert len(predictions) == len(targets)
    correct = 0
    for i in range(len(predictions)):
        if predictions[i] == targets[i]:
            correct += 1
    return correct / len(targets) * 100

print("2 encoders perfect prediction percentage: ", correct_predictions_percentage(two_encoders_one_beam_predictions, two_encoders_target), "%")
print("2 encoders 10 beam perfect prediction  percentage: ", correct_predictions_percentage_10_beam(two_encoders_ten_beam_predictions, two_encoders_target), "%")

print("1 encoder perfect prediction  percentage: ", correct_predictions_percentage(one_encoder_one_beam_predictions, one_encoder_target), "%")
print("1 encoder 10 beam perfect prediction  percentage: ", correct_predictions_percentage_10_beam(one_encoder_ten_beam_predictions, one_encoder_target), "%")

2 encoders perfect prediction percentage:  4.693140794223827 %
2 encoders 10 beam perfect prediction  percentage:  20.938628158844764 %
1 encoder perfect prediction  percentage:  1.5643802647412757 %
1 encoder 10 beam perfect prediction  percentage:  10.348977135980746 %


## Perfect Prediction Samples

In [7]:
def get_correct_predictions(predictions, targets):
    assert len(predictions) == len(targets)
    correct = []
    for i in range(len(predictions)):
        if predictions[i] == targets[i]:
            correct.append(i)
    return correct

two_encoders_one_beam_correct_predictions = get_correct_predictions(two_encoders_one_beam_predictions, two_encoders_target)
two_encoders_ten_beam_correct_predictions = get_correct_predictions_10_beam(two_encoders_ten_beam_predictions, two_encoders_target)

one_encoder_one_beam_correct_predictions = get_correct_predictions(one_encoder_one_beam_predictions, one_encoder_target)
one_encoder_ten_beam_correct_predictions = get_correct_predictions_10_beam(one_encoder_ten_beam_predictions, one_encoder_target)

print("2 encoders 1 beam correct predictions: ", two_encoders_one_beam_correct_predictions)
print("2 encoders 10 beam correct predictions: ", two_encoders_ten_beam_correct_predictions)

print("1 encoder correct predictions: ", one_encoder_one_beam_correct_predictions)
print("1 encoder 10 beam correct predictions: ", one_encoder_ten_beam_correct_predictions)

2 encoders 1 beam correct predictions:  [90, 95, 134, 136, 140, 191, 238, 263, 278, 287, 296, 337, 351, 367, 380, 382, 398, 420, 435, 456, 484, 492, 525, 574, 606, 613, 649, 657, 675, 705, 713, 720, 729, 781, 791, 796, 797, 800, 821]
2 encoders 10 beam correct predictions:  [{'prediction': 131, 'target': 13}, {'prediction': 191, 'target': 19}, {'prediction': 384, 'target': 38}, {'prediction': 391, 'target': 39}, {'prediction': 482, 'target': 48}, {'prediction': 741, 'target': 74}, {'prediction': 754, 'target': 75}, {'prediction': 891, 'target': 89}, {'prediction': 900, 'target': 90}, {'prediction': 930, 'target': 93}, {'prediction': 950, 'target': 95}, {'prediction': 1041, 'target': 104}, {'prediction': 1141, 'target': 114}, {'prediction': 1179, 'target': 117}, {'prediction': 1332, 'target': 133}, {'prediction': 1361, 'target': 136}, {'prediction': 1400, 'target': 140}, {'prediction': 1425, 'target': 142}, {'prediction': 1502, 'target': 150}, {'prediction': 1576, 'target': 157}, {'pred

In [8]:
def get_mapping(ms):
    ms = ms.replace("\n", "")
    map_str = dataset_with_mapping[dataset_with_mapping["ms"] == ms]["map"].values[0]
    mapping = eval(map_str)
    return mapping

def revert_abstracted_data(data, mapping):
    for key, value in mapping.items():
        data = data.replace(value, key)
    return data

def print_sample(ms, rnl, prediction, target, mapping):
    print("MS Abstracted: ", ms)
    print("MS: ", revert_abstracted_data(ms, mapping))

    if rnl is not None:
        print("RNL Abstracted: ", rnl)
        print("RNL: ", revert_abstracted_data(rnl, mapping))

    print("Prediction Abstracted: ", prediction)
    print("Prediction: ", revert_abstracted_data(prediction, mapping))

    print("Target Abstracted: ", target)
    print("Target: ", revert_abstracted_data(target, mapping))

def get_sample(index, ms, rnl, one_beam_predictions, target, ten_beam_predictions=None, prediction_index=None, target_index=None):
    if ten_beam_predictions is not None:
        mapping = get_mapping(ms[target_index])
        prediction = ten_beam_predictions[prediction_index]
        target_sample = target[target_index]
        print_sample(ms[target_index], rnl[target_index] if rnl else None, prediction, target_sample, mapping)
    else:
        mapping = get_mapping(ms[index])
        print_sample(ms[index], rnl[index] if rnl else None, one_beam_predictions[index], target[index], mapping)


#### 1 Beam Size Samples

In [9]:
print("Sample 2 encoder 1 beam size: ")
get_sample(two_encoders_one_beam_correct_predictions[0], two_encoders_ms, two_encoders_rnl, two_encoders_one_beam_predictions, two_encoders_target)
print("*" * 50)
print("Sample 1 encoder 1 beam size: ")
get_sample(one_encoder_one_beam_correct_predictions[0], one_encoder_ms, None, one_encoder_one_beam_predictions, one_encoder_target)

Sample 2 encoder 1 beam size: 
MS Abstracted:  public void VAR_1 ( VAR_2 VAR_1 ) { this . VAR_1 = VAR_1 ; } }

MS:  public void analysisReportScreen ( AnalysisReportScreen analysisReportScreen ) { this . analysisReportScreen = analysisReportScreen ; } }

RNL Abstracted:  Could use final VAR_2 VAR_1 please

RNL:  Could use final AnalysisReportScreen analysisReportScreen please

Prediction Abstracted:  public void VAR_1 ( final VAR_2 VAR_1 ) { this . VAR_1 = VAR_1 ; } }

Prediction:  public void analysisReportScreen ( final AnalysisReportScreen analysisReportScreen ) { this . analysisReportScreen = analysisReportScreen ; } }

Target Abstracted:  public void VAR_1 ( final VAR_2 VAR_1 ) { this . VAR_1 = VAR_1 ; } }

Target:  public void analysisReportScreen ( final AnalysisReportScreen analysisReportScreen ) { this . analysisReportScreen = analysisReportScreen ; } }

**************************************************
Sample 1 encoder 1 beam size: 
MS Abstracted:  protected synchronized voi

#### 10 Beam Size Samples

In [10]:
print("Sample 2 encoder with 10 beam: ")
get_sample(None, two_encoders_ms, two_encoders_rnl, None, two_encoders_target, two_encoders_ten_beam_predictions, two_encoders_ten_beam_correct_predictions[0]["prediction"], two_encoders_ten_beam_correct_predictions[0]["target"])
print("*" * 50)
print("Sample 1 encoder with 10 beam: ")
get_sample(None, one_encoder_ms, None, None, one_encoder_target, one_encoder_ten_beam_predictions, one_encoder_ten_beam_correct_predictions[0]["prediction"], one_encoder_ten_beam_correct_predictions[0]["target"])


Sample 2 encoder with 10 beam: 
MS Abstracted:  public static TYPE_1 METHOD_1 ( String VAR_1 ) { try { return TYPE_1 . METHOD_2 ( VAR_1 ) ; } catch ( final TYPE_2 VAR_2 ) { TYPE_3 VAR_3 = TYPE_4 . METHOD_1 ( VAR_1 ) ; if ( null == VAR_3 ) return null ; try { return TYPE_1 . METHOD_3 ( VAR_3 ) ; } catch ( final TYPE_2 VAR_4 ) { return null ; } } }

MS:  public static DateTimeZone getTimeZone ( String timezoneParam ) { try { return DateTimeZone . forID ( timezoneParam ) ; } catch ( final IllegalArgumentException e ) { TimeZone zone = ZoneInfo . getTimeZone ( timezoneParam ) ; if ( null == zone ) return null ; try { return DateTimeZone . forTimeZone ( zone ) ; } catch ( final IllegalArgumentException e1 ) { return null ; } } }

RNL Abstracted:  Can write curly braces block instead 1 line

RNL:  Can write curly braces block instead 1 line

Prediction Abstracted:  public static TYPE_1 METHOD_1 ( String VAR_1 ) { try { return TYPE_1 . METHOD_2 ( VAR_1 ) ; } catch ( final TYPE_2 VAR_2 ) { TYP

## Training Graphs

## 1-encoder model

### 1 Beam

<img width="545" alt="image" src="https://user-images.githubusercontent.com/46859098/236644947-06124430-fd71-42f5-a590-013978daf9dc.png">

<img width="535" alt="image" src="https://user-images.githubusercontent.com/46859098/236645065-2a1b1cfc-52fc-4381-9591-80d8f45d1a68.png">

<img width="549" alt="image" src="https://user-images.githubusercontent.com/46859098/236645089-2ba583a8-6aa7-46cd-8cd6-a11a05ee9f7b.png">



### 10 Beam

<img width="469" alt="image" src="https://user-images.githubusercontent.com/46859098/236666450-f9af1965-f257-4b4a-8448-e662e75a38aa.png">
<img width="782" alt="image" src="https://user-images.githubusercontent.com/46859098/236666474-141b9ca1-5c8c-4ed2-828e-0e2281293120.png">


## 2-encoders model

### 1 Beam

<img width="547" alt="image" src="https://user-images.githubusercontent.com/46859098/236645239-136b89a0-9812-4c55-a024-28537f7e69b5.png">
<img width="542" alt="image" src="https://user-images.githubusercontent.com/46859098/236645258-b9672f9b-fdc0-43e7-af5b-ea75d3a6782d.png">
<img width="537" alt="image" src="https://user-images.githubusercontent.com/46859098/236645268-f216ce2d-ccd9-481b-9929-fd64059dfbbf.png">


### 10 Beam

<img width="423" alt="image" src="https://user-images.githubusercontent.com/46859098/236663474-f63967e2-38af-4a8b-b64c-34065e730b35.png">

<img width="792" alt="image" src="https://user-images.githubusercontent.com/46859098/236663493-c2f67c00-0f37-4b2d-90fb-b63b78744670.png">
