# Introduction - Oracle Version 2

Notebook to load **bug reports**, **test cases** datasets and the **feature_matrixes** from the expert and volunteers responses given in the PyBossa applications, and create from them the **oracle** dataset. 

In this notebook we create a version of the oracle based on the results obtained from the empirical study made with volunteers and the PyBossa application. The relationship between bug reports and test cases is established through the firefox features shared between both artifacts.

Is expected that this oracle be more precise than the version created on the previous notebook (__tc_br_orc_v1_gen__), once the trace links are created based on the existing relationship between the bug report and a given Firefox Feature.

# Load Libraries and Data

In [48]:
from mod_finder_util import mod_finder_util
mod_finder_util.add_modules_origin_search_path()

import pandas as pd
import numpy as np
from tqdm import tqdm

from modules.utils import aux_functions
from modules.utils import data_origin as do

import modules.utils.firefox_dataset_p2 as fd

In [49]:
testcases = fd.Datasets.read_testcases_df()
bugreports = fd.Datasets.read_selected_bugreports_df()
features = fd.Datasets.read_features_df()

print()
expert_matrix = fd.Feat_BR_Oracles.read_feat_br_expert_df()
volunteers_matrix = fd.Feat_BR_Oracles.read_feat_br_volunteers_df()
exp_vol_union_matrix = fd.Feat_BR_Oracles.read_feat_br_expert_volunteers_union_df()
exp_vol_intersec_matrix = fd.Feat_BR_Oracles.read_feat_br_expert_volunteers_intersec_df()

print()
br_2_feature_matrix_final = fd.Feat_BR_Oracles.read_br_2_features_matrix_final_df()

TestCases.shape: (195, 12)
SelectedBugReports.shape: (91, 18)
Features.shape: (19, 8)

Feat_BR Expert Matrix shape: (91, 19)
Feat_BR Volunteers Matrix shape: (91, 19)
Expert and Volunteers Matrix UNION.shape: (91, 19)
Expert and Volunteers Matrix INTERSEC.shape: (91, 19)

BR_2_Features Matrix Final.shape: (91, 5)


## Test Cases x Bug Reports Trace Matrix

### Checking Link Condition Function

Check link condition function to decide if a given cell in the oracle has a positive link (1) or a negative link (0).

In [50]:
def check_link_condition(br, tc, data_origin):
    col_name = ""
    if data_origin == do.DataOrigin.VOLUNTEERS:
        col_name = "Features_IDs_vol_m"
    elif data_origin == do.DataOrigin.EXPERT:
        col_name = "Features_IDs_exp_m"
    elif data_origin == do.DataOrigin.VOLUNTEERS_AND_EXPERT_UNION:
        col_name = "Features_IDs_exp_vol_union_m"
    elif data_origin == do.DataOrigin.VOLUNTEERS_AND_EXPERT_INTERSEC:
        col_name = "Features_IDs_exp_vol_intersec_m"
    
    if str(tc.Feature_ID) in br_2_feature_matrix_final.at[str(br.Bug_Number), col_name].split(" "):
        return True
    return False    

### Generate Oracles

In [51]:
def generate_oracle(data_origin):
    cols = [br.Bug_Number for idx,br in bugreports.iterrows()]
    index = [tc.TC_Number for idx,tc in testcases.iterrows()]
    oracle_df = pd.DataFrame(columns=cols, index=index, data=np.zeros(shape=(len(testcases),len(bugreports))), dtype='int8')
    for idx_1,br in tqdm(bugreports.iterrows()):
        for idx_2,tc in testcases.iterrows():
            if check_link_condition(br, tc, data_origin):
                oracle_df.at[tc.TC_Number, br.Bug_Number] = 1
            else:
                oracle_df.at[tc.TC_Number, br.Bug_Number] = 0
    
    oracle_df.index.name = 'TC_Number'
    oracle_df.columns.name = 'Bug_Number'
    return oracle_df

oracle_volunteers_df = generate_oracle(do.DataOrigin.VOLUNTEERS)
oracle_expert_df = generate_oracle(do.DataOrigin.EXPERT)
oracle_expert_volunteers_union_df = generate_oracle(do.DataOrigin.VOLUNTEERS_AND_EXPERT_UNION)
oracle_expert_volunteers_intersec_df = generate_oracle(do.DataOrigin.VOLUNTEERS_AND_EXPERT_INTERSEC)

print('oracle_volunteers_df.shape: {}'.format(oracle_volunteers_df.shape))
print('oracle_expert_df.shape: {}'.format(oracle_expert_df.shape))
print('oracle_expert_volunteers_union_df.shape: {}'.format(oracle_expert_volunteers_union_df.shape))
print('oracle_expert_volunteers_intersec_df.shape: {}'.format(oracle_expert_volunteers_intersec_df.shape))

91it [00:02, 41.86it/s]
91it [00:02, 41.87it/s]
91it [00:02, 41.95it/s]
91it [00:02, 42.55it/s]

oracle_volunteers_df.shape: (195, 91)
oracle_expert_df.shape: (195, 91)
oracle_expert_volunteers_union_df.shape: (195, 91)
oracle_expert_volunteers_intersec_df.shape: (195, 91)





### Save Oracles

In [52]:
fd.Tc_BR_Oracles.write_oracle_expert_df(oracle_expert_df)
fd.Tc_BR_Oracles.write_oracle_volunteers_df(oracle_volunteers_df)
fd.Tc_BR_Oracles.write_oracle_expert_volunteers_union_df(oracle_expert_volunteers_union_df)
fd.Tc_BR_Oracles.write_oracle_expert_volunteers_intersec_df(oracle_expert_volunteers_intersec_df)

OracleExpert.shape: (195, 91)
OracleVolunteers.shape: (195, 91)
OracleExpertVolunteers_UNION.shape: (195, 91)
OracleExpertVolunteers_INTERSEC.shape: (195, 91)
