# Introduction - Including Complementary Days 01-02/06/2016

Notebook to preprocess the **NEW** **bug reports** and **test cases** datasets and create from them the **oracle** dataset.

In this notebook the following things are made:

* the testcases dataset is loaded, cleaned and preprocessed;
* the bugreport datasets are loaded, joined, cleaned, duplicates are removed, and the final dataset is preprocessed;
* the bugreports_final dataset is created. This dataset supports the empirical study made on a second moment.


## Load Libraries and Data

In [22]:
from mod_finder_util import mod_finder_util
mod_finder_util.add_modules_origin_search_path()

import pandas as pd
import numpy as np
from sklearn.externals.joblib import Parallel, delayed
from tqdm import tqdm

from modules.utils import aux_functions

In [23]:
testcases = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/TC/testcases.csv')
print('Test Cases Shape: {}'.format(testcases.shape))

bugreports_p1 = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/all_bugs_p1.csv', sep="|")
bugreports_p2 = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/all_bugs_p2.csv', sep='|')
bugreports_p3 = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/all_bugs_p3.csv', sep='|')
bugreports_p4 = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/all_bugs_p4.csv', sep='|')

bugreports = pd.concat([bugreports_p1, bugreports_p2, bugreports_p3, bugreports_p4])
print('Bug Reports shape: {}'.format(bugreports.shape))

Test Cases Shape: (207, 10)
Bug Reports shape: (35977, 10)


## BugReports

### Removing Duplicate Bugs

In [24]:
print('BR previous shape: {}'.format(bugreports.shape))
bugreports.drop_duplicates('Bug_Number', keep=False, inplace=True)
print('BR shape: {}'.format(bugreports.shape))

BR previous shape: (35977, 10)
BR shape: (35336, 10)


### Bug Reports Additional Infos

In [25]:
bugreports_add_info_df = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/all_bugs_add_info.csv', sep='|')
print('BugReportsAddInfo.shape: {}'.format(bugreports_add_info_df.shape))

bugreports_add_info_2_df = pd.read_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/all_bugs_add_info_2.csv', sep='|')
print('BugReportsAddInfo2.shape: {}'.format(bugreports_add_info_2_df.shape))

bugreports_final = bugreports.set_index('Bug_Number').join(other=bugreports_add_info_df.set_index('Bug_Number'))
bugreports_final.reset_index(inplace=True)

print('Final_BugReports.shape: {}'.format(bugreports_final.shape))

BugReportsAddInfo.shape: (37530, 7)
BugReportsAddInfo2.shape: (22, 7)
Final_BugReports.shape: (35336, 16)


### Bug Reports Names and Descriptions

In [26]:
bugreports_final['br_name'] = bugreports_final.apply(lambda row : 'BR_' + str(row['Bug_Number']) + '_SRC', axis=1)
bugreports_final['br_desc'] = bugreports_final.apply(lambda row : ' '.join([str(el) for el in row]), axis=1) 
bugreports_final.head()

Unnamed: 0,Bug_Number,Summary,Platform,Component,Version,Creation_Time,Whiteboard,QA_Whiteboard,First_Comment_Text,First_Comment_Creation_Time,Status,Product,Priority,Resolution,Severity,Is_Confirmed,br_name,br_desc
0,506297,Livemarks with null site/feed uris cause sync ...,All,Sync,unspecified,2009-07-24T17:08:43Z,,,2009-07-24 09:54:28 FaultTolerance D...,2009-07-24T17:08:43Z,RESOLVED,Firefox,--,FIXED,normal,True,BR_506297_SRC,506297 Livemarks with null site/feed uris caus...
1,506338,Enhance Crash Recovery to better help the user,All,Session Restore,Trunk,2009-07-24T19:17:21Z,[crashkill][crashkill-metrics],,When our users crash they are pretty much in t...,2009-07-24T19:17:21Z,NEW,Firefox,--,,enhancement,True,BR_506338_SRC,506338 Enhance Crash Recovery to better help t...
2,506507,Dragging multiple bookmarks in the bookmarks s...,x86,Bookmarks & History,Trunk,2009-07-26T06:16:02Z,,,User-Agent: Mozilla/5.0 (Windows; U; Win...,2009-07-26T06:16:02Z,RESOLVED,Firefox,--,WORKSFORME,normal,True,BR_506507_SRC,506507 Dragging multiple bookmarks in the book...
3,506550,Unreliable Back Button navigating nytimes.com,x86,Extension Compatibility,3.5 Branch,2009-07-26T16:12:10Z,[caused by adblock plus][platform-rel-NYTimes],,User-Agent: Mozilla/5.0 (Windows; U; Win...,2009-07-26T16:12:10Z,RESOLVED,Firefox,--,FIXED,normal,False,BR_506550_SRC,506550 Unreliable Back Button navigating nytim...
4,506575,ALT + F4 when dropdown of autocomplete is open...,x86,Address Bar,3.5 Branch,2009-07-26T20:14:54Z,,,Pressing ALT + F4 when the autocomplete dropdo...,2009-07-26T20:14:54Z,NEW,Firefox,P5,,normal,True,BR_506575_SRC,506575 ALT + F4 when dropdown of autocomplete ...


### Select Bug Reports from Days 01-02/06/2016

In [27]:
bugreports_final[bugreports_final.Bug_Number.isin(bugreports_p4.Bug_Number)].Bug_Number.values

array([ 882753,  945665, 1127927, 1154922, 1223550, 1265967, 1266270,
       1271395, 1271766, 1271774, 1274459, 1274712, 1276070, 1276152,
       1276447, 1276656, 1276818, 1276884, 1276966, 1277114, 1277151,
       1277257])

## TestCases

### Test Cases Names and Descriptions

In [28]:
testcases['tc_name'] = testcases.apply(lambda row : 'TC_' + str(row[0]) + '_TRG', axis=1)
testcases['tc_desc'] = testcases.apply(lambda row : ' '.join([str(el) for el in row]), axis=1)
testcases.head()

Unnamed: 0,TC_Number,TestDay,Feature_ID,Firefox_Feature,Gen_Title,Crt_Nr,Title,Preconditions,Steps,Expected_Result,tc_name,tc_desc
0,1,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,1,Notification - Popup Block,,1. Launch Firefox\n2. Navigate to http://www.p...,1. Firefox is successfully launched\n9. The al...,TC_1_TRG,1 20181221 20 <notificationbox> and <notificat...
1,2,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,2,Notification - Process Hang,,"1. Launch Firefox\n2. In the URL bar, navigate...",1. Firefox is successfully launched\n2. Firefo...,TC_2_TRG,2 20181221 20 <notificationbox> and <notificat...
2,3,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,3,Verify Notifications appear in RTL Mode,,"1. Launch Firefox\n2. In about:config, change ...",1. Firefox is successfully launched\n2.The for...,TC_3_TRG,3 20181221 20 <notificationbox> and <notificat...
3,4,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,4,Verify Notifications appear in High Contrast M...,,"1. While the browser is in High Contrast Mode,...",1. Firefox has been launched.\n2. Firefox begi...,TC_4_TRG,4 20181221 20 <notificationbox> and <notificat...
4,5,20181221,20,<notificationbox> and <notification> changes,<notificationbox> and <notification> changes,5,Verify notifications react to differing Zoom l...,,"1. While the browser is in High Contrast Mode,...",1. Firefox has been launched.\n2. Firefox begi...,TC_5_TRG,5 20181221 20 <notificationbox> and <notificat...


### Count Test Cases By FeatureID

In [35]:
testcases[['Feature_ID','Title']].groupby(['Feature_ID']).count()

Unnamed: 0_level_0,Title
Feature_ID,Unnamed: 1_level_1
1,13
2,11
3,22
4,6
5,8
6,31
7,6
8,2
9,8
10,3


## Save Datasets

In [13]:
bugreports_final.to_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/BR/bugreports_final.csv', index=False)
testcases.to_csv('../data/mozilla_firefox_v2/firefoxDataset/docs_english/TC/testcases_final.csv', index=False)