## Teste de Hipótese para a Hipótese 1

* Hipótese Nula: Tempo para consertar bugs >= Tempo para consertar outras issues
* Hipótese Alternativa (a que queremos saber): Tempo para consertar bugs < Tempo para consertar outras issues

Para começar lemos o arquivo csv criado pelo `issue-extractor.py`

In [25]:
import pandas as pd

issues_table = pd.read_csv("issues-firefox.csv")
issues_table["created_at"] = pd.to_datetime(issues_table["created_at"])
issues_table["closed_at"] = pd.to_datetime(issues_table["closed_at"])
issues_table

Unnamed: 0,issue_id,creator,created_at,closed_at,labels,n_comments,n_reactions,is_locked
0,942298762,mcl21014,2021-07-12 17:47:45+00:00,NaT,['Bug 🐞'],2,0,False
1,942207474,SimonBasca,2021-07-12 15:58:23+00:00,NaT,['Bug 🐞'],2,0,False
2,942168193,SimonBasca,2021-07-12 15:13:30+00:00,NaT,['Bug 🐞'],0,0,False
3,942151305,SimonBasca,2021-07-12 14:57:32+00:00,NaT,['Bug 🐞'],0,0,False
4,942144807,SimonBasca,2021-07-12 14:50:34+00:00,NaT,['Bug 🐞'],0,0,False
...,...,...,...,...,...,...,...,...
1255,564664520,muhasturk,2020-02-13 12:55:37+00:00,2021-03-08 11:53:49+00:00,"['3', 'Contributor OK', 'P2', 'Q3', 'qa-triaged']",3,0,False
1256,564638507,yusadogru,2020-02-13 12:04:49+00:00,2020-02-18 16:25:29+00:00,"['1', 'P3']",0,0,False
1257,564354707,athomasmoz,2020-02-13 00:00:47+00:00,NaT,['ux:l'],0,0,False
1258,564201713,athomasmoz,2020-02-12 18:59:48+00:00,NaT,['ux:m'],1,0,False


Em seguida selecionamos o que queremos: removemos issues que não foram fechadas, calculamos o tempo para consertar e filtramos outliers, utilizando um fator de 1 vez o desvio padrão.

In [65]:
import numpy as np
import dateutil.parser

# Filtrar issues fechadas
issues_limpo = issues_table.copy()
issues_limpo = issues_limpo[issues_limpo["closed_at"] != "NaT"]
# Calcular tempo para consertar
issues_limpo["fix_time"] = issues_limpo["closed_at"] - issues_limpo["created_at"]
# Remover outliers
issues_limpo = issues_limpo[np.abs(issues_limpo["fix_time"] - issues_limpo["fix_time"].mean()) <= issues_limpo["fix_time"].std()]
issues_limpo

Unnamed: 0,issue_id,creator,created_at,closed_at,labels,n_comments,n_reactions,is_locked,fix_time
7,941292917,pyrho,2021-07-10 17:36:10+00:00,2021-07-10 19:29:03+00:00,['Bug 🐞'],1,0,False,0 days 01:52:53
8,941292126,pyrho,2021-07-10 17:31:30+00:00,2021-07-12 08:45:20+00:00,"['Bug 🐞', 'qa-triaged']",1,0,False,1 days 15:13:50
14,939351307,st3fan,2021-07-08 00:07:36+00:00,2021-07-09 17:01:09+00:00,['P16107'],0,0,False,1 days 16:53:33
31,927507674,AaronMT,2021-06-22 18:11:42+00:00,2021-06-24 19:28:21+00:00,['Bug 🐞'],1,0,False,2 days 01:16:39
32,927286648,isabelrios,2021-06-22 14:18:20+00:00,2021-06-23 08:03:37+00:00,"['eng:automation', 'eng:intermittent-test']",0,0,False,0 days 17:45:17
...,...,...,...,...,...,...,...,...,...
1251,567355182,muhasturk,2020-02-19 06:54:08+00:00,2020-03-11 15:36:28+00:00,"['QA Verified', 'contrib-patch']",1,0,False,21 days 08:42:20
1252,567202901,athomasmoz,2020-02-18 22:40:15+00:00,2020-03-19 18:59:05+00:00,['3'],3,0,False,29 days 20:18:50
1253,566974047,isabelrios,2020-02-18 15:32:53+00:00,2020-03-04 08:49:38+00:00,['eng:automation'],0,0,False,14 days 17:16:45
1254,566930955,isabelrios,2020-02-18 14:29:55+00:00,2020-03-12 16:05:52+00:00,"['Bug 🐞', 'Needs-Strings', 'Needs-UX', 'P2', '...",7,0,False,23 days 01:35:57


Agora filtramos as issues dos tipos bugs e não-bugs e extraimos apenas o tempo para consertar, que é o necessário para fazer o teste de hipótese

In [84]:
ttf_bugs = issues_limpo[issues_limpo["labels"].astype(str).str.contains('Bug 🐞')]["fix_time"]
ttf_nonbugs = issues_limpo[~issues_limpo["labels"].astype(str).str.contains('Bug 🐞')]["fix_time"]
ttf_bugs, ttf_nonbugs

(7       0 days 01:52:53
 8       1 days 15:13:50
 31      2 days 01:16:39
 33      1 days 04:34:57
 39      0 days 02:08:29
              ...       
 1238   35 days 05:41:45
 1240   29 days 08:49:31
 1241   29 days 11:17:02
 1250   26 days 07:35:42
 1254   23 days 01:35:57
 Name: fix_time, Length: 300, dtype: timedelta64[ns],
 14      1 days 16:53:33
 32      0 days 17:45:17
 34      1 days 00:13:18
 38      5 days 00:58:55
 40      0 days 07:33:26
              ...       
 1245    6 days 00:28:25
 1251   21 days 08:42:20
 1252   29 days 20:18:50
 1253   14 days 17:16:45
 1256    5 days 04:20:40
 Name: fix_time, Length: 320, dtype: timedelta64[ns])

Ao rodar o MannWhitney com a hipótese alternativa `<`, podemos obter o resultado desejado.

In [83]:
from scipy import stats

mwu, p_value = stats.mannwhitneyu(ttf_bugs, ttf_nonbugs, alternative="less")

if p_value > 0.05:
    print("Hipótese nula aceita: não pode se dizer com certeza que bugs demoram menos tempo")
else:
    print("Hipótese alternativa aceita: bugs demoram menos tempo para serem consertados")

Hipótese alternativa aceita: bugs demoram menos tempo para serem consertados


Como o p-value foi muito baixo, temos que a hipótese nula é provavelmente falsa, e portanto a hipótese alternativa, que queriamos provar, é provavelmente verdadeira.