# Research Question 3

## Can we assess the quality of generated commit messages by comparing their style?

Strategy: Evaluate each given cluster (kmeans for the number of styles from 2) on quality by a Quality Model.  

Print statistics for each (mean, std, min, max, quantiles) to check whether the style embedding provides information on quality.

https://arxiv.org/pdf/2006.00843.pdf

In [12]:
import torch
import numpy as np
import pandas as pd
from tqdm import tqdm
from sklearn.metrics.pairwise import euclidean_distances

import warnings
warnings.filterwarnings('ignore')

import sys
sys.path.append("..")
from util.style_model import StyleModel

In [13]:
test_data = pd.read_pickle('../data/04-1c_Authors_Test_Set.pkl')

In [14]:
model = StyleModel()
model.load_state_dict(torch.load('../model/Authors_StyleModel.pt'))

<All keys matched successfully>

In [15]:
messages = test_data["message"].tolist()

vectors = []

for message in tqdm(messages):
    vectors.append(model(message).squeeze().detach().numpy())

vectors = np.array(vectors)

100%|██████████| 14536/14536 [03:53<00:00, 62.16it/s]


In [16]:
authors_centroids = {}

for group in test_data.groupby('author_email'):
    author_embeddings = np.array(vectors[group[1].index])
    authors_centroids[group[0]] = np.mean(author_embeddings, axis = 0)

In [17]:
centroids_array = [value for value in authors_centroids.values()]

## Self-written Messages

In [18]:
good_message_embedding = model("MINOR Removed unused jQuery.dialog creation in CMSMain.AddFor.js, which causes mem leaks (now uses dedicated pages/add UI) ").detach().numpy()
bad_message_embedding = model("Update files").detach().numpy()
worst_message_embedding = model("12345").detach().numpy()

In [25]:
distances = pd.DataFrame()

distances["Good Message: \"MINOR Removed unused ...\""] = pd.DataFrame(euclidean_distances(centroids_array, good_message_embedding))
distances["Bad Message: \"Update files\""] = pd.DataFrame(euclidean_distances(centroids_array, bad_message_embedding))
distances["Worst Message: \"12345\""] = pd.DataFrame(euclidean_distances(centroids_array, worst_message_embedding))

distances.index = ['Author 1', 'Author 2', 'Author 3', 'Author 4', 'Author 5', 'Author 6', 'Author 7', 'Author 8', 'Author 9', 'Author 10', 'Author 11', 'Author 12', 'Author 13', 'Author 14', 'Author 15']

distances.style.background_gradient(cmap='coolwarm', axis=None).set_precision(3)

Unnamed: 0,"Good Message: ""MINOR Removed unused ...""","Bad Message: ""Update files""","Worst Message: ""12345"""
Author 1,0.82,1.071,1.642
Author 2,0.588,1.157,1.738
Author 3,0.625,1.07,1.724
Author 4,0.638,1.08,1.677
Author 5,0.613,1.106,1.771
Author 6,0.631,1.049,1.686
Author 7,0.626,1.14,1.747
Author 8,0.607,1.09,1.728
Author 9,0.676,0.952,1.702
Author 10,0.621,1.005,1.719


## Messages from the dataset

In [37]:
bad_messages = []

for i, vector in enumerate(vectors):
    if np.mean(euclidean_distances(centroids_array, vector.reshape(1, -1))) > 1.4:
        bad_messages.append(messages[i])

print("\n\nDetected", len(bad_messages), "messages of bad quality out of a total of", len(vectors), "messages:\n")
print(*bad_messages, sep="\n\n")



Detected 32 messages of bad quality out of a total of 14536 messages:

[mordred] Add common panels: About, Overview, DataStatus

[panels] discourse panel now uses discourse enrich name

[enrich][studies] With "--no_inc" flag, don't use last enrich in studies

[enrich][meetup] Add group topics as an array to be used directly in kibana

Fix grid search reporting of prediction error for classification.

PUB-<I>: Repro. Now get NaN for cluster centroids for covtype on 1 JVM.

PUBDEV-<I>: Add assertions for Inf/Nan for float/double in GBM prediction.

LImit the POJO preview for K-Means to 1M entries in the centroid table.

Revert back to JavaRNG, the old default behavior. This unbreaks all the tests.

If users of Object[][][] (or similar) experience problems, we might need to revert this.

LSP: Fix error when getting transport pid on Windows

File switcher: Fix error when it tries to get data from the Outline Explorer after closing all files

Dependencies: Improve wording of missing deps 

In [38]:
bad_messages = []

for i, vector in enumerate(vectors):
    if np.mean(euclidean_distances(centroids_array, vector.reshape(1, -1))) < 0.7:
        bad_messages.append(messages[i])

print("\n\nDetected", len(bad_messages), "messages of bad quality out of a total of", len(vectors), "messages:\n")
print(*bad_messages, sep="\n\n")



Detected 5429 messages of bad quality out of a total of 14536 messages:

[release] Update version number to <I>

[logs] Remove logs related to getting last update.

[release] Update version number to <I>

[tests] Remove not needed imports

[task_collection] Add a specific tag param to arthur collected raw items

[enrich][twitter] Check that twitter is in projects map before using it

Before trying to get the map between a hashtag and a projects,
check that twitter is included in the projects map.

[gh2arthur] Use "scm" and data source for git in projects table.

[enrich] Support in safe_index to receive None as the name of the index

[release] Update version number to <I>

[release] Update version number to <I>

[enrich][discourse] The category_id must be a string in the projects mapping

[release] Update version number to <I>

[release] Update version number to <I>

[config] Add a set_param method to change a config param

[release] Update version number to <I>

Increase version pac