# Overview
When clustering GitHub Issues, we ultimately want to link those clusters of issues back to the code base, and the most straightforward way to do that is through the commits that are linked to the issues within the cluster. The goal is to create a concise, helpful, and actionable summary of the commits within an issue cluster.

This notebook addresses [Develop Cluster Commits Summary](https://github.com/it-com-engineering/CodeIQ/issues/68) and also build a simple UI around the results of the topic modeling, allowing us to easily see the results and break it down by both repository and cluster.

The code that returns the summary has been added to `RepoClusters.get_repo_cluster_commit_report()`.

## Motivation
The driving idea behind examining the commits is the assumption that a commit "solves" the issue and if you have a clustering of issues with a group of commits, the solution of a new issue added to a cluster will likely be similar to the code changes within the commits of that cluster. The commit summary would serve as a solution recommendation for a brand new ticket to give a developer a jump-start in discovering what part of the code base will need to be modified in order to resolve the new issue.

## Areas to Explore
* Sorting the commits by time and then show how the files within the commits changed over time
* Apply the [Apriori](https://en.wikipedia.org/wiki/Apriori_algorithm) algorithm to the sets of the changed files within each commit to develop association rules between which files are likely to be changed when certain other file(s) are changed
    * Example: `(file_5, file_2 => file_1)` states that changes to `file_1` are often accompanied by changed to `file_5` and `file_2`

### Some available stats
* For a single commit
    * Files
        * list all files modified within the commit
        * list the change_type of each file
        * nloc changed in each file
        * complexity of each file
    * Methods
        * list all of the methods modified within the file/commit
        * nloc changed in each method
        * complexity of each method
        * If the method parameters changed at all
        * Author list of who made the changes
        
* For a cluster of commits
    * Files
        * list all files modified within all of the commits
            * sort by ? number of commits it was changed in, nloc changed, increase/decrease in complexity
        * show Counter of change_type for each file
        * average nloc changed per file across all of the commits
        * average complexity of the file across all of the commits
    * Methods
        

In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
pd.options.mode.chained_assignment = None 

In [33]:
from RepoClusters import RepoClusters
repoClusters = RepoClusters("data/12_02_graph.xgmml")
repoClusters.vectorize().cluster(k=5)
repo_ids = repoClusters.get_repo_ids()

In [12]:
import ipywidgets as widgets

def get_repo_cluster_issue_descriptions(repoClusters, repo_id, cluster_id):
    return repoClusters.get_repo_cluster_issues(repo_id, cluster_id)["description"].tolist()

repo_outputs = {}
accordions = {}
for repo_id in repoClusters.repo_ids:
    
    repo_outputs[repo_id] = [widgets.Output() for _ in repoClusters.get_repo_cluster_ids(repo_id)]
    accordions[repo_id] = widgets.Accordion(children=repo_outputs[repo_id])

    for output, cluster_id in zip(repo_outputs[repo_id], repoClusters.get_repo_cluster_ids(repo_id)):
        with output:
            print("Number of Issues")
            print(len(repoClusters.get_repo_cluster_issues(repo_id, cluster_id)))
            print("\nAssociation Rules")
            display(repoClusters.get_repo_cluster_commit_report(repo_id, cluster_id)["Association Rules"].round(2))
            print("\nFile Report")
            display(repoClusters.get_repo_cluster_commit_report(repo_id, cluster_id)["File Report"])
            print("Issue Descriptions")
            description_labels = [widgets.Label(value=description) 
                                  for description in get_repo_cluster_issue_descriptions(repoClusters, repo_id, cluster_id)]
            display(widgets.VBox(description_labels))

    for idx, cluster_id in enumerate(repoClusters.get_repo_cluster_ids(repo_id)):
        accordions[repo_id].set_title(idx, repoClusters.get_repo_cluster_name(repo_id, cluster_id))

tab_nest = widgets.Tab()
tab_nest.children = list(accordions.values())
for idx, repo_id in enumerate(repoClusters.repo_ids):
    tab_nest.set_title(idx, repo_id)
tab_nest  

Tab(children=(Accordion(children=(Output(), Output(), Output(), Output(), Output()), _titles={'0': 'Mind a Pul…

In [80]:
repo_ids = repoClusters.get_repo_ids()


repo_id = repo_ids[1]
cluster_id = 1

print(f"Number of issues in topic: {len(get_repo_cluster_issue_descriptions(repoClusters, repo_id, cluster_id))}")
print(repoClusters.get_repo_cluster_name(repo_id, cluster_id))
print("\nIssues\n")
for number, distance, title, description in zip(repoClusters.get_repo_cluster_issues(repo_id, cluster_id).sort_values("distance")["number"].tolist(),
                                                repoClusters.get_repo_cluster_issues(repo_id, cluster_id).sort_values("distance")["distance"].tolist(),
                                                repoClusters.get_repo_cluster_issues(repo_id, cluster_id).sort_values("distance")["title"].tolist(),
                                                repoClusters.get_repo_cluster_issues(repo_id, cluster_id).sort_values("distance")["description"].tolist()):
    print(title)
    print(number)
    print(distance)
    print(description)
    print("\n---------------------------------------------\n")

Can't start Npm
710
1.979217508526888
[2019-03-10T13_38_53_805Z-debug.log](https://github.com/kelektiv/node.bcrypt.js/files/2949514/2019-03-10T13_38_53_805Z-debug.log)  I was fixing some issues with bcrypt but didn't figured out if I corrected it or not. Now is the problem with "gulp build", maybe I did something wrong? Can you help me plz?  node: v10.15.3 npm: 6.4.1   

---------------------------------------------

Comparison fails under Ubuntu 16.03 if the hash was generated on a Windows 10 machine
639
1.9793622618597801
I ran into an issue with invalid salt revisions and failed comparisons when I generated hashes on Windows 10 (locally) and compared them under Ubuntu 16.03 (Heroku). Both sides use Node v8.11.3.  `bcrypt.hashSync('hello', 12)` on a windows machine results in something like `$2b$12$kSvPARGDmLE6lC0JKq0cXuRVgkqUljlGvF5IKeavwkwOCh8/JEXDC`.  Against my expectations the following returns false under Ubuntu/Heroku: `bcrypt.compareSync('hello', '$2b$12$kSvPARGDmLE6lC0JKq0cX


Fallback to source compile if pre-built module is ABI incompatible
530
2.4304290419304007
A pre-built module which is ABI incompatible should be detected and the installation should perform a source build. Needed for Alpine Linux Docker images.  Related issues: #528   Waiting for: https://github.com/mapbox/node-pre-gyp/issues/309

---------------------------------------------

mod; support io.js
269
2.4380415687360784
nan 1.5.0 adds support for io.js allowing the bindings to build 

---------------------------------------------

Why is async 'recommended'?
303
2.438284757420275
Is there a performance reason for using the async methods? Isn't hashing CPU bound? 

---------------------------------------------

updated async readme
481
2.4400410513807573
Feedback appreciated on this..

---------------------------------------------

fix: propagate async context
583
2.4432896691297006

---------------------------------------------

Add version 0.8.7 to changelog
429
2.4464828920168684


--