# Ant

## 1. Consolidate commits
---

To get the release dates, I look for dates of commits tagged for release. To do this, I ran git log as follows:
```sh
git log --no-walk --tags --pretty="%%aD h \t %d" --decorate=auto > ant_tags.csv
```
This is saved in `ant_tags.csv`


Next, to get all commits made to date, I ran the following command:
```sh
git log --pretty=format:"%h%ad%s" > ant_commits.csv
```
This is saved as ant_tags.csv.

I had to preprocess the csvs to remove any timezone info. I used the following:
```sh
/^(?:Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])$/
```

The goal of the rest of this code is combine the above two files to create a consolidated list of commits and release dates as seperate csv files. Each of the individual files will pertain to one release, i.e., all the commits that have been made post that release and prior to the next release.

### 1.1.  Imports

In [1]:
from __future__ import print_function
import os, sys, shlex
import pandas as pd
from pdb import set_trace
import warnings
from datetime import datetime
import dateutil
import subprocess
from glob2 import glob

warnings.filterwarnings("ignore")

root = os.path.join(os.getcwd().split("bug_miner")[0], "bug_miner")
if root not in sys.path:
    sys.path.append(root)

### 1.2. Read the csv files

1. The date is raw, so read it and add column labels.
2. Convert column 1 (which is a string) to a datetime format.
3. Sort data chronologically


In [2]:
def get_data_as_dframe():
    releases = pd.read_csv("ant_tags.csv", delimiter="  ", header=None)
    releases.columns = ["Timestamp", "Commit_ID", "Version"]
    all_commits = pd.read_csv("ant_commits.txt", delimiter="___", header=None)
    all_commits.columns = ["Timestamp", "Commit_ID", "Commit Message"]
    "Formate Datetime"
    releases["Timestamp"] = releases["Timestamp"].apply(lambda x: dateutil.parser.parse(x))
    all_commits["Timestamp"] = all_commits["Timestamp"].apply(lambda x: dateutil.parser.parse(x))
    "Sort data chronologically"
    releases = releases.sort_values(by="Timestamp").reset_index(drop=True)
    all_commits = all_commits.sort_values(by="Timestamp").reset_index(drop=True)
    
    return all_commits, releases

### 1.3. Find commits between releases

In [3]:
all_commits, releases = get_data_as_dframe()
commits_made = dict()
for current_release_date, next_release_date, current_release_version in zip(releases["Timestamp"][:-1], releases["Timestamp"][1:], releases["Version"][:-1]):
    commits_made.update({current_release_version: list()})
    for index, commit in all_commits.iterrows():
        if current_release_date <= commit["Timestamp"] < next_release_date:
            commits_made[current_release_version].append(commit.values.tolist()[1:])

### 1.4. Save commits between each of the releases as csv (to be processed later..)

In [4]:
for version, commits_between in commits_made.iteritems():
    commits = pd.DataFrame(commits_between, columns=["Hash", "Message"])
    spath = os.path.abspath(os.path.join(root, "bug_miner/projects/consolidated/ant", "ant-{}.csv".format(version)))
    commits.to_csv(spath, index=False)

## 2. Create A Dataset

Now the task is to compute CK Metrics using the [ckjm tool](https://github.com/dspinellis/ckjm) for one release, then, look for files that changed after one release and before the next release using the commit hashes.

### 2.1 Compute CK Metrics for releases

In [None]:
metrics = []
# Location of the ckjm metrics suit
ckjm_path = os.path.expanduser("~/git/ckjm/target/runable-ckjm_ext-2.3-SNAPSHOT.jar")
# The root dir of the test project's git repo.
project_src = os.path.join(root, "bug_miner/projects/raw/ant/") 
# Get all the release data
version_commits = glob(os.path.join(root, "bug_miner/projects/consolidated/ant/*.csv"))

def git_checkout(hash):
    return subprocess.call("git checkout -f {}".format(hash), stdout=subprocess.PIPE, stderr=open(os.devnull, "w"), shell=True)

def run_ckjm():
    for class_file in glob(project_src+"/**/*.class"):        
        cmd = shlex.split("java -jar {} -s {}".format(ckjm_path, class_file))
        print(subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0])
        set_trace()

    
#     print(subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=open(os.devnull, "w"))

def save_metrics():
    for version in version_commits:
        hashes = pd.read_csv(version)["Hash"]
        git_checkout(hashes.values[0])
        metrics = run_ckjm().communicate()
        set_trace()
        
#     print("<metrics>", "\n".join(metrics), "</metrics>", sep="\n", file=open(os.path.join(save_path, version + ".xml"), "w+"))
    
    
if __name__ == "__main__":
    save_metrics()

org.apache.tools.ant.IntrospectionHelper$1 2 2 0 6 3 0 0 6 0 0.5000 17 0.0000 0 0.8571 0.7000 0 0 6.5000
 ~ Object create(org.apache.tools.ant.Project project, Object parent, Object ignore): 1
 ~ void <init>(org.apache.tools.ant.IntrospectionHelper this$0, java.lang.reflect.Method m, Object): 1


> <ipython-input-13-7b63432500bb>(13)run_ckjm()
-> for class_file in glob(project_src+"/**/*.class"):


In [None]:
version_commits