Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions src/main/kotlin/app/CodeLongevity.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
// Copyright 2017 Sourcerer Inc. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

author

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, I'll file a follow up email on it

// Author: Alexander Surkov (alex@sourcerer.io)

package app

import org.eclipse.jgit.diff.DiffFormatter
import org.eclipse.jgit.diff.DiffEntry
import org.eclipse.jgit.diff.RawText
import org.eclipse.jgit.internal.storage.file.FileRepository
import org.eclipse.jgit.lib.ObjectId
import org.eclipse.jgit.lib.ObjectLoader
import org.eclipse.jgit.lib.Repository
import org.eclipse.jgit.revwalk.RevCommit
import org.eclipse.jgit.revwalk.RevWalk
import org.eclipse.jgit.treewalk.TreeWalk
import org.eclipse.jgit.util.io.DisabledOutputStream

import java.text.SimpleDateFormat
import java.util.Date

/**
* Represents a code line in a file revision.
*/
class RevCommitLine(val commit: RevCommit, val file: String, val line: Int)

/**
* Represents a code line in repo's history.
*
* TODO(Alex): the text arg is solely for testing proposes (remove it)
*/
class CodeLine(val from: RevCommitLine, val to: RevCommitLine, val text: String) {

// TODO(alex): oldId and newId may be computed as a hash built from commit,
// file name and line number, if we are going to send the data outside a
// local machine.

/**
* Id of the code line in a revision when the line was added. Used to
* identify a line and update its lifetime computed at the previous
* iteration.
*/
val oldId: String = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we don't send lines to server oldId and newId is hashes of commits in which they appeared and deleted?
Do we want to send lines to server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the server should keep (hash, age) pairs, which are used for age calculation for sure and to update age of living lines on a subsequent run. The server may but doesn't have to keep the records of deleted lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of storing them on the server? We could use local storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should avoid sending anything to server except for stuff we can't do without. we don't need some blogger saying to everyone that sourcerer uploads per-line information of your proprietary code. if it can be done locally, let's do it locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how commit info is different from line info when uploaded from a local machine in terms of blogger's speculations, but I guess lines can be stored and processed locally if we can set up a local DB for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no line info uploads to server. Local DB is fine assuming it does not introduce new dependencies to the app. A plain text file is probably the best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok, however it's out of scope of this pull request, which was about alg to compute lines age. So, what's left before we can call it done? I assume, revert Main.kt changes, remove testing func, anything else? Or do you want to extend the scope to support lines age storage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case, it's not a wanted change, right? :)


/**
* Id of the code line in a revision, where the line was deleted, or a head
* revision, if the line is alive.
*/
val newId: String = ""

/**
* The code line's age in seconds.
*/
val age = to.commit.getCommitTime() - from.commit.getCommitTime()

/**
* A pretty print of a code line; debugging.
*/
override fun toString() : String {
val df = SimpleDateFormat("yyyy-MM-dd HH:mm z")
val fd = df.format(Date(from.commit.getCommitTime().toLong() * 1000))
val td = df.format(Date(to.commit.getCommitTime().toLong() * 1000))
val fc = "${from.commit.getName()} '${from.commit.getShortMessage()}'"
val tc = "${to.commit.getName()} '${to.commit.getShortMessage()}'"
return "Line '$text' - '${from.file}:${from.line}' added in $fc $fd\n" +
" last known as '${to.file}:${to.line}' in $tc $td"
}
}

/**
* Used to compute age of code lines in the repo.
*/
class CodeLongevity(repoPath: String, tailRev: String) {
val repo = FileRepository(repoPath)
val head: RevCommit =
RevWalk(repo).parseCommit(repo.resolve("refs/heads/master"))
val tail: RevCommit? =
if (tailRev != "") RevWalk(repo).parseCommit(repo.resolve(tailRev))
else null

/**
* A list of all code lines, both alive and deleted, between the given
* revisions.
*/
var codeLines: MutableList<CodeLine> = mutableListOf()

init {
compute()
}

// TODO(alex) debugging, remove it
fun ohNoDoesItReallyWork(email: String) {
var sum: Long = 0
var total: Long = 0
for (line in codeLines) {
val author = line.from.commit.getAuthorIdent()
if (author.getEmailAddress() != email) {
continue
}
println(line.toString())
println(" Age: ${line.age} secs")
sum += line.age
total++
}

//println("All lines:")
//codeLines.forEach { line -> line.printme() }

var avg = if (total > 0) sum / total else 0
println("avg code line age for <$email> is ${avg} seconds, lines total: ${total}")
}

/**
* Scans through the repo for alive and deleted code lines, and stores them
* in the [codeLines] list.
*/
private fun compute() {
val treeWalk = TreeWalk(repo)
treeWalk.setRecursive(true)
treeWalk.addTree(head.getTree())

val files: MutableMap<String, ArrayList<RevCommitLine>> = mutableMapOf()

// Build a map of file names and their code lines.
while (treeWalk.next()) {
val path = treeWalk.getPathString()
val fileLoader = repo.open(treeWalk.getObjectId(0))
if (!RawText.isBinary(fileLoader.openStream())) {
val fileText = RawText(fileLoader.getBytes())
var lines = ArrayList<RevCommitLine>(fileText.size())
for (idx in 0 .. fileText.size() - 1) {
lines.add(RevCommitLine(head, path, idx))
}
files.put(path, lines)
}
}

val df = DiffFormatter(DisabledOutputStream.INSTANCE)
df.setRepository(repo)
df.setDetectRenames(true)

val revWalk = RevWalk(repo)
revWalk.markStart(head)

var commit: RevCommit? = revWalk.next() // move the walker to the head
while (commit != null && commit != tail) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just while (commit != tail) will work just as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tail it not empty and not a tail (just a random revision), then we have to crash, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asurkov yes, you right, keep it then

var parentCommit: RevCommit? = revWalk.next()

println("commit: ${commit.getName()}; '${commit.getShortMessage()}'")
if (parentCommit != null) {
println("parent commit: ${parentCommit.getName()}; '${parentCommit.getShortMessage()}'")
}
else {
println("parent commit: null")
}

// A step back in commits history. Update the files map according
// to the diff.
val diffs = df.scan(parentCommit, commit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scan works if parentCommit is null? Probably precondition while would be more readable and no additional test would be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the way to obtain a first commit changes / could you give an example?

Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

var commit: RevCommit? = revWalk.next()
while (commit != null && commit != tail) {
    var parentCommit: RevCommit? = revWalk.next()
    ...
    commit = parentCommit
}

for (diff in diffs) {
val oldPath = diff.getOldPath()
val oldId = diff.getOldId().toObjectId()
val newPath = diff.getNewPath()
val newId = diff.getNewId().toObjectId()
println("old: '$oldPath', new: '$newPath'")

// Skip binary files.
var fileId = if (newPath != DiffEntry.DEV_NULL) newId else oldId
if (RawText.isBinary(repo.open(fileId).openStream())) {
continue
}

// TODO(alex): does it happen in the wilds?
if (diff.changeType == DiffEntry.ChangeType.COPY) {
continue
}

// File was deleted, put its lines into the files map.
if (diff.changeType == DiffEntry.ChangeType.DELETE) {
val fileLoader = repo.open(oldId)
val fileText = RawText(fileLoader.getBytes())
var lines = ArrayList<RevCommitLine>(fileText.size())
for (idx in 0 .. fileText.size() - 1) {
lines.add(RevCommitLine(commit, oldPath, idx))
}
files.put(oldPath, lines)
}

// If a file was deleted, then the new path is /dev/null.
val path = if (newPath != DiffEntry.DEV_NULL) newPath else oldPath
var lines = files.get(path)!!

// Update the lines array to match the diff's edit list changes.
// Traverse the edit list backwards to keep indices of the edit
// list and the lines array in sync.
val editList = df.toFileHeader(diff).toEditList().asReversed()
for (edit in editList) {
val delStart = edit.getBeginA()
val delEnd = edit.getEndA()
val delCount = edit.getLengthA()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't delCount == delEnd - delStart ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's same, they provide then method, I use it. I'm ok with either way and can change it if find it more readable.

var insStart = edit.getBeginB()
var insEnd = edit.getEndB()
val insCount = edit.getLengthB()
println("del ($delStart, $delEnd), ins ($insStart, $insEnd)")

// Deletion case. Chase down the deleted lines through the
// history.
if (delCount > 0) {
var tmpLines = ArrayList<RevCommitLine>(delCount)
for (idx in delStart .. delEnd - 1) {
tmpLines.add(RevCommitLine(commit, oldPath, idx))
}
lines.addAll(delStart, tmpLines)
}

// Insertion case. Track it.
if (insCount > 0) {
val fileLoader = repo.open(newId)
val fileText = RawText(fileLoader.getBytes())

for (idx in insStart .. insEnd - 1) {
val from = RevCommitLine(commit, newPath, idx)
var to = lines.get(idx)
val cl = CodeLine(from, to, fileText.getString(idx))
codeLines.add(cl)
}
lines.subList(insStart, insEnd).clear()
}
}

// File was renamed, tweak the files map.
if (diff.changeType == DiffEntry.ChangeType.RENAME) {
files.set(oldPath, files.remove(newPath)!!)
}
}
commit = parentCommit
}

// If a tail revision was given then the map has to contain unclaimed
// code lines, i.e. the lines added before the tail revision. Push
// them all into the result lines list, so the caller can update their
// ages properly.
if (tail != null) {
for ((file, lines) in files) {
for (idx in 0 .. lines.size - 1) {
val from = RevCommitLine(tail, file, idx)
val cl = CodeLine(from, lines[idx], "no data (too lazy to compute)")
codeLines.add(cl)
}
}
}
}
}