Skip to content
This repository has been archived by the owner on Mar 28, 2023. It is now read-only.

meetuparchive/git-linecat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git linecat Build Status Coverage Status

😽 a utility for transforming and categorizing git log output

🤔 about

The only constant in software is change which begs the question: What kind of patterns of change occur in your software project?

Git is a database of change but does not provide an interface for analyizing that change. This is where git-linecat can help.

📦 install

🍺 Via Homebrew

$ tap meetup/tools
$ brew install git-linecat

🏷️ Via GitHub Releases

Prebuilt binaries for OSX and Linux are available for download directly from GitHub Releases

$ curl -L \
 "https://github.com/meetup/git-linecat/releases/download/v0.0.0/git-linecat-v0.0.0-$(uname -s)-$(uname -m).tar.gz" \
  | tar -xz

🤸usage

Expects input in the form

$ git log --pretty=format:'"%H","%ae","%ai"' --numstat --no-merge

Emits output in the form of newline delimited json for further analysis

👩‍🔬analyzing data

AWS Athena makes it easy to both ask and answer questions about your json-formatted git data.

You can load git data into AWS Athena simply by piping git log into git-linecat along with a repository name, then to uplading AWS S3

$ git log --pretty=format:'"%H","%ae","%ai"' --numstat --no-merge \
	| git-linecat -r your/repo \
	| aws s3 cp - s3://your-s3-bucket/linecat.json

In the Athena console, create a "table" for your data. A table is simply simply a pointer to an S3 bucket where your data is stored and a description of the shape of the data.

CREATE EXTERNAL TABLE if not exists gitlog (
	repo string,
	sha string,
	author string,
	timestamp date,
	path string,
	category string,
	ext string,
	additions int,
	deletions int
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://your-s3-bucket/'

🔎 Sample queries

top kinds of files by frequency of change
select ext, count(*) as cnt
from gitlog
group by ext
order by cnt desc
limit 10
top paths by frequency of change
select count(*) as cnt, path
from gitlog
group by path
order by cnt desc
limit 10
top paths introducing net additions to code
select path, sum(additions - deletions) as net_adds
from gitlog
group by path
order by net_adds desc
limit 10
top changers of code ownership
select count(*) as changes, author
from gitlog
where path = 'CODEOWNERS'
group by author
order by changes desc
limit 10

tips

You may find these functions helpful in authoring queries.

👩‍🏭 development

This is a rustlang application. Go grab yourself a copy with rustup.

Meetup, Inc.

About

😽 a tool for transforming and categorizing git log output

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages