Analyze forks network to find hidden gems
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
config
deps
example/mouseless
out
scripts
.gitignore
.gitmodules
Dockerfile
LICENSE
README.md
docker-compose.yml

README.md

git-forks-analysis

Analyze forks network to find interesting forks, commits, file changes.

Why build it?

  • Frustration with navigation forks graph / network on GitHub
  • Too many forks with nothing interesting and noisy commits
  • Wanting to find changes made to a file, function or a set of lines across the network to avoid duplicating the work (fixing similar bugs or building similar features)

How does it work?

  • Gets all forks and all branches into a single repository
  • Make available git data mining tools such as gitinspector and git_stats. Use the tools in this repository as they have been modified to work across all branches
  • View other Tips below on how to search across the forks network using Git commands

How to install it?

git clone https://github.com/hbt/git-forks-analysis
docker-compose pull 

How to use it?

To generate HTML visualization of forks

cd bin

./analyze-forks username repository

# retrieve all direct forks for https://github.com/hbt/mouseless
./analyze-forks hbt mouseless

# retrieve all forks recursively (the whole network) https://github.com/frost-nzcr4/find_forks -- purposefully chose a small repo to avoid running this by mistake
./analyze-forks-deep frost-nzcr4 find_forks

Calling the gitinspector directly via CLI

./bin/gitinspector /out/mouseless
./bin/gitinspector --grading=true /out/mouseless
./bin/gitinspector --grading=true  --format=html /out/mouseless

What does it look like?

How to find interesting forks?

How to search for commits per file across all forks and branches?

cd out/mouseless
git log --all background_scripts/extension-reloader.js

# include diffs
git log -p --all background_scripts/extension-reloader.js

How to search for changes in a function across all forks and branches?

# look for contributions to a specific function across all forks
git log --all -p -L ":getCenters":lib/model/PersonInfo.php --ignore-all-space --ignore-space-change --ignore-space-at-eol --ignore-blank-lines


# also accepts line numbers range
git log --all -p -L 13,20:lib/model/PersonInfo.php --ignore-all-space --ignore-space-change --ignore-space-at-eol --ignore-blank-lines

#modify ~/.gitattributes to add language support
#*.php diff=php
#*.js diff=node

#normalize the repo in case of ^M
#Note: this might corrupt some file formats (e.g binary, images etc.)
#https://superuser.com/questions/293941/rewrite-git-history-to-replace-all-crlf-to-lf

#**Note: perform all operations on tmpfs. Much faster**
#normalize the whole repo and its history
# specific file  (much faster) -- few minutes
git filter-branch --tree-filter 'git ls-files lib/model/PersonInfo.php -z | xargs -0 fromdos' -- --all

#whole repo but takes longer
git filter-branch --tree-filter 'git ls-files -z | xargs -0 fromdos' -- --all


#Alternative if language is not properly supported is to use pickaxe
git log --all -p -S"function createHints"  content_scripts/hints.js

# for some languages, --function-context works well
git log --all -p -ScreateHints --function-context content_scripts/hints.js

How to search commits that have not been merged?

git log -p --all --not master --no-merges

How to find most modified files in forks?

git log --all --pretty=format: --name-only --not master --no-merges | sort | uniq -c | sort -rg | head -10

Other git data mining tools worth a mention

Contribute: Get in touch if you have a git data mining tool recommendation