Kospex is a CLI which aims to "look at the guts of your code" to help gain insights into your developers and technology landscape. It uses database structure from the excellent Mergestat lite to model data from git repositories.
kospex is currently a collection of tools, bound together by python in a docker container. It works by analysing cloned repositories on the filesystem.
See section "Git code layout for running analysis" below. For simple "one off" analysis just clone some repo's into a directory as a starting point.
Ideally, a structure like
/BASE/GIT_SERVER/ORG/REPO
git clone https://github.com/kospex/kospex.git
Optional but strongely recommended - use a python virtual env.
pip install -r requirements.txt
Follow the instructions for installing scc
export PATH=$PATH:$PWD
Add this directory to your path, kospex toolkit is a collection on python executables.
If you are ok to use the ~/code directory for cloned repos, then run:
kospex init --default
kospex sync [GIT_REPO]
kospex developers -repo [GIT_REPO]
kospex tech-landscape -metadata
You can also use the kgit command to clone and sync a repo you have access to
kgit clone -sync -repo https://github.com/mergestat/mergestat-lite
The above command will clone into the KOSPEX_CODE/GIT_SERVER/ORG/REPO structure
One option, if you're inspecting code on your own laptop is to use use your home directory.
~/kospex/
We'll place config files and the kospex DB (Sqlite3) in here for sync'ed data
~/code/
This should be your GIT_DATA_DIRECTORY with a structure like
GIT_SERVER/ORG/REPO
For example, in your ~/code it might look like:
github.com/kospex/kospex
github.com/mergestat/mergestat-lite
This way we have a nice deterministic way of separating different orgs, potentially different instances (e.g. you have an on premise bitbucket and use GitHub.com) as well.
- Identify technology landscape
- Identify active developers (e.g. who's had their code committer in the last 90 days)
- Identify key person or offboarding risk
- Identify potential complexity challenges (or conceptual integrity concerns)
- Aggregate repo metadata into a single database for easier and faster querying
kospex sync PATH/TO/GIT_REPO
Most functions require the data to be synced.
List the active developers (90 days) in the given repo (sync required)
kospex developers -repo PATH/TO/REPO
List the developers in the given repo_id (sync'ed data in the kospex DB)
kospex developers -repo_id=github.com˜ORG˜REPO
use -days NUM for seen in the last # of days (e.g. 90 or 365)
List the overall tech stack, based on file extension, for all sync'ed repos
kospex tech-landscape
List the overall tech stack for all sync'ed repos (using scc)
kospex tech-landscape -metadata
List the overall tech stack for a repo (using scc)
kospex tech-landscape -repo PATH/TO/REPO
List the tech stack in the given repo_id (sync'ed data in the kospex DB)
kospex tech-landscape -repo_id=github.com˜ORG˜REPO
- Precompute data where possible and useful
- Flatten tables, data warehouse style, to enabled easier querying and slicing by git server, owner and repo
- Be as agnostic to the git provider (i.e. GitHub, BitBucket, GitLab) for base use cases
- Be mindful that "there is no perfect", only indicators
- Separate cloning and pull updates from the analysis
- Build out automated functional and regressions testing (Currently manual)
- Build the ability to identify key person or offboarding risk
- Improve use case documentation
- Provide a mechanism to better map author_emails to users
Most tables have a column, _repo_id, in the format of GIT_SERVER˜OWNER˜REPO
So for the repository https://github.com/kospex/kospex the _repo_id would be github.com˜kospex˜kospex
Most queries use author_email from git to mean a "developer". This is not always accurate as GitHub
We're aiming to [k]now your c[o]de by in[spe]cting the haruspe[x]. From Wikipedia, The Latin terms haruspex and haruspicina are from an archaic word, hīra = "entrails, intestines"
So we're going to help look at the "guts of your code" to gain an understanding of the applications, technology landscape (sprawl?) and developers.