Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate Git LFS data (storage) usage for each repo #133

Open
anthonyfok opened this issue Feb 8, 2022 · 0 comments
Open

Calculate Git LFS data (storage) usage for each repo #133

anthonyfok opened this issue Feb 8, 2022 · 0 comments
Assignees
Labels
Milestone

Comments

@anthonyfok
Copy link
Member

anthonyfok commented Feb 8, 2022

While GitHub does shows Git LFS data usage, it only shows the top 5 and hides the rest.

It would be nice to have a daily updated web page and/or a CLI tool that shows our Git LFS data usage across all repos for diagnostics purposes; to estimate the bandwidth usage of repo mirroring or cloning, etc.

$ gh repo list OpenDRR --limit 5 --public

Showing 5 of 35 repositories in @OpenDRR that match your search

OpenDRR/opendrr-api    REST API for OpenDRR data / API REST pour les données OpenDRR                                                  public  1h
OpenDRR/model-factory  OpenQuake compilation and data manipulation scripts                                                            public  11h
OpenDRR/python-env     Docker image for Linux based python environment                                                                public  4d
OpenDRR/riskprofiler   Web Application to Support Disaster Resilience / Application web pour soutenir la résilience aux catastrophes  public  5d
OpenDRR/boundaries     Boundary geometries for model results in Geopackage format.                                                    public  5d

Mini HOWTOs

To get a list of all our repos (including private and archived ones):

gh repo list OpenDRR --limit 200 | cut -f1

(or borrow from @DamonU2's work on #125 where direct API call is used.)

For each repo (using OpenDRR/boundaries as example):

To clone a repo without checking out LFS files:

GIT_LFS_SKIP_SMUDGE=1 gh repo clone OpenDRR/boundaries

To sum up LFS data storage usage for all files in the repo:

~/OpenDRR/boundaries$ git lfs ls-files --debug | grep size: | grep -o '[0-9]\+' | paste -sd + - | bc | numfmt --to=iec --round=nearest --format="%.2f"
8.34G
~/OpenDRR/boundaries$ git lfs ls-files --debug --all | grep size: | grep -o '[0-9]\+' | paste -sd + - | bc | numfmt --to=iec --round=nearest --format="%.2f"
20.50G

where the relevant options for git lfs ls-files are:

  • -d --debug:
    Show as much information as possible about a LFS file. This is intended
    for manual inspection; the exact format may change at any time.

  • -a --all:
    Inspects the full history of the repository, not the current HEAD (or other
    provided reference). This will include previous versions of LFS objects that
    are no longer found in the current tree.

The 20.50G figure matches that reported by GitLab at https://gitlab.com/groups/OpenDRR/-/usage_quotas#storage-quota-tab. It is actually 20.50 GiB (10243). When numfmt --to=si is used, it is 22.01 GB (10003).

While git lfs ls-files --size also gives size information, it is given in human-readable form (e.g. 2.5 GB and thus not as precise.

Credit (for the use of paste, bc and numfmt): linux - Sum up numbers with KB/MB/GB/TB/PB... suffixes - Unix & Linux Stack Exchange


For the size of the Git repo itself without counting LFS storage:

$ curl https://api.github.com/repos/OpenDRR/boundaries 2> /dev/null | grep size | tr -dc '[:digit:]'
7294

Credit: https://stackoverflow.com/questions/8646517/how-can-i-see-the-size-of-a-github-repository-before-cloning-it

There is also https://github.com/github/git-sizer which "[c]ompute[s] various size metrics for a Git repository, flagging those that might cause problems".

@anthonyfok anthonyfok added the Task label Feb 8, 2022
@anthonyfok anthonyfok added this to the Sprint 51 milestone Feb 8, 2022
@anthonyfok anthonyfok self-assigned this Feb 8, 2022
anthonyfok added a commit to anthonyfok/github-experiments that referenced this issue Feb 9, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 51, Sprint 52 Feb 15, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 52, Sprint 53 Feb 28, 2022
@drotheram drotheram modified the milestones: Sprint 53, Sprint 56 Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants