Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[add] Calculate disk usage of fetched model and display the same #38

Merged
merged 1 commit into from
Mar 30, 2022

Conversation

Riyabelle25
Copy link
Contributor

@Riyabelle25 Riyabelle25 commented Mar 28, 2022

Solves #7
Using the very handy GitHub API to get information about the model repo in question, then grepping the API response to simply show the size in KB.
My changes to achieve this are:

        import os

        url = "https://github.com/ersilia-os/{0}".format(model_id)
        cmd = "echo " + url + "| perl -ne 'print $1 if m!([^/]+/[^/]+?)(?:\.git)?$!' | xargs -I{} curl -s -k https://api.github.com/repos/'{}' | grep size"
        echo(
            "The disk storage of this model in KB is"
        )
        os.system(cmd) 

Attached herein is how the output looks currently:
Screenshot 2022-03-28 at 3 30 23 PM

@Riyabelle25
Copy link
Contributor Author

Hi @miquelduranfrigola, do review my solution and let me know your thoughts.
This is a quick fix, as I have used the GitHub API before in previous open-source projects

@miquelduranfrigola
Copy link
Member

Hi @Riyabelle25 - many thanks for this good first step. Is there a way to track the size of files stored in Git LFS through the GitHub API? I like your solution but, since big files are stored in Git LFS, those would be the most important to track.

@Riyabelle25
Copy link
Contributor Author

Riyabelle25 commented Mar 28, 2022

GitHub API currently does not support tracking the size of the LFS files, though it is a feature request.

I have been exploring other solutions, and a tentative one would be:

  1. cat the .gitattributes to get the file extension that would be tracked as LFS.
  2. Iterate through the remote repo, find the files ending with that extension
  3. grep through each file's content, for their listed size. (I read that pointer files have the size of the LFS file mentioned in them) For eg:
  4. Add each size to get the total size.
outs:
- md5: 6781e0baec8d65b9a95a3e879a5098d1
  size: 800
  path: data.h5

An easier quick fix might be to use the GitHub API to search through the model's repo for pointer file instances and then add the sizes. Will look into this approach tomorrow

Copy link
Member

@miquelduranfrigola miquelduranfrigola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am merging this PR (thanks!) as we still try to fix the Git LFS issue.

@miquelduranfrigola miquelduranfrigola merged commit 85f19ee into ersilia-os:master Mar 30, 2022
@Riyabelle25
Copy link
Contributor Author

Thank you @miquelduranfrigola!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants