Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress bar when fetching a model #6

Closed
miquelduranfrigola opened this issue Mar 21, 2022 · 22 comments
Closed

Add progress bar when fetching a model #6

miquelduranfrigola opened this issue Mar 21, 2022 · 22 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@miquelduranfrigola
Copy link
Member

Background

The fetch command downloads a model from a GitHub repository corresponding to the identifier of the model, for example, eos4e40. In verbose mode (ersilia -v fetch eos4e40), the logging is displayed on the terminal. However, if verbose mode is not used, (ersilia fetch eos4e40) the user does not have a sense of the progress in downloading and setting up the model on the local computer.

Requested feature

Add a progress bar or percentage to the fetch command. Progress should ideally capture:

  1. Downloading all files from the model repository. A challenge is that large files are stored as part of Git LFS, which may make it more difficult to estimate the size of the repository.
  2. Time needed to perform the packing functions, in particular, creating conda environments or docker containers.
@miquelduranfrigola miquelduranfrigola added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Mar 21, 2022
@victorabba
Copy link
Contributor

I would love to contribute to this issue.

@KundaiChasinda
Copy link

I would like to contribute to this issue @miquelduranfrigola

@prtk2001
Copy link

I'm currently working on this issue @miquelduranfrigola

@GemmaTuron
Copy link
Member

Hello @KundaiChasinda, @prtk2001

Please before continuing the work in this issue, go to issue #36 as this is a required initial step to work on the project!
Thanks!

@miquelduranfrigola
Copy link
Member Author

Hi @prtk2001 thanks for your interest in this issue. Can you briefly explain to me your approach to the problem? I just want to make sure that your solution corresponds to what we need :)

@prtk2001
Copy link

prtk2001 commented Mar 28, 2022

Hi @miquelduranfrigola my basic approach is to use tqdm. The error I'm encountering is fetching size of files which are connected through git-lfs, it is actually showing the size of pointer. I'm trying to figure out a way for that.

@miquelduranfrigola
Copy link
Member Author

Indeed. Getting size of git-lfs files is key here. Please see related issue #7 currently being tackled by @Riyabelle25

Using tqdm sounds good to me 👌

@miquelduranfrigola
Copy link
Member Author

One possible workaround would be to calculate the size of the model and store this information as metadata in each model repository, correspondingly. For example, for model eos4e40 (https://github.com/ersilia-os/eos4e40) we could add a metadata.json file containing model size information. What are your thoughts?

@prtk2001
Copy link

This is a unique approach, I would love to implement it

@Rufaida98
Copy link
Contributor

@miquelduranfrigola your idea about calculating the size which shows how much progress is an interesting approach

@miquelduranfrigola
Copy link
Member Author

Alright, so let's take this avenue, if you both agree @prtk2001 and @Rufaida98! It will not be straightforward, though.

Many things happen at "fetching" time (ersilia fetch ...), including downloading, creating folders, creating environments, deleting folders, doing tests, etc.

I suggest the following solution:

  1. Create a metadata.json file where each step is stored along with the time taken on a standard computer with average internet bandwidth (e.g. Ersilia's workstation).
  2. The metadata file can only contain total disk usage.
  3. This metadata file can be stored in every model repository.

Then, at fetching time:

  1. The first thing we can do is check the metadata.json file.
  2. Since this file contains steps and estimated timepoints, then we have a way of building a progress bar.

So I suggest the following. @miquelduranfrigola (me) works on points 1,2,3. As soon as I am done, I will notify you. Then @prtk2001 and @Rufaida98 can suggest an approach for 4 and 5. What do you think?

@Rufaida98
Copy link
Contributor

@miquelduranfrigola Amazing!. I really want to learn and happy to work with a colleague @prtk2001 :)

@prtk2001
Copy link

@miquelduranfrigola sounds goood, I'm in!
same from this side @Rufaida98 :)

@Riyabelle25
Copy link
Contributor

@miquelduranfrigola I'd love to work with/help you on pts 1,2,3. If you have an approach in mind (asides this), do share!

@mahamtariq58
Copy link

Hi ! hope you are well.
I have completed the installation steps and I am really interested in this issue. Can I start working on this issue as well?

@miquelduranfrigola
Copy link
Member Author

Hi @Riyabelle25 and @prtk2001 I am very delayed with this - my apologies. Please give me some time, I am aware I am the bottleneck. I hope you are still in.

@Riyabelle25, my approach will be to add checkpoints in the fetching process (simple JSON files stored on disk). I have used workflow managers in the past, but in this case I want to keep it simple (Ersilia already has too many dependencies).

@mahamtariq58 thanks for volunteering. I think that at this point two point we do not assistance with this. Many thanks, though!

@Riyabelle25
Copy link
Contributor

Riyabelle25 commented Apr 8, 2022

@miquelduranfrigola apologies for sounding noob-ish 🙈, your approach is to add pythonic checkpoints during cmd ersilia fetch correct?
This ultimately translates to adding checkpoints during git fetch itself, as that's how we're fetching the model from its repository?

@Riyabelle25
Copy link
Contributor

Screenshot 2022-04-08 at 5 06 01 PM

Aight, so this is how I'm calculating the time taken for each step (defined in `fetch.py`) and showing the same in the CLI.

Opening a PR for the same now, @miquelduranfrigola do take a look 😄

@Rufaida98
Copy link
Contributor

After @miquelduranfrigola and @Riyabelle25 suggestions for solving the Metadata.json file as I think your both approaches are amazing. So what do you think if we check the Metadata file by defining a test function to perform the encoding and output of the size by using the "assert" e.g (assert len(encode...)==f.tell
Sorry for talking alot but let me know your opinions because you are experienced than me :)

@miquelduranfrigola
Copy link
Member Author

Hi @Rufaida98 thanks for giving it some thought. Have you checked the open PR thread: #188 ?
I would like to understand what do you mean by encoding in this context. I am sure your suggestion makes sense but I don't fully understand it at the moment. Can you elaborate a bit more? Thanks!

@GemmaTuron
Copy link
Member

Hi all!

As an update, @nataliyah123 is working on this issue we will keep you posted on the progress!

@GemmaTuron
Copy link
Member

Hi all,

We have decided to temporarily leave this feature as is, see #528

miquelduranfrigola pushed a commit that referenced this issue Dec 21, 2023
Add Github action to lint when PR comment is made
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants