Add progress bar when fetching a model #6

miquelduranfrigola · 2022-03-21T10:00:03Z

Background

The fetch command downloads a model from a GitHub repository corresponding to the identifier of the model, for example, eos4e40. In verbose mode (ersilia -v fetch eos4e40), the logging is displayed on the terminal. However, if verbose mode is not used, (ersilia fetch eos4e40) the user does not have a sense of the progress in downloading and setting up the model on the local computer.

Requested feature

Add a progress bar or percentage to the fetch command. Progress should ideally capture:

Downloading all files from the model repository. A challenge is that large files are stored as part of Git LFS, which may make it more difficult to estimate the size of the repository.
Time needed to perform the packing functions, in particular, creating conda environments or docker containers.

The text was updated successfully, but these errors were encountered:

victorabba · 2022-03-27T15:18:04Z

I would love to contribute to this issue.

KundaiChasinda · 2022-03-27T19:46:16Z

I would like to contribute to this issue @miquelduranfrigola

prtk2001 · 2022-03-28T02:44:22Z

I'm currently working on this issue @miquelduranfrigola

GemmaTuron · 2022-03-28T06:22:23Z

Hello @KundaiChasinda, @prtk2001

Please before continuing the work in this issue, go to issue #36 as this is a required initial step to work on the project!
Thanks!

miquelduranfrigola · 2022-03-28T16:22:44Z

Hi @prtk2001 thanks for your interest in this issue. Can you briefly explain to me your approach to the problem? I just want to make sure that your solution corresponds to what we need :)

prtk2001 · 2022-03-28T17:05:39Z

Hi @miquelduranfrigola my basic approach is to use tqdm. The error I'm encountering is fetching size of files which are connected through git-lfs, it is actually showing the size of pointer. I'm trying to figure out a way for that.

miquelduranfrigola · 2022-03-28T17:35:47Z

Indeed. Getting size of git-lfs files is key here. Please see related issue #7 currently being tackled by @Riyabelle25

Using tqdm sounds good to me 👌

miquelduranfrigola · 2022-03-29T15:55:02Z

One possible workaround would be to calculate the size of the model and store this information as metadata in each model repository, correspondingly. For example, for model eos4e40 (https://github.com/ersilia-os/eos4e40) we could add a metadata.json file containing model size information. What are your thoughts?

prtk2001 · 2022-03-29T16:11:39Z

This is a unique approach, I would love to implement it

Rufaida98 · 2022-03-30T05:09:58Z

@miquelduranfrigola your idea about calculating the size which shows how much progress is an interesting approach

miquelduranfrigola · 2022-03-30T17:45:21Z

Alright, so let's take this avenue, if you both agree @prtk2001 and @Rufaida98! It will not be straightforward, though.

Many things happen at "fetching" time (ersilia fetch ...), including downloading, creating folders, creating environments, deleting folders, doing tests, etc.

I suggest the following solution:

Create a metadata.json file where each step is stored along with the time taken on a standard computer with average internet bandwidth (e.g. Ersilia's workstation).
The metadata file can only contain total disk usage.
This metadata file can be stored in every model repository.

Then, at fetching time:

The first thing we can do is check the metadata.json file.
Since this file contains steps and estimated timepoints, then we have a way of building a progress bar.

So I suggest the following. @miquelduranfrigola (me) works on points 1,2,3. As soon as I am done, I will notify you. Then @prtk2001 and @Rufaida98 can suggest an approach for 4 and 5. What do you think?

Rufaida98 · 2022-03-30T18:05:31Z

@miquelduranfrigola Amazing!. I really want to learn and happy to work with a colleague @prtk2001 :)

prtk2001 · 2022-03-30T18:24:12Z

@miquelduranfrigola sounds goood, I'm in!
same from this side @Rufaida98 :)

Riyabelle25 · 2022-03-30T21:43:51Z

@miquelduranfrigola I'd love to work with/help you on pts 1,2,3. If you have an approach in mind (asides this), do share!

mahamtariq58 · 2022-03-31T20:13:12Z

Hi ! hope you are well.
I have completed the installation steps and I am really interested in this issue. Can I start working on this issue as well?

miquelduranfrigola · 2022-04-08T06:08:05Z

Hi @Riyabelle25 and @prtk2001 I am very delayed with this - my apologies. Please give me some time, I am aware I am the bottleneck. I hope you are still in.

@Riyabelle25, my approach will be to add checkpoints in the fetching process (simple JSON files stored on disk). I have used workflow managers in the past, but in this case I want to keep it simple (Ersilia already has too many dependencies).

@mahamtariq58 thanks for volunteering. I think that at this point two point we do not assistance with this. Many thanks, though!

Riyabelle25 · 2022-04-08T07:35:16Z

@miquelduranfrigola apologies for sounding noob-ish 🙈, your approach is to add pythonic checkpoints during cmd ersilia fetch correct?
This ultimately translates to adding checkpoints during git fetch itself, as that's how we're fetching the model from its repository?

Riyabelle25 · 2022-04-08T11:47:50Z

Aight, so this is how I'm calculating the time taken for each step (defined in `fetch.py`) and showing the same in the CLI.

Opening a PR for the same now, @miquelduranfrigola do take a look 😄

Rufaida98 · 2022-04-10T10:22:09Z

After @miquelduranfrigola and @Riyabelle25 suggestions for solving the Metadata.json file as I think your both approaches are amazing. So what do you think if we check the Metadata file by defining a test function to perform the encoding and output of the size by using the "assert" e.g (assert len(encode...)==f.tell
Sorry for talking alot but let me know your opinions because you are experienced than me :)

miquelduranfrigola · 2022-04-11T06:09:31Z

Hi @Rufaida98 thanks for giving it some thought. Have you checked the open PR thread: #188 ?
I would like to understand what do you mean by encoding in this context. I am sure your suggestion makes sense but I don't fully understand it at the moment. Can you elaborate a bit more? Thanks!

GemmaTuron · 2022-10-02T17:07:19Z

Hi all!

As an update, @nataliyah123 is working on this issue we will keep you posted on the progress!

GemmaTuron · 2023-01-12T12:11:39Z

Hi all,

We have decided to temporarily leave this feature as is, see #528

Add Github action to lint when PR comment is made

miquelduranfrigola added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Mar 21, 2022

GemmaTuron added the code label Mar 27, 2022

miquelduranfrigola assigned prtk2001 Mar 28, 2022

miquelduranfrigola assigned Rufaida98 Mar 30, 2022

miquelduranfrigola mentioned this issue Mar 30, 2022

Calculate disk usage of a model #7

Closed

This was referenced Apr 8, 2022

[add] Calculates time taken for each step on fetching a model #188

Closed

Outreachy Code Project: Riya Elizabeth John #159

Closed

Rufaida98 mentioned this issue Apr 19, 2022

Outreachy Code Project: <@Rufaida98> #115

Closed

26 tasks

GemmaTuron removed the good first issue Good for newcomers label Oct 10, 2022

pavithranair mentioned this issue Jan 8, 2023

📑 Feature Request: Progress bar for model fetching #528

Closed

GemmaTuron closed this as completed Jan 12, 2023

miquelduranfrigola pushed a commit that referenced this issue Dec 21, 2023

Merge pull request #6 from hcs-t4sg/anthony/add-black-linter

54049f2

Add Github action to lint when PR comment is made

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add progress bar when fetching a model #6

Add progress bar when fetching a model #6

miquelduranfrigola commented Mar 21, 2022

victorabba commented Mar 27, 2022

KundaiChasinda commented Mar 27, 2022

prtk2001 commented Mar 28, 2022

GemmaTuron commented Mar 28, 2022

miquelduranfrigola commented Mar 28, 2022

prtk2001 commented Mar 28, 2022 •

edited

Loading

miquelduranfrigola commented Mar 28, 2022

miquelduranfrigola commented Mar 29, 2022

prtk2001 commented Mar 29, 2022

Rufaida98 commented Mar 30, 2022

miquelduranfrigola commented Mar 30, 2022

Rufaida98 commented Mar 30, 2022

prtk2001 commented Mar 30, 2022

Riyabelle25 commented Mar 30, 2022

mahamtariq58 commented Mar 31, 2022

miquelduranfrigola commented Apr 8, 2022

Riyabelle25 commented Apr 8, 2022 •

edited

Loading

Riyabelle25 commented Apr 8, 2022

Rufaida98 commented Apr 10, 2022

miquelduranfrigola commented Apr 11, 2022

GemmaTuron commented Oct 2, 2022

GemmaTuron commented Jan 12, 2023

Add progress bar when fetching a model #6

Add progress bar when fetching a model #6

Comments

miquelduranfrigola commented Mar 21, 2022

Background

Requested feature

victorabba commented Mar 27, 2022

KundaiChasinda commented Mar 27, 2022

prtk2001 commented Mar 28, 2022

GemmaTuron commented Mar 28, 2022

miquelduranfrigola commented Mar 28, 2022

prtk2001 commented Mar 28, 2022 • edited Loading

miquelduranfrigola commented Mar 28, 2022

miquelduranfrigola commented Mar 29, 2022

prtk2001 commented Mar 29, 2022

Rufaida98 commented Mar 30, 2022

miquelduranfrigola commented Mar 30, 2022

Rufaida98 commented Mar 30, 2022

prtk2001 commented Mar 30, 2022

Riyabelle25 commented Mar 30, 2022

mahamtariq58 commented Mar 31, 2022

miquelduranfrigola commented Apr 8, 2022

Riyabelle25 commented Apr 8, 2022 • edited Loading

Riyabelle25 commented Apr 8, 2022

Rufaida98 commented Apr 10, 2022

miquelduranfrigola commented Apr 11, 2022

GemmaTuron commented Oct 2, 2022

GemmaTuron commented Jan 12, 2023

prtk2001 commented Mar 28, 2022 •

edited

Loading

Riyabelle25 commented Apr 8, 2022 •

edited

Loading