Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] benchmarks: add tqdm progress bar #6623

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Spinachboul
Copy link

Reference Issues/PRs

Fixes Issue #6560

What does this implement/fix? Explain your changes.

Adds progress bar using tqdm library

What should a reviewer concentrate their feedback on?

Focus on testing the progress while calling the API

Did you add any tests for the change?

Yes, tested the progress bar functionality using sample model registry and model validation

Capture

@@ -6,6 +6,7 @@
from typing import List, Optional, Union

import pandas as pd
from tqdm import tqdm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tqdm part of our core dependencies, or implied by those? If not, this is going to fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid tqdm is an independent package

Copy link
Collaborator

@yarnabrina yarnabrina Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not doubting that it's a package of it's own, but it could be implied by other dependencies of sktime. For example, statsforecast (part of forecasting subset) will imply tqdm and so on.

As far as I know none of the core dependencies, e.g. pandas, numpy, scikit-learn etc. will imply tqdm, so this line will fail CI. What you should do is move this import to inside your functions/methods and use tqdm only if it present. There is no need to make progress bars mandatory, especially if someone uses it in non-interactive setup.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR makes more substantial changes than just adding a progress bar? For instance, it changes the logger files or the structure of the code.

Change requests:

  • leave the logic unchanged in this PR. If you want to change or improve, you are much welcome, but it should be a separate PR.
  • I would recommend a refactor: factor out the loop into a single private function, that is, a single execution of validation_spec and model_spec. This will play well with parallelization efforts later.
  • we need to make sure that use of tqdm is optional, and the code still runs if tqdm is not installed.

@fkiraly fkiraly changed the title Add Progress bar [ENH] benchmarks: add tqdm progress bar Jun 16, 2024
@fkiraly fkiraly added enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules labels Jun 16, 2024
@Spinachboul
Copy link
Author

@fkiraly
Sure then! I'll keep the base logic of the code unchanged , but just add the progress bars in the console
Could I import the dependencies inside the functions to avoid CI errors during checks!

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 19, 2024

Could I import the dependencies inside the functions to avoid CI errors during checks!

Yes, I would also make sure that everything runs without the tqdm dependency as well.

@Spinachboul
Copy link
Author

@fkiraly & @benHeid
I hope this should be the final commit before merging!
Please tell if any changes need to be made to the code.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think there should be verbose mode and non-verbose mode, I would switch that with a verbose arg that can be True or False (default). If False, tqdm is not imported, and no prints are done.

I think this is quite easy to achieve, since you can do an easy switch on the generator used in the two for loops, e.g., if verbose: model_registry_generator = tqdm(...) etc

There should also be some consistency with the logger use.

@Spinachboul
Copy link
Author

Spinachboul commented Jun 21, 2024

@fkiraly & @benHeid
Would this be to your liking
I have included a verbose as one of the params which will control the verbosity of the progress bars
If verbose = False, ,then the funtion tqdm defined would let the program remain functional else would invoke progress bars while executing the for loops
Also I have made the program consistent by using logger.info() instead of print statements

Capture

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 21, 2024

Would this be to your liking

This works but imo it is unnecessarily obscure. How about

if verbose and tqdm_available:
    validation_gen = tqdm(validation_registry.all())
else:
    validation_gen = validation_registry.all()

@yarnabrina
Copy link
Collaborator

I'm unable to see any conditions in the changes tab on this Github PR currently regarding verbose mode. I'm also unable to see any check for presence of tqdm soft dependency being installed or not.

Is it just me? @fkiraly are you able to see?

Copy link
Contributor

@benHeid benHeid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. I have two requests that would improve the user experience.

for validation_spec in validation_registry.all():
print("Printing Validations...")
print("\n")
for validation_spec in tqdm(validation_registry.all()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to avoid the print statements, you can add a description of the loop into the tqdm call.

for model_spec in model_registry.all():
print("Running models...")
print("\n")
for model_spec in tqdm(model_registry.all()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to set leave=False to the second loop. This would remove the inner progress bar after the loop is finished. This would avoid a wall of loops and the user has not to scroll up to see the outer loop.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 21, 2024

I'm unable to see any conditions in the changes tab on this Github PR currently regarding verbose mode. I'm also unable to see any check for presence of tqdm soft dependency being installed or not.

I think the changes are only in the screenshot, which does not correspond ot the state of the fork. @Spinachboul, can you please make sure you update the PR to the most recent state?

@Spinachboul
Copy link
Author

@fkiraly | @benHeid | @yarnabrina | @achieveordie
I have updated the PR in the following ways

  • Added the verbosity to the progress bars as suggested by @fkiraly
  • Removed all the print statements and addional logger info(), while adding desciptions to the progress bars as suggested by @benHeid
  • Also removed the code obscurity by importing tqdm additionally as suggested by @fkiraly
  • Even updated the params in the main function!

@@ -1,4 +1,3 @@
"""Interface for running a registry of models on a registry of validations."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you remove this line?

@@ -58,16 +61,42 @@ def run(
results_df = pd.DataFrame(columns=["validation_id", "model_id", "runtime_secs"])
results_df["runtime_secs"] = results_df["runtime_secs"].astype(int)

tqdm_available = False
if verbose:
try:
Copy link
Collaborator

@fkiraly fkiraly Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try/except is not good style, if you know the condition you want to check.
In that case, simply check the condition, you can do this with _check_soft_dependencies as I have recommended, kindly search the code base if you want to see examples.

@@ -79,15 +108,11 @@ def run(
):
logger.info(
f"Skipping validation - model: "
f"{validation_spec.id} - {model_spec.id}"
", as found prior result in results."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing the comma?

)
continue

logger.info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing this?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Left some comments above.

@Spinachboul
Copy link
Author

Sure!
I'll fix them soon
Thanks for the review

@Spinachboul
Copy link
Author

@fkiraly
Made the requested changes!
Please review

@@ -58,16 +65,37 @@ def run(
results_df = pd.DataFrame(columns=["validation_id", "model_id", "runtime_secs"])
results_df["runtime_secs"] = results_df["runtime_secs"].astype(int)

tqdm_available, _ = _check_soft_dependencies("tqdm", severity="warning")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the warning should be raised only if verbose, with a clear error message. Otherwise, I would just leave it silent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to warn, you should use warn from sktime.utils.warnings which receives a reference from self, so users can turn it off via set_config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly

     try:
        results_df = pd.read_csv(results_path)
    except FileNotFoundError:
        results_df = pd.DataFrame(columns=["validation_id", "model_id", "runtime_secs"])
        results_df["runtime_secs"] = results_df["runtime_secs"].astype(int)

    tqdm_available, _ = _check_soft_dependencies("tqdm", severity="none")

    if verbose and not tqdm_available:
        warn("tqdm is not installed. Continuing without progress bars.", logger)

    if tqdm_available:
        from tqdm import tqdm

Something like this ⬆ ?

This has not been committed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants