-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] benchmarks: add tqdm progress bar #6623
base: main
Are you sure you want to change the base?
Conversation
@@ -6,6 +6,7 @@ | |||
from typing import List, Optional, Union | |||
|
|||
import pandas as pd | |||
from tqdm import tqdm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tqdm
part of our core dependencies, or implied by those? If not, this is going to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid tqdm is an independent package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not doubting that it's a package of it's own, but it could be implied by other dependencies of sktime
. For example, statsforecast
(part of forecasting
subset) will imply tqdm
and so on.
As far as I know none of the core dependencies, e.g. pandas
, numpy
, scikit-learn
etc. will imply tqdm
, so this line will fail CI. What you should do is move this import to inside your functions/methods and use tqdm
only if it present. There is no need to make progress bars mandatory, especially if someone uses it in non-interactive setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR makes more substantial changes than just adding a progress bar? For instance, it changes the logger files or the structure of the code.
Change requests:
- leave the logic unchanged in this PR. If you want to change or improve, you are much welcome, but it should be a separate PR.
- I would recommend a refactor: factor out the loop into a single private function, that is, a single execution of
validation_spec
andmodel_spec
. This will play well with parallelization efforts later. - we need to make sure that use of
tqdm
is optional, and the code still runs iftqdm
is not installed.
@fkiraly |
Yes, I would also make sure that everything runs without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think there should be verbose mode and non-verbose mode, I would switch that with a verbose
arg that can be True or False (default). If False, tqdm
is not imported, and no prints are done.
I think this is quite easy to achieve, since you can do an easy switch on the generator used in the two for
loops, e.g., if verbose: model_registry_generator = tqdm(...)
etc
There should also be some consistency with the logger
use.
@fkiraly & @benHeid |
This works but imo it is unnecessarily obscure. How about if verbose and tqdm_available:
validation_gen = tqdm(validation_registry.all())
else:
validation_gen = validation_registry.all() |
I'm unable to see any conditions in the changes tab on this Github PR currently regarding verbose mode. I'm also unable to see any check for presence of tqdm soft dependency being installed or not. Is it just me? @fkiraly are you able to see? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. I have two requests that would improve the user experience.
for validation_spec in validation_registry.all(): | ||
print("Printing Validations...") | ||
print("\n") | ||
for validation_spec in tqdm(validation_registry.all()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to avoid the print statements, you can add a description of the loop into the tqdm call.
for model_spec in model_registry.all(): | ||
print("Running models...") | ||
print("\n") | ||
for model_spec in tqdm(model_registry.all()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to set leave=False
to the second loop. This would remove the inner progress bar after the loop is finished. This would avoid a wall of loops and the user has not to scroll up to see the outer loop.
I think the changes are only in the screenshot, which does not correspond ot the state of the fork. @Spinachboul, can you please make sure you update the PR to the most recent state? |
@fkiraly | @benHeid | @yarnabrina | @achieveordie
|
@@ -1,4 +1,3 @@ | |||
"""Interface for running a registry of models on a registry of validations.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you remove this line?
@@ -58,16 +61,42 @@ def run( | |||
results_df = pd.DataFrame(columns=["validation_id", "model_id", "runtime_secs"]) | |||
results_df["runtime_secs"] = results_df["runtime_secs"].astype(int) | |||
|
|||
tqdm_available = False | |||
if verbose: | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try/except is not good style, if you know the condition you want to check.
In that case, simply check the condition, you can do this with _check_soft_dependencies
as I have recommended, kindly search the code base if you want to see examples.
@@ -79,15 +108,11 @@ def run( | |||
): | |||
logger.info( | |||
f"Skipping validation - model: " | |||
f"{validation_spec.id} - {model_spec.id}" | |||
", as found prior result in results." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you removing the comma?
) | ||
continue | ||
|
||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Left some comments above.
Sure! |
@fkiraly |
@@ -58,16 +65,37 @@ def run( | |||
results_df = pd.DataFrame(columns=["validation_id", "model_id", "runtime_secs"]) | |||
results_df["runtime_secs"] = results_df["runtime_secs"].astype(int) | |||
|
|||
tqdm_available, _ = _check_soft_dependencies("tqdm", severity="warning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the warning should be raised only if verbose
, with a clear error message. Otherwise, I would just leave it silent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to warn, you should use warn
from sktime.utils.warnings
which receives a reference from self
, so users can turn it off via set_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try:
results_df = pd.read_csv(results_path)
except FileNotFoundError:
results_df = pd.DataFrame(columns=["validation_id", "model_id", "runtime_secs"])
results_df["runtime_secs"] = results_df["runtime_secs"].astype(int)
tqdm_available, _ = _check_soft_dependencies("tqdm", severity="none")
if verbose and not tqdm_available:
warn("tqdm is not installed. Continuing without progress bars.", logger)
if tqdm_available:
from tqdm import tqdm
Something like this ⬆ ?
This has not been committed!
Reference Issues/PRs
Fixes Issue #6560
What does this implement/fix? Explain your changes.
Adds progress bar using tqdm library
What should a reviewer concentrate their feedback on?
Focus on testing the progress while calling the API
Did you add any tests for the change?
Yes, tested the progress bar functionality using sample model registry and model validation