# Add Progress Bars to Jupyter and Python Apps Using `tqdm`

The `tqdm` library provides a simple and intuitive way of adding progress bars, not only to your CLIs or Python console outputs, but also to Jupyter notebooks.

The library and documentation can be found [here](https://github.com/tqdm/tqdm)

To install the library: `pip install tqdm`

In [1]:
from tqdm import tqdm

If you plan to use this in a Jupyter notebook (as we are here), then although most of the examples I show you here will work fine, you really should be using the Jupyter specific `tqdm` object. 

To do so we should import the various objects we'll be using in this notebook from `tqdm.notebook` instead, for example:

In [2]:
from tqdm.notebook import tqdm

In addition, when using the Juptytrer version of these components, you will likely need to also install the `ipywidgets` library in your virtual environment:

```bash
pip install ipywidgets
```

So we have two "flavors" of these progress bar  components.

I'll refer to the components imported from `tqdm` as "console" variants, and those imported from `tqdm.notebook` as the Jupyter variants.

As we'll see in this video, you can use either variant in a Jupyter notebook, except in a few cases (one of which I'll point out later in the video).

The objects we use, whether from `tqdm` or `tqdm.notebook`, are instantiated and configured the same way - the Jupyter variant simply changes how the progress bar is rendered inside Jupyter.

## Out of the Box Functionality

Let's first look at some simple ways to use the default functionality.

The simplest way to use `tqdm` is to wrap it around an iterable - `tqdm` will return the iterable so you can perform your iteration, but now has hooks to understand the progress.Z

In [3]:
from tqdm.notebook import tqdm

from time import sleep

for x in tqdm("Python rocks!"):
    sleep(0.25)

  0%|          | 0/13 [00:00<?, ?it/s]

As you can see `tqdm` provides a lot of functionality out of the box. It knows how many elements are in the iterable, the elapsed time, the average iterations per second, and even tries to estimate the remaining time.

One thing to note here, is that `tqdm` basically queried our iterable to find out how many elements are in it. 

So what about iterators, where doing so would consume the iterator?

Let's try it using the console version of `tqdm` first:

In [4]:
from tqdm import tqdm

g = (i**2 for i in range(10))

for x in tqdm(g):
    sleep(0.25)

10it [00:02,  3.90it/s]


As you can see, given an iterator the behavior is slightly different: `tqdm` only displays the number of iterations, the elapsed time, and the average number of iterations per second.

We would expect the Jupyter version to behave similarly, but it does not (I assume this is an issue in the library, not by design). I'll show you how you can customize the output later in this video, so this is not a big deal.

In [5]:
from tqdm.notebook import tqdm

In [6]:
g = (i**2 for i in range(10))

for x in tqdm(g):
    sleep(0.25)

0it [00:00, ?it/s]

If you are just looping over a range, `tqdm` has a slightly more optimised way of dealing with this:

In [7]:
from tqdm import trange

for x in trange(3, 100, 3):
    sleep(0.25)

100%|███████████████████████████████████████████| 33/33 [00:08<00:00,  3.92it/s]


In [8]:
from tqdm.notebook import trange

for x in trange(3, 100, 3):
    sleep(0.25)

  0%|          | 0/33 [00:00<?, ?it/s]

And lastly we can also just manually update the progress bar - to do so we use a context manager to "open" a new progress bar (technically you could bypass the context manager, but this is just like opening files - if you bypass the context manager you need to make sure you close the progress bar yourself - so I don't recommend that approach unless something in your code structure requires it).

In [9]:
from tqdm import tqdm

with tqdm() as pbar:
    for x in "Python rocks!":
        sleep(0.25)
        pbar.update(1)
    

13it [00:03,  3.91it/s]


As we can see, this behaves very similar to the way it behaves for iterators. This is because `tqdm` does not have any knowledge of the "total size" of the progress bar. We can easily specify this value (if we know it):

In [10]:
from tqdm import tqdm

iterable = "Python rocks!"

with tqdm(total=len(iterable)) as pbar:
    for x in iterable:
        sleep(0.25)
        pbar.update(1)

100%|███████████████████████████████████████████| 13/13 [00:03<00:00,  3.90it/s]


In [11]:
from tqdm.notebook import tqdm

iterable = "Python rocks!"

with tqdm(total=len(iterable)) as pbar:
    for x in iterable:
        sleep(0.25)
        pbar.update(1)

  0%|          | 0/13 [00:00<?, ?it/s]

When using this manual approach, we can also set the description in the progress bar ourselves:

In [12]:
from tqdm.notebook import tqdm

iterable = "Python rocks!"

with tqdm(total=len(iterable)) as pbar:
    for x in iterable:
        sleep(0.25)
        pbar.set_description(f"processing {x}")
        pbar.update(1)

  0%|          | 0/13 [00:00<?, ?it/s]

## Customizing Things

The behavior and display of progress bar can be customized in many different ways (see docs for full range of options).

You can customize things like the width of the bar, change the scale of the iteration count, how the progress bar displays the various sub-parts of it (like the stats, timings, etc), and a slew of other settings.  See [here](https://github.com/tqdm/tqdm?tab=readme-ov-file#parameters).

Let's first look at the scaling. I'll use the "console version", but you can of course use the "notebook version".

In [13]:
from tqdm import tqdm

In [14]:
for i in tqdm(range(50_000_000)):
    pass   

100%|█████████████████████████| 50000000/50000000 [00:04<00:00, 11654180.78it/s]


Now lets ask `tqdm` to scale the iteration count:

In [15]:
for i in tqdm(range(50_000_000), unit_scale=True):
    pass  

100%|█████████████████████████████████████| 50.0M/50.0M [00:04<00:00, 11.9Mit/s]


While we're at it, let's change that `it` label used as the unit:

In [16]:
for i in tqdm(range(50_000_000), unit_scale=True, unit="rec"):
    pass  

100%|████████████████████████████████████| 50.0M/50.0M [00:04<00:00, 11.9Mrec/s]


And change the bar color:

In [17]:
for i in tqdm(range(50_000_000), unit_scale=True, unit="rec", colour="blue"):
    pass  

100%|[34m████████████████████████████████████[0m| 50.0M/50.0M [00:04<00:00, 11.9Mrec/s][0m


You can also specify colors using hex: 

In [18]:
for i in tqdm(range(50_000_000), unit_scale=True, unit="rec", colour="#FF0000"):
    pass  

100%|[38;2;255;0;0m████████████████████████████████████[0m| 50.0M/50.0M [00:04<00:00, 11.8Mrec/s][0m


Next let's look at how we can format the bar display itself, using several built-in "fields" `tdqm` provides us.

- `l_bar`
- `r_bar`
- `n`
- `n_fmt`
- `total`
- `total_fmt`
- `percentage`
- `elapsed`
- `elapsed_s`
- `ncols`
- `nrows`
- `desc`
- `unit`
- `rate`
- `rate_fmt`
- `rate_noinv`
- `rate_noinv_fmt`
- `rate_inv`
- `rate_inv_fmt`
- `postfix`
- `unit_divisor`
- `remaining`
- `remaining_s`
- `eta`

The default display for `tdqm` is:
```python
{l_bar}{bar}{r_bar}
```
where 
```python
l_bar --> {desc}: {percentage:3.0f}%|
```
and
```python
r_bar --> | {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}{postfix}]
```

We specify the exact progress bar definition using the `bar_format` argument, and we can also specify a custom prefix:

In [19]:
for x in tqdm(
    range(10_000_000),
    desc="Importing data",
    colour="blue",
    bar_format="{l_bar}{bar}"
):
    pass

Importing data: 100%|[34m███████████████████████████████████████████████████████████[0m


Let's get a bit more fancy:

In [20]:
for x in tqdm(
    range(50_000_000),
    desc="Importing data",
    colour="blue",
    unit_scale=True,
    unit="recs",
    bar_format="{desc}| processing {n_fmt} of {total_fmt} {unit} [{percentage:.1f}%]|{bar}|{elapsed}s"
):
    pass

Importing data| processing 50.0M of 50.0M recs [100.0%]|[34m█████████████████[0m|00:04s[0m


And just to make sure, let's use the Jupyter specific output:

In [21]:
from tqdm.notebook import tqdm

for x in tqdm(
    range(50_000_000),
    desc="Importing data",
    colour="blue",
    unit_scale=True,
    unit="recs",
    bar_format="{desc}| processing {n_fmt} of {total_fmt} {unit} [{percentage:.1f}%]|{bar}|{elapsed}s"
):
    pass

Importing data| processing 0.00 of 50.0M recs [0.0%]|          |00:00s

You'll probably have noticed that the progress bar keeps shifting around a bit, unlike the "console version".

This is because the text display font I chose for my Jupyter notebooks uses a non-monospaced font, unlike my console which uses a monospaced font.

Another thing to note is that may want to define a custom "base" `tqdm` object with your display settings, without having to redefine it everywhere else.

A simple way to do this would be to use `partial`. Something like this:

In [22]:
from functools import partial

In [23]:
import_pbar = partial(
    tqdm, 
    desc="Importing data",
    colour="blue",
    unit_scale=True,
    unit="recs",
    bar_format="{desc}| processing {n_fmt} of {total_fmt} {unit} [{percentage:.1f}%]|{bar}|{elapsed}s"
)

And now we can use it this way:

In [24]:
for x in import_pbar(range(50_000_000)):
    pass

Importing data| processing 0.00 of 50.0M recs [0.0%]|          |00:00s

## Nested Progress Bars

All the examples I just showed you appeared to work fine in a REPL (such as this Jupyter notebook). However, not everything that would work fine in a standard console output (what we did above), will work quite as expected in Jupyter.

For example, we can do nested progress bars this way:

In [25]:
from tqdm import trange

for i in trange(2, desc="i"):
    for j in trange(2, desc=f"j ({i=})", leave=False):
        for k in trange(2, desc=f"k ({i=} {j=})", leave=False):
            sleep(0.1)

i:   0%|                                                  | 0/2 [00:00<?, ?it/s]
j (i=0):   0%|                                            | 0/2 [00:00<?, ?it/s][A

k (i=0 j=0):   0%|                                        | 0/2 [00:00<?, ?it/s][A[A

k (i=0 j=0):  50%|████████████████                | 1/2 [00:00<00:00,  9.52it/s][A[A

k (i=0 j=0): 100%|████████████████████████████████| 2/2 [00:00<00:00,  9.46it/s][A[A

                                                                                [A[A
j (i=0):  50%|██████████████████                  | 1/2 [00:00<00:00,  4.44it/s][A

k (i=0 j=1):   0%|                                        | 0/2 [00:00<?, ?it/s][A[A

k (i=0 j=1):  50%|████████████████                | 1/2 [00:00<00:00,  9.67it/s][A[A

k (i=0 j=1): 100%|████████████████████████████████| 2/2 [00:00<00:00,  9.38it/s][A[A

                                                                                [A[A
j (i=0): 100%|█████████████████████████████████

As you can see, the output just kept adding to the preceding output, and that's obviously not what we want here (it will work just fine in a console output).

So, we definitely have to use the Jupyter specific object here:

In [26]:
from tqdm.notebook import tqdm

for i in tqdm(range(5), desc="i"):
    for j in tqdm(range(5), desc=f"j ({i=})", leave=False):
        for k in tqdm(range(5), desc=f"k ({i=} {j=})", leave=False):
            sleep(0.1)

i:   0%|          | 0/5 [00:00<?, ?it/s]

j (i=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=4):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

or, using the notebook variant of `trange`, we have the same exact code as before:

In [27]:
from tqdm.notebook import trange

for i in trange(5, desc="i"):
    for j in trange(5, desc=f"j ({i=})", leave=False):
        for k in trange(5, desc=f"k ({i=} {j=})", leave=False):
            sleep(0.1)

i:   0%|          | 0/5 [00:00<?, ?it/s]

j (i=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=4):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

I should also explain the `leave` argument. That argument defaults to `True`, and means that `tqdm` will leave the completed bar on the screen. By setting it to `False` it will remove it - and that's precisely what we want for the nested bars - remember that a bar is created for each of the inner loop iterations - once the loop has finished we want to remove the completed bar so it will be replaced by the new one from the next iteration. 

Let's see what happens if we don't remove the bars:

In [28]:
for i in trange(5, desc="i"):
    for j in trange(5, desc=f"j ({i=})", leave=True):
        for k in trange(5, desc=f"k ({i=} {j=})", leave=True):
            sleep(0.01)

i:   0%|          | 0/5 [00:00<?, ?it/s]

j (i=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=0 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=1 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=2 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=3 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

j (i=4):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=0):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=1):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=2):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=3):   0%|          | 0/5 [00:00<?, ?it/s]

k (i=4 j=4):   0%|          | 0/5 [00:00<?, ?it/s]

## Conclusion

This is a very neat and easy to use library, giving a lot of functionality out of the box, along with a slew of customization options.

It has much more functionality than I presented here - it will work fine with multi-threading, is async aware, and of course CLI style apps.

It even supports more advanced features such as callbacks, hooks, Pandas integration, Keras and Dask integration and much much more.