### Add progress bar estimates

As it's generally unclear, what exact time parallel processing can take for huge matrices (i.e. $N \times N$ where $N=10^6$), <br>we can add [TQDM-like](https://github.com/tqdm/tqdm) progress bar to see estimates for elapsed and remaining times.

However, to adapt it to Numba blocks, [numba-progress](https://github.com/mortacious/numba-progress) library is utilized.

In [1]:
import numpy as np
from chunkdot.cosine_similarity_top_k import cosine_similarity_top_k


n_items = 15000
embedding_dim = 256
float_type = 'float64'
max_memory = 5*2**30  # X * 2**30 = X Gb RAM
top_k = 16

np.random.seed(42)

# create random embeddings
embeddings = np.random.randn(int(n_items), int(embedding_dim)).astype(float_type)

In [2]:
# default behavior, no progress bar (False)
similarities = cosine_similarity_top_k(
    embeddings=embeddings,
    top_k=top_k,
    normalize=True,
    max_memory=max_memory,
    force_memory=False,
    show_progress=False,
)

In [3]:
# tqdm-like progress bar to estimate elapsed/remaining time (True)
similarities = cosine_similarity_top_k(
    embeddings=embeddings,
    top_k=top_k,
    normalize=True,
    max_memory=max_memory,
    force_memory=False,
    show_progress=True,
)

Perform chunked matrix multiplication: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 43.0/43 [00:03<00:00, 12.73it/s]
