# `heptabot` inference time and RAM usage

In this notebook we will measure the performance of current `heptabot` version. This was tested on a [vast.ai](https://vast.ai/console/create/) instance created using `tensorflow/tensorflow:2.3.0-gpu-jupyter` image and our [Install](https://github.com/lcl-hse/heptabot/blob/gpu-tpu/notebooks/Install.ipynb) procedure. As `heptabot` is currently optimized for a NVidia GeForce GTX 1080 Ti-class graphics card with 16 GB total system RAM, the results will be shown for the same system.

First, we check Python version and enter our working directory. Keep in mind that the code is executed within `heptabot` virtual environment.

In [1]:
!python --version

Python 3.6.9


In [2]:
%cd ../

/root/heptabot


Let's get the current load on the GPU:

In [3]:
!nvidia-smi

Thu Mar 18 00:50:01 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce GTX 108...  On   | 00000000:03:00.0 Off |                  N/A |
| 20%   34C    P8     7W / 220W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

Next, the total amount of used RAM:

In [4]:
!free -h

              total        used        free      shared  buff/cache   available
Mem:            15G        3.9G        7.4G        198M        4.2G         11G
Swap:          7.6G        1.8G        5.8G


And the current CPU tasks:

In [5]:
!top -H -n 1

[?1h=[H[2J[mtop - 00:50:43 up 57 days, 17:45,  2 users,  load average: 6.62, 6.63, 6.69[m[m[m[m[K
Threads:[m[m[1m  38 [m[mtotal,[m[m[1m   1 [m[mrunning,[m[m[1m  37 [m[msleeping,[m[m[1m   0 [m[mstopped,[m[m[1m   0 [m[mzombie[m[m[m[m[K
%Cpu(s):[m[m[1m 42.5 [m[mus,[m[m[1m 32.0 [m[msy,[m[m[1m  1.0 [m[mni,[m[m[1m 24.0 [m[mid,[m[m[1m  0.3 [m[mwa,[m[m[1m  0.0 [m[mhi,[m[m[1m  0.1 [m[msi,[m[m[1m  0.0 [m[mst[m[m[m[m[K
KiB Mem :[m[m[1m 16352884 [m[mtotal,[m[m[1m  8173968 [m[mfree,[m[m[1m  3737368 [m[mused,[m[m[1m  4441548 [m[mbuff/cache[m[m[m[m[K
KiB Swap:[m[m[1m  8000508 [m[mtotal,[m[m[1m  6121240 [m[mfree,[m[m[1m  1879268 [m[mused.[m[m[1m 12089668 [m[mavail Mem [m[m[m[m[K
[K
[7m  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND      [m[m[K
[m    1 root      20   0   20124   3200   3200 S  0.0  0.0   0:00.16 bash         [m[m[K
[

There is a way to place the models into CPU RAM: to do this, execute the code in the following cell. As we currently want to test the model on GPU, let's comment out this code.

In [None]:
# import os

# os.environ["MODEL_PLACE"] = "cpu"
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

Now let's initialize the system. This will also download the missing `sentence_transformers` model in case it had not been done earlier.

In [6]:
%%writefile prompt_run.sh
source ~/mambaforge/etc/profile.d/conda.sh
conda activate heptabot
pyro4-ns &
sleep 5; python models.py &
sleep 70

Writing prompt_run.sh


In [7]:
import os
import subprocess

!chmod +x prompt_run.sh
subprocess.Popen(["/bin/bash", os.path.join(os.path.abspath(os.getcwd()), "prompt_run.sh")])

<subprocess.Popen at 0x7f7a20e633c8>

Note that the models are placed on GPU now:

In [8]:
!nvidia-smi

Thu Mar 18 01:50:37 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce GTX 108...  On   | 00000000:03:00.0 Off |                  N/A |
| 20%   34C    P8     8W / 220W |  10657MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

The model is up and running thanks to `Pyro4`, but we still have to connect to it as such:

In [9]:
import os
import pickle
import Pyro4
import Pyro4.util

Heptamodel = Pyro4.Proxy("PYRONAME:heptabot.heptamodel")
batchify, process_batch, result_to_div = Heptamodel.batchify, Heptamodel.process_batch, Heptamodel.result_to_div

Now let's unpack our example texts (each having 300 words, as measured by `nltk.word_tokenize`) and perform correction. You will find the correction results in `output` directory.

In [10]:
!mkdir inputs
!mkdir output
!mv ./assets/example_texts.zip .
!unzip -q example_texts.zip -d inputs

texts = {}

for f in os.listdir("inputs"):
  with open(os.path.join("inputs", f), "r", encoding="utf-8") as infile:
    texts[f[:-4]] = infile.read()

task_type = "correction"

Now we actually perform the correction and benchmark the performance. The resulting time spent to process one document will be determined as the average over 5 text containing the same amount of words.

In [11]:
%%time

prepared_data = {}
for textid in texts:
    batches, delims = batchify(texts[textid], task_type)    
    prepared_data[textid] = (batches, delims)

with open("./templates/result.html", "r") as inres:
    outhtml = inres.read()
outhtml = outhtml.replace("{{ which_font }}", "{0}").replace("{{ response }}", "{1}").replace("{{ task_type }}", "{2}")

processed_texts = {}
which_font = "" if task_type == "correction" else "font-family: Ubuntu Mono; letter-spacing: -0.5px;"
task_str = "text" if task_type == "correction" else "sentences"

for textid in prepared_data:
    batches, delims = prepared_data[textid]
    processed = []

    if task_type != "correction":
        print("Processing text with ID", textid)
        for batch in tqdm(batches):
            processed.append(process_batch(batch))
    else:
        for batch in batches:
            processed.append(process_batch(batch))
    plist = [item for subl in processed for item in subl] 
    response = result_to_div(texts[textid], plist, delims, task_type)
    
    proc_html = outhtml.format(which_font, response, task_str)
    with open(os.path.join("output", textid+".html"), "w", encoding="utf-8") as outfile:
        outfile.write(proc_html)

CPU times: user 42.7 ms, sys: 5.4 ms, total: 48.1 ms
Wall time: 1min 44s


Let's check if something has changed on the GPU:

In [12]:
!nvidia-smi

Thu Mar 18 01:55:56 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 455.23.05    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce GTX 108...  On   | 00000000:03:00.0 Off |                  N/A |
| 40%   70C    P2    79W / 220W |  11001MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

Finally, let's check our RAM and running processes again:

In [13]:
!free -h

              total        used        free      shared  buff/cache   available
Mem:            15G        9.5G        3.4G        234M        2.7G        5.5G
Swap:          7.6G        2.0G        5.7G


In [14]:
!top -H -n 1

[?1h=[H[2J[mtop - 01:55:57 up 57 days, 18:50,  2 users,  load average: 8.41, 7.08, 6.86[m[m[m[m[K
Threads:[m[m[1m 114 [m[mtotal,[m[m[1m   1 [m[mrunning,[m[m[1m 113 [m[msleeping,[m[m[1m   0 [m[mstopped,[m[m[1m   0 [m[mzombie[m[m[m[m[K
%Cpu(s):[m[m[1m 42.5 [m[mus,[m[m[1m 32.0 [m[msy,[m[m[1m  1.0 [m[mni,[m[m[1m 24.0 [m[mid,[m[m[1m  0.3 [m[mwa,[m[m[1m  0.0 [m[mhi,[m[m[1m  0.1 [m[msi,[m[m[1m  0.0 [m[mst[m[m[m[m[K
KiB Mem :[m[m[1m 16352884 [m[mtotal,[m[m[1m  3537236 [m[mfree,[m[m[1m 10002692 [m[mused,[m[m[1m  2812956 [m[mbuff/cache[m[m[m[m[K
KiB Swap:[m[m[1m  8000508 [m[mtotal,[m[m[1m  5943832 [m[mfree,[m[m[1m  2056676 [m[mused.[m[m[1m  5787368 [m[mavail Mem [m[m[m[m[K
[K
[7m  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND      [m[m[K
[m[1m 1785 root      20   0   34516   3592   3168 R  6.2  0.0   0:00.01 top          [m[m[

And that's it – our benchmark ends here!