Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mmap.PAGESIZE constant as value for first read #505

Merged

Conversation

xavfernandez
Copy link
Contributor

This size can have a real impact on performance when a lot of files need
to be read.
The performance relies heavily on the file system cache and having the
possibility to adapt _INITIAL_READ_SIZE can help on memory constrained
systems.

@xavfernandez
Copy link
Contributor Author

With 60000 small files, switching _INITIAL_READ_SIZE from 65535 to 4096 brings

$ python read_profiler.py                                                                                                                                                          
Wed Jan 29 22:06:19 2020    generate_latest.proc                                                                                                                                                                                              
                                                                                                                                                                                                                                              
         5223058 function calls (5223048 primitive calls) in 14.600 seconds                  
                                                                                                                       
   Ordered by: cumulative time                                                                                                                                                                                                                
   List reduced from 109 to 9 due to restriction <9>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   14.600   14.600 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/exposition.py:71(generate_latest)
        3    0.001    0.000   14.600    4.867 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/registry.py:69(collect)
        1    0.000    0.000   14.599   14.599 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:149(collect)
        1    0.000    0.000   14.315   14.315 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:35(merge)
        1    0.523    0.523   14.197   14.197 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:46(_read_metrics)
    60000    0.326    0.000   12.231    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:81(read_all_values_from_file)
    60000   10.948    0.000   10.948    0.000 {method 'read' of '_io.BufferedReader' objects}
    60000    0.912    0.000    0.912    0.000 {built-in method io.open}
   403750    0.387    0.000    0.555    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:25(_read_all_values)

$ python read_profiler.py 
Wed Jan 29 22:07:31 2020    generate_latest.proc

         5223058 function calls (5223048 primitive calls) in 14.737 seconds

   Ordered by: cumulative time
   List reduced from 109 to 9 due to restriction <9>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   14.737   14.737 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/exposition.py:71(generate_latest)
        3    0.001    0.000   14.737    4.912 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/registry.py:69(collect)
        1    0.000    0.000   14.736   14.736 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:149(collect)
        1    0.000    0.000   14.568   14.568 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:35(merge)
        1    0.532    0.532   14.464   14.464 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:46(_read_metrics)
    60000    0.324    0.000   12.480    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:81(read_all_values_from_file)
    60000   11.277    0.000   11.277    0.000 {method 'read' of '_io.BufferedReader' objects}
    60000    0.836    0.000    0.836    0.000 {built-in method io.open}
   403750    0.387    0.000    0.560    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:25(_read_all_values)

(with a stable timing on multiple runs)

to

$ python read_profiler.py 
Wed Jan 29 22:13:55 2020    generate_latest.proc

         5223058 function calls (5223048 primitive calls) in 11.745 seconds

   Ordered by: cumulative time
   List reduced from 109 to 9 due to restriction <9>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   11.745   11.745 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/exposition.py:71(generate_latest)
        3    0.001    0.000   11.745    3.915 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/registry.py:69(collect)
        1    0.000    0.000   11.744   11.744 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:149(collect)
        1    0.000    0.000   11.579   11.579 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:35(merge)
        1    0.700    0.700   11.473   11.473 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:46(_read_metrics)
    60000    0.412    0.000    8.875    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:81(read_all_values_from_file)
    60000    7.359    0.000    7.359    0.000 {method 'read' of '_io.BufferedReader' objects}
    60000    1.051    0.000    1.051    0.000 {built-in method io.open}
   403750    0.528    0.000    0.750    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:25(_read_all_values)

$ python read_profiler.py 
Wed Jan 29 22:13:58 2020    generate_latest.proc

         5223058 function calls (5223048 primitive calls) in 2.063 seconds

   Ordered by: cumulative time
   List reduced from 109 to 9 due to restriction <9>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.063    2.063 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/exposition.py:71(generate_latest)
        3    0.001    0.000    2.062    0.687 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/registry.py:69(collect)
        1    0.000    0.000    2.061    2.061 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:149(collect)
        1    0.000    0.000    1.893    1.893 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:35(merge)
        1    0.313    0.313    1.784    1.784 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/multiprocess.py:46(_read_metrics)
    60000    0.128    0.000    0.505    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:81(read_all_values_from_file)
   403750    0.252    0.000    0.367    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/mmap_dict.py:25(_read_all_values)
   343750    0.149    0.000    0.347    0.000 /home/xafer/.virtualenvs/prometheus_profile/lib/python3.8/site-packages/prometheus_client/metrics_core.py:38(add_sample)
    60000    0.260    0.000    0.260    0.000 {built-in method io.open}

With a strong speedup on second run thanks to the OS filesystem cache.

I admit that such numbers depend heavily on:

  • number of files
  • how many bytes are used in each file (typically my metric files have a small number of metrics and use much less thant 65000 bytes)
  • how much memory is available for the file system cache

Therefore I don't think there is an always perfect _INITIAL_READ_SIZE value but easing its configuration would be useful.

@brian-brazil
Copy link
Contributor

It's not good practice to poke around internal constants of code. If there's a better value we should change it.

@xavfernandez
Copy link
Contributor Author

It's not good practice to poke around internal constants of code. If there's a better value we should change it.

I agree it is not ideal.
But I don't think there is a better value for all use cases.

A small _INITIAL_READ_SIZE value will be detrimental to a use case with a medium/large number of big metrics file (since read_all_values_from_file will need to perform 2 reads).
A big _INITIAL_READ_SIZE value will be detrimental to a use case with a very large number of small files (such as my case) where the cache amount is needlessly big.

I thought about adding a file_usage dict to keep track of the used number for each filename.
Something like:

diff --git a/prometheus_client/mmap_dict.py b/prometheus_client/mmap_dict.py
index 274b738..257f31f 100644
--- a/prometheus_client/mmap_dict.py
+++ b/prometheus_client/mmap_dict.py
@@ -79,14 +79,16 @@ class MmapedDict(object):
                     self._positions[key] = pos
 
     @staticmethod
-    def read_all_values_from_file(filename):
+    def read_all_values_from_file(filename, file_usage):
+        initial_read_size = file_usage.get(filename, 65535)
         with open(filename, 'rb') as infp:
             # Read the first block of data, including the first 4 bytes which tell us
             # how much of the file (which is preallocated to _INITIAL_MMAP_SIZE bytes) is occupied.
-            data = infp.read(65535)
+            data = infp.read(initial_read_size)
             used = _unpack_integer(data, 0)[0]
             if used > len(data):  # Then read in the rest, if needed.
                 data += infp.read(used - len(data))
+            file_usage[filename] = used
         return _read_all_values(data, used)
 
     def _init_value(self, key):
diff --git a/prometheus_client/multiprocess.py b/prometheus_client/multiprocess.py
index 4f2611b..4ad90b8 100644
--- a/prometheus_client/multiprocess.py
+++ b/prometheus_client/multiprocess.py
@@ -29,6 +29,9 @@ class MultiProcessCollector(object):
         self._path = path
         if registry:
             registry.register(self)
+        # Book-keeper of bytes used in each file
+        # to only read what is needed
+        self.file_usage = {}
 
     @staticmethod
     def merge(files, accumulate=True):
@@ -58,7 +61,7 @@ class MultiProcessCollector(object):
             parts = os.path.basename(f).split('_')
             typ = parts[0]
             try:
-                file_values = MmapedDict.read_all_values_from_file(f)
+                file_values = MmapedDict.read_all_values_from_file(f, self.file_usage)
             except FileNotFoundError:
                 if typ == 'gauge' and parts[1] in ('liveall', 'livesum'):
                     # Those files can disappear between the glob of collect

Would this make more sense ?

@brian-brazil
Copy link
Contributor

Thinking a bit more, I'm not sure that this should make a difference at all. If the file is sparse then they'll be handled via the zero page, which doesn't use file cache.

Do you have benchmarks?

@xavfernandez
Copy link
Contributor Author

xavfernandez commented Jan 30, 2020

Well my benchmark is kinda primitive.

I create my 60k files by launching 60k times this python program:

import os
os.environ['prometheus_multiproc_dir'] = 'tmp'

import random

import prometheus_client


METRIC = prometheus_client.Summary(
    'some_duration_seconds',
    'time spent somewhere',
    labelnames=('label1', 'second_label'),
    namespace='profile',
    subsystem='lot_of_files',
)

METRIC.labels(random.choice(('success', 'error')), f'test{random.randint(0, 10)}').observe(random.random())
METRIC.labels(random.choice(('success', 'error')), f'test{random.randint(0, 10)}').observe(random.random())
METRIC.labels(random.choice(('success', 'error')), f'test{random.randint(0, 10)}').observe(random.random())

and I time with a simplified version of what the library performs:

# This is read_simple.py
import os
import sys
import time

BLOCK_SIZE = int(sys.argv[1])
print('BLOCK_SIZE', BLOCK_SIZE)

for i in range(10):
    start = time.perf_counter()
    for filename in os.listdir('tmp'):
        if filename.endswith('.db'):
            with open(os.path.join('tmp', filename), 'rb') as infp:
                data = infp.read(BLOCK_SIZE)
    print(time.perf_counter() - start)

Before each run, I wipe the file cache with echo 1 > /proc/sys/vm/drop_caches).


With a bunch of free memory (~ 26 GB) and a read size of 65535 bytes:

$ ls tmp/ | wc -l
60000
$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864     5188908    26183948      668028     1165008    26234928
Swap:             0           0           0
$ python read_simple.py 65535
BLOCK_SIZE 65535
12.427678888998344
0.9561608399963006
0.8500389300024835
0.8534953799971845
0.8507285489904461
0.8569888329948299
0.8777932550001424
0.8816033009934472
0.8621571760013467
0.8545428220095346

With a bunch of free memory (~ 26 GB) and a read size of 4096 bytes:

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864     5148264    26250408      645920     1139192    26298164
Swap:             0           0           0
$ python read_simple.py 4096
BLOCK_SIZE 4096
10.32166458798747
0.4410311039973749
0.39926853599899914
0.4384756700019352
0.4004494149994571
0.40352619100303855
0.3965474760043435
0.3989230669976678
0.39654788600455504
0.4002341330051422

Now with ~ 2 GB of free memory and a read size of 65535 bytes:

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864    29285200     2293668      663724      958996     2171332
Swap:             0           0           0
$ python read_simple.py 65535
BLOCK_SIZE 65535
12.443866290006554
11.633168715998181
12.085821863001911
11.749885090990574
12.8851961879991
12.385939428990241
11.35605888698774
11.659072276990628
11.294882508998853
11.368941648004693

And finally with ~ 2 GB of free memory and a read size of 4096 bytes:

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864    29260832     2405596      646916      871436     2247648
Swap:             0           0           0
$ python read_simple.py 4096
BLOCK_SIZE 4096
12.362105008010985
0.40657658501004335
0.3870056059968192
0.4089814879989717
0.381813013998908
0.3841210550017422
0.38265788700664416
0.38066420500399545
0.3836389770003734
0.3772799519938417

And sure enough, if I constraint the system even more to have less than a 500 MB of free memory, the performance is bad even with 4096:

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864    31400812      335304      621992      801748      155100
Swap:             0           0           0
$ python read_simple.py 4096
BLOCK_SIZE 4096
11.380508398986422
10.212263110995991
12.595423141989158
11.597996773009072
9.844564571001683
11.246935460992972
11.166533361989423
10.640836154998397
10.942926752992207
11.625870076008141

And in the same way, even with 25 GB of available memory, if I push the read size to 650 350 kbytes:

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864     5259756    26470680      632860      807428    26287820
Swap:             0           0           0
$ python read_simple.py 655350
BLOCK_SIZE 655350
22.036876083991956
20.698863850993803
23.11240423300478
21.609464114997536
21.656152483003098
22.610070716997143
22.320307180998498
24.78865629399661
23.689863965002587
20.960628273998736

@brian-brazil
Copy link
Contributor

brian-brazil commented Jan 30, 2020 via email

@xavfernandez
Copy link
Contributor Author

But why 4k and not 1k or 8k ? Depending on the number of files, the number of bytes used and the available memory, surely the optimum value might differ ?

@brian-brazil
Copy link
Contributor

4k is the page size, so Iower than that won't help. The performance numbers for 4k are acceptable, so I don't see a need to consider 8k - unless you've more data.

This size can have a real impact on performance when a lot of files need
to be read.
The performance relies heavily on the file system cache so reading as
little pages as possible helps a lot on memory constrained systems.

Signed-off-by: Xavier Fernandez <xavier.fernandez@polyconseil.fr>
@xavfernandez
Copy link
Contributor Author

4k is the page size, so Iower than that won't help.
Yup, that's why I planned to use (by monkey-patching _INITIAL_READ_SIZE).

The performance numbers for 4k are acceptable, so I don't see a need to consider 8k - unless you've more data.

I tried to generate less but bigger files with:

for i in range(180):
    METRIC.labels(random.choice(('success', 'error')), f'test{i}').observe(random.random())

The resulting files ended up using ~ 58kB.

And I updated the read_simple.py script to perform a second read when needed:

import os
import struct
import sys
import time

BLOCK_SIZE = int(sys.argv[1])
print('BLOCK_SIZE', BLOCK_SIZE)

_unpack_integer = struct.Struct(b'i').unpack_from

for i in range(10):
    start = time.perf_counter()
    for filename in os.listdir('tmp'):
        if filename.endswith('.db'):
            with open(os.path.join('tmp', filename), 'rb') as infp:
                data = infp.read(BLOCK_SIZE)
                used = _unpack_integer(data, 0)[0]
                if used > len(data):  # Then read in the rest, if needed.
                    data += infp.read(used - len(data))
    print(time.perf_counter() - start)

(which means 4096 will require a second read but 65535 won't).

With very little RAM, the performance is slightly better with 65535 but not dramatically so:

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864    31451820      230012      757992      856032        8264
Swap:             0           0           0
$ python read_simple.py 4096
BLOCK_SIZE 4096
0.850281688006362
0.8136642899917206
0.8808399739937158
0.908141173000331
0.9539070020109648
0.9785966670024209
1.115910766995512
0.9385855119908229
0.8873319859994808
0.8499457229918335

$ free
              total        used        free      shared  buff/cache   available
Mem:       32537864    31455976      232368      750908      849520       11492
Swap:             0           0           0
$ python read_simple.py 65535
BLOCK_SIZE 65535
0.633211662992835
0.6896844110015081
0.680830130004324
0.6960915100062266
0.762219433003338
0.6661083700018935
0.6907059020013548
0.7428541709959973
0.6809142359998077
0.6796893150021788

And with lot of RAM, as expected, there is little impact:

$ python read_simple.py 65535
BLOCK_SIZE 65535
1.1218670280068181
0.057647009001811966
0.031712533003883436
0.03322139800002333
0.03348920198914129
0.03354687200044282
0.03345788500155322
0.032953472997178324
0.03325909799605142
0.03286989399930462

$ python read_simple.py 4096
BLOCK_SIZE 4096
1.152369550007279
0.05512778200500179
0.03482597500260454
0.03846739999426063
0.038542508002137765
0.03865903599944431
0.03924053700757213
0.0384809130046051
0.03836918300657999
0.038300098007312045

So 4096 seems like a good compromise 👍

I updated the PR to use mmap.PAGESIZE.

@xavfernandez xavfernandez changed the title Extract initial read size in a constant Use mmap.PAGESIZE constant as value for first read Jan 30, 2020
@brian-brazil brian-brazil merged commit e583016 into prometheus:master Jan 31, 2020
@brian-brazil
Copy link
Contributor

Yip, that seems reasonable. Thanks!

@xavfernandez xavfernandez deleted the xfernandez/extract_constant branch January 31, 2020 09:58
@xavfernandez
Copy link
Contributor Author

Thanks for merging :)
Any idea when the next version will be released ?

@brian-brazil
Copy link
Contributor

I don't think there's anything pressing for release currently, I'll probably wait until the OpenMetrics stuff is a bit more finalised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants