-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
formula_images memory manager issue #55
Comments
The extra 0.5GiB used between Python often holds on to memory, instead of releasing it back to the OS, so that it can quickly allocate memory again without having to call the OS to give it more memory. I think that's the most likely explanation for why those logs show memory increasing to 1.5GiB and then staying there. |
@LachlanStuart these spare 0.5GiB could be invested in formula images and reduce COS requests number. do you have a suggestion to use it better? how to force python to release this memory? |
@omerb01 Python releasing the memory won't help. After the formula images are cleared and the temporary buffer for pickling is gone, Python just holds the empty memory so that it doesn't need to waste time returning it to the OS and then re-requesting it from the OS later. The issue is that during saving (inside The |
@LachlanStuart I see... at the begging of the project, we needed to use COS with file-like-objects too, and I found an open-source GitHub repository which may help, it's called |
@omerb01 I haven't seen that before, but it looks like it would work. It seems to work by making a writable object that starts a multi-part upload and pushes a new part every time enough data has been written to it. |
@LachlanStuart seems that it didn't solve the issue, I used from smart_open import open
ibm_cos_session = ibm_boto3.session.Session(aws_access_key_id='****',
aws_secret_access_key='****')
transport_params = {
'session': ibm_cos_session,
'resource_kwargs': {'endpoint_url': 'https://s3.****.cloud-object-storage.appdomain.cloud'}
}
with open(f's3://{bucket}/{key}', 'wb', transport_params=transport_params) as data_stream:
pickle.dump(self.formula_images, data_stream) example of an activation log: (memory1 was measured before saving, memory2 was measured after clearing
|
@omerb01 I dug a bit deeper and found out a few frustrating things about Python's pickler:
Here's my test code in case you want to try it: import pickle, resource, numpy as np
from scipy.sparse import coo_matrix
class BlackHole:
def __init__(self):
self.cnt = 0
self.biggest = 0
self.total = 0
def __del__(self):
print(f'writes: {self.cnt} biggest: {self.biggest} total: {self.total}')
def write(self, bytes):
self.cnt += 1
self.biggest = max(self.biggest, len(bytes))
self.total += len(bytes)
# big_dict = dict((i, coo_matrix(np.arange(10000))) for i in range(10000)) # coo_matrixes
big_dict = dict((i, np.arange(10000)) for i in range(10000)) # numpy arrays
# big_dict = dict((i, list(range(10000))) for i in range(10000)) # pure Python objects
print(f'Max memory usage before: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss} kiB')
# Uncomment one of the below implementations
# normal pickle (C implementation)
# pickle.dump(big_dict, BlackHole())
# Python implementation
# pickle._dump(big_dict, BlackHole())
# C implementation ("fast mode")
# p = pickle.Pickler(BlackHole())
# p.fast = True
# p.dump(big_dict)
# del p # needed to trigger BlackHole.__del__
# Python implementation ("fast mode")
# p = pickle._Pickler(BlackHole())
# p.fast = True
# p.dump(big_dict)
# del p # needed to trigger BlackHole.__del__
print(f'Max memory usage after: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss} kiB') Note that the "Max memory usage" metric can't be reset, so Python should be restarted every time after this test. Currently the code needs memory for 3x the size of To fix the "never flushing the output buffer" problem, there are these options:
To fix the "numpy copies" problem, the options are:
For these steps forward, I think we also need to decide which solutions should be implemented in PyWren's |
The "numpy copies" problem has actually already been reported to Python, but it unfortunately seems to be stuck in PR: python/cpython#13036 |
@LachlanStuart based on your script above, I created a script that prints the actual memory peak when doing an operation, for example, pickling objects: https://gist.github.com/JosepSampe/25d2f1bdf8250ec56f4e739d8c2b4e6e Based on the results, seems that using fast mode in Python3.7 & Python3.8 (either C and Python) does not have extra memory consumption: Python3.6
Python3.7
Python3.8
@LachlanStuart This is not the same you stated in the previous comment, so can you confirm this? In my case |
@JosepSampe I don't have the Python3.7 environment that I used for that experiment anymore. It's possible I wasn't on the latest sub-version (3.7.6) because I just grabbed an existing environment that was already set up. I've re-tested on 3.7.6 and I got the same results as you. |
@omerb01 what is the status of it? |
@gilv solving this issue requires to move into Python 3.7 |
https://github.com/metaspace2020/pywren-annotation-pipeline/blob/095b4dcce9141b0f530e94fd163fe3bf1447ea52/annotation_pipeline/image.py#L39
I haven't succeeded to figure out why yet, but it seems that something wrong with the memory manager when it clears
formula_images
dict. I know that python's garbage collector works from time to time and frees unreachable pointers, so I even tried to execute it explicitly by:and it still shows the same output.
Example activation log of
annotate()
("memory1" is action's memory before data is cleared and "memory2" is action's memory after data is cleared - used bypywren_ibm_cloud.utils.get_current_memory_usage()
):in the example, all "memory2" records should be around 1GB instead of 1.5GB
The text was updated successfully, but these errors were encountered: