Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Actor] Possible extra memory consumption #37291

Open
zhaolazy opened this issue Jul 11, 2023 · 1 comment
Open

[Actor] Possible extra memory consumption #37291

zhaolazy opened this issue Jul 11, 2023 · 1 comment
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core core-scheduler P1 Issue that should be fixed within a few weeks stability

Comments

@zhaolazy
Copy link

What happened + What you expected to happen

When using Ray actors, I have noticed that Ray occupies some memory space that cannot be released. For example, a numpy array in my script takes approximately 3.8GB, but profiling results indicate that Ray actors always occupy twice the amount of memory. This can easily lead to out-of-memory (OOM) errors in high-concurrency scenarios. Is there a good way to release this memory?

(Actor2 pid=24216) Line #    Mem usage    Increment  Occurrences   Line Contents
(Actor2 pid=24216) =============================================================
(Actor2 pid=24216)     34    125.1 MiB    125.1 MiB           1       @profile
(Actor2 pid=24216)     35                                             def get(self):
(Actor2 pid=24216)     36    125.9 MiB      0.8 MiB           1           arrays = self._driver.gen()
(Actor2 pid=24216)     37                                                 # array = copy.deepcopy(array)
(Actor2 pid=24216)     38    125.9 MiB      0.0 MiB           8           arrays = [array for array in arrays]
(Actor2 pid=24216)     39   7754.7 MiB   7628.8 MiB           1           ret = np.vstack(arrays)
(Actor2 pid=24216)     40   7754.7 MiB      0.0 MiB           1           del arrays
(Actor2 pid=24216)     41   7754.7 MiB      0.0 MiB           1           print(ret.nbytes/1024/1024)
(Actor2 pid=24216)     42   7754.7 MiB      0.0 MiB           1           return ret

(Actor2 pid=29293) Line #    Mem usage    Increment  Occurrences   Line Contents
(Actor2 pid=29293) =============================================================
  ##### HOW TO RELEASE THIS 3.9GB? ####
(Actor2 pid=29293)     34   3941.8 MiB   3941.8 MiB           1       @profile   
(Actor2 pid=29293)     35                                             def get(self):
(Actor2 pid=29293)     36   3941.8 MiB      0.0 MiB           1           arrays = self._driver.gen()
(Actor2 pid=29293)     37                                                 # array = copy.deepcopy(array)
(Actor2 pid=29293)     38   3941.8 MiB      0.0 MiB           8           arrays = [array for array in arrays]
(Actor2 pid=29293)     39   7756.5 MiB   3814.7 MiB           1           ret = np.vstack(arrays)
(Actor2 pid=29293)     40   7756.5 MiB      0.0 MiB           1           del arrays
(Actor2 pid=29293)     41   7756.5 MiB      0.0 MiB           1           print(ret.nbytes/1024/1024)
(Actor2 pid=29293)     42   7756.5 MiB      0.0 MiB           1           return ret

Versions / Dependencies

Ray 2.4
Python 3.8
Linux 20.04

Reproduction script

import ray
import numpy as np
import time
from memory_profiler import profile

class Driver:
    def __init__(self) -> None:
        self._actors = [
            Actor1.remote()
            for i in range(5)
        ]

    def gen(self):
        return ray.get([
            actor.gen.remote()
            for actor in self._actors
        ])

@ray.remote
class Actor1:
    def gen(self):
        self._x = np.random.rand(100000000)
        return self._x


@ray.remote
class Actor2:
    def __init__(self, driver: Driver):
        self._driver = driver
    
    @profile
    def get(self):
        arrays = self._driver.gen()
        arrays = [array for array in arrays]
        ret = np.vstack(arrays)
        del arrays
        print(ret.nbytes/1024/1024)
        return ret
    
    @profile
    def x(self):
        time.sleep(10)
        return 1+1


if __name__ == "__main__":
    configs = {
        "memory_monitor_refresh_ms": 0,
        "memory_usage_threshold": 1,
        "free_objects_period_milliseconds": 0,
    }
    ray.init(_system_config=configs)

    driver = Driver()
    a2_ref = Actor2.remote(driver)

    while True:
        ray.get(a2_ref.get.remote())
        ray.get(a2_ref.x.remote())

Issue Severity

None

@zhaolazy zhaolazy added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 11, 2023
@zhaolazy
Copy link
Author

anybody pls help me, 😢

@jjyao jjyao added the core Issues that should be addressed in Ray Core label Jul 17, 2023
@rkooo567 rkooo567 added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core core-scheduler P1 Issue that should be fixed within a few weeks stability
Projects
None yet
Development

No branches or pull requests

4 participants