Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

question: Do I need to use ray.init() before using the Ray Accelerator? #7

Closed
MohammedAljahdali opened this issue Mar 6, 2021 · 12 comments

Comments

@MohammedAljahdali
Copy link

Hey, Thank you for creating a needed library.

I am very new to using Ray, and I already had a project built around PL. I looked around on how to add Ray distributed training backend to my project, and I found this library that does not force me to not use the PL trainer.

Now, I am trying to use the accelerator on my local machine, but I failed to do so. I think it's a really simple issue, because of my lack of knowledge.

This the bit where I add the accelerator:

if accelerator_use:
    ray.init()
    accelerator = RayAccelerator(num_workers=4, cpus_per_worker=1, use_gpu=True)
else:
    accelerator = None

I tried without using ray init and I got an error, and when I add ray init I get this:

2021-03-06 14:54:45,217 WARNING worker.py:1107 -- The actor or task with ID ffffffffffffffff63964fa4841d4a2ecb45751801000000 cannot be scheduled right now. It requires {CPU: 1.000000}, {GPU: 1.000000} for placement, but this node only has remaining {7.000000/8.000000 CPU, 7.177734 GiB/7.177734 GiB memory, 0.000000/1.000000 GPU, 1.000000/1.000000 node:172.20.10.2, 2.441406 GiB/2.441406 GiB object_store_memory}
. In total there are 0 pending tasks and 6 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.
@amogkam
Copy link
Collaborator

amogkam commented Mar 7, 2021

Hey @MohammedAljahdali thanks for the question. Yes you need to call ray.init() before instantiating the accelerator.

The error you are seeing is I think due to the number of GPUs available on your machine. It seems like there is only 1 GPU on your machine, but you are using 4 workers, and since use_gpu is set to True, each worker will attempt to reserve 1 GPU. This will only be feasible if there are at least 4 GPUs on your machine.

@amogkam
Copy link
Collaborator

amogkam commented Mar 7, 2021

Actually it is pretty simple to automatically call ray.init() in the __init__ of the accelerators if Ray is not already initialized. Thanks for bringing this up- I will make that change to this library!

@MohammedAljahdali
Copy link
Author

Thank you for the help, and happy that I inspired a change with my question.

Currently, I am facing a new issue, and I think it's related to ray itself, whenever I run any program with the ray, the program never gets executed and the script ends. I faced this issue, when I use the accelerator, and when I used the tuner, each on it's own.
I also tried this example and the same issue happened:

    import time
    import ray

    ray.init()

    @ray.remote
    def do_some_work(x):
        time.sleep(1)  # Replace this is with work you need to do.
        return x

    start = time.time()
    results = [do_some_work.remote(x) for x in range(4)]
    print("duration =", time.time() - start)
    print("results = ", results)

Also, the dashboard does not open with me.

I will open an issue on ray repo.

@amogkam
Copy link
Collaborator

amogkam commented Mar 7, 2021

What do you mean that the program never gets executed? Are the print statements not being printed out?

Also, just an fyi, calling do_some_work.remote() it will return immediately with an object future. You then have to call ray.get(...) on the returned object to actually block until the function is finished and the actual result is returned. You can find more information here https://docs.ray.io/en/master/ray-overview/index.html#parallelizing-python-java-functions-with-ray-tasks.

Also, could you describe your dashboard issue further? When you run ray.init(), the dashboard should show up in http://localhost:8265

And last thing- can you make sure you are using the latest version of Ray. Thanks.

@MohammedAljahdali
Copy link
Author

I am running the 1.2.0 version, which is the latest, I think. As for programs not being executed, yes the print statements are not printed out. Thank you for the clear-up about the example, but tbh I just copied it to check if the issue is from ray or my main script. I tried using http://localhost:8265 to open the dashboard, but also it did not open I get this: This site can’t be reached and the link printed in the command line is the following: http://127.0.0.1:8265

@amogkam
Copy link
Collaborator

amogkam commented Mar 7, 2021

Hmm the print statements not being shown is odd. Is there any output being shown at all that you can share?

@MohammedAljahdali
Copy link
Author

I just reinstalled ray and nothing changed. This the output of running the example above:

(dl_env) G:\AI_projects\DL_Projects\ArabicText\ArabicText>python try_ray.py
2021-03-07 11:08:12,082 INFO services.py:1172 -- View the Ray dashboard at http://127.0.0.1:8265

(dl_env) G:\AI_projects\DL_Projects\ArabicText\ArabicText>

@amogkam
Copy link
Collaborator

amogkam commented Mar 7, 2021

@MohammedAljahdali wow ok this is really odd. Could you post this as an issue on the main ray repo? It'll be easier to get more help there.

@MohammedAljahdali
Copy link
Author

@amogkam
Copy link
Collaborator

amogkam commented Mar 7, 2021

@MohammedAljahdali can you post a bug report here as well- https://github.com/ray-project/ray/issues

@MohammedAljahdali
Copy link
Author

Yea sure, thank you lots for the help.

@MohammedAljahdali
Copy link
Author

I opened an issue checkout please ray-project/ray#14522

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants