New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with Openfl Gramine "error: PAL failed at ../../Pal/src/db_main.c:pal_main:513 (exitcode = 4, reason=No 'loader.entrypoint' is specified in the manifest)" #503
Comments
Hey! Thanks for reporting this. This issue was addressed and I believe it shouldn't be there with the next OpenFL release. |
I agree with your note regarding step 7 in the manual, but it would be also a wrong place to explain the certification process. |
I tried using with the latest build "openfl 1.4" but got a new error. For running openfl-gramine I had to install openfl1.4 in the docker image as well, for which I had to change the file "*venv/lib/python3.8/site-packages/openfl-gramine/Dockerfile.gramine" but there after I got a new error
I have tried it for multiple temples and the error is persistent there also. |
Hi @gagandeep987123, could you specify the command which lead to the error above? Just looking through the error, looks like either you don't have root permissions or there is no space left on device (df -kh .) |
Hi @mansishr
Also I made another changes to the "*venv/lib/python3.8/site-packages/openfl-gramine/Dockerfile.gramine"
I am using the latest release from GitHub for openfl installation. Also attaching the output from running the script. (please remove .pdf from output.pdf) |
I also got this. It looks like Tensorflow tries to create a temporary directory somewhere inside an enclave which is not a good idea in the first place. |
In order for TF to create a directory at runtime inside an enclave, we would need to mount that directory from the host area to the enclave, something like this. For the syntax that we currently support with OpenFL, the lines below can be added to the manifest here:
|
It is strange we need to mount the temp folder and not just allow using it inside an enclave. I am positive it worked before 😅😅 |
So I made the following changes
But I am still getting the same error. |
It should be done in a different way. We need to add that mounting line to the gramine manifest template. Will try to do this in a separate branch. Yet I am still not sure if it is safe to mount /tmp to the enclave, will ask gramine guys |
Try this branch, worked for me! |
Thanks for a working branch @igor-davidyuk. Ideally, it is unsafe to even allow /tmp directory, but since we are putting that as an allowed file in the manifest for this example, it should be okay to mount it as well. We should definitely consult the Gramine team as well. |
yes it is not stopping at that step 😄 but stopping a bit further. As of now after using the new branch, aggregator starts but the example it self is giving a error.
The above error is happening in loop.
|
Hi @gagandeep987123, can you try out the example with "torch_unet_kvasir_gramine_ready"? We are aware of the issue of the hash not being valid (came pretty recently). We'll resolve this soon, but in the meantime, please comment out this line and proceed. Regarding the multiprocessing issue that you see, it is a known issue that Python's |
Hi @mansishr, getting same error |
Hi @gagandeep987123, sorry for a late response. Multiprocessing is getting triggered through the use of tensorboard's summary writer. Please disable
|
@mansishr Is it working for you because I am still getting the error. I just changed the file as you suggested via a change in Dockerfile.gramine |
Hi @gagandeep987123, could you attach files where you have made the changes? Also, you would need to run through all the steps again and rebuild the image after any changes to the plan. |
Hi @gagandeep987123 let us know if the issue got resolved? |
It is working. Thanks for the help |
Describe the bug
I am attempting to run the example of FL as given in manual and getting this error on the aggregator.
Also, step 7 in the manual is presented in a bit vague manner for a first-time user.
I used the setup given here as a workspace and template. But using this gave above error when I am trying to start federation on aggregator machine.
The text was updated successfully, but these errors were encountered: