Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEEC_InvokeCommand(forward) failed when running fl_tee_layerwise.sh on HiKey 960 #3

Open
HenryHu2000 opened this issue Apr 16, 2023 · 4 comments

Comments

@HenryHu2000
Copy link
Contributor

HenryHu2000 commented Apr 16, 2023

Hello @mofanv,
I attempted to run fl_tee_layerwise.sh on an HiKey 960, the same board used in the original PPFL paper PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments. However, I'm getting TEEC_InvokeCommand(forward) failed 0xffff3024 origin 0x3 when running fl_tee_layerwise.sh, the same error in mofanv/darknetz#14 and mofanv/darknetz#29. Other scripts like fl_tee_standard_noss.sh and fl_tee_standard_ss.sh can run correctly.

Since under tz_datasets/cfg folder there don't exist greedy-cnn-aux.cfg, greedy-cnn-layer1.cfg, greedy-cnn-layer2.cfg, greedy-cnn-layer3.cfg and mnist_greedy-cnn.cfg files that are required by fl_tee_layerwise.sh, I manually copy-pasted them from PPFL/server_side_sgx/cfg.
Error log:

  ============= initialization =============
  ============= layer 1 =============
  ============= round 1 =============
  ============= copy weights server -> client 1 =============
  Warning: Permanently added '[127.0.0.1]:8888' (ECDSA) to the list of known hosts.
  
  real    0m1.711s
  user    0m0.008s
  sys     0m0.000s
  tee weights: 82356 Bytes
  ============= ssh to the client and local training =============
  layer     filters    size              input                output
      0 conv_TA    2  3 x 3 / 1    32 x  32 x   3   ->    32 x  32 x   2  0.000 BFLOPs
      1 conv_TA    2  3 x 3 / 1    32 x  32 x   2   ->    32 x  32 x   2  0.000 BFLOPs
      2 connected_TA                         2048  ->    10
  Prepare session with the TA
  Begin darknet
  mnist_greedy-cnn
  1
  workspace_size=110592
      3 softmax_TA                                       10
      4 cost_TA                                          10
  Loading weights from /root/models/mnist/mnist_greedy-cnn_global.weights...Done!
  Learning Rate: 0.01, Momentum: 0.9, Decay: 5e-05
  3000
  32 28
  output file: /media/results/train_mnist_greedy-cnn_pps0_ppe4.txt
  current_batch=10 
  Loaded: 0.003913 seconds
  darknetp: TEEC_InvokeCommand(forward) failed 0xffff3024 origin 0x3
  
  real    0m1.594s
  user    0m0.003s
  sys     0m0.005s

I checked mofanv/darknetz#14 and mofanv/darknetz#29 and attempted to increase TA_STACK_SIZE and TA_DATA_SIZE in ta/include/user_ta_header_defines.h I have the following values, but am still getting the error. I cannot increase them further because that would cause a TEEC_Opensession failed with code 0xffff000c origin 0x3 error as from mofanv/darknetz#32.

/* Provisioned stack size */
#define TA_STACK_SIZE			(1 * 1024 * 1024)

/* Provisioned heap size for TEE_Malloc() and friends */
#define TA_DATA_SIZE			(12 * 1024 * 1024)

I isolated the command darknetp classifier train -pp_start_f 0 -pp_end 4 -ss 2 "cfg/mnist.dataset" "cfg/mnist_greedy-cnn.cfg" "/root/models/mnist/mnist_greedy-cnn_global.weights" that failed and tried to run it manually on the client. -pp_start_f 0 -pp_end 4 fails but -pp_start_f 0 -pp_end 3 can run. It seems that layer 4 is the one that cannot fit into TEE memory.

Do you know what the original configuration used in PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments was? Thank you!

@mofanv
Copy link
Owner

mofanv commented Apr 20, 2023

Hi @HenryHu2000 , the TEEC_InvokeCommand(forward) failed 0xffff3024 origin 0x3 error is typically caused by the secure memory limits. When one layer's weight matrix is created during the forward pass, out-of-memory happens. But I found in your test, the layer is quite small, and seems not large enough to trigger this problem?

@mofanv
Copy link
Owner

mofanv commented Apr 20, 2023

The cfg files you mentioned are not in tz_datasets/cfg, but server_side_sgx/cfg. You may try run again with these cfg files inside

@HenryHu2000
Copy link
Contributor Author

HenryHu2000 commented Apr 20, 2023

The cfg files you mentioned are not in tz_datasets/cfg, but server_side_sgx/cfg. You may try run again with these cfg files inside

Hi @mofanv, thanks for your reply. Yes, I used the cfg files in server_side_sgx/cfg but was still getting these errors. Without these cfg files, fl_tee_layerwise.sh doesn't run.

@HenryHu2000
Copy link
Contributor Author

HenryHu2000 commented Apr 20, 2023

Hi @HenryHu2000 , the TEEC_InvokeCommand(forward) failed 0xffff3024 origin 0x3 error is typically caused by the secure memory limits. When one layer's weight matrix is created during the forward pass, out-of-memory happens. But I found in your test, the layer is quite small, and seems not large enough to trigger this problem?

I followed the exactly same configuration as in the paper. I tried the following 3 configurations on fl_tee_layerwise.sh but none of them worked:

  • Device=HiKey 960, TA_STACK_SIZE=1 * 1024 * 1024, TA_DATA_SIZE=10 * 1024 * 1024 (default settings from the repo)
  • Device=HiKey 960, TA_STACK_SIZE=1 * 1024 * 1024, TA_DATA_SIZE=12 * 1024 * 1024
  • Device=Raspberry Pi 3, TA_STACK_SIZE=1 * 1024 * 1024, TA_DATA_SIZE=6 * 1024 * 1024

However, other scripts like fl_tee_standard_noss.sh and fl_tee_standard_ss.sh do run correctly. It seems that changing the flag -ss 2 to -ss 1 also avoids the error, but I guess it breaks the intended purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants