Skip to content

Troubleshooting

Helena Barmer edited this page Aug 18, 2019 · 13 revisions

Troubleshooting Guide

While doing the Federated learning project on Raspberry Pi’s, one will be presented with myriads of challenges, especially those frustrating exceptions! Not to worry, we have compiled all the errors that each one of us have encountered in this journey filled with loads of lessons learnt.

Installing PyTorch on Raspberry Pi

"Compiling PyTorch on my Raspberry Pi take forever! It has frozen on 37% for a long time now."

  • Cause: Compiling PyTorch on a Raspberry Pi takes a long time.
  • Fix: As long as no errors are showing in your terminal and the green light is on your Raspberry Pi you will just have to wait.

ModuleNotFoundError: No module named 'torch._C'

  • Cause 1: This exception occurs if you try to run pytorch from the same location as your build location. The compiled pytorch folder already contains a folder named torch and the interpreter tries to find the packages from that folder.
  • Cause 2: This exception would also occur if the installed PyTorch library is not compiled with the right version of gcc.
  • Fix 1: cd (change directory) to a different location and import torch upon launching Python
  • Fix 2 Compile PyTorch with the gcc version 8.2 present in Raspbian Buster and then install PyTorch

Original error was: libf77blas.so.3: cannot open shared object file, while importing torch in Python

  • Cause: You probably missed to install some of the dependencies listed in the project tutorial required for Pytorch
  • Fix: Run the following to install Pytorch’s dependencies:

sudo apt install libopenblas-dev libblas-dev m4 cmake cython python3-dev python3-yaml python3-setuptools

Failed to run 'bash tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use mkldnn --use qnnpack caffe2' while trying to build PyTorch

  • Cause: You probably missed to add the environment variables before the build process. Note that this needs to be done every time your Raspberry Pi is restarted.
  • Fix: Type in the following to add environment variables:

export NO_CUDA=1

export NO_DISTRIBUTED=1

export NO_MKLDNN=1

export NO_NNPACK=1

export NO_QNNPACK=1

  • Note: You can also temporarily add this to your .bash_rc file.

Compiling fails and throws a GCC error

  • Cause: You may be using latest Raspbian version (Buster) on a RPi3 or RPi3+. Raspbian Buster comes with GCC8.x and Stretch with GCC6.x, GCC8.x will fail to build PyTorch for RPi3+ or older.
  • Fix: Downgrade to Raspbian Stretch. You can find older Raspbian images here.

Installing PySyft on Raspberry Pi or PC

Could not find a version that satisfies the requirement torch>=1.1 pysyft

  • Cause: This exception occurs when you try to install syft via pip. This is because latest version of PySyft requires PyTorch v1.1 to be installed.
  • Fix 1: Upgrade torch to v1.1. Type in the following:

pip install –upgrade torch

  • Fix 2: If you are using PyTorch v1.0 and need to stick to it, install an older version of PySyft without dependencies. If you directly install via pip with dependencies, it would once again lead to the same error. Hence, type in the following instead:

pip install syft==0.1.13a1 –no-dependencies

Syft version 0.1.13a1 seems to be compatible with torch V1.0 Now you need to separately install the dependencies. Type in:

pip3 install flask-socketio lz4 msgpack websockets zstd

Try importing Pysyft using: import syft. This should execute successfully without any further exceptions


ModuleNotFoundError: No module named 'websocket'

Try:

  • $ pip3 install websocket_client
  • $ pip3 install Flask flask-socketio lz4 msgpack websockets zstd
  • $ python3
  • >>> import syft
  • #If no errors syft is successfully installed!

Running the Federated RNN code in Jupyter notebook

OverflowError: timeout doesn't fit into C timeval

  • Cause: This error occurs due to the very large timeout value given to websocket in pysyft for a windows machine.
  • Fix: This needs to be fixed by editing the websocket_client.py file in ..\Lib\site-packages\syft\workers of your Python location. Look for TIMEOUT_INTERVAL variable which has 9_999_999. Remove a 9 and change this to 9_999_99.

The following code snippet in the cell in Jupyter notebook take a long time to process:

print("Generating list of batches for the workers...")

list_federated_train_loader = list(federated_train_loader)

  • Cause: Well, it does take a looong time to process!
  • Fix: Be patient, grab a cup of coffee, and wait! It takes around 20mins – 3 hours or more depending on the processing power of machine.

importerror Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib

  • Cause: latest package version requires libomp that may not be installed in some devices.
  • Fix: just install libomp via Homebrew on macOS or apt-get on Linux.

Webbsocket connection throws an error while opening or closes abruptly while running

  • First, check if your RPi connections are running properly.
  • Cause: This problem will most likely be caused by having different PySyft versions on your devices.
  • Fix: Make sure to install latest version. You can just git pull from PySyft repo and restart build and install process (build and install time will be shorter).
  • Fix2: Restart notebook kernel al clear all outputs. If you changed or updated packages they may not work properly until kernel is restarted.
  • Note 1: latest version will only work with PyTorch 1.1.0.
  • Note 2: This will also fix a couple of bugs on tutorial files, worker conenction status will now be visible in RPis.

Running server workers in Raspberry Pi

No module named syft, even if its present

  • Cause: This error occurred for me when I tried to run the start_websocket_servers.py script using sudo
  • Fix: The error disappeared when I did not use sudo.

Train RNN error

TypeError: 'Tensor' object is not callable

  • Cause: On step 21 in jupyter notebook 'Federated Recurrent Neural Network'
  • Fix: ..\syft\frameworks\torch\hook.py at the line 356. Change self.native_param_data(new_data) to self.native_param_data.set_(new_data)
Clone this wiki locally