Multi-Resolution Fault Injector is a powerful neural network fault injector tool based on PyTorch.
Compared with previous network-level fault injection frameworks, the biggest feature is that MRFI can flexibly adjust different injection configurations for different experimental needs. Injection config and observations on each layer can be set independently by one clear config file. MRFI also provides a large number of commonly used fault injection methods and error models, and allows customization. In preliminary experiments, we may not want to face complex experimental configurations. MRFI also provide simple API for course-grained fault injection experiment.
Read detail usage on MRFI Documents >>
Fault injection is an important classic method for studying reliability. This is crucial for the design and deployment of neural networks for security critical applications. The low-level hardware based fault injection methods are time-consuming, while the high-level network model based methods usually neglect details. Therefore, MRFI is designed to be an efficient and highly configurable multi-resolution network-level fault injection tool.
Learn more from our paper of MRFI on Arxiv. For other level fault injection tool, see also Neural-Network-Reliability-Analysis-Toolbox.
# Install by pip
pip install MRFI
# Or download from github
git clone https://github.com/fffasttime/MRFI.git
pip install -r requirements.txt
The following code perform a quantized random integer bit flip injection on LeNet, and find the relation between bit error rate (BER) and classification accuracy.
from dataset.lenet_cifar import make_testloader, LeNet
from mrfi import MRFI, EasyConfig
from mrfi.experiment import Acc_experiment, Acc_golden, BER_Acc_experiment
testloader = make_testloader(1000, batch_size = 128) # test on 1000 cifar-10 images
# Create fault inject model
fi_model = MRFI(LeNet(trained=True).eval(),
EasyConfig.load_preset('default_fi'))
################### A Simple acccuracy experiment #####################
# Test accuracy under fault injection with select rate = 1e-3 which specified in "default_fi"
print('FI Acc: ', Acc_experiment(fi_model, testloader))
# Test accuracy w/o fault inject
with fi_model.golden_run():
print('golden run Acc: ', Acc_experiment(fi_model, testloader))
######################## BER_Acc_experiment ###########################
# Get selector handler because BER_Acc_experiment needs to modify selection rate in experiment
selector_cfg = fi_model.get_activation_configs('selector')
BER, Acc = BER_Acc_experiment(fi_model, selector_cfg,
make_testloader(1000, batch_size = 128),
[1e-6, 1e-5, 1e-4, 1e-3])
print('Bit error rate and accuracy: ', BER, Acc)
A possible output should like this:
FI Acc: 0.597
golden run Acc: 0.638
Bit error rate and accuracy: [1.e-06 1.e-05 1.e-04 1.e-03] [0.637 0.624 0.539 0.247]
The content of corresponding EasyConfig file default_fi.yaml
:
faultinject:
- type: activation
quantization:
method: SymmericQuantization
dynamic_range: auto
bit_width: 16
enabled: True
selector:
method: RandomPositionByRate
poisson: True
rate: 1e-3
error_mode:
method: IntSignBitFlip
bit_width: 16
module_name: [conv, fc] # Match layers that contain these name
module_type: [Conv2d, Linear] # Or match layers contain these typename
For detail usage, see MRFI Doc - Usage.
Explore MRFI internal functions in MRFI Doc - function table.
Activation injection
- Fixed position (Permanent fault)
- Runtime random position (Transient fault)
Weight injection
- Fixed position (Permanent fault)
- Runtime random position (Transient fault)
Injection on quantization model
- Posting training quantization
- Dynamic quantization
- Fine-grained quantization parameters config
- Add custom quantization
Error mode
- Integer bit flip
- Float bit flip
- Stuck-at fault (SetValue)
- Random value
- Add random noise
Internal observation & visualize
- Activation & Weight observer
- Error propagation observer
- Easy to save and visualize result, work well with
numpy
andmatplotlib
Flexibility
- Add custom
error_mode
,selector
,quantization
andobserver
- Distinguish network-level, layer-level, channel-level, neuron-level and bit-level fault tolerance difference
Performance
- Automatically use GPU for network inference and fault injection
- The selector - injector design is significantly faster than generate probability on all position when perform a random error injection
- Accelerate error impact analysis through internal observer metrics rather than use original accuracy metric
Fine-grained configuration
- By python code
- By .yaml config file
- By GUI
Evaluation fault tolerance policy
- Selective protection on different level
- More fault tolerance method may be support later (e.g. fault tolerant retrain, range-based filter)
Code in mrfi/
are tested to ensure correctness. Test case are under test/
.
Set logging.basicConfig(level = logging.DEBUG)
can get more runtime logging. It may be helpful when MRFI does not perform as expected.
pytest --cov=mrfi --cov-report=html
coverage report written to htmlconv/index.html
Modify markdown content files under docs/
.
mkdocs build
mkdocs serve
mkdocs gh-deploy
I tried to conduct thorough testing for MRFI, but due to the many features, there may still be potential issues. If you find a problem, please provide sufficient code and configuration in the issue to analyze the cause, or submit a pull request to fix the problem.