When a program applies tensor parallelism, different rank may receive the same op_config, the tuning proc may become duplicated across different ranks, consider the following:
cpu 0: tune op0, save runtime into local database id 0
cpu 1: tune op0, save runtime into local database id 0
This 0 -> 0 cross-overwriting process can potentially corrupt the local runtime module.
Maybe some bugs related to issue #186 .
Recommend solution:
- save op into database with a spin locker.
TODO Items: