This repo contains a fully integratable Keras implementation of the modality fusion algorithm described in "Bio-inspired modality fusion for active speaker detection", a research paper by G. Assunção et al.
This Keras layer enables the fusion of embeddings originating from different modalities (e.g. vision, sound). Succintly, integration of multi-source uni-sensory information occurs through feedback stimulation of spatially proximal neural regions. This implementation specifically is done for fusion of two modalities, although the methodology is theoretically scallable to N modalities.
If you do use this code or an altered version of it for your own research make sure to cite the appropriate paper available on MDPI:
Assunção, G.; Gonçalves, N.; Menezes, P. Bio-Inspired Modality Fusion for Active Speaker Detection. Appl. Sci. 2021, 11, 3397. https://doi.org/10.3390/app11083397
All code is licensed under a GNU General Public License v3.0
Please refer to the LICENSE file in this repository for specific language governing permissions and limitations under the License.