Our project is developed here. The goal finish time is August 01, 2021
FastAudioVisual is a tool that allows us to develop and analyse research in the audiovisual domain. The framework of this model as follow:
As we can see that this project has five parts. Here is the detail of each part.
-
DataRegular: It causes many questions due to different file structure in some research. In this work, we develop a series of functions to make your database regular with the next step. All of these funfunctions arested and regular by RAVDESS which is a big database in multimodal emotion recognition.
-
FeatureExtract: Features extraction is important for model study. There are many features can be extracted for input. For audio, MFCC, FBank, crossing-zero rate and soon on can be used. For visual, gray, RGB, optical flow diagram can be used. In this part, we will build some API to extract these features.
-
SampleModel: With the develop of hardwares, deep learning has got siginificant improvement in every area. Many area has been regular by deep learning. Therefore, we collect some classical model for basic research. This part will make you have a enough evaluate and experiment. (In the beginning, I struggled to choose Pytorch and fastai).
-
ModelDesign: In this part, we focus on audiovisual fusion method and model design for audiovisual other domain( including loss , framework, other trick.). It collect some research work and code. Also, we can replace simplemodel into this part. Making the result is better.
-
Analysis: Based on above parts, we will using some tool to analysis the result of this experiment. Such as confusion matrix, CAM, feature distrbution.
-
Test: Some demo for using this project.
-
Others: It includes some paper or blog for this area.
In general, All of these design is for developing your audiovisual research fastly by this ttool!