-
Notifications
You must be signed in to change notification settings - Fork 6
smartcameras/AV3T
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Date: 07/02/2019 % Author: Xinyuan Qian (x.qian@qmul.ac.uk) % % Please go through the readme.m and License.doc files before using the software. % % Requested citation acknowledgement when using this software: % [1] X. Qian, A. Brutti, O. Lanz, M. Omologo and A. Cavallaro, "Multi-speaker tracking from an audio-visual sensing device" in IEEE Transactions on Multimedia, Feb 2018, accepted. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This folder contains the MATLAB code of our proposed 'AV3T' tracker for multi-speaker tracking in 3D % using the multi-modal signals captured by a small-size co-located audio-visual sensing platform. % % % The evaluation is made on two datasets: % (1) CAV3D - collected and annotated by ourselves % (2) AV16.3 - % G. Lathoud, J.-M. Odobez, and D. Gatica-Perez, “AV16.3:an audio-visual corpus for speaker localization and tracking,” in Machine Learning for Multimodal Interaction. Martigny, Switzerland: Springer, Jun 2004. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % To execute the tracker, you need to % (1) download the CAV3D dataset: https://speechtek.fbk.eu/cav3d-dataset/ % (2) download our re-arranged AV16.3 dataset: https://drive.google.com/file/d/19vA1tSf9EauMBxZr4wkbS1Ra27hVlJpS/view?usp=sharing % (3) change the 'datapath' in 'test_main.m' to the data directory % (4) run 'test_main.m' % In the 'test_main.m' file, you can change the parameters to specify % (i) the test dataset i.e. either AV16.3 or CAV3D and sequences % (ii) tracking mode i.e. audio-visual audio-only or video-only % (iii) face detection removal index => to test the system robustness of the face detection inputs % (iv) the number of particle (per target), and the experiment iterations % (v) visualization and saving flag % % % Note: % The camera projection MEX-files ('projectin/mex/*.mex') were compiled from C++ code, % which are not compatible to different machines (especially if you have the linux environment). If you cannot execute % the default '.mex' files, you need to compile them by yourself by running the code 'projection/mex/cpp2mex.m' % Some attention points: % (1) make sure you have the right compiler: https://uk.mathworks.com/support/requirements/supported-compilers.html % (2) make sure you have installed OpenCV % (3) make sure you have installed the MATLAB Computer Vision System Toolbox and it's OpenCV Interface: % https://uk.mathworks.com/matlabcentral/fileexchange/47953-computer-vision-system-toolbox-opencv-interface % (you may follow compilation instructions in the enclosed video.) % If you still have problems during the compilation, please ask for the MathWorks technical support, % probably by creating a service request here: https://uk.mathworks.com/support/contact_us.html % % % The proposed 3D multi-speaker tracker can be adapted to any audio-visual sensing platforms % as long as signals are synchronized and calibration parameters are available. % To execute on a new audio-visual data, you need to % (1) compute and update the synchronization information, camera calibration parameters and microphone position in 'readParas.m' % (2) compute the face detection results, save the results, and update the filename in 'readParas/readfaceDetect.m' % the face detector we used in [1] can be found in https://github.com/tornadomeet/mxnet-face % the detection bounding box should be saved in format: [#frame, #detection, topleft_x, topleft_y, bottomright_x, bottomright_y, probability, ...] % (3) compute the GCC-PHAT results, save the results, and update 'readParas/extractGCF3Dres.m' % % For any questions, please contact x.qian@qmul.ac.uk
About
Impementation of "Multi-speaker tracking from an audio-visual sensing device" - IEEE Transactions on Multimedia,
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published