Flowgrad - Using Motion for Visual Sound Source Localization

Recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos. While it proves to be effective for widely used benchmark datasets, the method falls short for challenging scenarios like urban traffic. This work introduces temporal context into the state-of-the-art methods for sound source localization in urban scenes using optical flow as a means to encode motion information.

Inference on a Video

result.video.mp4

Setup

Install requirements

pip install -r requirements.txt

Evaluation on Urbansas

We've used the Urban Sound and Sight dataset to train and evaluate our models. Evaluation can be run on urbansas as follows -

Prepare data for evaluation

 python src/data/prepare_data.py -d PATH_TO_URBANSAS
 python src/data/calc_flow.py -d PATH_TO_URBANSAS

Run evaluation - Several different models can be used for evaluation and the model can be passed as an argument.
```
 python evaluate.py -m MODEL
```

The following model choices are available -

rcgrad - Pretrained model from How to Listen? Rethinking Visual Sound Source Localization (Paper) (Repo)
flow - optical flow used as localization maps
flowgrad-H, flowgrad-EN, flowgrad-IC - variants of flowgrad (refer to paper for details)
yolo_baseline, yolo_topline - vision only object detection models used as baselines. The topline includes motion based filtering (stationary objects are discarded).

Performance

Model	IoU (τ = 0.5)	AUC
Vision-only+CF+TF (topline)	0.68	0.51
Optical flow (baseline)	0.33	0.23
RCGrad	0.16	0.13
FlowGrad-H	0.50	0.30
FlowGrad-IC	0.26	0.18
FlowGrad-EN	0.37	0.23

Upcoming!

Inference on custom videos
Training
Support for other benchmark datasets for Visual Sound Source Localization

If you have any ideas or feature requests, please feel free to raise an issue!

Acknowledgements

This work is built upon and borrows heavily from hohsiangwu/rethinking-visual-sound-localization

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data/urbansas		data/urbansas
figures		figures
models		models
poster		poster
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flowgrad - Using Motion for Visual Sound Source Localization

Inference on a Video

Setup

Install requirements

Evaluation on Urbansas

Performance

Upcoming!

Acknowledgements

About

Releases

Packages

Languages

License

rrrajjjj/flowgrad

Folders and files

Latest commit

History

Repository files navigation

Flowgrad - Using Motion for Visual Sound Source Localization

Inference on a Video

Setup

Install requirements

Evaluation on Urbansas

Performance

Upcoming!

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages