Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.


VisualEchoes: Spatial Image Representation Learning through Echolocation

This repository contains the RGB / Depth / Echo data used for spatial image representations learning in our ECCV 2020 paper VisualEchoes. [Project Page]

VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao1,3, Changan Chen1,3, Ziad Al-Halah1, Carl Schissler2, Kristen Grauman1,3
1UT Austin, 2Facebook Reality Lab, 3Facebook AI Research
In European Conference on Computer Vision (ECCV), 2020

If you find our data or project useful in your research, please cite:

  title = {VisualEchoes: Spatial Image Representation Learning through Echolocation},
  author = {Gao, Ruohan and Chen, Changan and Al-Halab, Ziad and Schissler, Carl and Grauman, Kristen},
  booktitle = {ECCV},
  year = {2020}

Demo Video of Echolocation Simulation

We show an example of the agent navigating in one replica scene and performing echolocation. The agent emits 3ms chirp signals from 20Hz to 20kHz and receives echo responses from the room. Echoes resulting from the emitted chirps reflect the scene geometry.

Overview of VisualEchoes Dataset

We provide the (RGB, Depth, Echo) data generated using habitat-sim and sound-spaces on Replica dataset. The source audio "chirp" we use is a sweep signal from 20Hz - 20kHz (the human-audible range) within a duration of 3ms (3ms_sweep.wav). The echoes are obtained by convolving the 1s audio (with the sweep signal in the first 3ms) with the corresponding binaural echolocation room impuse responses (RIRs) for each of the four orientations (0°, 90°, 180°, 270°) at the agents' locations. When the agent emits sound from its position, convolving the emitted omnidirectional audio with the corresponding binaural RIR generates the binaural echo responses from the environment the agent hears when facing each orientation.

Data Download

  1. The VisualEchoes dataset contains the RGB images, depth maps, echo responses generated from Replica dataset. Run the commands below to download the rgb-depth pairs (4 different resolutions), echoes, room impuse responses used for echolocation for the 1,740 navigable locations x 4 orientations = 6,960 agent states used in the paper.
# rgb-depth pairs of 4 different resolutions
# dictionary is in the format of {scene:{(location, orientation): {'rgb':rgb_image, 'depth':depth_map}}}

# echo responses for the 3ms sweep signal at all navigable locations
#    ├── echoes_navigable                          
#    │       └── [scene]                         (scene name)
#    │           └── [sweep_sound]               (name of the source signal)
#    │               └── [angle]                 (agent's orientation)
#    │                   └── location_index.wav  (agent's location)

# echolocation room impulse response for all navigable locations
#    ├── echolocation_RIRs_navigable                          
#    │       └── [scene]                         (scene name)
#    │           └── [angle]                     (agent's orientation)
#    │               └── location_index.wav      (agent's location)
  1. We also provide the echoes and binaural echolocation room impulse responses for all locations of the 18 environments in Replica.
# echo responses for the 3ms sweep signal at all locations

# echolocation room impulse response for all locations


The data/sweep_audio/ directory contains the 3ms sweep signal we use for echolocation in our paper as well as some other types of sweep signals. To generate echo responses for other types of 1s signals of your choice, place the directory of echolocation room impluse response properly under your data directory and use the script to perform echolocation.



The RGB and depth data are generated using habitat-sim on Replica dataset. The binaural RIRs we use for echolocation is a subset of the binaural RIRs from sound-spaces.


The VisualEchoes dataset is CC BY 4.0 licensed, as found in the LICENSE file. Please also refer to the licence files for habitat-sim, Replica, and sound-spaces.


VisualEchoes Dataset (ECCV 2020)



Code of conduct

Security policy





No releases published


No packages published