Simone Mellace, Jerome Guzzi, Alessandro Giusti, Luca M. Gambardella
Dalle Molle Institute for Artificial Intelligence, USI-SUPSI, Lugano (Switzerland)
We showcase a model to generate a soundscape from a camera stream in real time. The approach relies on a training video with an associated meaningful audio track; a granular synthesizer generates a novel sound by randomly sampling and mixing audio data from such video, favoring timestamps whose frame is similar to the current camera frame; the semantic similarity between frames is computed by a pre-trained neural network. The demo is interactive: a user points a mobile phone to different objects and hears how the generated sound changes.
See proceedings of AAAI 2019 (not yet online)
Poster: PDF
Video: VIDEO
Coming soon. Please inquiry by email