# Tutorial 6: Variation of Rooms Setups and their Impact on Accoustic

<p align="right" style="margin-right: 8px;">
    <a target="_blank" href="https://colab.research.google.com/github/idiap/sdialog/blob/main/tutorials/01_audio/6.accoustics_variations.ipynb">
        <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
    </a>
</p>

## Getting started

### Environment Setup

Let's first check if our environment is all set up.

In [None]:
# Setup the environment depending on weather we are running in Google Colab or Jupyter Notebook
import os
from IPython import get_ipython
from IPython.display import Audio, display

if "google.colab" in str(get_ipython()):
    print("Running on CoLab")

    !sudo apt-get install sox ffmpeg
    !sudo apt-get -qq -y install espeak-ng > /dev/null 2>&1
    %pip install -q kokoro>=0.9.4
    # Installing sdialog
    !git clone https://github.com/idiap/sdialog.git
    %cd sdialog
    %pip install -e .[audio]
    %cd ..
else:
    print("Running in Jupyter Notebook")

### Load the example dialogue

In order to run the next steps in a fast manner, we will start from an existing dialog generated using previous tutorials:

In [None]:
from sdialog import Dialog

path_dialog = "../../tests/data/demo_dialog_doctor_patient.json"

if not os.path.exists(path_dialog) and not os.path.exists("./demo_dialog_doctor_patient.json"):
    !wget https://raw.githubusercontent.com/idiap/sdialog/refs/heads/main/tests/data/demo_dialog_doctor_patient.json
    path_dialog = "./demo_dialog_doctor_patient.json"

original_dialog = Dialog.from_file(path_dialog)
original_dialog.print()

# Tutorial

The idea behind this tutorial is to demonstrate how different room configurations and their acoustic properties can influence the quality and characteristics of generated dialogue audio.

By comparing the audio results generated with different room configurations, you will be able to hear and understand how the acoustic environment affects the perception and quality of synthetic dialogues.

### Instanciate voices database

In [None]:
from sdialog.audio.voice_database import HuggingfaceVoiceDatabase
kokoro_voice_database = HuggingfaceVoiceDatabase("sdialog/voices-kokoro")

### Instanciate TTS model

In [None]:
from sdialog.audio.tts import KokoroTTS
tts_engine = KokoroTTS()

## Setup stage: Audio Dialog and Audio Pipeline

In [None]:
from sdialog.audio.dialog import AudioDialog
from sdialog.audio.pipeline import AudioPipeline

Convert the original dialog into a audio enhanced dialog

In [None]:
dialog: AudioDialog = AudioDialog.from_dialog(original_dialog)

Instanciate the audio pipeline in order to use `Kokoro` (`tts_engine`) as the TTS model and save the audios outputs of all the dialogs into the directory `./audio_outputs`.

The voices are sampled from the `kokoro_voice_database` based on the persona attributes `age`, `gender` and `language`, as assigned during the original textual dialog.

In [None]:
os.makedirs("./audio_outputs_variations", exist_ok=True)
audio_pipeline = AudioPipeline(
    voice_database=kokoro_voice_database,
    tts_engine=tts_engine,
    dscaper_data_path="./dscaper_data_variations",
    dir_audio="./audio_outputs_variations",
)

or if you encounter any issue during the download due to timeout:

In [None]:
%%script false --no-raise-error
!hf download sdialog/background --repo-type dataset
!hf download sdialog/foreground --repo-type dataset

Now let's generate a medical room it will be enough and display it's shape and content:

In [None]:
from IPython.display import Audio, display
from sdialog.audio.room import DirectivityType
from sdialog.audio.utils import SourceVolume, SourceType
from sdialog.audio.room_generator import BasicRoomGenerator
from sdialog.audio.jsalt import MedicalRoomGenerator, RoomRole
from sdialog.audio.room import SpeakerSide, Role, RoomPosition, MicrophonePosition

Run steps 1 before, since they are commonly shared with all our simulations:

In [None]:
dialog: AudioDialog = audio_pipeline.inference(
    dialog,
    dialog_dir_name="demo_dialog_room_accoustic",
    audio_file_format="mp3"
)

print("dialog.audio_step_1_filepath",dialog.audio_step_1_filepath)
display(Audio(dialog.audio_step_1_filepath, rate=24000))

Then, run the accoustics simulation for all the room role with have here. Since we used the same `dialog_dir_name` as before (`demo_dialog_room_accoustic`) for steps 1 and 2, we will have access to the data obtains by those two process and only run the 3rd step:

Let's also do it with basic rooms:

In [None]:
for _size in range(20, 40, 5):

    room = BasicRoomGenerator().generate(args={"room_size": _size})

    room.place_speaker_around_furniture(speaker_name=Role.SPEAKER_1, furniture_name="center", max_distance=5.0, side=SpeakerSide.FRONT)
    room.place_speaker_around_furniture(speaker_name=Role.SPEAKER_2, furniture_name="center", max_distance=5.0, side=SpeakerSide.BACK)

    room.set_directivity(direction=DirectivityType.OMNIDIRECTIONAL)

    room.set_mic_position(MicrophonePosition.CEILING_CENTERED)

    dialog: AudioDialog = audio_pipeline.inference(
        dialog,
        perform_tts=False,
        perform_room_acoustics=True,
        environment={
            "room": room, # Need to provide a room object to trigger the 3rd step of the audio pipeline
            "background_effect": "white_noise",
            "foreground_effect": "ac_noise_minimal",
            "foreround_effect_position": RoomPosition.TOP_LEFT,
            "source_volumes": {
                SourceType.ROOM: SourceVolume.HIGH,
                SourceType.BACKGROUND: SourceVolume.VERY_LOW
            },
            "kwargs_pyroom": {
                "ray_tracing": True,
                "air_absorption": True
            }
        },
        dialog_dir_name="demo_dialog_room_accoustic",
        room_name=f"my_room_config_BasicRoom_{_size}",
        audio_file_format="mp3",
        override_tts_audio=False
    )

    print(f"Done with {_size} basic room configuration!")
    print("\n"*3)

In [None]:
print("-"*25)
print("- Room Configurations")
print("-"*25)
for config_name in dialog.audio_step_3_filepaths:
    print(f"> Room Configuration: {config_name}")
    display(Audio(dialog.audio_step_3_filepaths[config_name]["audio_path"], rate=24000))

In [None]:
for _role in [_r for _r in RoomRole][0:3]:

    room = MedicalRoomGenerator().generate(args={"room_type": _role})

    room.place_speaker_around_furniture(speaker_name=Role.SPEAKER_1, furniture_name="desk", max_distance=1.0, side=SpeakerSide.FRONT)
    room.place_speaker_around_furniture(speaker_name=Role.SPEAKER_2, furniture_name="desk", max_distance=1.5, side=SpeakerSide.BACK)

    room.set_directivity(direction=DirectivityType.OMNIDIRECTIONAL)

    room.set_mic_position(MicrophonePosition.CHEST_POCKET_SPEAKER_1)

    dialog: AudioDialog = audio_pipeline.inference(
        dialog,
        perform_tts=False,
        perform_room_acoustics=True,
        environment={
            "room": room, # Need to provide a room object to trigger the 3rd step of the audio pipeline
            "background_effect": "white_noise",
            "foreground_effect": "ac_noise_minimal",
            "foreround_effect_position": RoomPosition.TOP_LEFT,
            "source_volumes": {
                SourceType.ROOM: SourceVolume.HIGH,
                SourceType.BACKGROUND: SourceVolume.VERY_LOW
            },
            "kwargs_pyroom": {
                "ray_tracing": True,
                "air_absorption": True
            }
        },
        dialog_dir_name="demo_dialog_room_accoustic",
        room_name=f"my_room_config_{_role}",
        audio_file_format="mp3",
        override_tts_audio=False
    )

    print(f"Done with {_role} room configuration!")
    print("\n"*3)

In [None]:
print("-"*25)
print("- Room Configurations")
print("-"*25)
for config_name in dialog.audio_step_3_filepaths:
    print(f"> Room Configuration: {config_name}")
    display(Audio(dialog.audio_step_3_filepaths[config_name]["audio_path"], rate=24000))