## Prepare environment

In [None]:
!git clone https://github.com/podmabsterio/dla_avss.git
%cd dla_avss
!pip install -r requirements.txt

## Download pretrained weights

- **ConvTasNet** – baseline audio separation model.
- **RTFSNet** – our main speech separation network.
- **Video Encoder** – generates mouth embeddings used as input to RTFSNet.


In [None]:
out_dir = "weights" # you can change dir if you want
!bash scripts/download_convtasnet.sh $out_dir
!bash scripts/download_rtfsnet.sh $out_dir
!bash scripts/download_video_encoder.sh $out_dir

## Example: Running Inference


If your dataset already contains ground truth signals, you can run inference and automatically compute all metrics by specifying a metric configuration in `metrics`.  
Make sure that the dataset parameter `expect_target` is set to `True`.

If your dataset is not split into train/val/test partitions, set `partition=None` during inference.

Results will be saved in `predictions` directory. Before running inference, the script below removes the previous `predictions` directory (if it exists) to avoid mixing old results with new ones.

In [None]:
import shutil, os
if os.path.exists("predictions"):
    shutil.rmtree("predictions")

!python inference.py \
    -cn=inf_rtfsnet.yaml \
    metrics=pit \
    datasets.inf.partition=train \
    datasets.inf.expect_target=True \
    datasets.inf.dataset_path=example_data \
    video_encoder.dataset_path=example_data \
    inferencer.save_path=predictions \
    inferencer.from_pretrained=weights/rtfsnet.pth

If your dataset contains only mixed audio (without ground-truth sources), you cannot compute separation metrics during inference.  
In this case, use the `empty` metric configuration (an empty list of metrics) and set `expect_target=False`.

In [None]:
if os.path.exists("predictions"):
    shutil.rmtree("predictions")

!python inference.py \
    -cn=inf_rtfsnet.yaml \
    metrics=empty \
    datasets.inf.partition=inf \
    datasets.inf.expect_target=False \
    datasets.inf.dataset_path=example_data \
    video_encoder.dataset_path=example_data \
    inferencer.save_path=predictions \
    inferencer.from_pretrained=weights/rtfsnet.pth

If you have already saved model predictions to disk and later obtained the corresponding ground-truth sources, you can compute the separation metrics afterwards using the `calc_metrics.py` script.

In [None]:
!python calc_metrics.py \
    -cn=calc_metrics_rtfsnet.yaml \
    metric_calculator.pred_path=predictions/inf \
    metric_calculator.gt_path=example_data/audio/train

## Load a custom dataset

You can try the model with your **own dataset stored on Google Drive**.  
Paste a public link to your dataset folder (shared via “Anyone with the link”) and it will be downloaded automatically and prepared for inference.


In [None]:
import os
import yadisk
import zipfile

!mkdir -p data/datasets/

dataset_link = input("Введите ссылку на ваш датасет (Yandex Drive / public link): ")
y = yadisk.YaDisk()
y.download_public(dataset_link, "custom_dataset.zip")
with zipfile.ZipFile("custom_dataset.zip", 'r') as zip_ref:
    zip_ref.extractall("data/datasets")

In [None]:
dataset_dir = 'data/datasets/my_dataset' # change my_dataset to your dataset folder name
if os.path.isdir(os.path.join(dataset_dir, "audio", "s1")):
    expect_target=True
    metrics_conf="pit"
else:
    expect_target=False
    metrics_conf="empty"


In [None]:
if os.path.exists("predictions"):
    shutil.rmtree("predictions")

!python inference.py \
    -cn=inf_rtfsnet.yaml \
    metrics=$metrics_conf \
    datasets.inf.partition=null \
    datasets.inf.expect_target=$expect_target \
    datasets.inf.dataset_path=$dataset_dir \
    video_encoder.dataset_path=$dataset_dir \
    inferencer.save_path=predictions \
    inferencer.from_pretrained=weights/rtfsnet.pth