# Extract Narrations and Audio Captions Tutorial

In this tutorial, you will be extracting video narrations through an auto-narration model, LaViLa, as well as audio captions through speech-to-text model, WhisperX. Finally, you will be able to interact with the extracted narrations and captions using langchain.

### Notebook stuck?
Note that because of Jupyter issues, sometimes the code may stuck at visualization. We recommend **restart the kernels** and try again to see if the issue is resolved.

## Step 1. Install Project Aria Tools
Run the following cell to install Project Aria Tools for reading Aria recordings in .vrs format

In [None]:
# Specifics for Google Colab
google_colab_env = 'google.colab' in str(get_ipython())
print("Running from Google Colab, installing projectaria_tools")
!pip install projectaria-tools

## Step 2. Prepare an Aria recording


### Prepare your collected Aria recording
We will set the vrsfile path to your collected Aria recording.

Upload your Aria recording in your Google Drive before running the cell.

Here, we assume it is uploaded to **`My Drive/aria/recording.vrs`**

*(You can check the content of the mounted drive by running `!ls "/content/drive/My Drive/"` in a cell.)*



In [None]:
from google.colab import drive
import os
drive.flush_and_unmount()
drive.mount('/content/drive/')
my_vrs_file_path = 'aria/recording.vrs'
vrsfile = "/content/drive/My Drive/" + my_vrs_file_path
print(f"INFO: vrsfile set to {vrsfile}")

## Tip: Avoid re-installation of packages
Follow the below steps to avoid re-installation of package due to Colab shutting off or restarts later when running the scripts.

(1) Create a folder called “ColabNotebooks” manually in your Drive (under "My Drive").

(2) Then, run the below cell to add the symlink path to the system path.

In [None]:
# This step is to save package
nb_path = '/content/notebooks'
os.symlink('/content/drive/My Drive/ColabNotebooks', nb_path)
sys.path.insert(0,nb_path)


#### (Optional) Download a sample data for debugging
Use this small scale sample data for testing out the dependencies.

In [None]:
# !curl -O -J -L "https://github.com/facebookresearch/projectaria_tools/raw/main/data/mps_sample/sample.vrs"
# vrsfile = "sample.vrs"
# print(f"INFO: vrsfile set to {vrsfile}")

## Step 3. Create data provider

Create projectaria data_provider so you can load the content of the vrs file.

In [None]:
from projectaria_tools.core import data_provider, calibration
from projectaria_tools.core.sensor_data import TimeDomain, TimeQueryOptions
from projectaria_tools.core.stream_id import RecordableTypeId, StreamId
import numpy as np
from matplotlib import pyplot as plt

print(f"Creating data provider from {vrsfile}")
provider = data_provider.create_vrs_data_provider(vrsfile)
if not provider:
    print("Invalid vrs data provider")

Creating data provider from /content/drive/My Drive/aria/recording_long.vrs


## Step 4. Display VRS rgb content in thumbnail images

Goals:
- Summarize a VRS using 10 image side by side, to visually inspect the collected data.

Key learnings:
- Image streams are identified with a Unique Identifier: stream_id
- Image frames are identified with timestamps
- PIL images can be created from Numpy array

Customization
- To change the number of sampled images, change the variable `sample_count` to a desired number.
- To change the thumbnail size, change the variable `resize_ratio` to a desired value.

In [None]:
from PIL import Image
from tqdm import tqdm

sample_count = 10
resize_ratio = 10

rgb_stream_id = StreamId("214-1")

# Retrieve image size for the RGB stream
time_domain = TimeDomain.DEVICE_TIME  # query data based on host time
option = TimeQueryOptions.CLOSEST # get data whose time [in TimeDomain] is CLOSEST to query time

# Retrieve Start and End time for the given Sensor Stream Id
start_time = provider.get_first_time_ns(rgb_stream_id, time_domain)
end_time = provider.get_last_time_ns(rgb_stream_id, time_domain)

image_config = provider.get_image_configuration(rgb_stream_id)
width = image_config.image_width
height = image_config.image_height

thumbnail = newImage = Image.new(
    "RGB", (int(width * sample_count / resize_ratio), int(height / resize_ratio))
)
current_width = 0


# Samples 10 timestamps
sample_timestamps = np.linspace(start_time, end_time, sample_count)
for sample in tqdm(sample_timestamps):
    image_tuple = provider.get_image_data_by_time_ns(rgb_stream_id, int(sample), time_domain, option)
    image_array = image_tuple[0].to_numpy_array()
    image = Image.fromarray(image_array)
    new_size = (
        int(image.size[0] / resize_ratio),
        int(image.size[1] / resize_ratio),
    )
    image = image.resize(new_size).rotate(-90)
    thumbnail.paste(image, (current_width, 0))
    current_width = int(current_width + width / resize_ratio)

from IPython.display import Image
display(thumbnail)

## Step 5. Prepare Pytorch Data Loader for Auto-Narration

Here, we will be creating a pytorch data loader that outputs batches of video snippets in order to run the LaViLa auto-narration model.

A **snippet** consists of a series of frames captured over a brief time span, which we will refer to as **snippet duration**.

#### Step 5-1. Define Dataset

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import torchvision.transforms as transforms
import torchvision.transforms._transforms_video as transforms_video
import torch.nn as nn

class RGBSnippetDataset(Dataset):
    def __init__(self,
                 start_time: float, # start time in the video for sampling data
                 end_time: float, # end time in the video for sampling data
                 snippet_dur_sec: int, # snippet duration
                 frames_per_snippet: int, # number of frames per snippet
                 transform=None
    ):
        self.start_time = start_time
        self.end_time = end_time
        self.snippet_dur = snippet_dur_sec * 1000000000 # duration of a snippet in nano seconds
        self.frames_per_snippet = frames_per_snippet # number of frames per snippet
        self.stride_ns = int(self.snippet_dur//frames_per_snippet)
        self.num_snippets = int((end_time - start_time) // self.snippet_dur)
        self.snippet_starts = np.arange(start_time, start_time + self.snippet_dur * self.num_snippets, self.snippet_dur)

        # Precompute timestamps for each snippet
        self.all_frame_timestamps = [np.arange(snippet_start, snippet_start + self.snippet_dur, self.stride_ns) for snippet_start in self.snippet_starts]

        self.rgb_stream_id = rgb_stream_id
        self.time_domain = time_domain
        self.option = option
        self.transform = transform

    def __len__(self):
        return self.num_snippets

    def __getitem__(self, idx):
        # returns a snippet

        # get timestamps of frames that belong to the current snippet idx
        frame_timestamps = self.all_frame_timestamps[idx]

        # read frames from the data provider and append to frame_list
        frame_list = []
        for timestamp in frame_timestamps:
            image_tuple = provider.get_image_data_by_time_ns(self.rgb_stream_id, int(timestamp), self.time_domain, self.option)
            image_array = image_tuple[0].to_numpy_array()
            frame_list.append(image_array)

        # append a set of images to a snippet
        frames = [torch.tensor(frame, dtype=torch.float32) for frame in frame_list]
        frames = torch.stack(frames, dim=0)

        if self.transform:
          frames = self.transform(frames)

        # return snippet start time and end time
        snippet_start = self.snippet_starts[idx]
        snippet_end = snippet_start + self.snippet_dur

        return frames, snippet_start, snippet_end

class Permute(nn.Module):
    """
    Permutation as an op
    """
    def __init__(self, ordering):
        super().__init__()
        self.ordering = ordering

    def forward(self, frames):
        """
        Args:
            frames in some ordering, by default (C, T, H, W)
        Returns:
            frames in the ordering that was specified
        """
        return frames.permute(self.ordering)

#### Step 5-2. Construct Data Loader
Here you can set batch size (`batch_size`) as well as customize start time, end_time for running auto-narration.

In [None]:
# Retrieve Start and End time for the given Sensor Stream Id
start_time = provider.get_first_time_ns(rgb_stream_id, time_domain)
end_time = provider.get_last_time_ns(rgb_stream_id, time_domain)

batch_size = 2 # batch size in dataloader (Decrease/increase based on the GPU memory)
image_size = 224  # image size after resizing (Do not change for LaViLa)
snippet_dur_sec = 2  # duration of a snippet (We recommend values between 1-10.)
frames_per_snippet = 4  # number of frames per snippet (Do not change for LaViLa)

val_transform = transforms.Compose([
    Permute([3, 0, 1, 2]),  # T H W C -> C T H W
    transforms.Resize(image_size),
    transforms_video.NormalizeVideo(mean=[108.3272985, 116.7460125, 104.09373615000001], std=[68.5005327, 66.6321579, 70.32316305]),
])
rgb_snippet_dataset = RGBSnippetDataset(start_time, end_time, snippet_dur_sec=snippet_dur_sec, frames_per_snippet=frames_per_snippet, transform=val_transform)
snippet_dataloader = DataLoader(rgb_snippet_dataset, batch_size=batch_size, shuffle=False)

## Step 6. Install LaViLa auto-narration library
Now that the data is prepared, let's install LaViLa library.

LaViLa (Language augmented Video Language Pretraining) is a video narration model that is trained on Ego4D.
It is used for generating text descriptions for your captured recordings in this tutorial.
- Paper: https://arxiv.org/abs/2212.04501
- Code: https://github.com/facebookresearch/LaViLa

In [None]:
# install LaViLa as dependency
!pip install git+https://github.com/zhaoyang-lv/LaViLa

## Step 7. Define helper functions for LaViLa

Run the following cell for defining helper functions for (1) loading pre-trained models and tokenizers, (2) decoding generated tokens, and (3) run model on a batch of snippets.

In [None]:
import os
import urllib.request
from collections import OrderedDict
import torch

from lavila.models.models import VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL, VCLM_OPENAI_TIMESFORMER_BASE_GPT2
from lavila.models.tokenizer import MyGPT2Tokenizer

DEFAULT_CHECKPOINT = 'vclm_openai_timesformer_base_gpt2_base.pt_ego4d.jobid_319630.ep_0002.md5sum_68a71f.pth'
# DEFAULT_CHECKPOINT = 'vclm_openai_timesformer_large_336px_gpt2_xl.pt_ego4d.jobid_246897.ep_0003.md5sum_443263.pth'

def load_models_and_transforms(num_frames=4, ckpt_name=DEFAULT_CHECKPOINT, device='cpu'):
    '''
    Helper function for loading oading pre-trained models and tokenizers
    '''
    ckpt_path = os.path.join('lavila/modelzoo/', ckpt_name)
    print(f"ckpt_path: {os.path.abspath(ckpt_path)}")
    os.makedirs('lavila/modelzoo/', exist_ok=True)
    if not os.path.exists(ckpt_path):
        print('downloading model to {}'.format(ckpt_path))
        urllib.request.urlretrieve('https://dl.fbaipublicfiles.com/lavila/checkpoints/narrator/{}'.format(ckpt_name), ckpt_path)
    ckpt = torch.load(ckpt_path, map_location='cpu')
    state_dict = OrderedDict()
    for k, v in ckpt['state_dict'].items():
        state_dict[k.replace('module.', '')] = v

    # instantiate the model, and load the pre-trained weights
    # model = VCLM_OPENAI_TIMESFORMER_LARGE_336PX_GPT2_XL(
    model = VCLM_OPENAI_TIMESFORMER_BASE_GPT2(
        text_use_cls_token=False,
        project_embed_dim=256,
        gated_xattn=True,
        timesformer_gated_xattn=False,
        freeze_lm_vclm=False,      # we use model.eval() anyway
        freeze_visual_vclm=False,  # we use model.eval() anyway
        num_frames=num_frames,
        drop_path_rate=0.
    )
    model.load_state_dict(state_dict, strict=True)

    device_type_str = device.type if isinstance(device, torch.device) else device
    if device_type_str != 'cpu':
        model = model.to(device)
    model.eval()

    tokenizer = MyGPT2Tokenizer('gpt2', add_bos=True)
    #tokenizer = MyGPT2Tokenizer('gpt2-xl', add_bos=True)

    return model, tokenizer


def decode_one(generated_ids, tokenizer):
    '''
    Helper function for decoding generated tokens.
    '''
    # get the index of <EOS>
    if tokenizer.eos_token_id == tokenizer.bos_token_id:
        if tokenizer.eos_token_id in generated_ids[1:].tolist():
            eos_id = generated_ids[1:].tolist().index(tokenizer.eos_token_id) + 1
        else:
            eos_id = len(generated_ids.tolist()) - 1
    elif tokenizer.eos_token_id in generated_ids.tolist():
        eos_id = generated_ids.tolist().index(tokenizer.eos_token_id)
    else:
        eos_id = len(generated_ids.tolist()) - 1
    generated_text_str = tokenizer.tokenizer.decode(generated_ids[1:eos_id].tolist())
    return generated_text_str


def run_model_on_snippets(
    frames, model, tokenizer, device="cpu", narration_max_sentences=5
):
    '''
    Function for running the LaViLa model on batches of snippets.
    '''
    with torch.no_grad():
        image_features = model.encode_image(frames)
        generated_text_ids, ppls = model.generate(
            image_features,
            tokenizer,
            target=None,  # free-form generation
            max_text_length=77,
            top_k=None,
            top_p=0.95,   # nucleus sampling
            num_return_sequences=narration_max_sentences,  # number of candidates: 10
            temperature=0.7,
            early_stopping=True,
        )
    output_narration = []
    for j in range(generated_text_ids.shape[0] // narration_max_sentences):
        cur_output_narration = []
        for k in range(narration_max_sentences):
            jj = j * narration_max_sentences + k
            generated_text_str = decode_one(generated_text_ids[jj], tokenizer)
            generated_text_str = generated_text_str.strip()
            generated_text_str = generated_text_str.replace("#c c", "#C C")
            if generated_text_str in cur_output_narration:
                continue
            if generated_text_str.endswith('the'):
                # skip incomplete sentences
                continue
            cur_output_narration.append(generated_text_str)
        output_narration.append(cur_output_narration) # list of size B (batch size)
    return output_narration

## Step 8. Run LaViLa inference over vrs file
Let's load the pre-traiend model and tokenizer


In [None]:
# load the pre-traiend model and tokenizer.
model, tokenizer = load_models_and_transforms(num_frames=4)

# this is where the generated narration will be stored
narrations_dict = {
    'start_time_ns': [],
    'end_time_ns': [],
    'narration': [],
}

# use gpu if available
if torch.cuda.is_available():
  model = model.cuda()

for idx, (frames, st_ns, ed_ns) in enumerate(snippet_dataloader):
  if torch.cuda.is_available():
    frames = frames.cuda()
  # run inference over a batch of snippet
  output_narration = run_model_on_snippets(frames, model, tokenizer)
  # store results
  narrations_dict['start_time_ns'].extend(st_ns.numpy().tolist())
  narrations_dict['end_time_ns'].extend(ed_ns.numpy().tolist())
  narrations_dict['narration'].extend(output_narration)

## Step 9. Display the auto-narration results and save to csv file
Make sure to change `narration_save_path` to your desired location!

In [None]:
narration_save_path = os.path.join(os.path.dirname(vrsfile), 'auto_narration.csv')

import pandas as pd
df = pd.DataFrame(narrations_dict)
df.to_csv(narration_save_path)
display(df)

Unnamed: 0,start_time_ns,end_time_ns,narration
0,148502526450,150502526450,"[#C C stares at the ceiling, #C C looks around..."
1,150502526450,152502526450,"[#C C looks around the house, #C C looks aroun..."
2,152502526450,154502526450,"[#C C adjusts the camera, #C C looks around]"
3,154502526450,156502526450,[#C C looks around]
4,156502526450,158502526450,"[#C C stands beside the door, #C C looks around]"
5,158502526450,160502526450,"[#C C looks at the wall, #C C looks around, #C..."


# Optional Steps for Speech2Text
Proceed Step 10-14, if you have speech in your recording and would like to use it.

## Step 10. Build VRS Tool to extract .wav file for audio captioning
Whisper X can be run on .wav file. We need to install VRSTool for extracting .wav file from .vrs file.

In [None]:
!sudo apt-get update
# Install VRS dependencies
!sudo apt-get install cmake git ninja-build ccache libgtest-dev libfmt-dev libturbojpeg-dev libpng-dev
!sudo apt-get install liblz4-dev libzstd-dev libxxhash-dev
!sudo apt-get install libboost-system-dev libboost-filesystem-dev libboost-thread-dev libboost-chrono-dev libboost-date-time-dev
# Install build dependencies
!sudo apt-get install -y cmake ninja-build

#clone and build
!git clone https://github.com/facebookresearch/vrs.git
!cmake -S vrs -B vrs/build -G Ninja
!cd vrs/build; ninja vrs


## Step 11. Extract .wav file from VRS file
Now that VRSTool is installed, let's extract .wav file from .vrs file.

Here, the extracted .wav file is saved to the current working directory.

If you anticipate this file to be re-used, change the output path using the argument `--to <google_drive_path>`

*(Ignore error '[AudioExtractor][ERROR]: os::makeDirectories(folderPath_) failed: 22, Invalid argument')*

In [None]:
!./vrs/build/tools/vrs/vrs extract-audio "{vrsfile}" --to .

## Step 12. Install Whisper X
We have input data ready for Whisper X. Let's install the library.

Whisper X is an automatic speech recognition method that provides word-level timestamps and speaker diarization.
- Paper: https://arxiv.org/abs/2303.00747
- Code: https://github.com/m-bain/whisperX

In [None]:
!pip install git+https://github.com/m-bain/whisperx.git

## Step 13. Define helper functions for Whisper X
Let's define some helper functions for Whisper X, this include a postprocessing function and a function to align the output to the timestamps.

In [None]:
import logging
import os.path as osp
import numpy as np
import glob
import os
import pandas as pd
import whisperx
import tqdm

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

device = "cuda"

def asr_tokens_to_csv(
    word_segments,
    token_csv_folder: str,
    starting_timestamp_s: float = 0.0,
):
    # post process the output asr file to extract only the minimal needed content

    df = pd.DataFrame.from_dict(word_segments)
    os.makedirs(token_csv_folder, exist_ok=True)

    # write to wav domain:
    s_to_ms = int(1e3)
    df = df.fillna(-1)
    df["start"] = (df["start"] * s_to_ms).astype("int64")
    df["end"] = (df["end"] * s_to_ms).astype("int64")
    df_speech_wav = df.rename(
        columns={"start": "startTime_ms", "end": "endTime_ms", "text": "written"},
    )
    df_speech_wav.to_csv(
        osp.join(token_csv_folder, "speech.csv"), index=False, header=True
    )

    # Update ASR ms time to Aria ns time
    s_to_ns = int(1e9)
    ms_to_ns = int(1e6)
    df["start"] = (df["start"] * ms_to_ns + starting_timestamp_s * s_to_ns).astype(
        "int64"
    )
    df["end"] = (df["end"] * ms_to_ns + starting_timestamp_s * s_to_ns).astype("int64")

    df_aria_domain = df.rename(
        columns={"start": "startTime_ms", "end": "endTime_ms", "text": "written"},
    )
    df_aria_domain.to_csv(
        osp.join(token_csv_folder, "speech_aria_domain.csv"), index=False, header=True
    )

    logging.info(f"Generate speech.csv & speech_aria_domain.csv to {token_csv_folder}")


def run_whisperx_aria_wav(
    model,
    file_path: str,
    output_folder: str = "",
    batch_size = None,
):
    """
    Run whisperx model on .wav file extracted from VRS file
    """
    starting_timestamp = file_path.split("-")[-1].replace(".wav", "")
    starting_timestamp = float(starting_timestamp)
    logging.info("Aria Starting timestamp: {:0.3f}".format(starting_timestamp))

    logging.info(f"Transcribe the speech from wav file {file_path}.")
    result = model.transcribe(file_path, batch_size=batch_size)
    print(f"Transcription done.")


    model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
    logging.info(f"Transcription done.")
    result_aligned = whisperx.align(
        result["segments"], model_a, metadata, file_path, device
    )
    print(f"Alignment done.")

    try:
        asr_tokens_to_csv(
            word_segments=result_aligned["word_segments"],
            token_csv_folder=output_folder,
            starting_timestamp_s=starting_timestamp,
        )
    except Exception as err:
        logging.warning(f"Cannot process {file_path} because {err}. Skip this recording.")


## Step 14. Run Whisper X
Finally, let's run Whisper X on the .wav file that we extracted.

Make sure to
- Change the `audio_file` to the .wav file that we extracted in Step 12.
- Set the `whisper_x_output_folder` to desired location. The resulting file name is `speech_aria_domain.csv`.

In [None]:
audio_file = '231-1-0000-743.444.wav'
whisper_x_output_folder = "."
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)
model = whisperx.load_model("large-v2", device, compute_type=compute_type,) #  language='en'
batch_size = 16 # reduce if low on GPU mem, or keep it None
provider = run_whisperx_aria_wav(model, audio_file, output_folder=whisper_x_output_folder, batch_size=batch_size)

In [None]:
asr_df = pd.read_csv("speech_aria_domain.csv")
display(asr_df)

Unnamed: 0,word,startTime_ms,endTime_ms,score
0,Awesome.,744915000000,745275000000,0.535
1,Would,745836000000,745936000000,0.451
2,you,745956000000,746016000000,0.958
3,like,746056000000,746156000000,0.932
4,some,746176000000,746296000000,0.759
...,...,...,...,...
222,Thank,927758000000,927978000000,0.609
223,you.,927998000000,928159000000,0.912
224,Big,934170000000,934330000000,0.652
225,long,934370000000,934671000000,0.950


# Optional steps for summarization example
Proceed Step 15-17, if you would like to try out summarization of the narration using llm (via langchain).

## Step 15. Install Langchain

In [None]:
!pip install langchain

## Step 16. Install OpenAI to use with Langchain

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!pip install openai

## Step 17. Summaraize the narration result


In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.document_loaders.csv_loader import CSVLoader
from langchain import PromptTemplate


prompt_template = """ Write a concise summary (between 5 to 10 sentences) of the following text.
The text is about my exhaustive timeline, where I am referred to as '#C C' or 'C C' or 'C'.
Please use first person pronoun (I) in the summary, instead of 'C' or 'C C'.
Please keep in mind that some observations maybe incorrect as the timeline was machine-generated.

Timeline:

{text}


TL'DR: """

#os.environ["OPENAI_API_KEY"] = "sk-your-key"
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)

loader = CSVLoader(file_path=narration_save_path)
docs = loader.load()

chain.run(docs)

'The summary describes a series of events with timestamps and corresponding narrations. The events involve a character named C C who is observed looking around, adjusting the camera, staring at the ceiling, standing beside the door, and looking at various objects in the room and house.'

## (Optional) RGBDataset

The RGBDataset is a simple image pytorch dataset designed for image-based models that operate on individual frames rather than snippet inputs. Use this dataset that process single frames.

In [None]:
# import torch
# from torch.utils.data import Dataset, DataLoader
# from PIL import Image
# import torchvision.transforms as transforms

# class RGBDataset(Dataset):
#     def __init__(self, start_time, end_time, sample_count, transform=None):
#         self.timestamps = np.linspace(start_time, end_time, sample_count)
#         self.rgb_stream_id = StreamId("214-1")
#         self.time_domain = TimeDomain.DEVICE_TIME
#         self.option = TimeQueryOptions.CLOSEST
#         self.transform = transform

#     def __len__(self):
#         return len(self.timestamps)

#     def __getitem__(self, idx):
#         timestamp = self.timestamps[idx]
#         image_tuple = provider.get_image_data_by_time_ns(self.rgb_stream_id, int(timestamp), self.time_domain, self.option)
#         image_array = image_tuple[0].to_numpy_array()
#         image = Image.fromarray(image_array).rotate(-90)
#         if self.transform:
#           image = self.transform(image)
#         return timestamp, image

# val_transform = transforms.Compose([
#     transforms.Resize(224),
#     transforms.ToTensor(),
#   ])

# rgb_dataset = RGBDataset(start_time, end_time, sample_count, transform=val_transform)
# image_dataloader = DataLoader(rgb_dataset, batch_size=2, shuffle=False)
# # Get the next batch of data
# timestamp, image = next(iter(image_dataloader))