# Scene Audio
In `scene_boundary_partitioning.ipynb`, in `unifying_features/`, we demonstrated how we can identify individual two-character dialogue scenes. We'll want to conduct some audio analysis on the scene's audio.

We can extract the film's *entire* audio track. However, this presents a challenge because most audio analyses will have memory constraints if loading the entire audio track. We can solve this by extracting just a portion of the film's audio file. We'll take the saved audio file from `/input_audio/`, create a new audio file with just the scene's audio, and then save it to `/extracted_audio/(film_name)/`.

We'll use the ffmpeg-python library, which is a Python wrapper for the ffmpeg suite of audio/video tools.

In [1]:
import os
import sys
import ffmpeg
sys.path.append('../unifying_features')
sys.path.append('../data_serialization')
from serialization_preprocessing_io import *
from scene_identification_io import *

We'll start by identifying the film's scenes.

In [2]:
film = 'lost_in_translation_2003'
srt_df, subtitle_df, sentence_df, vision_df, face_df = read_pickle(film)
scene_dictionaries = generate_scenes(vision_df, face_df, substantial_minimum=4, anchor_search=6)

In [3]:
scene_dict = scene_dictionaries[1]
scene_dict

{'scene_id': 1,
 'first_frame': 1929,
 'last_frame': 2082,
 'scene_duration': 154,
 'left_anchor_shot_cluster': 207,
 'left_anchor_face_cluster': 6.0,
 'matching_left_face_clusters': [],
 'right_anchor_shot_cluster': 66,
 'right_anchor_face_cluster': 17.0,
 'matching_right_face_clusters': [18.0, 12.0],
 'cutaway_shot_clusters': [7, 29, 142]}

We'll be taking the film's entire audio track as the input file, located in `/input_audio/`. We then designate it as the input_stream using `ffmpeg.input()`.

In [4]:
input_audio_file = os.path.join('../input_audio/' + film + '.wav')
input_audio_file

'../input_audio/lost_in_translation_2003.wav'

In [5]:
input_stream = ffmpeg.input(input_audio_file)

We'll extract audio for the first scene, and name it based on the frame numbers, in a folder named for the film in `/extracted_audio/`.

In [6]:
first = str(scene_dict['first_frame'])
last = str(scene_dict['last_frame'])

extracted_file_name = os.path.join('../extracted_audio', film, first + '_' + last + '.wav')
extracted_file_name

'../extracted_audio/lost_in_translation_2003/1929_2082.wav'

Once we have the input and output files designated, we can begin the extraction process. ffmpeg is a suite of audio/video manipulation tools that is typically run on the command line. Arguments are included with the input/output files to designate exactly how the input file should be manipulated/converted. Luckily, ffmpeg-python makes it easy to use these arguments when performing these conversions in Python. Below is the bash command we would normally use in the command line.

In [7]:
# bash command-line command
# ffmpeg -ss 1929 -i lost_in_translation_2003.wav -ac 2 -t 154 lost_in_translation_2003/1929_2082.wav

Here's a breakdown of those arguments:
- ss 1929: position of 1929 seconds (or the scene's beginning)
- ac 2: two-channel (stereo) audio, necessary for processing by pyAudioAnalysis
- t 154: duration of 154 seconds (or the duration of the scene)

For the scene-specific arguments (position and duration), we can define them according to scene_dict values.

In [8]:
out = ffmpeg.output(input_stream, extracted_file_name, ss=scene_dict['first_frame'], ac=2, t=scene_dict['scene_duration'])

In [9]:
ffmpeg.run(out)

(None, None)

We now have a .wav file in `/extracted_audio/lost_in_translation_2003/` titled `1929_2082.wav`, containing just the two-channel audio of a specific scene. This is available as a function `extract_scene_audio()` in `audio_processing.py`.