## Convert video to text with Speech-to-text model and sentence embedding model

In this notebook, we will extract information from video/audio files with [Whipser model](https://github.com/openai/whisper). Be leveraging multilingual support, we can extract tanscripts from videos files mixed different languages, even for one video file with different languanges. We provide the following options for whisper inference:
- Batch inference with SageMaker Processing job, we can process massive data and store them into vector database for RAG solution.
- Real-time inference with SageMaker Endpoint, we can leverage it to do summarizaton or QA with a short video/audio file (less than 6MB).

In [None]:
!pip install -U sagemaker -q

## Set up

In [1]:
from sagemaker.huggingface import HuggingFaceProcessor
from sagemaker import get_execution_role
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
import boto3
import json

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.session.Session()
bucket = sess.default_bucket()
prefix = "sagemaker/rag_video"
folder_name = "genai_workshop"
s3_input = f"s3://{bucket}/{prefix}/raw_data/{folder_name}" # Directory for video files
s3_output_clips = f"s3://{bucket}/{prefix}/clips" # Directory for video clips
s3_output_transcript = f"s3://{bucket}/{prefix}/transcript" # Directory for transcripts

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [2]:
%store s3_output_transcript

Stored 's3_output_transcript' (str)


## Upload test data to S3 bucket

Download data from YouTube.

In [3]:
# Download data from YouTube
!pip install pytube



In [4]:
from pytube import YouTube

VIDEO_SAVE_DIRECTORY = "./videos"
AUDIO_SAVE_DIRECTORY = "./audio"
video_name = "genai_interview.mp4"
def download(video_url):
    video = YouTube(video_url)
    video = video.streams.get_highest_resolution()

    try:
        video.download(VIDEO_SAVE_DIRECTORY, filename=video_name)
    except:
        print("Failed to download video")

    print("video was downloaded successfully")
    
def download_audio(video_url):
    video = YouTube(video_url)
    audio = video.streams.filter(only_audio = True).first()

    try:
        audio.download(AUDIO_SAVE_DIRECTORY)
    except:
        print("Failed to download audio")

    print("audio was downloaded successfully")

In [5]:
# JAWS-UG AI/ML (Japanese) #16 Generative AI: https://www.youtube.com/watch?v=PkZenNAXtYs
# New York Summit 2023 AIML: https://www.youtube.com/watch?v=1PkABWCJINM Totally 36mins

In [6]:
download("https://www.youtube.com/watch?v=dBzCGcwYCJo")

video was downloaded successfully


In [7]:
!aws s3 cp videos/{video_name} {s3_input}/

upload: videos/genai_interview.mp4 to s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/raw_data/genai_workshop/genai_interview.mp4


## Batch inference with SageMaker Processing

In [8]:
hfp = HuggingFaceProcessor(
    role=get_execution_role(), 
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    transformers_version='4.28.1',
    pytorch_version='2.0.0', 
    base_job_name='frameworkprocessor-hf',
    py_version="py310"
)

In [9]:
hfp.run(
    code='preprocessing.py',
    source_dir="data_preparation",
    inputs=[
        ProcessingInput(source=s3_input, destination="/opt/ml/processing/input")
    ], 
    outputs=[
        ProcessingOutput(source='/opt/ml/processing/output_clips', destination=s3_output_clips),
        ProcessingOutput(source='/opt/ml/processing/transcripts', destination=s3_output_transcript),
    ],
    arguments=[
        "--whisper-model", "whisper-large-v2",
        "--target-language", "en",
        "--sentence-embedding-model", "all-mpnet-base-v2",
        "--clips_s3uri", s3_output_clips,
        "--transcripts_s3uri", s3_output_transcript,
        "--order", "5"
    ]
)

INFO:sagemaker:Creating processing-job with name frameworkprocessor-hf-2024-03-10-03-43-32-911


[34mCollecting git+https://github.com/openai/whisper.git (from -r requirements.txt (line 3))
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-nc6px2l6
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-nc6px2l6
  Resolved https://github.com/openai/whisper.git to commit ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
  Installing build dependencies: started[0m
[34m  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'[0m
[34mCollecting tiktoken (from -r requirements.txt (line 1))
  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)[0m
[34mCollecting moviepy (from -r requirements.txt (line 2))
  Downloading moviepy-1.0.3.tar.

In [11]:
!mkdir -p video-scripts
!aws s3 sync $s3_output_transcript/ video-scripts

download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/0.txt to video-scripts/genai_interview/0.txt
download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/5.txt to video-scripts/genai_interview/5.txt
download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/1.txt to video-scripts/genai_interview/1.txt
download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/2.txt to video-scripts/genai_interview/2.txt
download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/4.txt to video-scripts/genai_interview/4.txt
download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/6.txt to video-scripts/genai_interview/6.txt
download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/transcript/genai_interview/genai_interview_0_20.24.txt to video-scripts/genai_interview/gena

## Summarization and Tagging on Chunking with LLM

In [12]:
from sagemaker.sklearn.processing import SKLearnProcessor

sklearn_processor = SKLearnProcessor(
    framework_version='1.2-1',
    role=get_execution_role(),
    instance_type='ml.m5.xlarge',
    instance_count=1,
    #python_version="py310",
    base_job_name='summarization-bedrock',
)

INFO:sagemaker.image_uris:Defaulting to only available Python version: py3


In [17]:
summary_s3uri = f"s3://{bucket}/{prefix}/summaries"

hfp.run(
    code='summarization.py',
    source_dir="summarization",
    inputs=[
        ProcessingInput(source=s3_output_transcript, destination="/opt/ml/processing/input/transcripts/")
    ], 
    outputs=[
        ProcessingOutput(source='/opt/ml/processing/summaries', destination=summary_s3uri),
    ],
    arguments=[
        "--model-id", "anthropic.claude-v2:1"
    ]
)

INFO:sagemaker.processing:Uploaded summarization to s3://sagemaker-us-east-1-822507008821/frameworkprocessor-hf-2024-03-10-11-25-27-191/source/sourcedir.tar.gz
INFO:sagemaker.processing:runproc.sh uploaded to s3://sagemaker-us-east-1-822507008821/frameworkprocessor-hf-2024-03-10-11-25-27-191/source/runproc.sh
INFO:sagemaker:Creating processing-job with name frameworkprocessor-hf-2024-03-10-11-25-27-191


[34mCollecting langchain (from -r requirements.txt (line 1))
  Downloading langchain-0.1.11-py3-none-any.whl.metadata (13 kB)[0m
[34mCollecting unstructured (from -r requirements.txt (line 2))
  Downloading unstructured-0.12.6-py3-none-any.whl.metadata (83 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.2/83.2 kB 6.3 MB/s eta 0:00:00[0m
[34mCollecting tiktoken (from -r requirements.txt (line 3))
  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)[0m
[34mCollecting anthropic (from -r requirements.txt (line 4))
  Downloading anthropic-0.19.1-py3-none-any.whl.metadata (15 kB)[0m
[34mCollecting SQLAlchemy<3,>=1.4 (from langchain->-r requirements.txt (line 1))
  Downloading SQLAlchemy-2.0.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)[0m
[34mCollecting dataclasses-json<0.7,>=0.5.7 (from langchain->-r requirements.txt (line 1))
  Downloading dataclasses_json-0.6.4-py3-none-any.whl.metadata 

In [18]:
!aws s3 ls $summary_s3uri/

2024-03-10 11:40:22      17570 metadata.json


In [19]:
!aws s3 cp $summary_s3uri/metadata.json .

download: s3://sagemaker-us-east-1-822507008821/sagemaker/rag_video/summaries/metadata.json to ./metadata.json


## Deploy Whipser model to SageMaker for real-time inference

In [None]:
endpoint_name="whisper-large-v2"
# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'openai/whisper-large-v2',
    'HF_TASK':'automatic-speech-recognition',
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    
    env=hub,
    role=role
)

In [None]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    endpoint_name=endpoint_name,
    initial_instance_count=1, # number of instances
    instance_type='ml.g5.xlarge' # ec2 instance type
)

In [None]:
client = boto3.client('runtime.sagemaker')
file = "test_raw_data/test.webm"
with open(file, "rb") as f:
    data = f.read()

In [None]:
response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='audio/x-audio', Body=data)
output = json.loads(response['Body'].read())
print(f"Extracted text from the audio file:\n {output['text']}")

You can follow section for `Example - Build a multi-functional chatbot with Amazon SageMaker` in [REAMDE](./README.md) to build a multi-functional chatbot with whipser endpoint.
<span style="color: red">Please delete endpoint once you don't it.</span>

In [None]:
predictor.delete_endpoint()