# Project

In this Project, you will bring together many of the tools and techniques that you have learned throughout this course into a final project. You can choose from many different paths to get to the solution. 

### Business scenario

You work for a training organization that recently developed an introductory course about machine learning (ML). The course includes more than 40 videos that cover a broad range of ML topics. You have been asked to create an application that will students can use to quickly locate and view video content by searching for topics and key phrases.

You have downloaded all of the videos to an Amazon Simple Storage Service (Amazon S3) bucket. Your assignment is to produce a dashboard that meets your supervisor’s requirements.

## Project steps

To complete this project, you will follow these steps:

1. [Viewing the video files](#1.-Viewing-the-video-files)
2. [Transcribing the videos](#2.-Transcribing-the-videos)
3. [Normalizing the text](#3.-Normalizing-the-text)
4. [Extracting key phrases and topics](#4.-Extracting-key-phrases-and-topics)
5. [Creating the dashboard](#5.-Creating-the-dashboard)

## Useful information

The following cell contains some information that might be useful as you complete this project.

In [1]:
bucket = "c56161a939430l3396553t1w744137092661-labbucket-rn642jaq01e9"
job_data_access_role = 'arn:aws:iam::744137092661:role/service-role/c56161a939430l3396553t1w7-ComprehendDataAccessRole-1P24MSS91ADHP'

## 1. Viewing the video files
([Go to top](#Capstone-8:-Bringing-It-All-Together))


The source video files are located in the following shared Amazon Simple Storage Service (Amazon S3) bucket.

In [2]:
!aws s3 ls s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/

2021-04-26 20:17:33  410925369 Mod01_Course Overview.mp4
2021-04-26 20:10:02   39576695 Mod02_Intro.mp4
2021-04-26 20:31:23  302994828 Mod02_Sect01.mp4
2021-04-26 20:17:33  416563881 Mod02_Sect02.mp4
2021-04-26 20:17:33  318685583 Mod02_Sect03.mp4
2021-04-26 20:17:33  255877251 Mod02_Sect04.mp4
2021-04-26 20:23:51   99988046 Mod02_Sect05.mp4
2021-04-26 20:24:54   50700224 Mod02_WrapUp.mp4
2021-04-26 20:26:27   60627667 Mod03_Intro.mp4
2021-04-26 20:26:28  272229844 Mod03_Sect01.mp4
2021-04-26 20:27:06  309127124 Mod03_Sect02_part1.mp4
2021-04-26 20:27:06  195635527 Mod03_Sect02_part2.mp4
2021-04-26 20:28:03  123924818 Mod03_Sect02_part3.mp4
2021-04-26 20:31:28  171681915 Mod03_Sect03_part1.mp4
2021-04-26 20:32:07  285200083 Mod03_Sect03_part2.mp4
2021-04-26 20:33:17  105470345 Mod03_Sect03_part3.mp4
2021-04-26 20:35:10  157185651 Mod03_Sect04_part1.mp4
2021-04-26 20:36:27  187435635 Mod03_Sect04_part2.mp4
2021-04-26 20:36:40  280720369 Mod03_Sect04_part3.mp4
2021-04-26 20:40:01  443479

In [3]:
#Copying videos to S3 Bucket
!aws s3 sync s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/ s3://nlp-project-w2024/

copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod02_Sect01.mp4 to s3://nlp-project-w2024/Mod02_Sect01.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod02_Sect02.mp4 to s3://nlp-project-w2024/Mod02_Sect02.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod02_Intro.mp4 to s3://nlp-project-w2024/Mod02_Intro.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod02_WrapUp.mp4 to s3://nlp-project-w2024/Mod02_WrapUp.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod03_Intro.mp4 to s3://nlp-project-w2024/Mod03_Intro.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod02_Sect03.mp4 to s3://nlp-project-w2024/Mod02_Sect03.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod03_Sect02_part1.mp4 to s3://nlp-project-w2024/Mod03_Sect02_part1.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video/Mod03_Sect01.mp4 to s3://nlp-project-w2024/Mod03_Sect01.mp4
copy: s3://aws-tc-largeobjects/CUR-TF-200-ACMNLP-1/video

In [4]:
#Filtering only the videos to different folder
!aws s3 sync s3://nlp-project-w2024/ s3://nlp-project-w2024/videos/ --exclude "*" --include "*.mp4"

copy: s3://nlp-project-w2024/Mod02_Sect01.mp4 to s3://nlp-project-w2024/videos/Mod02_Sect01.mp4
copy: s3://nlp-project-w2024/Mod02_Sect02.mp4 to s3://nlp-project-w2024/videos/Mod02_Sect02.mp4
copy: s3://nlp-project-w2024/Mod02_Sect05.mp4 to s3://nlp-project-w2024/videos/Mod02_Sect05.mp4
copy: s3://nlp-project-w2024/Mod02_Intro.mp4 to s3://nlp-project-w2024/videos/Mod02_Intro.mp4
copy: s3://nlp-project-w2024/Mod03_Intro.mp4 to s3://nlp-project-w2024/videos/Mod03_Intro.mp4
copy: s3://nlp-project-w2024/Mod03_Sect01.mp4 to s3://nlp-project-w2024/videos/Mod03_Sect01.mp4
copy: s3://nlp-project-w2024/Mod03_Sect02_part1.mp4 to s3://nlp-project-w2024/videos/Mod03_Sect02_part1.mp4
copy: s3://nlp-project-w2024/Mod03_Sect02_part2.mp4 to s3://nlp-project-w2024/videos/Mod03_Sect02_part2.mp4
copy: s3://nlp-project-w2024/Mod03_Sect02_part3.mp4 to s3://nlp-project-w2024/videos/Mod03_Sect02_part3.mp4
copy: s3://nlp-project-w2024/Mod02_Sect04.mp4 to s3://nlp-project-w2024/videos/Mod02_Sect04.mp4
copy: s3

## 2. Transcribing the videos
 ([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to implement your solution to transcribe the videos. 

In [2]:
# Write your answer/code here
!pip install boto3



In [3]:
!pip install moviepy

Collecting moviepy
  Downloading moviepy-1.0.3.tar.gz (388 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.3/388.3 kB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting decorator<5.0,>=4.0.2 (from moviepy)
  Downloading decorator-4.4.2-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting proglog<=1.0.0 (from moviepy)
  Downloading proglog-0.1.10-py3-none-any.whl.metadata (639 bytes)
Collecting imageio_ffmpeg>=0.2.0 (from moviepy)
  Downloading imageio_ffmpeg-0.4.9-py3-none-manylinux2010_x86_64.whl.metadata (1.7 kB)
Downloading decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Downloading imageio_ffmpeg-0.4.9-py3-none-manylinux2010_x86_64.whl (26.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.9/26.9 MB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hDownloading proglog-0.1.10-py3-none-any.whl (6.1 kB)
Building wheels for collected packages: moviepy
  Building wheel 

In [7]:
import os
import boto3

bucket_name = 'nlp-project-w2024'

bucket_prefix = 'videos/'

s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket=bucket_name)
if 'Contents' in response:
   
    print("Downloading video files from the bucket...")

    destination_folder = 'downloaded_videos'
    if not os.path.exists(destination_folder):
        os.makedirs(destination_folder)

    # Download and save videos to the destination folder
    for obj in response['Contents']:
    
        object_key = obj['Key']

        # Check if the object is a video file
        if object_key.endswith('.mp4'):
            
            local_video_file = os.path.join(destination_folder, os.path.basename(object_key))
            s3_client.download_file(bucket_name, object_key, local_video_file)
            print(f"Downloaded: {object_key} -> {local_video_file}")

    print("Download completed.")
else:
    print("No video files found in the bucket.")


Downloading video files from the bucket...
Downloaded: Mod01_Course Overview.mp4 -> downloaded_videos/Mod01_Course Overview.mp4
Downloaded: Mod02_Intro.mp4 -> downloaded_videos/Mod02_Intro.mp4
Downloaded: Mod02_Sect01.mp4 -> downloaded_videos/Mod02_Sect01.mp4
Downloaded: Mod02_Sect02.mp4 -> downloaded_videos/Mod02_Sect02.mp4
Downloaded: Mod02_Sect03.mp4 -> downloaded_videos/Mod02_Sect03.mp4
Downloaded: Mod02_Sect04.mp4 -> downloaded_videos/Mod02_Sect04.mp4
Downloaded: Mod02_Sect05.mp4 -> downloaded_videos/Mod02_Sect05.mp4
Downloaded: Mod02_WrapUp.mp4 -> downloaded_videos/Mod02_WrapUp.mp4
Downloaded: Mod03_Intro.mp4 -> downloaded_videos/Mod03_Intro.mp4
Downloaded: Mod03_Sect01.mp4 -> downloaded_videos/Mod03_Sect01.mp4
Downloaded: Mod03_Sect02_part1.mp4 -> downloaded_videos/Mod03_Sect02_part1.mp4
Downloaded: Mod03_Sect02_part2.mp4 -> downloaded_videos/Mod03_Sect02_part2.mp4
Downloaded: Mod03_Sect02_part3.mp4 -> downloaded_videos/Mod03_Sect02_part3.mp4
Downloaded: Mod03_Sect03_part1.mp4 -

In [10]:
import os
from moviepy.editor import VideoFileClip

# Directory containing downloaded video files
downloaded_videos_directory = "downloaded_videos"

# Directory to save converted audio files
converted_audios_directory = "converted_audios"

# Create the directory if it doesn't exist
if not os.path.exists(converted_audios_directory):
    os.makedirs(converted_audios_directory)

# Iterate through downloaded video files
for root, dirs, files in os.walk(downloaded_videos_directory):
    for file in files:
        
        if file.endswith(".mp4"):
            
            video_path = os.path.join(root, file)
            
            
            video_clip = VideoFileClip(video_path)
            
            audio_clip = video_clip.audio
            
            
            output_audio_path = os.path.join(converted_audios_directory, os.path.splitext(file)[0] + ".wav")
            
            audio_clip.write_audiofile(output_audio_path, codec='mp3')  # Specify the codec as mp3
            
            video_clip.close()

print("Conversion of video files to audio files completed.")


MoviePy - Writing audio in converted_audios/Mod05_Sect02_part2.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect08.wav


                                                                       

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect03_part1.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod06_Sect02.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod07_Sect01.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect07_part1.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Sect03_part3.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod04_Sect02_part3.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect02_part1.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod04_Intro.wav


                                                                     

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Sect01_ver2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect04_part3.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect03_part3.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect01.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod04_Sect02_part2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Intro.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod06_Sect01.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect07_part2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect04_part2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_Sect05.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_Sect01.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect02_part3.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect02_part2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_Intro.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod01_Course Overview.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect03_part2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect05.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_Sect02.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_WrapUp_ver2.wav


                                                                     

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect04_part1.wav


                                                                     

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod04_Sect01.wav


                                                                     

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Sect02_part1_ver2.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_Sect03.wav


                                                                       

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_WrapUp.wav


                                                                     

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod04_WrapUp.wav


                                                                    

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Sect03_part1.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod06_WrapUp.wav


                                                                    

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect06.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_Sect04.wav


                                                                     

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod02_WrapUp.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod03_Sect07_part3.wav


                                                                        

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Intro.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod04_Sect02_part1.wav


                                                                       

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Sect03_part4_ver2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod05_Sect03_part2.wav


                                                                      

MoviePy - Done.
MoviePy - Writing audio in converted_audios/Mod06_Intro.wav


                                                                      

MoviePy - Done.
Conversion of video files to audio files completed.




In [9]:
!pip install pydub

Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


In [11]:
!pip install google-cloud-speech



In [21]:
!pip install SpeechRecognition

Collecting SpeechRecognition
  Downloading SpeechRecognition-3.10.3-py2.py3-none-any.whl.metadata (29 kB)
Downloading SpeechRecognition-3.10.3-py2.py3-none-any.whl (32.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.8/32.8 MB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.10.3


In [14]:
!conda install -c conda-forge ffmpeg -y
!ffmpeg -version

Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: - 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/noarch::autopep8==2.0.4=pyhd8ed1ab_0
  - conda-forge/linux-64::black==24.2.0=py310hff52083_0
  - conda-forge/noarch::bleach==6.1.0=pyhd8ed1ab_0
  - conda-forge/noarch::plotly==5.19.0=pyhd8ed1ab_0
  - conda-forge/noarch::pytest==8.0.1=pyhd8ed1ab_1
  - conda-forge/noarch::qtpy==2.4.1=pyhd8ed1ab_0
  - conda-forge/linux-64::sip==6.7.12=py310hc6cd4ac_0
  - conda-forge/noarch::tqdm==4.66.2=pyhd8ed1ab_0
  - conda-forge/noarch::flask==3.0.2=pyhd8ed1ab_0
  - conda-forge/noarch::importlib_metadata==7.0.1=hd8ed1ab_0
  - conda-forge/noarch::nltk==3.8.1=pyhd8ed1ab_0
  - conda-forge/linux-64::pyqt5-sip==12.12.2=py310hc6cd4ac_5
  - conda-forge/noarch::pytoolconfig==1.2.5=pyhd8ed1ab_0
  - conda-forge/noarch::qdarkstyle==3.1=pyhd8ed1ab_0
  - 

In [15]:
from moviepy.editor import AudioFileClip

def get_audio_duration(audio_file_path):
    """
    Get the duration of an audio file.
    
    Parameters:
    - audio_file_path (str): Path to the audio file.
    
    Returns:
    - duration (float): Duration of the audio file in seconds.
    """
    # Load the audio file clip
    audio_clip = AudioFileClip(audio_file_path)
    
    # Get the duration of the audio clip
    duration = audio_clip.duration
    
    # Close the audio clip
    audio_clip.close()
    
    return duration


In [16]:
audio_file_path = "converted_audios/Mod02_Intro.wav"
duration = get_audio_duration(audio_file_path)
print("Duration of the audio file:", duration, "seconds")


Duration of the audio file: 60.94 seconds


In [17]:
#Function to extract a segment of audio from whole audio file

from pydub import AudioSegment

def extract_audio_segment(audio_file, start_time, end_time):
    # Load the audio file
    audio = AudioSegment.from_file(audio_file)

    # Extract the desired segment
    audio_segment = audio[start_time * 1000:end_time * 1000]  # Convert time to milliseconds

    return audio_segment

# # Example usage: Extract audio segment from 10 to 20 seconds
audio_file = "converted_audios/Mod02_Sect02.wav"
start_time = 120 
end_time = 300    
audio_segment = extract_audio_segment(audio_file, start_time, end_time)


In [None]:
import os

audio_directory = "converted_audios"
output_directory = "transcribed_texts"

# Create the output directory if it doesn't exist
os.makedirs(output_directory, exist_ok=True)

def transcribe_all_files():
    for audio_file in os.listdir(audio_directory):
        audio_name = os.path.splitext(audio_file)[0]
        output_text = os.path.join(output_directory, f"{audio_name}.txt")
        transcribe_single_file(audio_file, output_text)

def transcribe_single_file(audio_file, output_text):
    duration = get_audio_duration(os.path.join(audio_directory, audio_file))
    duration_used = math.floor(duration)
    segment_duration = 120  # Duration of each segment in seconds

    for start_duration in range(0, duration_used, segment_duration):
        end_duration = min(start_duration + segment_duration, duration_used)
        
        audio_segment = extract_audio_segment(os.path.join(audio_directory, audio_file), start_duration, end_duration)
        
        output_file = f"output_segment_{start_duration}-{end_duration}.wav"
        audio_segment.export(output_file, format="wav")
        
        # Initialize recognizer 
        r = sr.Recognizer()

        # Load the audio file 
        with sr.AudioFile(output_file) as source:
            data = r.record(source)

        # Perform speech recognition
        text = r.recognize_google(data)
        
        # Write the transcription result to the text file
        with open(output_text, 'a') as file:
            file.write(' ' + text)
            
    return 'Transcription Success'

transcribe_all_files()


## 3. Normalizing the text
([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to perform any text normalization steps that are necessary for your solution.

In [19]:
# Write your answer/code here
import os

def read_text_files(directory_path):

    text_list = []
    for filename in os.listdir(directory_path):
    
        if filename.endswith(".txt"):
            file_path = os.path.join(directory_path, filename)
            with open(file_path, 'r') as file:
                text_list.append(file.read())
    return text_list

directory_path = "transcribed_texts"  
text_data = read_text_files(directory_path)


In [20]:
import nltk
from nltk.stem import WordNetLemmatizer
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')


[nltk_data] Downloading package punkt to /home/ec2-user/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package wordnet to /home/ec2-user/nltk_data...
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/ec2-user/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [21]:
def lowercase_and_strip(text):
    text = text.lower().strip()
    return text

def remove_html_tags(text):
    text = re.compile('<.*?>').sub('', text)
    return text

def replace_mentions_and_hashtags(text):
    # Improved regex to handle usernames with underscores
    text = re.sub(r'@[\w_]+', '', text)
    hashtags = re.findall(r'#(\w+)', text)
    text = re.sub(r'#(\w+)', r'\1', text)
    return text

def remove_links(text):
    # Remove URLs
    text = re.sub(r'http\S+', '', text)
    return text

def replace_punctuation_with_space(text):
    text = re.sub('\s+', ' ', text)
    return text

def remove_special_characters(text):
    text = re.compile('[^A-Za-z0-9@#]+').sub(' ', text)
    return text


def remove_stopwords(text):
    stop_words = set(stopwords.words('english'))
    words = word_tokenize(text)
    filtered_sentence = [w for w in words if w.lower() not in stop_words]
    text = " ".join(filtered_sentence)
    return text

def lemmatize_text(lemmatizer, text):
    # lemmatizer = WordNetLemmatizer()
    words = word_tokenize(text)
    lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
    text = " ".join(lemmatized_words)
    return text

def remove_numericals(text):
    text = re.compile('[^A-Za-z]+').sub(' ', text)
    return text
def remove_short_words(text):
    words = word_tokenize(text)
    filtered_sentence = [w for w in words if len(w)>2]
    text = " ".join(filtered_sentence)
    return text    


normalized_texts = []


lemmatizer = WordNetLemmatizer()
for text in text_data:
    processed_text = lowercase_and_strip(text)
    processed_text = remove_html_tags(processed_text)
    processed_text = replace_mentions_and_hashtags(processed_text)
    processed_text = remove_links(processed_text)
    processed_text = remove_stopwords(processed_text)
    processed_text = remove_special_characters(processed_text)
    processed_text = remove_numericals(processed_text)
    processed_text = replace_punctuation_with_space(processed_text)
    
    processed_text = lemmatize_text(lemmatizer, processed_text)
    processed_text = remove_short_words(processed_text)
    
    normalized_texts.append(processed_text)



In [22]:
import os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import re

# Function to preprocess text
def preprocess_text(text):
    processed_text = lowercase_and_strip(text)
    processed_text = remove_html_tags(processed_text)
    processed_text = replace_mentions_and_hashtags(processed_text)
    processed_text = remove_links(processed_text)
    processed_text = remove_stopwords(processed_text)
    processed_text = remove_special_characters(processed_text)
    processed_text = remove_numericals(processed_text)
    processed_text = replace_punctuation_with_space(processed_text)
    processed_text = lemmatize_text(lemmatizer, processed_text)
    processed_text = remove_short_words(processed_text)
    return processed_text


output_dir = "preprocessed_texts"
os.makedirs(output_dir, exist_ok=True)


for filename in os.listdir("transcribed_texts"):
    # Skip directories
    if os.path.isdir(os.path.join("transcribed_texts", filename)):
        continue

    input_file_path = os.path.join("transcribed_texts", filename)
    output_file_path = os.path.join(output_dir, filename)
    
    with open(input_file_path, "r") as file:
        text = file.read()
    
    processed_text = preprocess_text(text)
    
    with open(output_file_path, "w") as file:
        file.write(processed_text)

print("Text preprocessing completed.")



Text preprocessing completed.


## 4. Extracting key phrases and topics
([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to extract the key phrases and topics from the videos.

In [24]:
# Write your answer/code here
!pip install gensim

Collecting gensim
  Downloading gensim-4.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.4 kB)
Collecting smart-open>=1.8.1 (from gensim)
  Downloading smart_open-7.0.4-py3-none-any.whl.metadata (23 kB)
Downloading gensim-4.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.5/26.5 MB[0m [31m57.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hDownloading smart_open-7.0.4-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.2/61.2 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: smart-open, gensim
Successfully installed gensim-4.3.2 smart-open-7.0.4


In [25]:
!pip install yake

Collecting yake
  Downloading yake-0.4.8-py2.py3-none-any.whl.metadata (4.0 kB)
Collecting segtok (from yake)
  Downloading segtok-1.5.11-py3-none-any.whl.metadata (9.0 kB)
Downloading yake-0.4.8-py2.py3-none-any.whl (60 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.2/60.2 kB[0m [31m702.4 kB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hDownloading segtok-1.5.11-py3-none-any.whl (24 kB)
Installing collected packages: segtok, yake
Successfully installed segtok-1.5.11 yake-0.4.8


In [26]:
import os
import yake

def extract_key_phrases_from_file(file_path):
    with open(file_path, "r") as file:
        text = file.read()
    language = "en"
    max_ngram_size = 3
    deduplication_threshold = 0.9
    deduplication_algo = 'seqm'
    window_size = 1
    num_keywords = 20

   
    custom_kw_extractor = yake.KeywordExtractor(lan=language, n=max_ngram_size, dedupLim=deduplication_threshold,
                                                dedupFunc=deduplication_algo, windowsSize=window_size, top=num_keywords, features=None)

    
    keywords = custom_kw_extractor.extract_keywords(text)

    key_phrases = [keyphrase for keyphrase, _ in keywords]

    return key_phrases

preprocessed_texts_dir = "preprocessed_texts"

key_phrases_list = []
for file_name in os.listdir(preprocessed_texts_dir):
    file_path = os.path.join(preprocessed_texts_dir, file_name)
    if os.path.isfile(file_path):
        key_phrases = extract_key_phrases_from_file(file_path)
        key_phrases_list.append(key_phrases)

# Print the key phrases
for i, key_phrases in enumerate(key_phrases_list, start=1):
    print(f"Key Phrases for Text {i}:")
    for phrase in key_phrases:
        print("-", phrase)
    print()


Key Phrases for Text 1:
- time series data
- working time series
- series data list
- challenge working time
- aws academy machine
- academy machine learning
- business problem solved
- list step required
- step required create
- problem solved amazon
- data list step
- forecasting start introduction
- start introduction forecasting
- describe business problem
- describe challenge working
- module aws academy
- machine learning module
- simplify building forecast
- building forecast end
- required create forecast

Key Phrases for Text 2:
- feature data set
- data set information
- data set made
- scatter plot matrix
- data set imbalance
- imbalance data set
- set imbalance data
- relationship scatter plot
- plot box plot
- set information give
- column data set
- set made credit
- variable scatter plot
- set information relates
- instance data set
- scatter plot good
- car data set
- numerical variable feature
- descriptive statistic data
- create scatter plot

Key Phrases for Text 3:


In [27]:
from gensim import corpora, models
import os

def extract_topics_from_folder(folder_path, num_topics=5, num_words=10):
    preprocessed_texts = []
    for file_name in os.listdir(folder_path):
        file_path = os.path.join(folder_path, file_name)
        if os.path.isfile(file_path):
            with open(file_path, "r") as file:
                preprocessed_text = file.read().split() 
                preprocessed_texts.append(preprocessed_text)

    dictionary = corpora.Dictionary(preprocessed_texts)
    corpus = [dictionary.doc2bow(text) for text in preprocessed_texts]
    
    # Training the LDA model
    lda_model = models.LdaModel(corpus, num_topics=num_topics, id2word=dictionary)
    topics = lda_model.print_topics(num_topics=num_topics, num_words=num_words)
    return topics

# Directory where preprocessed texts are saved
preprocessed_texts_dir = "preprocessed_texts"

num_topics = 46
num_words = 10
topics = extract_topics_from_folder(preprocessed_texts_dir, num_topics=num_topics, num_words=num_words)

# Printing the extracted topics
for i, topic in enumerate(topics):
    print(f"Topic {i+1}:")
    print(topic)
    print()


Topic 1:
(0, '0.038*"model" + 0.031*"label" + 0.022*"image" + 0.020*"recognition" + 0.019*"data" + 0.013*"amazon" + 0.013*"test" + 0.012*"set" + 0.012*"custom" + 0.011*"metric"')

Topic 2:
(1, '0.051*"data" + 0.018*"model" + 0.012*"use" + 0.011*"example" + 0.011*"value" + 0.011*"amazon" + 0.010*"feature" + 0.009*"set" + 0.009*"also" + 0.008*"need"')

Topic 3:
(2, '0.026*"amazon" + 0.018*"video" + 0.017*"image" + 0.015*"recognition" + 0.014*"use" + 0.014*"service" + 0.009*"detection" + 0.009*"object" + 0.007*"computer" + 0.007*"vision"')

Topic 4:
(3, '0.020*"learning" + 0.020*"data" + 0.013*"machine" + 0.011*"problem" + 0.009*"amazon" + 0.009*"example" + 0.007*"model" + 0.007*"time" + 0.007*"forecast" + 0.007*"use"')

Topic 5:
(4, '0.031*"machine" + 0.029*"learning" + 0.015*"data" + 0.013*"section" + 0.012*"model" + 0.011*"problem" + 0.008*"set" + 0.008*"computer" + 0.007*"module" + 0.007*"image"')

Topic 6:
(5, '0.028*"learning" + 0.025*"data" + 0.022*"machine" + 0.017*"amazon" + 0.01

In [28]:
import os
import yake
from gensim import corpora, models

# Define function to extract key phrases using YAKE
def extract_key_phrases(text):
    
    language = "en"
    max_ngram_size = 3
    deduplication_threshold = 0.9
    deduplication_algo = 'seqm'
    window_size = 1
    num_keywords = 10 
    custom_kw_extractor = yake.KeywordExtractor(lan=language, n=max_ngram_size, dedupLim=deduplication_threshold,
                                                dedupFunc=deduplication_algo, windowsSize=window_size, top=num_keywords, features=None)


    keywords = custom_kw_extractor.extract_keywords(text)
    key_phrases = [keyphrase for keyphrase, _ in keywords]
    return key_phrases

# Defining function to extract topics using Gensim LDA
def extract_topics(texts, num_topics=5, num_words=5):
    tokenized_texts = [text.split() for text in texts]
    dictionary = corpora.Dictionary(tokenized_texts)
    corpus = [dictionary.doc2bow(text) for text in tokenized_texts]
    lda_model = models.LdaModel(corpus, num_topics=num_topics, id2word=dictionary)
    topics = lda_model.print_topics(num_topics=num_topics, num_words=num_words)
    return topics

directory_path = "preprocessed_texts"

def read_text_files(directory_path):
    text_data = []
    for filename in os.listdir(directory_path):
        filepath = os.path.join(directory_path, filename)
        if os.path.isfile(filepath):  # Check if the path is a file
            with open(filepath, 'r') as file:
                text = file.read()
                text_data.append(text)
    return text_data

text_data = read_text_files(directory_path)

key_phrases_list = []
topics_list = []

for text in text_data:
    
    key_phrases = extract_key_phrases(text)
    key_phrases_list.append(key_phrases)
   
    topics = extract_topics([text])[0]  # Assuming only one text per iteration
    topics_list.append(topics)

for i, (key_phrases, topics) in enumerate(zip(key_phrases_list, topics_list), start=1):
    print(f"Text {i} Key Phrases:", key_phrases)
    print(f"Text {i} Topics:")
    for j, topic in enumerate(topics, start=1):
        print(f"Topic {j}: {topic}")
    print()


Text 1 Key Phrases: ['time series data', 'working time series', 'series data list', 'challenge working time', 'aws academy machine', 'academy machine learning', 'business problem solved', 'list step required', 'step required create', 'problem solved amazon']
Text 1 Topics:
Topic 1: 0
Topic 2: 0.025*"forecast" + 0.025*"amazon" + 0.025*"module" + 0.025*"data" + 0.025*"look"

Text 2 Key Phrases: ['feature data set', 'data set information', 'data set made', 'scatter plot matrix', 'data set imbalance', 'imbalance data set', 'set imbalance data', 'relationship scatter plot', 'plot box plot', 'set information give']
Text 2 Topics:
Topic 1: 0
Topic 2: 0.033*"data" + 0.019*"value" + 0.014*"feature" + 0.011*"variable" + 0.011*"might"

Text 3 Key Phrases: ['distributed training job', 'sagemaker hyperparameter tuning', 'metric training job', 'training job performs', 'result training job', 'training tuning machine', 'training job data', 'training job time', 'tuning job improves', 'process training 

In [29]:
topic_key_phrases = {}

for i, key_phrases in enumerate(key_phrases_list, start=1):
    topic_key_phrases[f"Topic {i}"] = key_phrases

# Print the dictionary
print(topic_key_phrases)


{'Topic 1': ['time series data', 'working time series', 'series data list', 'challenge working time', 'aws academy machine', 'academy machine learning', 'business problem solved', 'list step required', 'step required create', 'problem solved amazon'], 'Topic 2': ['feature data set', 'data set information', 'data set made', 'scatter plot matrix', 'data set imbalance', 'imbalance data set', 'set imbalance data', 'relationship scatter plot', 'plot box plot', 'set information give'], 'Topic 3': ['distributed training job', 'sagemaker hyperparameter tuning', 'metric training job', 'training job performs', 'result training job', 'training tuning machine', 'training job data', 'training job time', 'tuning job improves', 'process training tuning'], 'Topic 4': ['amazon recognition custom', 'recognition custom label', 'train amazon recognition', 'machine learning process', 'learning amazon recognition', 'training data set', 'amazon recognition', 'training computer vision', 'amazon recognition de

In [30]:

from os.path import splitext

key_phrases_dict = {}

for filename, key_phrases in zip(os.listdir(directory_path), key_phrases_list):
    video_name = splitext(filename)[0] 
    key_phrases_dict[video_name] = key_phrases

# Print the dictionary
print(key_phrases_dict)



{'Mod04_Intro': ['time series data', 'working time series', 'series data list', 'challenge working time', 'aws academy machine', 'academy machine learning', 'business problem solved', 'list step required', 'step required create', 'problem solved amazon'], 'Mod03_Sect03_part2': ['feature data set', 'data set information', 'data set made', 'scatter plot matrix', 'data set imbalance', 'imbalance data set', 'set imbalance data', 'relationship scatter plot', 'plot box plot', 'set information give'], 'Mod03_Sect08': ['distributed training job', 'sagemaker hyperparameter tuning', 'metric training job', 'training job performs', 'result training job', 'training tuning machine', 'training job data', 'training job time', 'tuning job improves', 'process training tuning'], 'Mod05_Sect03_part1': ['amazon recognition custom', 'recognition custom label', 'train amazon recognition', 'machine learning process', 'learning amazon recognition', 'training data set', 'amazon recognition', 'training computer 

In [31]:
from gensim.models.ldamodel import LdaModel

def extract_topics(texts, num_topics=5, num_words=10):
    
    dictionary = corpora.Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]
    lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary)
    topics_words_list = []
    for i in range(num_topics):
        topic_words = [word for word, _ in lda_model.show_topic(i, topn=num_words)]
        topics_words_list.append(topic_words)
    return topics_words_list

normalized_texts_list = [text.split() for text in normalized_texts]

# Calling the function to extract topics
topics = extract_topics(normalized_texts_list, num_topics=num_topics, num_words=num_words)

topic_words_dict = {}
for i, topic_words in enumerate(topics, start=1):
    topic_words_dict[f"Topic {i}"] = topic_words

print(topic_words_dict)


{'Topic 1': ['learning', 'machine', 'data', 'use', 'amazon', 'aws', 'model', 'problem', 'module', 'also'], 'Topic 2': ['model', 'data', 'word', 'metric', 'need', 'learning', 'use', 'machine', 'also', 'could'], 'Topic 3': ['amazon', 'video', 'learning', 'machine', 'recognition', 'data', 'face', 'image', 'also', 'feature'], 'Topic 4': ['data', 'model', 'set', 'use', 'time', 'also', 'learning', 'example', 'image', 'feature'], 'Topic 5': ['image', 'label', 'data', 'model', 'amazon', 'use', 'set', 'custom', 'object', 'recognition'], 'Topic 6': ['data', 'learning', 'machine', 'correlation', 'use', 'aws', 'amazon', 'one', 'section', 'module'], 'Topic 7': ['machine', 'learning', 'data', 'model', 'problem', 'set', 'module', 'also', 'time', 'use'], 'Topic 8': ['data', 'amazon', 'use', 'model', 'label', 'also', 'machine', 'image', 'need', 'forecast'], 'Topic 9': ['data', 'amazon', 'learning', 'use', 'machine', 'image', 'example', 'model', 'set', 'label'], 'Topic 10': ['data', 'model', 'feature', 

In [32]:
from os.path import splitext

# Create a dictionary to store topic words
topic_words_dict = {}


for filename, topic_words in zip(os.listdir(directory_path), topics):
    file_name_without_extension = splitext(filename)[0]
    topic_words_dict[file_name_without_extension] = topic_words


print(topic_words_dict)


{'Mod04_Intro': ['learning', 'machine', 'data', 'use', 'amazon', 'aws', 'model', 'problem', 'module', 'also'], 'Mod03_Sect03_part2': ['model', 'data', 'word', 'metric', 'need', 'learning', 'use', 'machine', 'also', 'could'], 'Mod03_Sect08': ['amazon', 'video', 'learning', 'machine', 'recognition', 'data', 'face', 'image', 'also', 'feature'], 'Mod05_Sect03_part1': ['data', 'model', 'set', 'use', 'time', 'also', 'learning', 'example', 'image', 'feature'], 'Mod04_WrapUp': ['image', 'label', 'data', 'model', 'amazon', 'use', 'set', 'custom', 'object', 'recognition'], 'Mod06_Sect02': ['data', 'learning', 'machine', 'correlation', 'use', 'aws', 'amazon', 'one', 'section', 'module'], 'Mod03_WrapUp': ['machine', 'learning', 'data', 'model', 'problem', 'set', 'module', 'also', 'time', 'use'], 'Mod04_Sect02_part1': ['data', 'amazon', 'use', 'model', 'label', 'also', 'machine', 'image', 'need', 'forecast'], 'Mod02_Intro': ['data', 'amazon', 'learning', 'use', 'machine', 'image', 'example', 'model

In [33]:
import os
from os.path import splitext, join

# Define the directory paths to save key phrases and topics
key_phrases_dir = "key_phrases"
topics_dir = "topics"

# Create directories if they don't exist
os.makedirs(key_phrases_dir, exist_ok=True)
os.makedirs(topics_dir, exist_ok=True)

# Save key phrases for each text
for filename, key_phrases in zip(os.listdir(directory_path), key_phrases_list):
    file_name_without_extension = splitext(filename)[0]
    key_phrases_file_path = join(key_phrases_dir, f"{file_name_without_extension}.txt")
    with open(key_phrases_file_path, 'w') as f:
        for phrase in key_phrases:
            f.write(f"{phrase}\n")

# Save topics for each text
for filename, topics_tuple in zip(os.listdir(directory_path), topics_list):
    file_name_without_extension = splitext(filename)[0]
    topics_file_path = join(topics_dir, f"{file_name_without_extension}.txt")
    with open(topics_file_path, 'w') as f:
        for topic_id, topic in enumerate(topics_tuple, start=1):
            f.write(f"Topic {topic_id}:\n{topic}\n\n")



## 5. Creating the dashboard
([Go to top](#Capstone-8:-Bringing-It-All-Together))

Use this section to create the dashboard for your solution.

In [34]:
!pip install gradio



In [39]:
import os
import ipywidgets as widgets
from IPython.display import display, Video, HTML

# Function to search for matching videos based on query
def search_videos(query):
    key_phrases_dir = "key_phrases"
    topics_dir = "topics"
    matching_videos = set()  

    for file_name in os.listdir(key_phrases_dir):
        file_path = os.path.join(key_phrases_dir, file_name)
        if os.path.isfile(file_path):  
            with open(file_path, "r") as file:
                text_name = os.path.splitext(file_name)[0]
                for line in file:
                    if query in line:
                        video_path = os.path.join("downloaded_videos", f"{text_name}.mp4")
                        matching_videos.add((video_path, text_name))

    for file_name in os.listdir(topics_dir):
        file_path = os.path.join(topics_dir, file_name)
        if os.path.isfile(file_path):  
            with open(file_path, "r") as file:
                text_name = os.path.splitext(file_name)[0]
                for line in file:
                    if query in line:
                        video_path = os.path.join("downloaded_videos", f"{text_name}.mp4")
                        matching_videos.add((video_path, text_name))

    return list(matching_videos)

# Function to display video players
def display_video_players(matching_videos):
    if matching_videos:
        for video_path, video_name in matching_videos:
            display(HTML(f"<h3>{video_name}</h3>"))
            display(Video(video_path))
    else:
        print("No matching videos found.")

# Callback function for the search button
def search_and_display_videos(btn):
    output_videos.clear_output()  # Clear previous search results
    query = text_input.value.strip()
    if query:
        matching_videos = search_videos(query)
        with output_videos:
            display_video_players(matching_videos)

text_input = widgets.Text(placeholder="Enter keyword or topic")

search_button = widgets.Button(description="Search")

output_videos = widgets.Output()

search_button.on_click(search_and_display_videos)

display(widgets.VBox([text_input, search_button, output_videos]))


VBox(children=(Text(value='', placeholder='Enter keyword or topic'), Button(description='Search', style=Button…

In [40]:
import gradio as gr
import glob
import os

# Function to search for matching videos based on query
def search_videos(query):

    key_phrases_dir = "key_phrases"
    topics_dir = "topics"
    matching_videos = []

    # Searching for matching videos based on key phrases
    for file_path in glob.glob(os.path.join(key_phrases_dir, "*.txt")):
        with open(file_path, "r") as file:
            text_name = os.path.splitext(os.path.basename(file_path))[0]
            for line in file:
                if query in line:
                    video_path = os.path.join("downloaded_videos", f"{text_name}.mp4")
                    matching_videos.append(video_path)

    # Searching for matching videos based on topics
    for file_path in glob.glob(os.path.join(topics_dir, "*.txt")):
        with open(file_path, "r") as file:
            text_name = os.path.splitext(os.path.basename(file_path))[0]
            for line in file:
                if (query in line) and text_name not in matching_videos:
                    video_path = os.path.join("downloaded_videos", f"{text_name}.mp4")
                    matching_videos.append(video_path)

    return matching_videos

# Function to display videos in Gradio interface
def display_videos(matching_videos):
    if matching_videos:
        videos_data = []
        for video_path in matching_videos:
            video_name = os.path.basename(video_path)
            videos_data.append((video_name, video_path))
        return videos_data
    else:
        return "No matching videos found."

# Create a Gradio interface for searching and displaying videos
search = gr.Interface(fn=search_videos, inputs="text", outputs="text", title="Video Search", description="Enter a keyword or topic to search for related videos.")
search.launch()


Running on local URL:  http://127.0.0.1:7872
Sagemaker notebooks may require sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Running on public URL: https://2cfd777c36aa34d3b0.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [42]:
import gradio as gr
import glob
import os
from IPython.display import HTML, display


def search_videos(query):
   
    key_phrases_dir = "key_phrases"
    topics_dir = "topics"
    matching_videos = set()  

    # Searching for matching videos based on key phrases
    for file_path in glob.glob(os.path.join(key_phrases_dir, "*.txt")):
        with open(file_path, "r") as file:
            text_name = os.path.splitext(os.path.basename(file_path))[0]
            for line in file:
                if query in line:
                    video_path = os.path.join("downloaded_videos", f"{text_name}.mp4")
                    matching_videos.add(video_path)

    # Searching for matching videos based on topics
    for file_path in glob.glob(os.path.join(topics_dir, "*.txt")):
        with open(file_path, "r") as file:
            text_name = os.path.splitext(os.path.basename(file_path))[0]
            for line in file:
                if query in line:
                    video_path = os.path.join("downloaded_videos", f"{text_name}.mp4")
                    matching_videos.add(video_path)

    return list(matching_videos)

# Function to display videos in HTML format
def display_videos(matching_videos):
    if matching_videos:
        html = "<div>"
        for video in matching_videos:
            html += f'<video width="320" height="240" controls><source src="{video}" type="video/mp4"></video>'
        html += "</div>"
        display(HTML(html))
    else:
        print("No matching videos found.")

# Gradio interface for searching and displaying videos
def search_and_display_videos(query):
    matching_videos = search_videos(query)
    display_videos(matching_videos)

iface = gr.Interface(fn=search_and_display_videos, inputs="text", outputs=None, title="Video Search", description="Enter a keyword or topic to search for related videos.")
iface.launch(inline=True)


Running on local URL:  http://127.0.0.1:7874
Sagemaker notebooks may require sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Running on public URL: https://ed47ef647bef22a60e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


