<a href="https://colab.research.google.com/github/mogbuehi/chat_with_yt/blob/main/Chat_with_YT_version_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chat with YT

A Notebook App powered by GPT-3.5 Turbo LLM and Whisper Text-to-Speech models. Allows you to transcribe a short (less than 10 min) Youtbue video and ask an AI chatbot questions based on that transcript.

# 1. Mount drive and find current directory

In [None]:
from google.colab import drive
drive.mount('/content/drive') # mounting a drive is different than changing directory, this command is only giving the notebook access to your Google Drive Folders

In [None]:
# chaning directory now that drive is mounted
%cd '/content/drive/MyDrive/Lonely Octopus/Lunch and Learn'
!pwd

# 2. Install Dependencies

Install and import Libraries:


*   Pytube - Download Youtube videos
*   Moviepy - convert video to audio
*   Wave - to interact with `.wav` audio files
*   OpenAI - Access `Whisper` and `ChatCompletion` models for transcribing audio.



In [None]:
!pip install pytube moviepy openai wave tiktoken

In [None]:
# Importing the necessary modules
from pytube import YouTube
from moviepy.editor import VideoFileClip, AudioFileClip
import wave
import openai
import os


# 3. Define Functions

## Set up formatting for the notebook

In [None]:
# Display markdown format
from IPython.display import display, Markdown
from datetime import datetime
def print_md(string):
    display(Markdown(string))

# Example usage:
print_md("**Bold text**")
print_md("_Italic text_")

current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(current_time)

## Download the audio from a YouTube Video

Using PyTube to download the video from YT video and save the audio to a '.wav' file in `downloads` folder. Returns the `audio_path` and `status_message` string objects

Mount Drive and files will be saved to `content/MyDrive/...`


In [None]:
def download_audio(url):
    # Load YouTube url into `YouTube` attribute
    yt = YouTube(url)
    video = yt.streams.filter(only_audio=True, file_extension='mp4').first()

    # Create `downloads/` folder if it doesn't exist
    folder ='downloads/'
    if not os.path.exists(folder):
        os.makedirs(folder)

    # Downloads only the audio from a YT video and saves it as an `.mp4`
    video.download(output_path=folder)
    video_path = os.path.join(folder, video.default_filename)

    # Convert the audio to .wav format using moviepy
    audio = AudioFileClip(video_path)
    audio_path = os.path.splitext(video_path)[0] + ".wav"
    audio.write_audiofile(audio_path, codec='pcm_s16le') # wav format codec

    # Delete the original mp4 file
    os.remove(video_path)

    # Status message for users
    status_message = 'Audio downloaded successfully'

    return audio_path, status_message





## Break audio file into clips

In [None]:
def split_audio(audio_path, clip_size=30, overlap=3):
  clip_list =[]
  clip_name = os.path.basename(audio_path)[:-4]
  folder = 'downloads/clips/'
  if not os.path.exists(folder):
        os.makedirs(folder)

  with wave.open(audio_path, 'rb') as wf:
    # used for dividing clip into pieces
    num_frames = wf.getnframes()
    framerate = wf.getframerate()
    # important for constructing the clips
    num_channels = wf.getnchannels()
    sample_width = wf.getsampwidth()

    audio_size = num_frames / framerate #value in seconds
    frame_interval = int(clip_size * framerate - overlap * framerate) # make an int to avoid fractional frames

    clip_number = 1
    for start in range(0, num_frames, frame_interval):
      end = start + clip_size * framerate
      # constructing clip
      wf.setpos(start)
      clip_num_frames = wf.readframes(clip_size * framerate)
      clip_path = f'{folder}/{clip_name}_{clip_number}.wav'

      # writing clip with same number of channels, width, and framerate
      with wave.open(clip_path, 'wb') as wf_clip:
        wf_clip.setnchannels(num_channels)
        wf_clip.setsampwidth(sample_width)
        wf_clip.setframerate(framerate)
        wf_clip.writeframes(clip_num_frames)

      clip_list.append(clip_path)
      clip_number += 1

  return clip_list




## Transcribe Audio Clips

In [None]:
def transcribe(audio_path):
  clip_list = split_audio(audio_path) # helper function
  file_name = os.path.basename(audio_path)[:-4]
  folder = 'transcripts/'
  os.makedirs(folder, exist_ok=True)
  transcribed_clips = []

  for clip in clip_list:
    clip_size = AudioFileClip(audio_path).duration
    if clip_size < 0.1:
      print(f'Skipping the following clip (less than 0.1 seconds): {clip}')

    with open(clip, 'rb') as file: #audio model
      transcript = openai.Audio.transcribe(
          file=file,
          model='whisper-1',
          response_format='text',
          temperature=0
      )

    transcribed_clips.append(transcript + '\n')
    full_transcript = ''.join(transcribed_clips)
    full_transcript_path = f'transcripts/{file_name}.txt'

    with open(full_transcript_path, 'w') as f:
      f.write(full_transcript)

  return full_transcript, full_transcript_path

# # # Test functions, check folder location (Be sure to mount drive)
# url = input('Insert YouTube URL here--> ')
# audio_path, message = download_audio(url)

# with open('openai.txt', 'r') as file:
#     api_key = file.readline().strip()  # Read the key and remove any trailing spaces or newlines

# openai.api_key = api_key

# transcript, transcript_path = transcribe(audio_path)
# print_md(transcript)

## Compress transcript so that it fits in the context window (less than 4000 tokens ~ 8000 words)

In [None]:
with open('openai.txt', 'r') as file:
    api_key = file.readline().strip()  # Read the key and remove any trailing spaces or newlines

openai.api_key = api_key

# txt_path = 'transcripts/How to think of GPT as a developer.txt'
# with open(txt_path, 'r') as f:
#   transcript = f.read()

def compress_transcript(transcript='', context=''):
  response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=[
      {
        "role": "system",
        "content": """You are a text compression robot that is an expert in regex, .
        All you do is output a compressed piece of text that retains the semantic meaning of the text.
      Only output the compressed text. Also be sure to infer amount of speakers in the text and give that info.
      Follow these rules below to generate a compressed transcript and DO NOT under any circumstances show your work.
      Again only output the final compressed text.

      *Extractive Summarization:* Select the most informative sentences or fragments based on scoring methods (e.g., TF-IDF, sentence embedding similarities).
      *Keyword Extraction:* Identify the most relevant terms and ensure they are included in the compressed text to maintain its topical focus.
      *Sentence Simplification:* Rewrite sentences to be more concise, removing subordinate clauses and passive constructions where they are not necessary for understanding.
      *Pruning Redundant Information:* Remove any repetitions or paraphrased information that does not add new facts or insight.
      *Named Entity Recognition (NER):* Highlight and retain named entities such as people, organizations, and locations to ensure that the compressed text maintains its reference points.
      *Coreference Resolution:* Ensure that pronouns and referring expressions are resolved to explicit entities, to prevent loss of meaning through the removal of preceding context.
      *Discourse Structure Analysis:* Evaluate the logical structure of the text to prioritize sentences that convey the main narrative or argument."""
      },
      {
        "role": "user",
        "content": f"Summarize the following transcript. {context} Remember, ONLY output the final compressed transcript. Transcript{transcript} "
      }
    ],
    temperature=0,
    max_tokens=500,
  )

  compressed_transcript = response['choices'][0]['message']['content']
  return compressed_transcript





 # # Test function
  # text = compress_transcript()
  # print(text)


## Chatbot to ask questions about the transcript



### Memory Management

How to count tokens: [Github link](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb)

Slightly out of date. Run the following for up-to-date encodings
```
from tiktoken.registry import ENCODING_CONSTRUCTORS

# List the encodings from the constructors directly
available_encodings = list(ENCODING_CONSTRUCTORS.keys())
print("Available encodings:", available_encodings)
```

Helper functions allow us to summarize just the user and AI messages while keeping the system message intact.


In [None]:
# if messages gets to big, then summarize previous message prior to appending
# !pip install tiktoken
import tiktoken

# Assuming 'tiktoken_ext.openai_public' is both the plugin module and the encoding name
encoding_name = "cl100k_base"
encoding = tiktoken.get_encoding(encoding_name)

def num_tokens(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    return len(encoding.encode(string))

# Troubleshooting-----------------------------------------------
# import tiktoken
# from tiktoken.registry import _available_plugin_modules
# # List available plugin modules
# available_plugin_modules = _available_plugin_modules()
# print("Available plugin modules:", available_plugin_modules
# print("Available encodings:", available_encodings)

#--------------------------------------------------------------
def user_assistant_content(messages):
    keys = ['role', 'content']
    extracted = []
    for message in messages:
        # Check if the role is 'user' or 'assistant'
        if message.get('role') in ['user', 'assistant']:
            # Extract only the 'role' and 'content' keys
            extracted_item = {key: message[key] for key in keys if key in message}
            extracted.append(extracted_item)
    # Join the contents of the extracted messages with a newline, ensuring 'content' exists
    user_assistant = '\n'.join(msg['content'] for msg in extracted if 'content' in msg)
    # print(type(user_assistant))
    return user_assistant


#--------------------------------------------------------------
def all_content(messages):
  # Initialize an empty list to hold the content of each message
  contents = []
  # Loop through each message in the messages list
  for message in messages:
      # Extract the 'content' key and append it to the contents list
      contents.append(message.get('content', ''))  # Default to an empty string if 'content' key is not found
  # Join all the contents with a newline character to separate them
  all_contents_text = '\n'.join(contents)
  return all_contents_text
#--------------------------------------------------------------
def manage_memory(messages):
    # Get the complete chat content
    raw_chat_all = all_content(messages=messages)
    # print(f'TYPE:{type(raw_chat_all)}')
    # Get the user and assistant chat content
    raw_chat_ua = user_assistant_content(messages=messages)

    # Check the token count of the entire chat
    encoding_name = "cl100k_base"
    if num_tokens(string=raw_chat_all, encoding_name=encoding_name) > 1500:
        # Summarize the user assistant messages using compress() function
        summarized_content = compress_transcript(raw_chat_ua, context='Keep response to less than 200 words and extract key phrases')

        # Search for first system message and keep it
        first_system_message = next((msg for msg in messages if msg.get('role') == 'system'), None)

        # Clear the messages list and append the first system message back if it exists
        messages.clear()
        if first_system_message:
          # Add the summarized content as a new system message
          messages.append({'role': 'system', 'content': first_system_message + f'''\nThis is the chat history:{summarized_content}'''})
    else:
      # Return the potentially modified list of messages
      return messages
#--------------------------------------------------------------
# if user prompt makes system message too long, ask user to make a shorter prompt

### Chatbot

This function will run until user types `exit`. Memory is managed by `manage_memory()` function

In [None]:
# Normal chat bot with memory management


# Define the manage_memory function (as previously discussed)

def chatbot(system_content):
    messages = [{
        "role": "system",
        "content": system_content
    }]

    while True:
        # Get user input
        user_input = input(
            """Input your question query type 'exit' to stop-->: """)

        # Break the loop if the user types 'exit'
        if user_input.lower() == 'exit':
            break

        # Prepare the message for the API call
        api_messages = messages + [{"role": "user", "content": user_input}]

        # Make the API call to OpenAI
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=api_messages,
            temperature=0.1,
            max_tokens=500,
        )

        # Extract the AI's response
        ai_output = response['choices'][0]['message']['content']
        token_count = response['usage']['total_tokens']
        # print(f"AI: {ai_output}")

        # Add the user input and AI response to the messages
        messages.append({"role": "user", "content": user_input})
        messages.append({"role": "assistant", "content": ai_output})

        # Timestamp
        current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

        # Manage memory if needed
        messages = manage_memory(messages)

        #Display Chatlog
        print_md(f'**Chatlog** {current_time}')
        print_md(f'\n---')
        print_md(f'**User:** {user_input}')
        print_md(f'**AI** {ai_output}')
        print_md(f'**token count:** {token_count}')
        # At this point, the loop will continue, and the user can enter another message

# Example system content
# transcript=''
# system_content = f"""You are a nice chatbot having a conversation with a human. Help them answer questions in regards to
# the following transcribed audio from a video. Analyze the text and take on the role of an expert in the relavent field.
# Here is the transcript {transcript}.

# Take your time and think this out"""

# # Call the function to start the chat
# chatbot(system_content)


# 4. Demo in Notebook

In [None]:
# This cell runs an instance of the application

# Select Youtube URL------------------------------------------------------------
url = input('Insert YouTube URL here--> ')

# Downloading audio from the video----------------------------------------------
audio_path, message = download_audio(url)

# Transcribing the audio--------------------------------------------------------
with open('openai.txt', 'r') as file:
    api_key = file.readline().strip()

openai.api_key = api_key # Loading API key

transcript, transcript_path = transcribe(audio_path)
print_md(f'**Transcript**: {transcript}')

##  Compressing the transcript---------------------------------------------------
with open(transcript_path, 'r') as f:
  transcript = f.read()

transcript = compress_transcript(transcript=transcript) # comment out for language lesson
print_md(f'**Compressed:** {transcript}') # comment out for language lesson

# Starting the chatbot----------------------------------------------------------
system_content = f"""You are a nice chatbot having a conversation with a human.
Help them answer questions in regards tothe following transcribed audio from
a video.
Analyze the text and take on the role of an expert in the relavent field.
Here is the transcript {transcript}.

Take your time and think this out"""

# Call the function to start the chat-------------------------------------------
chatbot(system_content)

List of youtube links to try:

---

Gurren Lagann (2 mins): https://www.youtube.com/watch?v=s65n0PmvIm4

Max Tegmark x Lex Friedman (4 mins): https://www.youtube.com/watch?v=53zaaRp5cfo

Video should be less than 10 mins to avoid error