Goal : Summarise my weekly podcasts for quick reading

Resources:

1. https://www.elithecomputerguy.com/2023/10/scraping-youtube-with-openai-python-chatgpt-youtube-transcript-api/
2. https://huggingface.co/spaces/SteveDigital/free-fast-youtube-url-video-to-text-using-openai-whisper/blob/main/app.py
3. https://github.com/jxnl/instructor/blob/main/examples/chain-of-density/chain_of_density.py
4. https://pytube.io/en/latest/api.html - for downloading youtube videos then transcribing

Tasks : 

- [] Download youtube video and send to whisper to get transcript


In [26]:
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import JSONFormatter, TextFormatter
import instructor
from openai import OpenAI
from pydantic import BaseModel
from pathlib import Path
from dotenv import load_dotenv
import os

In [86]:
# load API key

dotenv_path = Path(r"C:\Storage\python_projects\ashvin\.env")
image_folderpath = Path(r"C:\Storage\python_projects\ashvin\sandbox\pydantic")
load_dotenv(dotenv_path=dotenv_path)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

# main constants

GPT_MODEL_TEXT_ALIAS = "gpt-4-turbo-2024-04-09" # points to latest GPT model
GPT_MODEL_TEXT = "gpt-4-0125-preview"
GPT_MODEL_35_TEXT_ALIAS = "gpt-3.5-turbo" # points to latest GPT 3.5 Turbo model
DALL_E_3 = "dall-e-3"

#instantiate client
client = instructor.patch(OpenAI())

In [78]:
# config

video_id = "cnTwXzjksOY"

In [52]:
# text wrapper function

def wrapper(prompt: str, data: str | list, response_model:BaseModel | None = None):
    """Wrapper function to generate LLM completion"""
    return client.chat.completions.create(
        model=GPT_MODEL_TEXT_ALIAS,
        response_model=response_model,
        max_retries=5,
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": data},
        ]
    )

In [79]:
# get transcript as list of dictionaries

transcript = YouTubeTranscriptApi.get_transcript(video_id=video_id)

# convert transcript to format

formatter = JSONFormatter()
json_formatted = formatter.format_transcript(transcript)

In [80]:
print(json_formatted)

[{"text": "[Music]", "start": 20.26, "duration": 3.229}, {"text": "hi and welcome to crypto law I am Jeremy", "start": 31.279, "duration": 5.8}, {"text": "Hogan and we are live I have hijacked", "start": 34.12, "duration": 5.72}, {"text": "the channel for one last day but I know", "start": 37.079, "duration": 4.081}, {"text": "why you are here because you are", "start": 39.84, "duration": 3.559}, {"text": "interested in xrp and want to know", "start": 41.16, "duration": 4.52}, {"text": "what's about to happen in the SEC law", "start": 43.399, "duration": 5.041}, {"text": "suitcase and I am about to tell you and", "start": 45.68, "duration": 4.519}, {"text": "then we are going to speculate on the", "start": 48.44, "duration": 4.48}, {"text": "outcome all just for fun and all within", "start": 50.199, "duration": 3.641}, {"text": "15", "start": 52.92, "duration": 3.799}, {"text": "minutes promise so first I'm going to", "start": 53.84, "duration": 5.28}, {"text": "let you in on a little 

In [83]:
prompt = """
As a professional summarizer, create a concise and comprehensive summary of the provided text:

- Extract all key and pertinent information.
- Craft a summary that is detailed, thorough, in-depth, and complex, while maintaining clarity and conciseness.
- Incorporate main ideas and essential information, eliminating extraneous language and focusing on critical aspects.
- Rely strictly on the provided text, without including external information.
- Format the summary in markdown with bullet points, signposting and any other tools for accessible reading.
- Writing in the style of Raymond Carver: simple, clear and direct.
- Ensure you give me a TL;DR upfront.
"""

In [70]:
initial_summary = """
Write an initial summary which should be long ( 4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. 
Use overly verbose languages and fillers (Eg. This article discusses) to reach ~80 words.
"""

denser_summary = """
You are going to generate an increasingly concise, entity-dense summary..

Perform the following two tasks
- Identify 1-3 informative entities from the original text which is missing from the previous summary
- Write a new denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities

Guidelines
- Make every word count: re-write the previous summary to improve flow and make space for additional entities
- Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
- The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.
- Missing entities can appear anywhere in the new summary
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.

An Entity is a real-world object that's assigned a name - for example, a person, country a product or a book title.
"""

In [84]:
response = wrapper(prompt=prompt, data=json_formatted)
response_content = response.choices[0].message.content
print(response_content)

**Summary of Ripple's Ongoing Legal Battle with the SEC**

- **Introduction and Context:**
  - Jeremy Hogan, an attorney specializing in crypto law, discusses the ongoing legal case between Ripple and the U.S. Securities and Exchange Commission (SEC).
  - The SEC previously held a perfect record in crypto-related cases until facing Ripple and experienced unexpected challenges during the proceedings.

- **Judge Assignment and Initial Developments:**
  - Ripple was seen as fortunate to have Judge Torres assigned to their case. Under other judges, results might have differed significantly.

- **The 'Goil' Case and its Impact:**
  - A pivotal case, referred to as the 'Goil' case, emerged from the second DCA Court, which significantly favored Ripple's stance. This case played a decisive role in shaping the legal arguments and strategies for Ripple.

- **SEC’s Demands:**
  - The SEC seeks an injunction preventing further violations, a disgorgement (return of illicit gains) of approximately $

Notification of Video from Google

In [93]:
from googleapiclient.discovery import build

def get_latest_videos(api_key, channel_id):
    youtube = build('youtube', 'v3', developerKey=api_key)
    request = youtube.search().list(
        part="id,snippet",
        channelId=channel_id,
        maxResults=5,
        order="date"
    )
    response = request.execute()
    
    # Debugging: Print the raw response
    print("API Response:", response)

    videos = []  # List to store video information
    for item in response.get('items', []):
        if item['id']['kind'] == "youtube#video":
            video_info = {
                'video_id': item['id']['videoId'],
                'title': item['snippet']['title'],
                'description': item['snippet']['description'],
                'thumbnail_url': item['snippet']['thumbnails']['high']['url']
            }
            videos.append(video_info)
        else:
            # Debugging: Print out non-video items
            print("Non-video item found:", item)

    return videos

# Example usage:
api_key = GOOGLE_API_KEY
channel_id = ''
video_data = get_latest_videos(api_key, channel_id)

# Debugging: Check if video_data is empty
if not video_data:
    print("No new videos found.")
else:
    for video in video_data:
        print(f"Video ID: {video['video_id']}, Title: {video['title']}")


API Response: {'kind': 'youtube#searchListResponse', 'etag': 'PLk22xd7gUSOrZbmKdslywO3xBQ', 'regionCode': 'AU', 'pageInfo': {'totalResults': 0, 'resultsPerPage': 0}, 'items': []}
No new videos found.
