# Step-by-Step Guide for Creating a Content Idea Generator

## Introduction
In this Python notebook, we'll create a Content Idea Generator using LangChain and the OpenAI API. 

This tool will summarize YouTube videos and then generate content ideas based on those summaries, taking into account:
- specific information about the user
- the target audience.

## Prerequisites
To run this notebook, make sure you've installed requried packages:

`pip install langchain openai gradio youtube-transcript-api pytube python-dotenv`


## Step 1: Load Api keys .env

First, let's load the .env file to get OPENAI_API_KEY

In [1]:

from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())


## Step 2: YouTube Transcript Loader Function
Create a function that takes a YouTube URL, extracts the video transcript and title, and returns them.

First, we need to extract the video ID from the YouTube URL because Langchain's `YoutubeLoader` requires a video ID to load the transcript.

In [2]:
from urllib.parse import urlparse, parse_qs

def extract_video_id_from_url(url):
    """
    Extract the YouTube video ID using urllib.parse.
    """
    video_id = None
    parsed_url = urlparse(url)
    
    if "youtube.com" in parsed_url.netloc:
        parsed_query = parse_qs(parsed_url.query)
        video_id = parsed_query.get("v", [None])[0]
    elif "youtu.be" in parsed_url.netloc:
        video_id = parsed_url.path[1:]
        
    return video_id


In [3]:
from typing import Optional, Tuple
from langchain.document_loaders import YoutubeLoader

def get_transcript_and_metadata(url: str) -> Tuple[Optional[str], Optional[str]]:
    """
    Returns the transcript and title from a YouTube URL.

    Parameters:
    url (str): The YouTube URL from which the transcript and title will be extracted.

    Returns:
    transcript (str): The transcript of the video.
    title (str): The title of the video.
    """
    try:
        vid_id = extract_video_id_from_url(url)
        loader = YoutubeLoader(vid_id, add_video_info=True)
        docs = loader.load()
        if docs:
            doc = docs[0]
            transcript = doc.page_content
            title = doc.metadata["title"]
            return transcript, title
        else:
            return None, None
    except Exception as e:
        print(f"Failed to load transcript and title from URL {url}: {e}")
        return None, None


Testing the function

In [4]:
# get_transcript_and_metadata("https://youtu.be/Z6sCl6abJj4?si=627FWCed9VtYTcbR")

## Step 3: Create Chain for Summary
Use LangChain's Chain to create a chain that will summarize the YouTube transcript.

- for summaries we'll use the `gpt-3.5-turbo-16k` model to handle longer transcripts
- we'll also use a smaller temperature to increase reasoning


<img src="images/Chains_seq.png" alt="Image Alt Text" width="600">

In [5]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI

llm_summary = ChatOpenAI(model_name='gpt-3.5-turbo-16k', temperature=.3)
summary_template = """Please summarize the following transcript in a form of a list with key takeaways.\
Tailor the summary for the person who is {info_about_me}.\

Transcript: {transcript}
"""

summary_prompt_template = PromptTemplate(input_variables=["transcript", "info_about_me"], template=summary_template)
summary_chain = LLMChain(llm=llm_summary, prompt=summary_prompt_template, output_key="summary")


## Step 4: Create Chain for Idea Generation
Create another Chain that will take the summary, your info, and your target audience info to generate content ideas.

In [6]:
llm_idea = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=.7)

idea_template = """Given the summarized content,\
and knowing that the creator is specialized in {info_about_me} and\
the target audience is interested in {info_about_audience},\
what are some content ideas that can be generated?\
Summary: {summary}"""

idea_prompt_template = PromptTemplate(input_variables=["summary", "info_about_me", "info_about_audience"], template=idea_template)
idea_chain = LLMChain(llm=llm_idea, prompt=idea_prompt_template, output_key="content_ideas")


## Step 5: Sequential Chain
Create a SequentialChain that combines both the summary and idea generation Chains.

Although we're using `SequentialChain`, our model is simple.

<img src="images/Chains_simple_seq.png" alt="Image Alt Text" width="600">

In [7]:
from langchain.chains import SequentialChain

overall_chain = SequentialChain(
    chains=[summary_chain, idea_chain],
    input_variables=["transcript", "info_about_me", "info_about_audience"],
    output_variables=["summary", "content_ideas"],
    verbose=True
)


## Step 6: Gradio Interface with Additional Inputs
Update the Gradio interface to include fields for entering information about you and your target audience.

In [8]:
ABOUT_ME = """An NLP Engineer with a background in Full-Stack Development,\
specialized in Large Language Models and Generative AI.\
Creates educational content and shares it on LinkedIn, YouTube and Medium."""

TARGET_AUDIENCE = """Aspiring NLP engineers, data scientists, and tech enthusiasts who are interested in leveraging cutting-edge AI technologies.\
They look for practical guides and insights into building projects with Large Language Models."""

In [9]:
import gradio as gr

def execute_chain(url: str, info_about_me: str, info_about_audience: str):
    transcript = get_transcript_and_metadata(url)
    if transcript:
        inputs = {
            "transcript": transcript,
            "info_about_me": info_about_me,
            "info_about_audience": info_about_audience,
        }
        output = overall_chain(inputs)
        return output["summary"], output["content_ideas"]
    else:
        return "Failed to load transcript.", "Cannot generate content ideas without transcript."

demo = gr.Interface(
    fn=execute_chain,
    inputs=[
        "text",
        gr.Textbox(lines=4, value=ABOUT_ME, label="About Me"),
        gr.Textbox(lines=2, value=TARGET_AUDIENCE, label="Target Audience"),
    ],
    outputs=[
        gr.Textbox(label="Video Summary"), 
        gr.Textbox(label="Content Ideas"),
    ],
)

demo.launch(debug=True)


  from .autonotebook import tqdm as notebook_tqdm


Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Keyboard interruption in main thread... closing server.


