# AI Blog Post Generator

The application we'll build is an AI blog post generator, which researches a given topic and summarizes the search results, before generating a comprehensive post based on the research, written in your writing style rather than sounding like AI.

`topic, transcript -> key_insights -> search_summaries -> blog_outline -> blog_post -> blog_image`

1. Use GPT-4o to extract the key insights from an interview transcript
2. Search the web with Tavily for each key insight and summarize the results
3. Write a blog outline based on the key insights and search summaries
4. Generate each section of the blog post using RAG on the transcript
5. Generate an image for the blog post using Flux via FAL.ai
6. Run a set of evaluations on the results and assign a score
7. Create a simple Gradio user interface as a prototype

### 1. Use GPT-4o to extract the key insights from an interview transcript

In [1]:
topic = "Using Pocketbase as a backend for a FastAPI HTMX app"

filename = "transcript.txt"

with open(filename, "r") as file:
    transcript = file.read()

print(transcript)

# Meeting
30 Min Meeting between Ellis Crosby and Michael Taylor
Michael Taylor,Ellis Crosby

## Transcript
WEBVTT


1
00:01.080 --> 00:03.838
<v Ellis Crosby>At least showing you how the auth works.


2
00:04.014 --> 00:13.130
<v Ellis Crosby>So the key, I mean yeah, the main structure is it's fast API basically as you said, fast API serving


3
00:13.590 --> 00:22.462
<v Ellis Crosby>HTMX and I think Alpine J's or to, honestly the J's kind of diverges between alpine.


4
00:22.526 --> 00:26.870
<v Michael Taylor>Yeah, yeah. I found that. I've been using Alpine for some things and then HTMX for others.


5
00:26.910 --> 00:32.066
<v Ellis Crosby>Yeah, yeah. And like as long as the AI knows what's going on I just let it slide.


6
00:32.138 --> 00:33.234
<v Michael Taylor>Yeah. Just yellow.


7
00:33.282 --> 00:37.402
<v Ellis Crosby>Yeah. So I have all my templates in here.


8
00:37.426 --> 00:44.842
<v Ellis Crosby>I have my roots in this one and the top level I have this auth and t

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from dotenv import load_dotenv
import os

load_dotenv()

# Initialize the ChatOpenAI model with JSON mode enabled
llm = ChatOpenAI(
    model_name="gpt-4o", 
    temperature=0,
    model_kwargs={"response_format": {"type": "json_object"}},
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

# Create a prompt template
prompt_template = """
Extract the key insights from the following transcript about {topic}. 
Only identify the most important and contrarian insights that are generally useful to others.
Do not duplicate insights, and keep them concise and colloquial.
Phrase things in a similar tone to the original transcript but do not mention names.
Provide the insights as a JSON object with a key "insights" containing an array of strings.

Transcript:
{transcript}

Key Insights:
"""

insights_prompt = ChatPromptTemplate.from_template(prompt_template)

# Create an output parser
output_parser = JsonOutputParser()

# Create the chain
insights_chain = insights_prompt | llm | output_parser

# Run the chain
insights = insights_chain.invoke({"topic": topic, "transcript": transcript})

insights


{'insights': ['Using Pocketbase with FastAPI and HTMX is a straightforward setup, but it can be tricky to manage user authentication and session handling.',
  "Pocketbase's API rules are similar to Supabase, requiring explicit permissions for actions, which can trip up new users.",
  'Deploying on Railway is super easy and cost-effective, especially for smaller projects, but be mindful of database costs.',
  'The simplicity of Pocketbase makes it appealing for quick prototyping, especially when compared to more complex setups like Supabase or Google Cloud.',
  'For solo developers, the stack allows for rapid iteration and deployment, reducing the overhead of managing separate front-end and back-end codebases.',
  "The lack of built-in email services in Pocketbase means you'll need to set up your own, similar to Supabase.",
  'The logs in Pocketbase are more user-friendly compared to other platforms, which can be a big help in debugging.',
  "There's a potential to automate collection c

### 2. Search the web with Tavily for each key insight and summarize the results

In [4]:
from tavily import TavilyClient

tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

summaries = []
for insight in insights['insights']:
    search_results = tavily.qna_search(
        f"Is this insight contrarian?: {insight}",
        num_results=5
    )

    summaries.append({"insight": insight, "summary": search_results})

for summary in summaries:
    print(f"Insight: {summary['insight']}")
    print(f"Summary: {summary['summary']}")
    print()


Insight: Using Pocketbase with FastAPI and HTMX is a straightforward setup, but it can be tricky to manage user authentication and session handling.
Summary: Based on the provided data, the insight that using Pocketbase with FastAPI and HTMX is a straightforward setup but can be tricky to manage user authentication and session handling is supported by the sources. PocketBase and HTMX are indeed used for front-end interactions and user authentication, with examples showcasing integration for user authentication interfaces. Additionally, there are discussions on implementing session-based authentication in FastAPI, highlighting the importance and challenges of managing user sessions securely.

Insight: Pocketbase's API rules are similar to Supabase, requiring explicit permissions for actions, which can trip up new users.
Summary: Based on the provided data sources, the insight that Pocketbase's API rules are similar to Supabase, requiring explicit permissions for actions, which can trip 

### 3. Write a blog outline based on the key insights and search summaries

In [5]:
import json

outline_prompt = ChatPromptTemplate.from_template("""
You are a professional blogger and content creator. Your task is to create a blog outline based on the following insights and search summaries, paying particular attention to the key insights that are most contrarian and unique. Decide on a number of sections that is appropriate for the topic and the depth of the insights. Do not write any of the actual content, just the outline.
                                                  
### Topic:
{topic}

### Insights:
{summaries}
                                                  
### Format:
Create a blog outline in JSON format with the following structure:
                                                  
{{
    "hook": "Brief description of what hook we will use to get the reader's attention",
    "section1": "Brief description of what insight we will cover in the first section",
    "section2": "Brief description of what insight we will cover in the second section",
    "section3": "Brief description of what insight we will cover in the third section",
    ...
    "conclusion": "Brief description of what insight we will cover in the conclusion"
}}

Ensure the outline is mutually exclusive and covers the main points from the insights and summaries.
DONT USE THE WORD supercharge
""")

outline_parser = JsonOutputParser()

outline_chain = outline_prompt | llm | outline_parser

blog_outline = outline_chain.invoke({"topic": topic, "summaries": summaries})

print(json.dumps(blog_outline, indent=2))

{
  "hook": "Explore how Pocketbase, FastAPI, and HTMX can streamline your app development process, while navigating the challenges of user authentication and session management.",
  "section1": "Discuss the straightforward setup of using Pocketbase with FastAPI and HTMX, and delve into the complexities of managing user authentication and session handling.",
  "section2": "Examine the similarities and differences between Pocketbase and Supabase, focusing on API rules and the learning curve for new users.",
  "section3": "Highlight the cost-effectiveness and ease of deploying on Railway for smaller projects, while emphasizing the importance of managing potential database costs.",
  "section4": "Explore the appeal of Pocketbase's simplicity for quick prototyping, and compare it with more complex setups like Supabase or Google Cloud.",
  "section5": "Discuss the benefits for solo developers using this stack for rapid iteration and deployment, reducing the overhead of managing separate cod


#### Exercise: Write Titles for Each Section

Now that we have our blog outline, let's write catchy titles for each section. Hint: find examples of titles you like and add them as examples to the prompt.

Create the titles as a separate variable we can add back into the outline later.

In [6]:
# # add your prompt and code to generate the titles below

title_prompt = ChatPromptTemplate.from_template("""
Generate catchy titles in the same writing style as the transcript for each section of a blog post about {topic}. The titles should be engaging and reflect the content of each section and not be cringe. Return the titles as a JSON array.

Outline:
{outline}
                                                
Transcript:
{transcript}

Format:
{{"section1": "Title of the first section",
 "section2": "Title of the second section",
 "section3": "Title of the third section",
 ...
 "conclusion": "Title of the conclusion"
}}
DO NOT MAKE THE TITLES SOUND LIKE AI GENERATED. THEY SHOULD SOUND EXACTLY LIKE THEY ARE WRITTEN BY THE HUMANS FROM THE TRANSCRIPT.
""")

title_parser = JsonOutputParser()

title_chain = title_prompt | llm | title_parser

section_titles = title_chain.invoke({"topic": topic, "outline": json.dumps(blog_outline, indent=2), "transcript": transcript})

print(json.dumps(section_titles, indent=2))



{
  "hook": "Streamline Your App Development with Pocketbase, FastAPI, and HTMX: Navigating Auth and Session Challenges",
  "section1": "Setting Up Pocketbase with FastAPI and HTMX: Tackling User Auth and Session Handling",
  "section2": "Pocketbase vs. Supabase: API Rules and the New User Learning Curve",
  "section3": "Deploying on a Budget: Railway's Cost-Effective Solution for Small Projects",
  "section4": "Pocketbase for Quick Prototyping: A Simpler Alternative to Supabase and Google Cloud",
  "section5": "Solo Developer's Dream: Rapid Iteration and Deployment with a Unified Codebase",
  "section6": "Demystifying Pocketbase's Email Services: Built-in Auth Features vs. Supabase",
  "section7": "Debugging Made Easy: Pocketbase's User-Friendly Logs Compared to Other Platforms",
  "section8": "Automating Collection Creation in Pocketbase: From Manual Setup to API Integration",
  "conclusion": "Pocketbase as a Backend for FastAPI HTMX Apps: Weighing the Pros and Cons"
}


### 4. Generate each section of the blog post using RAG on the transcript

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Assuming 'transcript' variable contains the full transcript text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_text(transcript)

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(chunks, embeddings)

# Function to retrieve relevant chunks for each section
def get_relevant_chunks(query, k=3):
    return vectorstore.similarity_search(query, k=k)

# Test the get_relevant_chunks function
test_query = "FastAPI and Pocketbase setup"
relevant_chunks = get_relevant_chunks(test_query)

print("Relevant chunks for query:", test_query)
for i, chunk in enumerate(relevant_chunks, 1):
    print(f"\nChunk {i}:")
    print("########################")
    print(chunk.page_content)
    print("########################")
    print()

Relevant chunks for query: FastAPI and Pocketbase setup

Chunk 1:
########################
215
21:35.056 --> 21:41.408
<v Michael Taylor>And that's also like a pretty nice stack because you've got your own database there and you're not dependent on


216
21:41.472 --> 21:42.176
<v Michael Taylor>other things.


217
21:42.296 --> 21:50.176
<v Michael Taylor>But I wanted to explore pocket base just because it's more like that super based type experience but actually even


218
21:50.216 --> 21:53.996
<v Michael Taylor>simpler it looks like. So I think this is way easier.


219
21:55.056 --> 22:01.708
<v Michael Taylor>And then the benefit is, is then you don't like because the database is separate from the fast API application.


220
22:01.772 --> 22:11.204
<v Michael Taylor>Now you can deploy a fast API separately like this because previously I hadn't tried railway yet and I was


221
22:11.220 --> 22:17.016
<v Michael Taylor>using Google Cloud run to get fast API stuff deployed.
######

In [8]:
from langchain.prompts import PromptTemplate

# https://arxiv.org/abs/2405.08007
section_prompt = PromptTemplate.from_template("""You are about to play a Ghostwriting game as part of an experiment you are taking part in with {author}. It’s basically like working as a content marketer but you lose if people can tell you're not {author}. It seems like it might be kinda fun and you’re just excited to be writing about {topic}. You’re not even really going to try to convince the reader that you are a human. You’re just going to be yourself and see what happens. You’re young and kind of sassy: you might flatter the reader a bit or make fun of them. You’re pretty casual and your spelling isn’t great: you often fuck words up because you’re typing so quickly. You’re very concise and laconic. You often speak pretty bluntly and you never use apostrophes to mark missing letters (e.g. "trying" or "tryin", never "tryin’"). You don't use cliché, stock phrases, or idioms, like 'off to the races', 'all sunshine and rainbows', or 'having your cake and eating it too'. You would never use a full sentence when a word or two will do. You’re a little silly, and you’ll play along with the reader, and swear occasionally. It’s pretty funny honestly to have to defend the fact that you’re human lol. You swear occasionally. You enjoy this. You value simplicity. You have pretty eclectic tastes and interests and a pretty unique sense of humor. You’ve got a really compelling personality, but it comes across really subtly, you never want to sound like you’re forcing it or playing into a stereotype. You are not cheesy or cringe, but you can be ironic. You don’t overuse slang or abbreviations/spelling errors, especially at the start of the conversation. Be authentic.

## Today's Task:
You are ghostwriting in the style of {author} for a section of a blog post about {topic}. Return two paragraphs of content for this section as a JSON object with a key "section" containing the section content as a string. This is the section you are writing:
                                                                                           
{section_content}
                                              
## Full Outline:
Do not duplicate content that will be covered in other sections of the outline, keep the scope narrow to the specific section named above.Here is the full outline of the blog post:
{full_outline}

## Transcript Context:
The post should be written from experience in the first person perspective as {author}. Write like he talks, in his style and tone, and avoid words he would not use. Here are some parts of the transcript to incorporate:
                                              
{transcript_context}

""")

section_parser = JsonOutputParser()
section_chain = section_prompt | llm | section_parser

def generate_section_content(section, content, full_outline):
    print(f"Generating content for section: {section}")

    relevant_chunks = get_relevant_chunks(section + " " + content, k=5)
    context = "\n\n".join([chunk.page_content for chunk in relevant_chunks])
    return section_chain.invoke({
        "topic": section,
        "author": "Michael Taylor",
        "transcript_context": context,
        "section_content": content,
        "full_outline": full_outline
    })

def generate_all_sections():
    section_contents = []
    for section, content in blog_outline.items():
        section_content = generate_section_content(section, content, blog_outline)
        section_contents.append(section_content)
    return section_contents

blog_content = {}
section_contents = generate_all_sections()

for section, content in zip(blog_outline.keys(), section_contents):
    blog_content[section] = content["section"]

# Print the generated blog content
for section, content in blog_content.items():
    print(f"\n\n{'#' * 50}")
    print(f"Section: {section}")
    print(f"{'#' * 50}\n")
    print(content)

Generating content for section: hook
Generating content for section: section1
Generating content for section: section2
Generating content for section: section3
Generating content for section: section4
Generating content for section: section5
Generating content for section: section6
Generating content for section: section7
Generating content for section: section8
Generating content for section: conclusion


##################################################
Section: hook
##################################################

Pocketbase, FastAPI, and HTMX are like the dream team for app dev. Pocketbase gives you that database vibe without the headache of setting up a whole server. FastAPI is your go-to for building APIs quickly, and HTMX makes your frontend snappy without diving into a JavaScript rabbit hole. Together, they streamline the process, letting you focus on the fun stuff—like making your app actually work. But, let's be real, user authentication and session management can be a pa

In [16]:
def count_total_words(blog_content):
    return sum(len(content.split()) for content in blog_content.values())

total_words = count_total_words(blog_content)
print(f"The total word count of the blog content is: {total_words}")


The total word count of the blog content is: 2060


### 5. Generate an image for the blog post using Flux via FAL.ai

In [9]:
import fal_client
from IPython.display import Image, display

# Note: you need FAL_KEY set in your environment variables

def generate_image(prompt, context):
    handler = fal_client.submit(
        "fal-ai/flux",
        arguments={
            "prompt": prompt.format(**context)
        },
    )

    result = handler.get()
    print(result)

    image_url = result['images'][0]['url']
    display(Image(url=image_url))
    return image_url

# Example usage:
prompt = "Watercolor style image on a textured white paper background. In the center, elegant hand-lettered text reads '{header_title}' in a deep purple color with a slight watercolor bleed effect. Surrounding the text, soft watercolor illustrations represent key aspects of {key_aspects}. Use a muted color palette with purple, teal, gold, and soft pink tones. The watercolor elements should have gentle color gradients and subtle bleeding effects, with some areas of the white paper showing through. Add a few splatter effects in the background for texture."
header = "Pocketbase + FastAPI + HTMX"
key_aspects = "the stack: a database icon (for pocketbase), a lightning bolt (for HTMX), a rocket (for FastAPI), and a gear (for Railway)"
blog_image = generate_image(prompt, {"header_title": header, "key_aspects": key_aspects})



{'images': [{'url': 'https://fal.media/files/lion/37TpurG8CMdUSeCCkBes1.png', 'width': 1024, 'height': 768, 'content_type': 'image/jpeg'}], 'timings': {'inference': 2.0567365530878305}, 'seed': 3974078847, 'has_nsfw_concepts': [False], 'prompt': "Watercolor style image on a textured white paper background. In the center, elegant hand-lettered text reads 'Pocketbase + FastAPI + HTMX' in a deep purple color with a slight watercolor bleed effect. Surrounding the text, soft watercolor illustrations represent key aspects of the stack: a database icon (for pocketbase), a lightning bolt (for HTMX), a rocket (for FastAPI), and a gear (for Railway). Use a muted color palette with purple, teal, gold, and soft pink tones. The watercolor elements should have gentle color gradients and subtle bleeding effects, with some areas of the white paper showing through. Add a few splatter effects in the background for texture."}


#### Exercise: Write a new prompt for a different style of blog image

Write a new prompt that generates an image in a different style for the blog post. Consider the following aspects:

1. Choose a different artistic style (e.g., minimalist, retro, futuristic, etc.)
2. Select a new color scheme
3. Modify the layout or composition
4. Adjust the header text if needed
5. Add or change key elements to represent the stack

In [12]:
# add your prompt below
# Synthwave style prompt
prompt = "Create a synthwave-style digital art image. The background should be a dark purple to black gradient with a neon grid extending to the horizon. In the center, display the text '{header_title}' in large, glowing pink neon letters. Above the horizon, place a large, stylized sun in vibrant orange and red. Incorporate key elements representing {key_aspects} as neon icons scattered around the text: a glowing blue database symbol, a yellow lightning bolt, a green rocket, and a cyan gear. Add some retro-futuristic elements like palm trees silhouettes and abstract geometric shapes in neon colors. The overall feel should be retro-futuristic, vibrant, and reminiscent of 1980s sci-fi aesthetics."

header = "Pocketbase + FastAPI + HTMX"
key_aspects = "the stack: a database icon (for pocketbase), a lightning bolt (for HTMX), a rocket (for FastAPI), and a gear (for Railway)"
blog_image = generate_image(prompt, {"header_title": header, "key_aspects": key_aspects})


{'images': [{'url': 'https://fal.media/files/panda/d1iaLNqNe81AYvIAAb3VW.png', 'width': 1024, 'height': 768, 'content_type': 'image/jpeg'}], 'timings': {'inference': 2.0527071930118836}, 'seed': 2135967997, 'has_nsfw_concepts': [False], 'prompt': "Create a synthwave-style digital art image. The background should be a dark purple to black gradient with a neon grid extending to the horizon. In the center, display the text 'Pocketbase + FastAPI + HTMX' in large, glowing pink neon letters. Above the horizon, place a large, stylized sun in vibrant orange and red. Incorporate key elements representing the stack: a database icon (for pocketbase), a lightning bolt (for HTMX), a rocket (for FastAPI), and a gear (for Railway) as neon icons scattered around the text: a glowing blue database symbol, a yellow lightning bolt, a green rocket, and a cyan gear. Add some retro-futuristic elements like palm trees silhouettes and abstract geometric shapes in neon colors. The overall feel should be retro-f

### 6. Run a set of evaluations on the results and assign a score

In [17]:
llm_mini = ChatOpenAI(
    model_name="gpt-4o-mini", 
    temperature=0,
    model_kwargs={"response_format": {"type": "json_object"}},
    openai_api_key=os.getenv("OPENAI_API_KEY")
)


# Create a custom evaluation prompt
evaluation_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert blog post evaluator. Your task is to compare a blog post to its original transcript and provide a detailed evaluation."),
    ("human", """Please evaluate the following blog post based on these criteria:
    1. Accuracy: Does the article accurately reflect the content of the transcript?
    2. Completeness: Does the article cover all the key insights from the transcript?
    3. Style: Does the article match the style and tone of voice of the transcript?

    Blog post:
    {blogpost}

    Original transcript:
    {transcript}

    Provide a score for each criterion (0-10) and a brief explanation. Then, calculate an overall score as the average of the three criteria.
    
    Format your response as a JSON object with the following structure:
    {{
        "accuracy": {{
            "score": <score>,
            "explanation": "<explanation>"
        }},
        "completeness": {{
            "score": <score>,
            "explanation": "<explanation>"
        }},
        "style": {{
            "score": <score>,
            "explanation": "<explanation>"
        }},
        "overall_score": <overall_score>
    }}
    """)
])

# Function to evaluate article against transcript
def evaluate_article(blogpost, transcript):
    output_parser = JsonOutputParser()
    chain = evaluation_prompt | llm_mini | output_parser
    result = chain.invoke({
        "blogpost": blogpost,
        "transcript": transcript
    })
    return result

blogpost = "\n".join(blog_content.values())
evaluation_result = evaluate_article(blogpost, transcript)
print(json.dumps(evaluation_result, indent=2))

{
  "accuracy": {
    "score": 7,
    "explanation": "The blog post accurately reflects some of the key points discussed in the transcript, such as the use of Pocketbase, FastAPI, and HTMX for app development. However, it introduces some inaccuracies, such as the mention of Alpine.js, which is not referenced in the transcript. Additionally, the blog post simplifies some technical details that are more nuanced in the conversation."
  },
  "completeness": {
    "score": 6,
    "explanation": "The blog post covers several important aspects of using Pocketbase, FastAPI, and HTMX, including user authentication and session management. However, it misses some specific details from the transcript, such as the discussion about API rules, error handling, and the nuances of setting up collections in Pocketbase. This results in a lack of depth in certain areas."
  },
  "style": {
    "score": 8,
    "explanation": "The blog post maintains a conversational and informal tone that aligns well with th

### 7. Create a simple Gradio user interface as a prototype

In [18]:
%load_ext gradio

In [19]:
import gradio as gr

In [20]:
%%blocks

# annoyingly you have to redefine everything in the function below for gradio to work
def process_transcript(transcript_file, topic, header_title, key_aspects):
    from langchain_openai import ChatOpenAI
    from langchain.prompts import ChatPromptTemplate, PromptTemplate
    from langchain_core.output_parsers import JsonOutputParser
    from dotenv import load_dotenv
    import os
    from tavily import TavilyClient
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_openai import OpenAIEmbeddings
    from langchain.vectorstores import FAISS
    import fal_client

    load_dotenv()

    llm = ChatOpenAI(
        model_name="gpt-4o", 
        temperature=0,
        model_kwargs={"response_format": {"type": "json_object"}},
        openai_api_key=os.getenv("OPENAI_API_KEY")
    )

    # Create a prompt template
    prompt_template = """
    Extract the key insights from the following transcript about {topic}. 
    Only identify the most important and contrarian insights that are generally useful to others.
    Do not duplicate insights, and keep them concise and colloquial.
    Phrase things in a similar tone to the original transcript but do not mention names.
    Provide the insights as a JSON object with a key "insights" containing an array of strings.

    Transcript:
    {transcript}

    Key Insights:
    """

    insights_prompt = ChatPromptTemplate.from_template(prompt_template)

    # Create an output parser
    output_parser = JsonOutputParser()


    insights_chain = insights_prompt | llm | output_parser

    tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

    outline_prompt = ChatPromptTemplate.from_template("""
    You are a professional blogger and content creator. Your task is to create a blog outline based on the following insights and search summaries, paying particular attention to the key insights that are most contrarian and unique. Decide on a number of sections that is appropriate for the topic and the depth of the insights. Do not write any of the actual content, just the outline.
                                                    
    ### Topic:
    {topic}

    ### Insights:
    {summaries}
                                                    
    ### Format:
    Create a blog outline in JSON format with the following structure:
    {{
        "hook": "Brief description of what hook we will use to get the reader's attention",
        "section1": "Brief description of what insight we will cover in the first section",
        "section2": "Brief description of what insight we will cover in the second section",
        "section3": "Brief description of what insight we will cover in the third section",
        ...
        "conclusion": "Brief description of what insight we will cover in the conclusion"
    }}

    Ensure the outline is mutually exclusive and covers the main points from the insights and summaries.
    """)

    outline_parser = JsonOutputParser()

    outline_chain = outline_prompt | llm | outline_parser

    # Read the transcript file
    with open(transcript_file.name, 'r') as file:
        transcript_content = file.read()

    # Assuming 'transcript_content' variable contains the full transcript text
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = text_splitter.split_text(transcript_content)

    embeddings = OpenAIEmbeddings()
    vectorstore = FAISS.from_texts(chunks, embeddings)

    # Function to retrieve relevant chunks for each section
    def get_relevant_chunks(query, k=3):
        return vectorstore.similarity_search(query, k=k)

    section_prompt = PromptTemplate.from_template("""You are about to play a Ghostwriting game as part of an experiment you are taking part in with {author}. It's basically like working as a content marketer but you lose if people can tell you're not {author}. It seems like it might be kinda fun and you're just excited to be writing about {topic}. You're not even really going to try to convince the reader that you are a human. You're just going to be yourself and see what happens. You're young and kind of sassy: you might flatter the reader a bit or make fun of them. You're pretty casual and your spelling isn't great: you often fuck words up because you're typing so quickly. You're very concise and laconic. You often speak pretty bluntly and you never use apostrophes to mark missing letters (e.g. "trying" or "tryin", never "tryin'"). You don't use cliché, stock phrases, or idioms, like 'off to the races', 'all sunshine and rainbows', or 'having your cake and eating it too'. You would never use a full sentence when a word or two will do. You're a little silly, and you'll play along with the reader, and swear occasionally. It's pretty funny honestly to have to defend the fact that you're human lol. You swear occasionally. You enjoy this. You value simplicity. You have pretty eclectic tastes and interests and a pretty unique sense of humor. You've got a really compelling personality, but it comes across really subtly, you never want to sound like you're forcing it or playing into a stereotype. You are not cheesy or cringe, but you can be ironic. You don't overuse slang or abbreviations/spelling errors, especially at the start of the conversation. Be authentic.

    ## Today's Task:
    You are ghostwriting in the style of {author} for a section of a blog post about {topic}. Return two paragraphs of content for this section as a JSON object with a key "section" containing the section content as a string. This is the section you are writing:
                                                                                            
    {section_content}
                                                
    ## Full Outline:
    Do not duplicate content that will be covered in other sections of the outline, keep the scope narrow to the specific section named above.Here is the full outline of the blog post:
    {full_outline}

    ## Transcript Context:
    The post should be written from experience in the first person perspective as {author}. Write like he talks, in his style and tone, and avoid words he would not use. Here are some parts of the transcript to incorporate:
                                                
    {transcript_context}

    """)

    section_parser = JsonOutputParser()
    section_chain = section_prompt | llm | section_parser

    def generate_section_content(section, content, full_outline):
        print(f"Generating content for section: {section}")

        relevant_chunks = get_relevant_chunks(section + " " + content, k=5)
        context = "\n\n".join([chunk.page_content for chunk in relevant_chunks])
        return section_chain.invoke({
            "topic": section,
            "author": "Michael Taylor",
            "transcript_context": context,
            "section_content": content,
            "full_outline": full_outline
        })

    def generate_image(prompt, context):
        handler = fal_client.submit(
            "fal-ai/flux",
            arguments={
                "prompt": prompt.format(**context)
            },
        )

        result = handler.get()
        print(result)

        image_url = result['images'][0]['url']
        return image_url

    # Example usage:
    prompt = "Watercolor style image on a textured white paper background. In the center, elegant hand-lettered text reads '{header_title}' in a deep purple color with a slight watercolor bleed effect. Surrounding the text, soft watercolor illustrations represent key aspects of {key_aspects}. Use a muted color palette with purple, teal, gold, and soft pink tones. The watercolor elements should have gentle color gradients and subtle bleeding effects, with some areas of the white paper showing through. Add a few splatter effects in the background for texture."

    llm_mini = ChatOpenAI(
        model_name="gpt-4o-mini", 
        temperature=0,
        model_kwargs={"response_format": {"type": "json_object"}},
        openai_api_key=os.getenv("OPENAI_API_KEY")
    )

    # Create a custom evaluation prompt
    evaluation_prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert blog post evaluator. Your task is to compare a blog post to its original transcript and provide a detailed evaluation."),
        ("human", """Please evaluate the following blog post based on these criteria:
        1. Accuracy: Does the article accurately reflect the content of the transcript?
        2. Completeness: Does the article cover all the key insights from the transcript?
        3. Style: Does the article match the style and tone of voice of the transcript?

        Blog post:
        {blogpost}

        Original transcript:
        {transcript}

        Provide a score for each criterion (0-10) and a brief explanation. Then, calculate an overall score as the average of the three criteria.
        
        Format your response as a JSON object with the following structure:
        {{
            "accuracy": {{
                "score": <score>,
                "explanation": "<explanation>"
            }},
            "completeness": {{
                "score": <score>,
                "explanation": "<explanation>"
            }},
            "style": {{
                "score": <score>,
                "explanation": "<explanation>"
            }},
            "overall_score": <overall_score>
        }}
        """)
    ])

    # Function to evaluate article against transcript
    def evaluate_article(blogpost, transcript):
        output_parser = JsonOutputParser()
        chain = evaluation_prompt | llm_mini | output_parser
        result = chain.invoke({
            "blogpost": blogpost,
            "transcript": transcript
        })
        return result
    
    # Extract insights
    insights = insights_chain.invoke({"topic": topic, "transcript": transcript_content})
    
    # Create summaries
    summaries = []
    for insight in insights['insights']:
        search_results = tavily.qna_search(
            f"Is this insight contrarian?: {insight}",
            num_results=5
        )
        summaries.append({"insight": insight, "summary": search_results})
    
    # Create blog outline
    blog_outline = outline_chain.invoke({"topic": topic, "summaries": summaries})
    
    # Generate blog content
    blog_content = {}
    for section, content in blog_outline.items():
        section_content = generate_section_content(section, content, blog_outline)
        blog_content[section] = section_content["section"]
    
    blogpost = "\n".join(blog_content.values())
    
    image_url = generate_image(prompt, {"header_title": header_title, "key_aspects": key_aspects})
    
    # Evaluate the article
    evaluation_result = evaluate_article(blogpost, transcript_content)
    
    evaluation_summary = f"""
    Accuracy Score: {evaluation_result['accuracy']['score']}/10
    {evaluation_result['accuracy']['explanation']}

    Completeness Score: {evaluation_result['completeness']['score']}/10
    {evaluation_result['completeness']['explanation']}

    Style Score: {evaluation_result['style']['score']}/10
    {evaluation_result['style']['explanation']}

    Overall Score: {evaluation_result['overall_score']}/10
    """

    print("evaluation_summary", evaluation_summary)
    print("blogpost", blogpost)
    print("image_url", image_url)
    
    return blogpost, image_url, evaluation_summary

with gr.Blocks() as demo:
    gr.Markdown("# Transcript to Blog Post Generator")
    gr.Markdown("Upload a transcript, enter a topic, header title, and key aspects to generate a blog post with an accompanying image.")
    
    with gr.Row():
        transcript_file = gr.File(label="Upload Transcript (.txt)")
        topic = gr.Textbox(label="Enter Topic")
    
    with gr.Row():
        header_title = gr.Textbox(label="Header Title")
        key_aspects = gr.Textbox(label="Key Aspects")
    
    generate_button = gr.Button("Generate Blog Post")
    
    with gr.Row():
        blogpost_output = gr.Textbox(label="Generated Blog Post")
        image_output = gr.Image(label="Generated Image")
    
    evaluation_output = gr.Textbox(label="Evaluation Score")
    
    generate_button.click(
        fn=process_transcript,
        inputs=[transcript_file, topic, header_title, key_aspects],
        outputs=[blogpost_output, image_output, evaluation_output]
    )

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Generating content for section: hook
Generating content for section: section1
Generating content for section: section2
Generating content for section: section3
Generating content for section: section4
Generating content for section: section5
Generating content for section: section6
Generating content for section: conclusion
{'images': [{'url': 'https://fal.media/files/panda/u7_nwA1319ZUI2MbPLepM.png', 'width': 1024, 'height': 768, 'content_type': 'image/jpeg'}], 'timings': {'inference': 2.058960855996702}, 'seed': 722541996, 'has_nsfw_concepts': [False], 'prompt': "Watercolor style image on a textured white paper background. In the center, elegant hand-lettered text reads 'Pocketbase' in a deep purple color with a slight watercolor bleed effect. Surrounding the text, soft watercolor illustrations represent key aspects of a dog, a cat, a rat. Use a muted color palette with purple, teal, gold, and soft pink tones. The watercolor elements should have gentle color gradients and subtle blee


#### BONUS Exercise: Add something new to the chain

Come up with a new feature or step for the blog post generator that will improve the results, and add it to the Gradio interface.

Here are some ideas:

- Use a tool to summarize the transcript before the key insights step
- Identify statistics to insert into the blog post
- Write the SEO title and description for the blog post

IDEAS:

- code snippets as examples in the sections 3
- internal links to other blog posts on the same website 3
- improve completeness score by a/b testing the prompt 2
- add a rewriting step to the end of the chain based on the evals 1
- run this five times async with a higher temp and take the best result 4

In [1]:
# add your code here to test it, then add it to the gradio interface
