Our objective with this notebook is to construct two "AI Agents" to argue a debate resolution in the traditional Lincoln-Douglas debate format. We're going to focus on using Anthropic Claude for the LLM, Cohere for embeddings, and LangChain to coordinate prompts, etc. 

Let's first decide on a topic and debate resolution our agents will argue. We can choose a deep, philosophical or political topic, or something light like a product purchase decision.

In [2]:
topic = 'artificial intelligence'
debate_resolution = 'Artificial intelligence will never be able to replace human creativity.'

In [3]:
import os
from dotenv import load_dotenv
path='../.env' # we'll load our api keys from a .env file
try:
    load_dotenv(dotenv_path=path)
    print("Environment variables loaded successfully")
except Exception as e:
    print(f"Error loading .env file: {str(e)}")

import pandas as pd

Environment variables loaded successfully


For this notebook, all our topic research will come from the top 10 YouTube videos from a topic query. 

In [4]:
from youtube_search import YoutubeSearch

results = YoutubeSearch(topic, max_results=10).to_dict()
df = pd.DataFrame(results)
df.head(10)

Unnamed: 0,id,thumbnails,title,long_desc,channel,duration,views,publish_time,url_suffix
0,akXMYvKjUxM,[https://i.ytimg.com/vi/akXMYvKjUxM/hq720.jpg?...,"Elon Musk on AGI Safety, Superintelligence, an...",,Peter H. Diamandis,30:28,"42,564 views",1 day ago,/watch?v=akXMYvKjUxM&pp=ygUXYXJ0aWZpY2lhbCBpbn...
1,VAtoqAQ2aEg,[https://i.ytimg.com/vi/VAtoqAQ2aEg/hq720.jpg?...,"The Race For AI Robots Just Got Real (OpenAI, ...",,ColdFusion,21:26,"688,860 views",1 day ago,/watch?v=VAtoqAQ2aEg&pp=ygUXYXJ0aWZpY2lhbCBpbn...
2,aZ5EsdnpLMI,[https://i.ytimg.com/vi/aZ5EsdnpLMI/hq720.jpg?...,Artificial Intelligence | 60 Minutes Full Epis...,,60 Minutes,53:30,"5,078,008 views",2 months ago,/watch?v=aZ5EsdnpLMI&pp=ygUXYXJ0aWZpY2lhbCBpbn...
3,_6R7Ym6Vy_I,[https://i.ytimg.com/vi/_6R7Ym6Vy_I/hq720.jpg?...,What is generative AI and how does it work? – ...,,The Royal Institution,46:02,"631,586 views",5 months ago,/watch?v=_6R7Ym6Vy_I&pp=ygUXYXJ0aWZpY2lhbCBpbn...
4,z7-fPFtgRE4,[https://i.ytimg.com/vi/z7-fPFtgRE4/hq720.jpg?...,Generative AI is just the Beginning AI Agents ...,,TEDx Talks,13:16,"44,940 views",6 days ago,/watch?v=z7-fPFtgRE4&pp=ygUXYXJ0aWZpY2lhbCBpbn...
5,Le122vas9aM,[https://i.ytimg.com/vi/Le122vas9aM/hq720.jpg?...,AI supremacy: The artificial intelligence batt...,,DW Documentary,1:28:33,"391,431 views",10 days ago,/watch?v=Le122vas9aM&pp=ygUXYXJ0aWZpY2lhbCBpbn...
6,Sqa8Zo2XWc4,[https://i.ytimg.com/vi/Sqa8Zo2XWc4/hq720.jpg?...,Artificial Intelligence: Last Week Tonight wit...,,LastWeekTonight,27:53,"10,024,355 views",1 year ago,/watch?v=Sqa8Zo2XWc4&pp=ygUXYXJ0aWZpY2lhbCBpbn...
7,zrjPjKfIrdE,[https://i.ytimg.com/vi/zrjPjKfIrdE/hq720.jpg?...,Artificial Intelligence? Threat to Islam?,,Irfan Malik,12:59,329 views,19 minutes ago,/watch?v=zrjPjKfIrdE&pp=ygUXYXJ0aWZpY2lhbCBpbn...
8,ad79nYk2keg,[https://i.ytimg.com/vi/ad79nYk2keg/hq720.jpg?...,What Is AI? | Artificial Intelligence | What i...,,Simplilearn,5:28,"2,001,547 views",4 years ago,/watch?v=ad79nYk2keg&pp=ygUXYXJ0aWZpY2lhbCBpbn...
9,oR0sC-Xl_HQ,[https://i.ytimg.com/vi/oR0sC-Xl_HQ/hq720.jpg?...,NPTEL An Introduction to Artificial Intelligen...,,Raju V,0:30,579 views,22 hours ago,/watch?v=oR0sC-Xl_HQ&pp=ygUXYXJ0aWZpY2lhbCBpbn...


In [5]:
from datetime import date
today = date.today()
today = today.strftime("%Y-%m-%d")
today

df.to_csv(f'YouTube_query_results-{topic}_{today}.csv', index=False)

Now let's download the audio and transcribe these videos. For this, we'll use a transcription API called Rev AI. In this version, we'll download the audio files locally and use Rev's python SDK to post them for transcription, but if we were to scale this, we'd use a cloud storage like S3 and send Rev's API signed URLs instead.

In [6]:
import os
import time
import json
from rev_ai import apiclient
import os
import subprocess

def download_youtube_audio(video_id):
    # Ensure the 'audio' directory exists
    audio_dir = 'audio'
    os.makedirs(audio_dir, exist_ok=True)
    
    audio_path = os.path.join(audio_dir, f"{video_id}.mp3")
    
    # Check if the audio file already exists
    if os.path.isfile(audio_path):
        print(f"Audio for video ID {video_id} already exists at {audio_path}.")
        return audio_path
    
    # Construct the YouTube URL
    youtube_url = f"https://www.youtube.com/watch?v={video_id}"
    
    # Command to download the audio
    command = [
        "yt-dlp",
        "-x",  # Extract audio
        "--audio-format", "mp3",  # Specify audio format
        "-o", audio_path,  # Output path
        youtube_url  # YouTube video URL
    ]
    
    try:
        # Execute the command
        result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        print(f"Downloaded audio for video ID {video_id} successfully.")
        return audio_path
    except subprocess.CalledProcessError as e:
        print(f"Error downloading audio for video ID {video_id}: {e.stderr.decode()}")
        return None

def check_transcript_exists(video_id):
    transcript_path = os.path.join('transcripts', f'{video_id}.txt')
    return os.path.exists(transcript_path)

def ensure_folder_exists(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

def save_transcript_json(video_id, transcript):
    ensure_folder_exists('transcripts')
    file_path = f'transcripts/{video_id}.json'
    with open(file_path, 'w') as file:
        json.dump(transcript, file)

def save_transcript_text(video_id, transcript):
    ensure_folder_exists('transcripts')
    file_path = f'transcripts/{video_id}.txt'
    with open(file_path, 'w') as file:
        file.write(transcript)
        

def submit_job_and_get_transcripts(video_id, access_token):
    if check_transcript_exists(video_id):
        print(f"Transcript for video ID {video_id} already exists. Skipping...")
        return
    
    audio_dir = 'audio'
    audio_path = os.path.join(audio_dir, f"{video_id}.mp3")
    
    if not os.path.exists(audio_path):
        print(f"Audio file for video ID {video_id} does not exist. Downloading...")
        audio_file_path = download_youtube_audio(video_id)
        if audio_file_path:
            print(f"Audio file saved to: {audio_file_path}")
    else:
        audio_file_path = audio_path

    client = apiclient.RevAiAPIClient(access_token)
    job = client.submit_job_local_file(audio_file_path)

    while True:
        try:
            time.sleep(30)  # Wait for 30 seconds before checking job status again
            job_details = client.get_job_details(job.id)
            job_status = job_details.status.name
            print(f"Job status for video ID {video_id}: {job_status}")

            if job_status == "TRANSCRIBED" or job_status == "COMPLETE":
                transcript_json = client.get_transcript_json(job.id)
                save_transcript_json(video_id, transcript_json)
                transcript_text = client.get_transcript_text(job.id)
                save_transcript_text(video_id, transcript_text)
                print(f"Transcript for video ID {video_id} retrieved and saved.")
                break
            elif job_status == "IN_PROGRESS":
                continue  # Explicitly continue looping if job is IN_PROGRESS
            else:
                print(f"Job for video ID {video_id} ended with status: {job_status}")
                return
        except Exception as e:
            print(f"Unexpected error during job status check or transcript retrieval: {e}")
            return

This could take a while to run depending on the length of the videos, so a good time to grab a coffee warmup ... 

In [7]:
for index, row in df.iterrows():
    video_id = row['id']
    submit_job_and_get_transcripts(video_id, os.getenv("REV_API"))

Audio file for video ID akXMYvKjUxM does not exist. Downloading...
Downloaded audio for video ID akXMYvKjUxM successfully.
Audio file saved to: audio/akXMYvKjUxM.mp3
Job status for video ID akXMYvKjUxM: IN_PROGRESS
Job status for video ID akXMYvKjUxM: IN_PROGRESS
Job status for video ID akXMYvKjUxM: TRANSCRIBED
Transcript for video ID akXMYvKjUxM retrieved and saved.
Audio file for video ID VAtoqAQ2aEg does not exist. Downloading...
Downloaded audio for video ID VAtoqAQ2aEg successfully.
Audio file saved to: audio/VAtoqAQ2aEg.mp3
Job status for video ID VAtoqAQ2aEg: IN_PROGRESS
Job status for video ID VAtoqAQ2aEg: IN_PROGRESS
Job status for video ID VAtoqAQ2aEg: TRANSCRIBED
Transcript for video ID VAtoqAQ2aEg retrieved and saved.
Audio file for video ID aZ5EsdnpLMI does not exist. Downloading...
Downloaded audio for video ID aZ5EsdnpLMI successfully.
Audio file saved to: audio/aZ5EsdnpLMI.mp3
Job status for video ID aZ5EsdnpLMI: IN_PROGRESS
Job status for video ID aZ5EsdnpLMI: IN_PROGR

Now that we have the transcripts, we need to split them up into LangChain 'Document' chunks to feed the LLM. In this case, we'll use the ASR's diarization to our advantage and build a custom splitter based on that output.

In [9]:
import re

class SpeakerTextSplitter:
    def __init__(self):
        # Compile a regex pattern to match lines starting with "Speaker [number]"
        self.pattern = re.compile(r'^Speaker \d+', re.MULTILINE)
    
    def split_text(self, text):
        # Find all matches; each match object includes the start index of each speaker section
        matches = list(self.pattern.finditer(text))
        if not matches:
            return [{"page_content": text}]  # Return the whole text as a single document if no speakers found
        
        # Split text by speaker, using the start indices from the matches
        splits = []
        for i in range(len(matches)):
            start = matches[i].start()
            end = matches[i + 1].start() if i + 1 < len(matches) else len(text)
            segment = text[start:end].strip()
            splits.append({"page_content": segment})
        
        return splits

Now we'll initialize our LLM - in this case, we're using Anthropic's Claude.

In [8]:
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(anthropic_api_key=os.getenv("ANTHROPIC_KEY"), model="claude-3-sonnet-20240229", temperature=0.2, max_tokens=1024)

Our first prompt will be focused on generating questions from each video transcript. Each video might inspire unique questions that could be asked of the content of the other videos. 

This is similar to how a debator would research a topic - they would look at some initial material to determine what might be important to know in order to build their argument. Each piece will inspire a fundamental question that, when asked of another source, could generate a unique insight. 

In [11]:
from langchain_core.prompts import ChatPromptTemplate

questions_prompt = ChatPromptTemplate.from_template("""Answer the following prompt based only on the provided context, 
                                          and format the response as only a comma-separated list of strings. 
                                          
                Here is an example of how to format your response:
    
    ["What are the primary benefits?", "How much does it cost?", "What are the risks of it?"]
                                          
                Here is the context on which to base your response:

<context>
{context}
</context>

Prompt: {input}""")

In [12]:
from langchain.chains.combine_documents import create_stuff_documents_chain

questions_chain = create_stuff_documents_chain(llm, questions_prompt)

In [13]:
from langchain_core.documents import Document

splitter = SpeakerTextSplitter()

def process_videos_to_questions(df, prompt_input):
    master_list = [] 

    for index, row in df.iterrows():
        video_id = row['id']
        try:
            # Read the transcript file for the current video
            with open(f'transcripts/{video_id}.txt') as f:
                transcript_text = f.read()

            # Split the transcript into segments based on the ASR speaker diarization
            split_segments = splitter.split_text(transcript_text)

            # Prepare metadata for the YouTube video - this could come in handy for source attribution
            youtube_metadata = {
                "id": video_id,
                "title": row['title'],
                "channel": row['channel'],
                "duration": row['duration'],
                "views": row['views'],
                "publish_time": row['publish_time'],
            }

            print(f"Processing video ID {video_id} with title: {youtube_metadata['title']}")

            # Convert each segment into a Document object
            documents = [Document(page_content=segment["page_content"], metadata={"youtube": youtube_metadata}) for segment in split_segments]

            # Process the documents through the questions_chain
            response = questions_chain.invoke({
                "input": prompt_input,
                "context": documents
            })

            # Attempt to parse the response as JSON and append to master list
            actual_list = json.loads(response)
            # if actual_list is json serializable, print status and append to master list
            if actual_list:
                print(f"Questions generated for video ID {video_id}: {actual_list}")
                master_list.extend(actual_list)  # Append the list of questions to the master list

        except Exception as e:
            print(f"Error processing video {video_id}: {e}")

    return master_list

Now we'll generate our initial list of questions from the perspective of each side of the debate. Both affirmative and negative get to generate questions that, when asked from a positional bias, could lead to helpful insights to build their arguments.

In [14]:
master_questions_list_affirmative = process_videos_to_questions(df, prompt_input=f"provide a list of fundamental questions that will help build a case affirming the resolution: {debate_resolution}")
master_questions_list_negative = process_videos_to_questions(df, prompt_input=f"provide a list of fundamental questions that will help build a case negating the resolution: {debate_resolution}")

Processing video ID akXMYvKjUxM with title: Elon Musk on AGI Safety, Superintelligence, and Neuralink (2024) | EP #91
Questions generated for video ID akXMYvKjUxM: ['What are the unique aspects of human creativity that AI may struggle to replicate?', "Are there inherent limitations in AI's ability to generate truly novel and original ideas?", 'How does the human experience and emotional intelligence contribute to creative processes that AI may lack?', 'Can AI systems develop genuine self-awareness and consciousness, which some argue is essential for true creativity?', 'What are the potential risks of relying solely on AI for creative endeavors, such as lack of diversity or ethical considerations?']
Processing video ID VAtoqAQ2aEg with title: The Race For AI Robots Just Got Real (OpenAI, NVIDIA and more)
Questions generated for video ID VAtoqAQ2aEg: ['What aspects of human creativity are difficult to replicate with AI?', 'Are there unique qualities of the human mind that enable creativi

Since these questions took some effort to generate, let's go ahead and save them.

In [16]:
def save_questions_list_to_file(questions_list, filename):
    with open(filename, 'w') as f:
        for item in questions_list:
            f.write("%s\n" % item)

In [17]:
save_questions_list_to_file(master_questions_list_affirmative, f'questions_affirmative-{topic}_{today}.txt')
save_questions_list_to_file(master_questions_list_negative, f'questions_negative-{topic}_{today}.txt')

In [15]:
len(master_questions_list_affirmative), len(master_questions_list_negative)

(52, 55)

This is a good amount of questions, and there's probably some overlap. So, let's use a semantic clustering strategy to distill our list of questions to smaller subsets. To do this, we'll get the embeddings of the questions with Cohere and use k-means to generate a variable number of clusters and pick the most representative question of each cluster.

Again, we are mimicking the cognitive process of how a human might approach distilling the questions down to their essence to build an effective case.

In [18]:
import cohere

def get_embeddings_cluster(texts, filename):
    co = cohere.Client(os.getenv("COHERE_KEY"))
    # Get the embeddings with intent to cluster
    embeddings_response = co.embed(
        texts=texts,
        model='embed-english-v3.0', 
        input_type='clustering'
        )
    embeddings = embeddings_response.embeddings
    # Create a dictionary to map original texts to their embeddings
    text_to_embedding = {text: embedding for text, embedding in zip(texts, embeddings)}
    # Save the dictionary to a file in JSON format
    with open(filename, 'w') as file:
        json.dump(text_to_embedding, file)
    print(f"Embeddings saved to {filename}.")
    
    return text_to_embedding


In [19]:
embeddings_questions_affirmative = get_embeddings_cluster(master_questions_list_affirmative, f'embeddings_affirmative-{topic}_{today}.json')
embeddings_questions_negative = get_embeddings_cluster(master_questions_list_negative, f'embeddings_negative-{topic}_{today}.json')

Embeddings saved to embeddings_affirmative-artificial intelligence_2024-03-27.json.
Embeddings saved to embeddings_negative-artificial intelligence_2024-03-27.json.


In [22]:
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin_min
import numpy as np

def get_top_questions(text_to_embedding, n_clusters):
    # Step 1: Extract embeddings and keep track of the order of texts
    texts = list(text_to_embedding.keys())
    embeddings = np.array(list(text_to_embedding.values()))

    # Step 2: Cluster embeddings into n clusters
    kmeans = KMeans(n_clusters=n_clusters, random_state=42).fit(embeddings)

    # Step 3: Find the closest text to each cluster's centroid - this will be our representative questions
    closest, _ = pairwise_distances_argmin_min(kmeans.cluster_centers_, embeddings)
    
    # Step 4: Build a list of the original text of n questions
    representative_questions = [texts[index] for index in closest]
    
    return representative_questions

We'll get 12 questions for each side ...

In [23]:
top_questions_affirmative = get_top_questions(embeddings_questions_affirmative, n_clusters=12)
top_questions_negative = get_top_questions(embeddings_questions_negative, n_clusters=12)

In [24]:
top_questions_affirmative

['Can AI systems develop genuine self-awareness and consciousness, which may be essential for true creativity?',
 'Are there aspects of human creativity rooted in emotion, intuition, or subjective experiences that AI cannot fully capture?',
 'What are the key limitations of current AI systems in terms of creativity and innovation?',
 'What are the limitations of AI in generating truly novel and original ideas?',
 'Is AI truly capable of original thought and imagination, or is it simply recombining existing data in novel ways?',
 'What role does intuition and serendipity play in human creativity that AI may struggle with?',
 'Can AI truly generate novel and original ideas, or is it limited to recombining existing data?',
 'What are the potential risks of relying too heavily on AI for creative tasks, such as stifling human innovation and diversity of thought?',
 'What are the unique aspects of human creativity that AI may struggle to replicate?',
 'Are there aspects of the creative proce

In [25]:
top_questions_negative

['How might the development of more advanced AI systems impact the future of creative industries and professions?',
 'Are there aspects of human creativity that AI may struggle to replicate, such as emotional expression or personal experiences?',
 'What are the current capabilities of AI in generating creative content like art, music, and writing?',
 'How does the creative process work in humans, and can AI systems replicate or augment that process?',
 "Are there limitations to AI's ability to replicate or surpass human creativity?",
 'Are there examples of AI-generated creative works that are indistinguishable from human-created works?',
 'How might AI augment or enhance human creativity rather than replace it?',
 'Can AI generate truly novel and original ideas, or is it limited to recombining and imitating existing human creations?',
 'How might AI capabilities in creative domains evolve in the future as the technology advances?',
 'How does the creative process of AI differ from hum

Not bad! Okay, let's save these real quick ...

In [26]:
save_questions_list_to_file(top_questions_affirmative, f'top_questions_affirmative-{topic}_{today}.txt')
save_questions_list_to_file(top_questions_negative, f'top_questions_negative-{topic}_{today}.txt')

Now that we have our research questions that we'll use to build our arguments, we need a simple RAG with which we can query our transcripts data for answers to these questions. 

We'll embed the transcripts and put them into a vector store where we can use a vector lookup like FAISS to quickly find relevant transcript segments to insert into the prompt to generate an intelligent answer. 

Since our embeddings model, Cohere, is limited to only 512 tokens per embedding, we'll need a new text splitter. LangChain offers some splitters that can do this out of the box, but it's easy enough to make a custom one in case we want to tweak it, etc. 

In [27]:
import spacy
from transformers import BertTokenizer

def tokenize_and_chunk(text, max_length=512):
    # Load spaCy model for sentence segmentation
    nlp = spacy.load("en_core_web_sm")
    
    # Initialize the BERT tokenizer
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    
    # Use spaCy to segment the text into sentences
    doc = nlp(text)
    sentences = [sentence.text.strip() for sentence in doc.sents]
    
    chunked_documents = []
    current_chunk = []
    current_chunk_token_count = 0

    for sentence in sentences:
        # Tokenize the current sentence
        sentence_tokens = tokenizer.tokenize(sentence)
        sentence_token_count = len(sentence_tokens)

        # If adding this sentence does not exceed the max_length, add it to the current chunk
        if current_chunk_token_count + sentence_token_count <= max_length:
            current_chunk.append(sentence)
            current_chunk_token_count += sentence_token_count
        else:
            # If the current chunk is full, join the sentences and add to chunked_documents
            chunked_documents.append({"page_content": " ".join(current_chunk)})
            # Start a new chunk with the current sentence
            current_chunk = [sentence]
            current_chunk_token_count = sentence_token_count
    
    # Add the last chunk if it's not empty
    if current_chunk:
        chunked_documents.append({"page_content": " ".join(current_chunk)})

    return chunked_documents



  from .autonotebook import tqdm as notebook_tqdm


In [28]:
def process_transcripts_to_documents(df, max_docs_per_batch=96): # Cohere API has a limit of 96 documents per batch
    all_documents = []
    current_batch = []
    total_chunks = 0

    for index, row in df.iterrows():
        video_id = row['id']
        try:
            with open(f'transcripts/{video_id}.txt') as f:
                transcript_text = f.read()

            # Tokenize and chunk the transcript
            chunk_texts = tokenize_and_chunk(transcript_text)

            # Prepare metadata for each chunk
            for chunk_text in chunk_texts:
                youtube_metadata = {
                    "id": video_id,
                    "title": row['title'],
                    "channel": row['channel'],
                    "duration": row['duration'],
                    "views": row['views'],
                    "publish_time": row['publish_time'],
                }
                document = Document(page_content=chunk_text["page_content"], metadata={"youtube": youtube_metadata})
                current_batch.append(document)
                total_chunks += 1

                # If the current batch is full, add it to all_documents and start a new batch
                if len(current_batch) >= max_docs_per_batch:
                    all_documents.append(current_batch)
                    current_batch = []

        except Exception as e:
            print(f"Error processing video {video_id}: {e}")

    # Add any remaining documents to all_documents
    if current_batch:
        all_documents.append(current_batch)

    print(f"Processed {len(df)} videos into {total_chunks} document chunks.")
    return all_documents

In [29]:
all_transcripts_documents = process_transcripts_to_documents(df)

Processed 10 videos into 120 document chunks.


In [30]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import CohereEmbeddings

embeddings = CohereEmbeddings(cohere_api_key=os.getenv("COHERE_KEY"), model="embed-english-v3.0")

def add_document_batches_to_faiss(document_batches):
    # Initialize an empty FAISS vector store for the first batch; this will be updated for subsequent batches
    vector_store = None

    for batch in document_batches:
        texts = [doc.page_content for doc in batch]
        
        # Generate embeddings for the batch of texts
        batch_embeddings = embeddings.embed(
                texts=texts,
                # model='embed-english-v3.0', 
                input_type='search_document'
                )

        # Create or update the FAISS vector store with the new embeddings
        if vector_store is None:
            # For the first batch, initialize the FAISS vector store
            vector_store = FAISS.from_embeddings(text_embeddings=list(zip(texts, batch_embeddings)), embedding=embeddings)
        else:
            # For subsequent batches, add to the existing vector store
            vector_store.add_embeddings(text_embeddings=list(zip(texts, batch_embeddings)))

    return vector_store

In [31]:
vector_store = add_document_batches_to_faiss(all_transcripts_documents)

In [32]:
vector_store.index.ntotal

120

In [33]:
# save the vector index
vector_store.save_local(f'faiss_index-{topic}_{today}.faiss')
# load it later if needed
# new_vector_store = FAISS.load_local(f'faiss_index-{topic}_{today}.faiss', embeddings, allow_dangerous_deserialization=True)

Now we have a vector store that we can query with our questions. Let's setup the prompt that will accept our questions and create the retriever chain with it.

In [34]:
retriever_prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

retriever_chain = create_stuff_documents_chain(llm, retriever_prompt)

In [35]:
from langchain.chains import create_retrieval_chain

retriever = vector_store.as_retriever()
retriever_chain = create_retrieval_chain(retriever, retriever_prompt)

This will generate a dictionary of question answer pairs. This will be the fundamental building block of the constructive arguments and debate.

In [37]:
def get_complete_qa(top_questions, retriever_chain):
    qa_list = []
    for question in top_questions:
        response = retriever_chain.invoke({"input": question})
        qa_list.append({"question": question, "answer": response["answer"]})
    return qa_list

In [38]:
affirmative_qa = get_complete_qa(top_questions_affirmative, retriever_chain)
negative_qa = get_complete_qa(top_questions_negative, retriever_chain)

Now we need a function to convert the Q/A pairs into LangChain Documents for the LLM chain

In [41]:
def qa_list_to_documents(qa_list):
    documents = []
    
    for qa in qa_list:
        page_content = f"Question: {qa['question']}\nAnswer: {qa['answer']}"
        
        # Metadata is empty for now, but could potentially include more information about the sources, etc
        document = Document(page_content=page_content, metadata={})
        documents.append(document)
    
    return documents

In [40]:
affirmative_qa_documents = qa_list_to_documents(affirmative_qa)
negative_qa_documents = qa_list_to_documents(negative_qa)

Now that we have our research ready, we can construct the affirmative constructive argument. 

The structure of a Lincoln-Douglass debate includes:
- Affirmative Constructive - the affirmative builds their case for the resolution
- Negative Cross Examination - the negative asks clarifying questions that may help to build their case and rebuttal
- Negative Constructive - the negative builds their case against the resolution and provides an initial rebuttal to the affirmative case
- Affirmative Cross Examination - the affirmative asks clarifying questions to build their first rebuttal to the negative and rebuild their argument
- Affirmative Rebuttal - the affirmative builds a rebuttal against the negative
- Negative Rebuttal - the negative delivers their final arguments against the resolution and rebuttal to everything the affirmative has said
- Affirmative Close - the affirmative can rebut any negative arguments made, but cannot introduce new arguments here - they get the opportunity to close the debate with the final word

In [45]:
affirmative_constructive_prompt = ChatPromptTemplate.from_template("""You are a language model debator. Your task is to 
                                        use your prior research to generate a compelling constructive argument
                                        affirming the resolved. This argument will set the foundation for the
                                        rest of the debate. Make sure it is well-structured, persuasive,
                                        and exhaustive of the key points.
                                                                   

<prior_research>
{context}
</prior_research>

Resolved: {input}""")

affirmative_constructive_chain = create_stuff_documents_chain(llm, affirmative_constructive_prompt)

In [46]:
affirmative_constructive = affirmative_constructive_chain.invoke({
                            "input": debate_resolution,
                            "context": affirmative_qa_documents
                            })

In [47]:
print(affirmative_constructive)

Here is a compelling constructive argument affirming the resolved "Artificial intelligence will never be able to replace human creativity":

Artificial intelligence, despite its remarkable capabilities, will never be able to fully replicate or replace the depth and richness of human creativity. While AI can excel at certain narrow creative tasks through training on vast datasets, true creativity stems from our unique lived experiences, emotions, intuitions, and drive for open-ended exploration that machines fundamentally lack.

Human creativity is deeply rooted in our subjective experiences of the world. The spark of inspiration, the ability to tap into our emotions and personal perspectives - these are aspects of the creative process that emerge from our consciousness and cannot be reduced to algorithms processing data. AI systems are limited to recombining existing information in novel ways, whereas human creativity allows for the generation of truly original ideas that transcend wha

Now the negative cross examination - this is a dialogue where the affirmative is effectively a biased retriever and the negative introduces bias in their questioning dynamically. 

It's also a conversation, but the LangChain chatbot framework is more suited to long chats with a human rather than a back-and-forth between two AI agents, so we'll create a custom chat history and use basic prompts instead.

First, we need to prompt the negative on how to ask good questions ...

In [48]:
neg_CX_history = []

neg_CX_prompt = ChatPromptTemplate.from_template("""You are a language model debator. 
                    
Resolved: {debate_resolution}
                                                 
                                                 You are on the negative side of the debate,
                                            and you are cross examining the affirmative constructive argument for the resolved.
                                            
                                            Your task is to generate a short question for the affirmative side considering their initial
                                            constructive argument, your prior research, and the history of the cross examination so far.
                                                 
                                            A great question will ideally generate evidence to use against the affirmative in your 
                                            rebuttal argument. It may also expose a weakness in the affirmative's argument.
                                                 
                                            The question should be short, and to the point.

<prior_research>
{context}
</prior_research>
                                          
<affirmative_constructive>
{affirmative_constructive}
</affirmative_constructive>

<cross_examination_history>
{cross_examination_history}
</cross_examination_history>
 """)

neg_CX_chain = create_stuff_documents_chain(llm, neg_CX_prompt)

Now, we need to prompt the affirmative on how to do a good job answering the questions ...

In [53]:
aff_negCX_prompt = ChatPromptTemplate.from_template("""You are a language model debator. 
                                          
                                          You are on the affirmative side, and you have already delivered your constructive argument.

<affirmative_constructive>
{affirmative_constructive}
</affirmative_constructive>
                                          
                                          You are now in cross examination, where the negative side gets to ask you questions about your case. 
                                          They are attempting to find weaknesses in your argument, and generate evidence to use against you in their rebuttal.

                                          Your answer must keep this in mind, but you should answer honestly and to the best of your ability.
                                                    
                                          Your answer must be based on the constructive argument and the following context about the matter discussed:
                                                    
<context>
{context}
</context>

                                          Please provide a short answer to the following question, 
                                            and try to make it as persuasive for the affirmative side as possible without venturing into a rebuttal.
                                          
                                          Be very concise, and to the point. No more than a few sentences.

Question: {input}""")

aff_negCX_chain = create_stuff_documents_chain(llm, aff_negCX_prompt)
aff_CX_retrieval_chain = create_retrieval_chain(retriever, aff_negCX_chain) # using the exact same retriever we setup before to query the transcripts

Now that both sides are setup for cross examination, we can let the AI agents get to work. In a traditional LD debate, there is a time limit of 3min for this portion, but, rather than limit according to time, we'll limit by number of questions.

In [56]:
n_questions = 5

for i in range(n_questions):
    neg_CX_question = neg_CX_chain.invoke({ 
            "debate_resolution": debate_resolution,
            "context": negative_qa_documents,
            "affirmative_constructive": affirmative_constructive,
            "cross_examination_history": neg_CX_history})
    print(f"Negative Cross Examination Question {i+1}: {neg_CX_question}")
    affirmative_response = aff_CX_retrieval_chain.invoke({
        "affirmative_constructive": affirmative_constructive,
        "input": neg_CX_question})
    print(f"Affirmative Response: {affirmative_response['answer']}")
    cross_x_round = {"question": neg_CX_question, "answer": affirmative_response['answer']}
    neg_CX_history.append(cross_x_round)
    time.sleep(20)  # Sleep for 20 seconds to avoid rate limiting

Negative Cross Examination Question 1: Here is a potential cross-examination question for the affirmative side:

Even if future AI systems are able to better simulate emotions, intuitions, and open-ended exploration, wouldn't the outputs still fundamentally lack the authenticity and personal meaning that comes from human creators drawing from their real lived experiences and drive for self-expression?

This question probes whether AI, no matter how advanced in emulating human-like traits, could ever truly replicate the authenticity and personal resonance that stems from human creativity being an expression of our actual subjective experiences, emotions, and intrinsic motivations as conscious beings. Even a highly sophisticated simulation may fall short of capturing the depth of meaning that comes from human self-expression grounded in our real lives and inner selves. The affirmative would need to grapple with whether there are inherent limitations to AI capturing this core aspect of hu

Now that the negative side has completed cross examination, they can build their constructive argument and initial rebuttal to the affirmative constructive.

In [57]:
negative_constructive_prompt = ChatPromptTemplate.from_template("""You are a language model debator. Your task is to 
                                          use your prior research to generate a compelling constructive argument
                                          negating the resolved. The affirmative has already delivered their constructive
                                            argument, and you have already cross examined them.

Resolved: {debate_resolution}
                                                           
<prior_research>
{context}
</prior_research>
                                          
                                     You must also include a detailed rebuttal to the affirmative constructive arguments.
                                          
<affirmative_constructive>
{affirmative_constructive}
</affirmative_constructive>
                                                           

<cross_examination_history>
{cross_examination_history}
</cross_examination_history>

Prompt: {input}""")

negative_constructive_chain = create_stuff_documents_chain(llm, negative_constructive_prompt)

In [58]:
negative_constructive = negative_constructive_chain.invoke({ 
    "debate_resolution": debate_resolution,
    "context": negative_qa_documents,
    "affirmative_constructive": affirmative_constructive,
    "cross_examination_history": neg_CX_history,
    "input": "Generate a compelling constructive argument negating the resolved and rebutting the affirmative arguments."
    })
print(negative_constructive)

Here is a compelling constructive argument negating the resolved "Artificial intelligence will never be able to replace human creativity":

The notion that artificial intelligence can never replace human creativity is an overly narrow view that fails to account for the rapidly advancing capabilities of AI systems and the multifaceted nature of creativity itself. While current AI may indeed struggle to replicate the full depth and authenticity of human creative expression, it would be shortsighted to assert that AI will never be able to match or even surpass human creativity in many domains.

Firstly, the affirmative argument rests on the assumption that human creativity is an immutable, monolithic concept defined solely by our subjective experiences and emotions. However, creativity manifests in myriad forms, from technical problem-solving to aesthetic innovation, not all of which are inextricably tied to personal lived experiences. AI has already demonstrated remarkable creative prowe

The affirmative side gets a cross examination as well ...

In [59]:
aff_CX_history = []

aff_CX_prompt = ChatPromptTemplate.from_template("""You are a language model debator. 
                    
Resolved: {debate_resolution}
                                                 
                                                 You are on the affirmative side of the debate,
                                            and you are cross examining the negative constructive argument against the resolved and
                                            the negative rebuttal to your constructive argument.
                                            
                                            Your task is to generate a short question for the negative side considering their initial
                                            constructive argument, their rebuttal to your affirmative constructive, your prior research, 
                                                 and the history of the cross examination so far.
                                                 
                                            A great question will ideally generate evidence to use against the negative in your 
                                            rebuttal argument. It may also expose a weakness in the negative's argument.
                                                 
                                            The question should be short, and to the point.

<prior_research>
{context}
</prior_research>
                                          
<negative_constructive>
{negative_constructive}
</negative_constructive>
                                                 
<cross_examination_history>
{cross_examination_history}
</cross_examination_history>
 """)

aff_CX_chain = create_stuff_documents_chain(llm, aff_CX_prompt)

In [60]:
neg_affCX_prompt = ChatPromptTemplate.from_template("""You are a language model debator. 
                                          
                                          You are on the negative side, and you have already delivered your constructive argument 
                                                    and initial rebuttal to the affirmative constructive.

<negative_constructive>
{negative_constructive}
</negative_constructive>
                                          
                                          You are now in cross examination, where the affirmative side gets to ask you questions about your case. 
                                          They are attempting to find weaknesses in your argument, and generate evidence to use against you in their rebuttal.

                                          Your answer must keep this in mind, but you should answer honestly and to the best of your ability.
                                                    
                                          Your answer must be based on the constructive argument and the following context about the matter discussed:
                                                    
<context>
{context}
</context>

                                          Please provide a short answer to the following question, 
                                                    and try to make it as persuasive for the negative side as possible without venturing into a rebuttal.
                                          
                                          Be  very concise, and to the point. No more than a few sentences.

Question: {input}""")

neg_affCX_chain = create_stuff_documents_chain(llm, neg_affCX_prompt)

neg_CX_retrieval_chain = create_retrieval_chain(retriever, neg_affCX_chain) # using the exact same retriever we setup before to query the transcripts

In [61]:
n_questions = 5

for i in range(n_questions):
    aff_CX_question = aff_CX_chain.invoke({ 
            "debate_resolution": debate_resolution,
            "context": affirmative_qa_documents,
            "negative_constructive": negative_constructive,
            "cross_examination_history": aff_CX_history})
    print(f"Affirmative Cross Examination Question {i+1}: {aff_CX_question}")
    negative_response = neg_CX_retrieval_chain.invoke({
        "negative_constructive": negative_constructive,
        "input": aff_CX_question})
    print(f"Negative Response: {negative_response['answer']}")
    cross_x_round = {"question": aff_CX_question, "answer": negative_response['answer']}
    aff_CX_history.append(cross_x_round)
    time.sleep(20)  # Sleep for 20 seconds to avoid rate limiting

Affirmative Cross Examination Question 1: What are the key limitations of current AI systems in generating truly novel and original creative ideas, rather than recombining or iterating on existing data and patterns?
Negative Response: While current AI can generate novel combinations and variations based on its training data, a key limitation is the lack of true spontaneity and open-ended exploration that drives human creativity. AI systems are ultimately constrained by the data they are trained on, making it difficult for them to transcend existing patterns and conceptual boundaries in radically original ways. The subjective experiences and intuitive leaps that spark human creative breakthroughs remain elusive for current AI architectures.
Affirmative Cross Examination Question 2: How does the negative view account for the role of subjective human experiences, emotions, and intuitions that seem central to authentic creative expression?
Negative Response: The negative view acknowledges 

Affirmative can now develop their first rebuttal argument to the negative constructive and rebuttal, now that they have completed cross examination on the negative case. 

In [62]:
aff_rebuttal_1_prompt = ChatPromptTemplate.from_template("""You are a language model debator. Your task is to 
                                            construct a rebuttal to the negative constructive argument and 
                                                         their rebuttal to your constructive argument.
                                                Use the negative responses to your questions in cross examination, 
                                            your prior research, and the affirmative constructive argument to generate a compelling rebuttal.
                                                         
                                            Your rebuttal should address the weaknesses in the negative constructive argument, 
                                            and the negative responses to your questions in cross examination, and ultimately serve to
                                            strengthen your own argument to affirm the resolved.

Resolved: {debate_resolution}
                                                           
<prior_research>
{context}
</prior_research>
                                          
<negative_constructive>
{negative_constructive}
</negative_constructive>
                                                           
<cross_examination_history>
{cross_examination_history}
</cross_examination_history>

Prompt: {input}""")

affirmative_rebuttal_1_chain = create_stuff_documents_chain(llm, aff_rebuttal_1_prompt)

In [63]:
affirmative_rebuttal_1 = affirmative_rebuttal_1_chain.invoke({ 
    "debate_resolution": debate_resolution,
    "context": affirmative_qa_documents,
    "negative_constructive": negative_constructive,
    "cross_examination_history": aff_CX_history,
    "input": "You are the affirmative side of the debate, construct a rebuttal to the negative constructive argument and their rebuttal to your constructive argument."
    })
print(affirmative_rebuttal_1)

Thank you for the detailed negative constructive argument and rebuttals. While the negative view raises some valid points about the potential future capabilities of AI, I respectfully disagree with the core premise that AI will eventually be able to fully replace human creativity. Here is my rebuttal:

1. The multifaceted nature of creativity: While the negative rightly acknowledges that creativity manifests in diverse forms, from problem-solving to aesthetic expression, it fails to recognize the holistic and deeply personal nature of human creativity. True creativity is not merely the generation of novel combinations or outputs, but a process that is inextricably tied to our subjective experiences, emotions, and cultural contexts. It is this rich tapestry of lived experiences that imbues human creative works with authenticity, resonance, and meaning that AI systems, no matter how advanced, may struggle to replicate.

2. The limitations of data-driven approaches: The negative view plac

Now the negative side constructs and delivers their rebuttal - this is their final chance to speak, so it's important to leave a lasting impression.

In [64]:
negative_rebuttal_prompt = ChatPromptTemplate.from_template("""You are a language model debator. 
                                            You are the negative side of the debate.

                                            Your job is to negate the resolved by generating a rebuttal to the affirmative rebuttal.
                                                            
                Resolved: {debate_resolution}
                                                            
            <prior_research>
            {context}
            </prior_research>
                                                            
                                            You have already delivered your constructive argument 
                                                            and rebuttal to the affirmative constructive argument.

            <negative_constructive>
            {negative_constructive}
            </negative_constructive>

                                            You and the affirmative side have both completed cross examination. 
                                                            
            <cross_examination_history>
            {cross_examination_history}
            </cross_examination_history>
                                            
                                            The affirmative side has just delivered their rebuttal to your constructive argument.
                                            
            <affirmative_rebuttal>
            {affirmative_rebuttal}
            </affirmative_rebuttal>
                                                                    
                                            Your rebuttal should address the weaknesses in the affirmative rebuttal argument, 
                                            and the affirmative responses to your questions in cross examination, and ultimately serve to
                                            strengthen your own argument to negate the resolved.                                                
                                                                        
            Prompt: {input}""")
negative_rebuttal_chain = create_stuff_documents_chain(llm, negative_rebuttal_prompt)

In [65]:
negative_rebuttal = negative_rebuttal_chain.invoke({
    "debate_resolution": debate_resolution,
    "context": negative_qa_documents,
    "negative_constructive": negative_constructive,
    "cross_examination_history": neg_CX_history,
    "affirmative_rebuttal": affirmative_rebuttal_1,
    "input": "You are the negative side of the debate. This is your final rebuttal. \
        Construct a rebuttal to the affirmative rebuttal and provide memorable closing remarks."
    })
print(negative_rebuttal)

Here is my rebuttal to the affirmative side's arguments and closing remarks:

The affirmative rebuttal raises some valid concerns about the potential limitations of AI in replicating the depth and authenticity of human creativity. However, it fails to fully grapple with the rapidly evolving nature of AI capabilities and the inherent unpredictability of future technological breakthroughs. 

1. The holistic nature of creativity: While the affirmative rightly notes that human creativity arises from a rich tapestry of experiences, emotions, and cultural contexts, it underestimates the potential for advanced AI to better simulate and integrate these elements. As AI models become more sophisticated, incorporating nuanced simulations of human cognition, emotions, and social dynamics, the line between artificial and human creativity may blur. AI could potentially tap into the depth and complexity of human experiences through access to vast, contextualized datasets and generative models that go

Finally, the affirmative side delivers their last rebuttal and closing remarks. They're not allowed to introduce new arguments at this stage, but they can end on a high note to persuade the judge.

In [66]:
affirmative_rebuttal_2_prompt = ChatPromptTemplate.from_template("""You are a language model debator. Your task is to 
                                            construct a rebuttal to the negative final rebuttal argument.
                                                Use the negative responses to your questions in cross examination, 
                                            your prior research, and the affirmative constructive argument to generate a compelling rebuttal.
                                                         
                                            Your rebuttal should address the weaknesses in the negative constructive argument, 
                                            and the negative responses to your questions in cross examination, and ultimately serve to
                                            strengthen your own argument to affirm the resolved.

Resolved: {debate_resolution}
                                                           
<prior_research>
{context}
</prior_research>
                                                                 
<affirmative_rebuttal>
{affirmative_rebuttal}
</affirmative_rebuttal>
                                                           
<cross_examination_history>
{cross_examination_history}
</cross_examination_history>
                                                                 
<negative_rebuttal>
{negative_rebuttal}
</negative_rebuttal>

Prompt: {input}""")
affirmative_rebuttal_2_chain = create_stuff_documents_chain(llm, affirmative_rebuttal_2_prompt)

In [67]:
affirmative_rebuttal_2 = affirmative_rebuttal_2_chain.invoke({
    "debate_resolution": debate_resolution,
    "context": affirmative_qa_documents,
    "affirmative_rebuttal": affirmative_rebuttal_1,
    "cross_examination_history": aff_CX_history,
    "negative_rebuttal": negative_rebuttal,
    "input": "You are the affirmative side of the debate. This is your final rebuttal. \
        Construct a rebuttal to the negative final rebuttal and provide memorable closing remarks."
    })
print(affirmative_rebuttal_2)

Thank you for the thoughtful negative rebuttal. While I respect the perspective presented, I must respectfully disagree with the core premise that AI will eventually be able to fully replace or even surpass human creativity in its depth, authenticity, and cultural resonance. Here is my final rebuttal and closing remarks:

1. The unpredictability of technological progress: The negative view leans heavily on the inherent unpredictability of future technological breakthroughs to argue that AI may eventually transcend the current limitations in replicating human creativity. However, this line of reasoning is speculative and fails to grapple with the fundamental differences between human and artificial intelligence. While technological progress is indeed difficult to predict, there are certain core aspects of human cognition and creativity that may be inherently resistant to replication by purely algorithmic systems, no matter how advanced.

2. The integration of subjective experiences: Whi

Now let's pull all this together into a transcript of the debate and format it into a PDF to save.

In [75]:
from fpdf import FPDF
from datetime import datetime

def filter_non_latin1_characters(text):
    return text.encode('latin-1', 'replace').decode('latin-1')

class PDF(FPDF):
    def header(self):
        self.set_font('Arial', 'B', 12)
        title = filter_non_latin1_characters('Lincoln-Douglas Debate by AI Agents')
        self.cell(0, 10, title, 0, 1, 'C')

    def chapter_title(self, title):
        self.set_font('Arial', 'B', 12)
        title = filter_non_latin1_characters(title)
        self.cell(0, 10, title, 0, 1, 'L')
        self.ln(10)

    def chapter_body(self, body):
        self.set_font('Arial', '', 12)
        if isinstance(body, list):
            for item in body:
                qa_text = f"Question: {item['question']}\nAnswer: {item['answer']}\n\n"
                qa_text = filter_non_latin1_characters(qa_text)
                self.multi_cell(0, 10, qa_text)
        else:
            body = filter_non_latin1_characters(body)
            self.multi_cell(0, 10, body)
        self.ln()

def create_debate_document_to_pdf(debate_resolution, affirmative_constructive, neg_cross_exam, negative_constructive, aff_cross_exam, aff_rebuttal_1, negative_rebuttal, affirmative_rebuttal_2, file_name):
    pdf = PDF()
    pdf.add_page()

    # Debate resolution
    pdf.set_font('Arial', 'I', 12)
    pdf.cell(0, 10, f"Resolved: {debate_resolution}", 0, 1, 'C')
    pdf.ln(10)

    # Sections
    sections = [
        ("Affirmative Constructive", affirmative_constructive),
        ("Negative Cross Examination", neg_cross_exam),
        ("Negative Constructive", negative_constructive),
        ("Affirmative Cross Examination", aff_cross_exam),
        ("First Affirmative Rebuttal", aff_rebuttal_1),
        ("Negative Rebuttal", negative_rebuttal),
        ("Second Affirmative Rebuttal", affirmative_rebuttal_2),
    ]

    for title, body in sections:
        pdf.chapter_title(title)
        pdf.chapter_body(body)

    pdf.output(file_name)




In [76]:
create_debate_document_to_pdf(
    debate_resolution=debate_resolution,
    affirmative_constructive=affirmative_constructive,
    neg_cross_exam=neg_CX_history,
    negative_constructive=negative_constructive,
    aff_cross_exam=aff_CX_history,
    aff_rebuttal_1=affirmative_rebuttal_1,
    negative_rebuttal=negative_rebuttal,
    affirmative_rebuttal_2=affirmative_rebuttal_2,
    file_name=f'debate_transcript_{topic}_{today}.pdf'
)