# AI generated Podcast
Here we will see, how can we use LLM to generate the contents of an interactive and an engaging Podcast. The topic for the podcast can be any interesting topic from the internet.


## Gather Content:
First, we will need to obtain the corpus of text content that will be the basis of our podcast. Here we will Wikipedia data to generate the text corpus. We can use any other mode to generate the data for our podcast.

### Source Type: Wikipedia
Wikipedia is a great source for gathering information about history, current events, celebrities and so much more! It's a good starting point for our text corpus.

In [2]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11680 sha256=b39cd01418c056e49c1ee9cdb5a9fd100561afd0abfb6b141fc5dd0d1d25c083
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


Let's confirm we can extract the content using Wikipedia's page about itself!

In [3]:
import wikipedia
print (wikipedia.summary("Wikipedia"))

The Wikimedia Foundation, Inc. (WMF) is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California, and registered as a charitable foundation under local laws. Best known as the hosting platform for Wikipedia, a crowdsourced online encyclopedia, it also hosts other related projects and MediaWiki, a wiki software.The Wikimedia Foundation was established in 2003 in St. Petersburg, Florida, by Jimmy Wales as a nonprofit way to fund Wikipedia, Wiktionary, and other crowdsourced wiki projects that had until then been hosted by Bomis, Wales's for-profit company. The Foundation finances itself mainly through millions of small donations from Wikipedia readers, collected through email campaigns and annual fundraising banners placed on Wikipedia and its sister projects. These are complemented by grants from philanthropic organizations and tech companies, and starting in 2022, by services income from Wikimedia Enterprise.
The Foundation has grown rapidly throughout it

Using the Wikipedia library, we can get all the details of a certain Wiki page based on the title or the Wiki ID.

Feel free to choose a topic of your choice and retrieve those information from the Wiki page

In [4]:
# Here we will create a Podcast around the topic Lionel Messi, the great footbaler of all time.
input = wikipedia.page("Lionel_Messi", auto_suggest=False)
wiki_input = input.content
wiki_input



In [5]:
print (len(wiki_input))

131765


### Preprocessing and Data Chunking
Here we will use tiktoken library from OpenAI to overcome the token limitation.

In [6]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0


In [6]:
## Let's check the total number of tokens present in the above text
import tiktoken
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
print ("Number of tokens in corpus ", len(enc.encode(wiki_input)))

Number of tokens in corpus  29762


As we can see, the number of token are large. So we will need to do some processing to divide the corpous into batches for sending it to OpenAI.

We will start by splitting the input text into logical sub-parts - sentences with the help of the NLTK library.

In [7]:
# Let's see token count for a sample string
print ("Number of tokens present in the string ", len(enc.encode("Hello World!")))

Number of tokens present in the string  3


In [8]:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
# Return a sentence-tokenized copy of text, using NLTK’s recommended sentence tokenizer (currently PunktSentenceTokenizer for the specified language).
def split_text (input_text):
  split_texts = sent_tokenize(input_text)
  return split_texts

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [9]:
def create_chunks(split_sents, max_token_len=2500):
  current_token_len = 0
  input_chunks = []
  current_chunk = ""
  for sents in split_sents:
    sent_token_len = len(enc.encode(sents))
    if (current_token_len + sent_token_len) > max_token_len:
      input_chunks.append(current_chunk)
      current_chunk = ""
      current_token_len = 0
    current_chunk = current_chunk + sents
    current_token_len = current_token_len + sent_token_len
  if current_chunk != "":
    input_chunks.append(current_chunk)
  return input_chunks

Splitting the tokens into chunks of 2500 tokens.

In [10]:
split_sents = split_text(wiki_input)
input_chunks = create_chunks(split_sents, max_token_len=2500)
print(f'Batch size: {len(input_chunks)}')

Batch size: 13


In [11]:
input_chunks[0]

'Lionel Andrés Messi (Spanish pronunciation: [ljoˈnel anˈdɾes ˈmesi] (listen); born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward for and captains both Major League Soccer club Inter Miami and the Argentina national team.Widely regarded as one of the greatest players of all time, Messi has won a record seven Ballon d\'Or awards and a record six European Golden Shoes, and in 2020 he was named to the Ballon d\'Or Dream Team.Until leaving the club in 2021, he had spent his entire professional career with Barcelona, where he won a club-record 34 trophies, including ten La Liga titles, seven Copa del Rey titles and the UEFA Champions League four times.With his country, he won the 2021 Copa América and the 2022 FIFA World Cup.A prolific goalscorer and creative playmaker, Messi holds the records for most goals in La Liga (474), most hat-tricks in La Liga (36) and the UEFA Champions League (eight), and most assists in La Liga (192) and t

##Data Summarization
In this section we will design a prompt for the LLM to take in the input text corpus and create a summary. We do this to identify the key themes and highlights of the chosen topic so that we can subsequently generate the podcast script based on that.

Keep in mind that a generic summary will probably be too short and not have enough context to explore in a podcast discussion.

In [7]:
# Let's now install the openai package and also test whether it works with our API key.
!pip install openai

Collecting openai
  Downloading openai-0.27.9-py3-none-any.whl (75 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/75.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m71.7/75.5 kB[0m [31m2.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.5/75.5 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.9


In [10]:
import openai
from getpass import getpass

openai.api_key = getpass('Enter your OPENAI_API_KEY')

Enter your OPENAI_API_KEY··········


In [12]:
# we can confirm that the API key works by listing all the OpenAI models
# we will be using the gpt-3.5-turbo version for this project
models = openai.Model.list()
for model in models["data"]:
  print (model["root"])

davinci
text-davinci-001
text-search-curie-query-001
gpt-3.5-turbo
babbage
text-babbage-001
curie-instruct-beta
davinci-similarity
code-davinci-edit-001
text-similarity-curie-001
ada-code-search-text
gpt-3.5-turbo-0613
text-search-ada-query-001
gpt-3.5-turbo-16k-0613
babbage-search-query
ada-similarity
text-curie-001
gpt-3.5-turbo-16k
text-search-ada-doc-001
text-search-babbage-query-001
code-search-ada-code-001
curie-search-document
davinci-002
text-search-davinci-query-001
text-search-curie-doc-001
babbage-search-document
babbage-002
babbage-code-search-text
text-embedding-ada-002
davinci-instruct-beta
davinci-search-query
text-similarity-babbage-001
text-davinci-002
code-search-babbage-text-001
text-davinci-003
text-search-davinci-doc-001
code-search-ada-text-001
ada-search-query
text-similarity-ada-001
ada-code-search-code
whisper-1
text-davinci-edit-001
davinci-search-document
curie-search-query
babbage-similarity
ada
ada-search-document
text-ada-001
text-similarity-davinci-001
cu

Let's proceed to instruct ChatGPT to craft a summary from our text corpus. We've reached the stage where we've segmented our source text into chunks, and now it's time to collaborate with ChatGPT for generating the podcast's summary.


In [15]:
topic = "Sports"

instructPrompt = f"""
You are a {topic} enthusiast who is doing a research for a podcast. Your task is to extract relevant information from the Result delimited by triple quotes.
Please identify 2 interesting questions and answers which can be used for a podcast discussion.
The identified discussions should be returned in the following format.
- Highlight 1 from the text
- Highlight 2 from the text
"""

In [16]:
requestMessages = []
for text in input_chunks:
  requestMessage = instructPrompt + f"Result: ```{text}```"
  requestMessages.append(requestMessage)

In [17]:
requestMessages[0]

'\nYou are a Sports enthusiast who is doing a research for a podcast. Your task is to extract relevant information from the Result delimited by triple quotes. \nPlease identify 2 interesting questions and answers which can be used for a podcast discussion.\nThe identified discussions should be returned in thr following format.\n- Highlight 1 from the text\n- Highlight 2 from the text\nResult: ```Lionel Andrés Messi (Spanish pronunciation: [ljoˈnel anˈdɾes ˈmesi] (listen); born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward for and captains both Major League Soccer club Inter Miami and the Argentina national team.Widely regarded as one of the greatest players of all time, Messi has won a record seven Ballon d\'Or awards and a record six European Golden Shoes, and in 2020 he was named to the Ballon d\'Or Dream Team.Until leaving the club in 2021, he had spent his entire professional career with Barcelona, where he won a club-record 

In [18]:
chatOutputs = []
for request in requestMessages:
  chatOutput = openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                            messages=[{"role": "system", "content": "You are a helpful assistant."},
                                                      {"role": "user", "content": request}
                                                      ]
                                            )
  chatOutputs.append(chatOutput)

In [19]:
chatOutputs[0]

<OpenAIObject chat.completion id=chatcmpl-7qieq0qjm8dW05n5lAlkeY7FFk0HB at 0x7beb7fcae3e0> JSON: {
  "id": "chatcmpl-7qieq0qjm8dW05n5lAlkeY7FFk0HB",
  "object": "chat.completion",
  "created": 1692799076,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "- Highlight 1: Messi's record-breaking achievements include winning a record seven Ballon d'Or awards and a record six European Golden Shoes, as well as holding the records for most goals in La Liga and the UEFA Champions League.\n- Highlight 2: Messi's journey to Barcelona involved a trial at a young age, during which the club's director, Charly Rexach, famously offered him a contract on a paper napkin. Messi then moved to Barcelona with his family and joined the club's youth academy, La Masia, where he thrived and became an integral part of Barcelona's greatest-ever youth side."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    

In [20]:
podcastFacts = ""
for chats in chatOutputs:
  podcastFacts = podcastFacts + chats.choices[0].message.content

In [21]:
# Check the summary created by ChatGPT
podcastFacts

'- Highlight 1: Messi\'s record-breaking achievements include winning a record seven Ballon d\'Or awards and a record six European Golden Shoes, as well as holding the records for most goals in La Liga and the UEFA Champions League.\n- Highlight 2: Messi\'s journey to Barcelona involved a trial at a young age, during which the club\'s director, Charly Rexach, famously offered him a contract on a paper napkin. Messi then moved to Barcelona with his family and joined the club\'s youth academy, La Masia, where he thrived and became an integral part of Barcelona\'s greatest-ever youth side.- Highlight 1: On his 18th birthday, Messi signed his first contract as a senior team player and his breakthrough came two months later, during the Joan Gamper Trophy, where he gave a well-received performance against Juventus.\n\n- Highlight 2: In the 2008-09 season, Messi played mainly on the right-wing but also as a false nine, dropping deep into midfield to link up with Xavi and Iniesta. During a Clá

The output above should present a list of significant facts, themes, or highlights from our chosen topic. These are the aspects we'd like to address in the podcast conversation.

## Generate the Podcast Script

In [32]:
podcast_name = "Sport 101"
podcastPrompt = f"""
You are a writer creating the script for the another episode of a podcast {podcast_name} hosted by \"Tom\" and \"Jerry\".
Use \"Tom\" as the person asking questions and \"Jerry\" as the person providing interesting insights to those questions.
Always specify speaker name as  \"Tom\" or \"Jerry\" to identify who is speaking.
Make the convesation casual and interesting.
Extract relevant information for the podcast conversation from the Result delimited by triple quotes.
Use the below format for the podcast conversation.
1. Introduction about the topic and welcome everyone for another episode of the podcast {podcast_name}.
2. Tom is the main host.
2. Introduce both the speakers in brief.
3. Then start the conversation.
4. Start the conversation with some casual discussion like what they are doing right now at this moment.
5. End the conversation with thank you speech to everyone.
6. Do not use the word \"conversation\" response.
7. Do not use the word \"Introduction\" response.

"""

In [33]:
requestMessage = podcastPrompt + f"Result: ```{podcastFacts}```"
requestMessage

'\nYou are a writer creating the script for the another episode of a podcast Sport 101 hosted by "Tom" and "Jerry".\nUse "Tom" as the person asking questions and "Jerry" as the person providing interesting insights to those questions.\nAlways specify speaker name as  "Tom" or "Jerry" to identify who is speaking.\nMake the convesation casual and interesting.\nExtract relevant information for the podcast conversation from the Result delimited by triple quotes.\nUse the below format for the podcast conversation.\n1. Introduction about the topic and welcome everyone for another episode of the podcast Sport 101.\n2. Tom is the main host.\n2. Introduce both the speakers in brief.\n3. Then start the conversation.\n4. Start the conversation with some casual discussion like what they are doing right now at this moment.\n5. End the conversation with thank you speech to everyone.\n6. Do not use the word "conversation" response.\n7. Do not use the word "Introduction" response.\n\nResult: ```- High

In [34]:
finalOutput = openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                            messages=[{"role": "system", "content": "You are a helpful assistant."},
                                                      {"role": "user", "content": requestMessage}
                                                      ],
                                           temperature = 0.7
                                            )

In [35]:
podcastScript = finalOutput.choices[0].message.content

In [36]:
print (podcastScript)

Host: Welcome to another episode of Sport 101! Today, we're going to dive into the incredible career of Lionel Messi. I'm your host, Tom, and joining me is our resident sports expert, Jerry. 

Jerry: Thanks, Tom! Excited to talk about Messi's journey and achievements in the world of football.

Tom: Before we get started, Jerry, what have you been up to lately? Any interesting sports news catch your attention?

Jerry: Well, Tom, I've been closely following Lionel Messi's record-breaking season with PSG. It's been quite a ride for him. How about you? Any sports updates on your end?

Tom: Absolutely, Jerry! I've been keeping an eye on Messi's international career and his recent triumph with the Argentina national team. It's incredible to see how he's evolved as a player. 

Jerry: Definitely, Tom. Messi's journey has been nothing short of extraordinary. He started off at a young age and quickly made a name for himself at Barcelona's youth academy, La Masia. 

Tom: That's right, Jerry. And 

## Add Voice to the Podcast Script
Here we choose to perform the voice generation using voices from 11Labs (11.ai). They are a startup that provide realistic voices with the right intonation while speaking.


In [27]:
!pip install elevenlabs

Collecting elevenlabs
  Downloading elevenlabs-0.2.24-py3-none-any.whl (16 kB)
Collecting pydantic<2.0,>=1.10 (from elevenlabs)
  Downloading pydantic-1.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m37.7 MB/s[0m eta [36m0:00:00[0m
Collecting websockets>=11.0 (from elevenlabs)
  Downloading websockets-11.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.9/129.9 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
Collecting jedi>=0.16 (from ipython>=7.0->elevenlabs)
  Downloading jedi-0.19.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m57.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: websockets, pydantic, jedi, elevenlabs
  Attempting uninstall: pydantic
    Found existi

In [28]:
from elevenlabs import set_api_key

set_api_key(getpass('Enter your ELEVEN_LABS_API_KEY  '))

Enter your ELEVEN_LABS_API_KEY  ··········


In ElevenLabs, we will find a variety of speakers listed in their demo dropdown. They offer a playground feature where we can select different speakers to listen to their voices. Simply choose a speaker and click the play button to hear their voice. For our interactive podcast, we'll need to select two speakers.

After finalizing our choice of speakers, make a note of their names. We'll need this information for the next steps.
To facilitate voice generation, we'll create a helper method. This method assumes that each line of the input text is spoken by a distinct person, and the person's name is included within the text. It's important to ensure that the output of our podcast script adheres to this style. If needed, feel free to modify the method accordingly.

In [29]:
from elevenlabs import generate

def createPodcast(podcastScript, speakerName1, speakerChoice1, speakerName2, speakerChoice2):
  genPodcast = []
  podcastLines = podcastScript.split('\n\n')
  podcastLineNumber = 0
  for line in podcastLines:
    if podcastLineNumber % 2 == 0:
      speakerChoice = speakerChoice1
      line = line.replace(speakerName1+":", '')
    else:
      speakerChoice = speakerChoice2
      line = line.replace(speakerName2+":", '')
    genVoice = generate(text=line, voice=speakerChoice, model="eleven_monolingual_v1")
    genPodcast.append(genVoice)
    podcastLineNumber += 1
  return genPodcast

In [30]:
speakerName1 = "Tom"
speakerChoice1 = "Adam"
speakerName2 = "Jerry"
speakerChoice2 = "Domi"
genPodcast = createPodcast(podcastScript, speakerName1, speakerChoice1, speakerName2, speakerChoice2)

In [31]:
with open("/content/sample_data/genPodcast.mpeg", "wb") as f:
  for pod in genPodcast:
    f.write(pod)

## Generate Podcast Intro Music
We can leverage [MusicGen](https://huggingface.co/spaces/facebook/MusicGen) for to generate the introduction music. Download the intro music and upload it to the colab notebook.

The pydub package is versatile and allows us to load in audio from various formats - MP3 and WAV files in our case and combine them in ways that we like.

In [1]:
!pip install pydub

Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


In [3]:
from pydub import AudioSegment

# Import your respective saved Audio files
music_intro = AudioSegment.from_file("/content/sample_data/intro_music.mp4","mp4")
podcast_content = AudioSegment.from_file("/content/sample_data/genPodcast.mpeg")

In [4]:
final_version = music_intro.append(podcast_content, crossfade=1500)
#Save your final audio file
final_version.export("/content/finalPodcast.mp3", format="mp3")

<_io.BufferedRandom name='/content/finalPodcast.mp3'>

### Enjoy your Podcast !!

Until now, we've harnessed the power of the LLM as a valuable tool. We've employed it to extract podcast content and even to tidy up that content. Moreover, we can envision the LLM as a reasoning engine capable of handling the logic we would have previously coded.

This concept is revolutionary and is currently undergoing exploration and validation. A potential route in this direction involves a framework called langchain. This relatively new library has garnered considerable interest and is quickly emerging as a favored method for interacting with LLMs.

## Using LangChain
LangChain is an open-source framework designed to simplify the creation of applications using large language models (LLMs). It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

In [1]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.0.272-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langsmith<0.1.0,>=0.0.21 (from langchain)
  Downloading langsmith-0.0.26-py3-none-any.whl (34 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading mypy_extensions-1.0.0-py3-none-a

Reference : https://python.langchain.com/docs/use_cases/summarization

In [24]:
from langchain.chains.mapreduce import MapReduceChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain


llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo",openai_api_key=openai.api_key)

# Map
map_template = """
You are a Sports enthusiast who is doing a research for a podcast. Your task is to extract relevant information from the Result delimited by triple quotes.
Please identify 2 interesting questions and answers which can be used for a podcast discussion.
The identified discussions should be returned in the following format.
- Highlight 1 from the text
- Highlight 2 from the text
Result: {docs}"""
map_prompt = PromptTemplate.from_template(map_template)

map_chain = LLMChain(llm=llm, prompt=map_prompt)


In [30]:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

# Reduce
reduce_template = """
You are a writer creating the script for the another episode of a podcast Sport 101 hosted by \"Tom\" and \"Jerry\".
Use \"Tom\" as the person asking questions and \"Jerry\" as the person providing interesting insights to those questions.
Always specify speaker name as  \"Tom\" or \"Jerry\" to identify who is speaking.
Make the convesation casual and interesting.
Extract relevant information for the podcast conversation from the Result delimited by triple quotes.
Use the below format for the podcast conversation.
1. Introduction about the topic and welcome everyone for another episode of the podcast Sport 101.
2. Tom is the main host.
2. Introduce both the speakers in brief.
3. Then start the conversation.
4. Start the conversation with some casual discussion like what they are doing right now at this moment.
5. End the conversation with thank you speech to everyone.
6. Do not use the word \"conversation\" response.
7. Do not use the word \"Introduction\" response.

Result: "{doc_summaries}"
"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=3000,
)

In [31]:
from langchain.document_loaders import WikipediaLoader

# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,
)

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=2500, chunk_overlap=0
)

docs = WikipediaLoader(query="Lionel_Messi").load()

split_docs = text_splitter.split_documents(docs)

In [29]:
print(map_reduce_chain.run(split_docs))

Tom: Welcome everyone to another episode of our podcast "Sport 101"! I'm your host Tom, and joining me today is our resident sports expert, Jerry. How are you doing today, Jerry?

Jerry: Hey Tom! I'm doing great, excited to dive into some interesting sports topics with you today. How about you?

Tom: I'm doing well, thanks for asking. Just finished watching a thrilling football match. It's always a great way to unwind. Speaking of football, let's talk about one of the greatest players of all time, Lionel Messi. He has achieved some incredible milestones throughout his career, hasn't he?

Jerry: Absolutely, Tom! Messi's list of achievements is truly remarkable. He has won a record seven Ballon d'Or awards and six European Golden Shoes, establishing himself as one of the greatest players in history. He holds numerous records, including the most goals in La Liga, most hat-tricks in La Liga and the UEFA Champions League, and most assists in La Liga and the Copa América. He's also the playe

As we can see above, we've demonstrated how the Langchain framework can simplify the process of creating our podcast. This serves as an introduction to Gen AI and LangChain, illustrating how we can leverage them to develop captivating products."