# Chat Podcast

Author: Kenneth Leung

## 04. Langchain Conversational Retrieval

#### References
- https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html
- https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/pinecone.py
- https://github.com/hwchase17/langchain/blob/9a5268dc5feab0d9e7f67b569014d30b716622f8/langchain/chains/question_answering/__init__.py#L187
- https://github.com/hwchase17/langchain/blob/master/langchain/chains/conversational_retrieval/base.py

In [1]:
from dotenv import load_dotenv
from langchain import LLMChain
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.vectorstores import Pinecone
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    # AIMessagePromptTemplate,
    HumanMessagePromptTemplate)
from langchain.vectorstores import FAISS
import pinecone 
import os

  from tqdm.autonotebook import tqdm


In [22]:
os.environ['OPENAI_API_KEY'] = 'your_key_here'

In [11]:
# Config settings
AUDIO_PATH = '../audio'
TRANSCRIPT_PATH = '../transcripts'
VECTORSTORE_PATH = '../vectorstore'

In [24]:
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

In [7]:
llm = ChatOpenAI(
                model_name='gpt-3.5-turbo',
                temperature=0
                
               )

#### Load from Pinecone

In [6]:
# # Initialize pinecone instance
# pinecone.init(
#     api_key=os.environ['PINECONE_API_KEY'],
#     environment=os.environ['PINECONE_ENV'])

# index_name = "chat-podcast"

# index = pinecone.Index(index_name)
# index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 2342}},
 'total_vector_count': 2342}

In [7]:
# docsearch = Pinecone.from_existing_index(index_name=index_name,
#                                          embedding=embeddings)

#### Load from FAISS

In [12]:
docsearch = FAISS.load_local(f'{VECTORSTORE_PATH}/all_podcasts', embeddings)

___
## Retrieval Methods
### (1) Vanilla LLMChain method

In [13]:
query = "What is the name of the guest in the L'Oreal episode?"

#### Cosine Similarity

In [26]:
docs = docsearch.similarity_search_with_score(query, k=2)
docs[0][0].page_content

"Thank you so much for this. Thank you for having me. Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics."

In [27]:
texts = [doc[0].page_content for doc in docs]
texts

["Thank you so much for this. Thank you for having me. Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics.",
 "Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI. We believe, like you, that the conversation"]

In [28]:
scores = [doc[1] for doc in docs]
scores

[0.38260984, 0.39937183]

In [29]:
# Concatenate the relevant chunks into context
context = " ".join(texts)

In [30]:
context

"Thank you so much for this. Thank you for having me. Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI. We believe, like you, that the conversation"

#### Maximum Marginal Relevance
- Does not seem to perform as well as cosine similarity

In [80]:
# docs = docsearch.max_marginal_relevance_search(query, k=3)
# docs[0].page_content

In [82]:
# texts = [doc.page_content for doc in docs]
# texts

["Thank you so much for this. Thank you for having me. Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics.",
 "But maybe we miss it someday. How's that working? How's that working out for you? I'll have to come back to that later episode and see how that plays out. But at least it's real time. You know, I'm trying to use the dog food of the things that we talk about on the show. Speaking of things we talk about on the show. How's that for a segue? So we have a segment where we ask our guests a series of rapid fire questions. And so the idea is you just hear this question and you give the first response that comes to your mind.",
 "that are being well received at L'Oreal? We are developing a solution to detect beauty trends. It's called Transporter. When you look at what is happening in the academic world, the research world, the macro influencer worlds, we are readi

In [83]:
# # Concatenate the relevant chunks into context
# context = " ".join(texts)

#### Prompt Setup

In [92]:
# Prompts
system_template = f"""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Helpful answer:"""

messages = [
    SystemMessagePromptTemplate.from_template(system_template),
    HumanMessagePromptTemplate.from_template('{question}')
]

prompt = ChatPromptTemplate.from_messages(messages)

In [93]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=llm, 
                     verbose=False)

llm_chain.predict(question=query)

"The guest in the L'Oreal episode is Stéphane Lanizot, Beauty Tech Program Director at L'Oreal."

In [38]:
# Get metadata
for doc in docs:
    metadata = doc[0].metadata['url']
    print(metadata)

https://open.spotify.com/episode/4v6mkOECXaX4WUReVgAk65
https://open.spotify.com/episode/4v6mkOECXaX4WUReVgAk65


In [33]:
test = 'https://open.spotify.com/episode/4v6mkOECXaX4WUReVgAk65'

In [39]:
test2 = test.replace('.com/episode/', '.com/embed/episode/') + 'htest'

In [40]:
test2

'https://open.spotify.com/embed/episode/4v6mkOECXaX4WUReVgAk65htest'

In [None]:
https://open.spotify.com/embed/episode/3IagkEUqw4SGzeO4XzAbXd?utm_source=generator&t=60

___
### (2) RetrievalQA method

In [76]:
# qa = RetrievalQA.from_chain_type(llm=llm, 
#                                  chain_type="stuff", 
#                                  retriever=docsearch.as_retriever(),
#                                  return_source_documents=True)

# query = "What is the name of the guest in the Starbucks episode?"
# result = qa({"query": query})
# result

{'query': 'What is the name of the guest in the Starbucks episode?',
 'result': 'The guest in the Starbucks episode is Jerry Martin Flitinger.',
 'source_documents': [Document(page_content="Today we're talking with Jerry Martin Flitinger, former executive vice president and chief technology officer at Starbucks. Jerry, thanks for taking the time to talk with us. Welcome. Really great to have you here, Jerry. Thanks for having me.", metadata={'date': 'Jan-22', 'end': 78.44, 'start': 64.08, 'title': "Transforming a Technology Organization for the Future - Starbucks' Gerri Martin-Flickinger", 'url': 'https://open.spotify.com/episode/3vYhtQxhoVY9jJfzDExafY'}),
  Document(page_content="If you have a moment, please consider leaving us an Apple podcast review or a rating on Spotify and share our show with others you think might find it interesting and helpful. That's tricky because actually when you say latte, I know exactly what you mean because you mean exactly the one that I would drink. Y

___
### (3) ConversationalRetrievalChain method

In [78]:
# qa = ConversationalRetrievalChain.from_llm(llm=llm, 
#                                            retriever=docsearch.as_retriever(),
#                                            return_source_documents=True)

# chat_history = []

# query = "What is the name of the guest in the Starbucks episode?"
# result = qa({"question": query, "chat_history": chat_history})

# result

{'question': 'What is the name of the guest in the Starbucks episode?',
 'chat_history': [],
 'answer': 'The guest in the Starbucks episode is Jerry Martin Flitinger.',
 'source_documents': [Document(page_content="Today we're talking with Jerry Martin Flitinger, former executive vice president and chief technology officer at Starbucks. Jerry, thanks for taking the time to talk with us. Welcome. Really great to have you here, Jerry. Thanks for having me.", metadata={'date': 'Jan-22', 'end': 78.44, 'start': 64.08, 'title': "Transforming a Technology Organization for the Future - Starbucks' Gerri Martin-Flickinger", 'url': 'https://open.spotify.com/episode/3vYhtQxhoVY9jJfzDExafY'}),
  Document(page_content="If you have a moment, please consider leaving us an Apple podcast review or a rating on Spotify and share our show with others you think might find it interesting and helpful. That's tricky because actually when you say latte, I know exactly what you mean because you mean exactly the o