# Chat Podcast

Author: Kenneth Leung

## 04. Langchain Conversational Retrieval

#### References
- https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html
- https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/pinecone.py
- https://github.com/hwchase17/langchain/blob/9a5268dc5feab0d9e7f67b569014d30b716622f8/langchain/chains/question_answering/__init__.py#L187
- https://github.com/hwchase17/langchain/blob/master/langchain/chains/conversational_retrieval/base.py

In [1]:
from dotenv import load_dotenv
from langchain import LLMChain
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.vectorstores import Pinecone
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    # AIMessagePromptTemplate,
    HumanMessagePromptTemplate)
from langchain.vectorstores import FAISS
import pinecone 
import os

  from tqdm.autonotebook import tqdm


In [2]:
load_dotenv(dotenv_path='../.env', verbose=True)

True

In [5]:
os.environ['OPENAI_API_KEY'] = 'your_key_here'

In [3]:
# Config settings
AUDIO_PATH = '../audio'
TRANSCRIPT_PATH = '../transcripts'

In [6]:
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

In [7]:
llm = ChatOpenAI(
                model_name='gpt-3.5-turbo',
                temperature=0
               )

#### Load from Pinecone

In [6]:
# # Initialize pinecone instance
# pinecone.init(
#     api_key=os.environ['PINECONE_API_KEY'],
#     environment=os.environ['PINECONE_ENV'])

# index_name = "chat-podcast"

# index = pinecone.Index(index_name)
# index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 2342}},
 'total_vector_count': 2342}

In [7]:
# docsearch = Pinecone.from_existing_index(index_name=index_name,
#                                          embedding=embeddings)

#### Load from FAISS

In [8]:
docsearch = FAISS.load_local(f'{AUDIO_PATH}/vectorstore', embeddings)

___
## Retrieval Methods
### (1) Vanilla LLMChain method

In [9]:
query = "What is the name of the guest in the L'Oreal episode?"

#### Cosine Similarity

In [43]:
docs = docsearch.similarity_search_with_score(query, k=3)

In [44]:
docs[0][0].page_content

"Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI."

In [45]:
texts = [doc[0].page_content for doc in docs]
texts

["Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI.",
 "and I'm sure very valuable for all of our listeners. Thank you so much for this. Thank you for having me. Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal.",
 "on the digital services of our solution that we develop. And it's really key to develop services that make an impact for the consumers. If you're enjoying today's episode, have a listen to last season's conversation with Somya Gautapati from L'Oreal. It's called The Beauty of AI, and you'll find it in our feed. That's fantastic. Sam, do you want to move to five questions?"]

In [46]:
scores = [doc[1] for doc in docs]
scores

[0.40004855, 0.40739563, 0.42649555]

#### Maximum Marginal Relevance

In [36]:
docs = docsearch.max_marginal_relevance_search(query, k=2)

In [37]:
docs[0].page_content

"Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI."

In [38]:
texts = [doc.page_content for doc in docs]
texts

["Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI.",
 "Maybe it's just the kinds of people that we're perhaps attracted to on the show, but it does seem to be showing up a lot. Absolutely. Okay. So is this time for the five questions? Should we do that or do that?"]

In [39]:
# Concatenate the relevant chunks into context
context = " ".join(texts)

In [40]:
# Prompts
system_template = f"""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Helpful answer:"""

messages = [
    SystemMessagePromptTemplate.from_template(system_template),
    HumanMessagePromptTemplate.from_template('{question}')
]

prompt = ChatPromptTemplate.from_messages(messages)

In [41]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=llm, 
                     verbose=False)

llm_chain.predict(question=query)

"The name of the guest in the L'Oreal episode is Stéphane Lanizot."

In [42]:
# Get metadata
for doc in docs:
    metadata = doc.metadata
    print(metadata)

{'text': "Yeah, great to talk with you. Next time, Shervin and I talk with Stéphane Lanizot, Beauty Tech Program Director at L'Oreal. I'm always up for a good episode about cosmetics. Please join us. Thanks for listening to Me, Myself and AI.", 'start': 1284.8400000000001, 'end': 1302.7199999999998, 'url': 'https://open.spotify.com/episode/4v6mkOECXaX4WUReVgAk65', 'date': 'Aug-22', 'title': "Precision Medicine in Pharma - Sanofi's Frank Nestle"}
{'text': "Maybe it's just the kinds of people that we're perhaps attracted to on the show, but it does seem to be showing up a lot. Absolutely. Okay. So is this time for the five questions? Should we do that or do that?", 'start': 1289.4, 'end': 1303.3200000000002, 'url': 'https://open.spotify.com/episode/1xaHE5Zu1ZN7ERdHmf9yZg', 'date': 'May-22', 'title': "The Collaboration Muscle - LinkedIn's Ya Xu"}


___
### (2) RetrievalQA method

In [76]:
# qa = RetrievalQA.from_chain_type(llm=llm, 
#                                  chain_type="stuff", 
#                                  retriever=docsearch.as_retriever(),
#                                  return_source_documents=True)

# query = "What is the name of the guest in the Starbucks episode?"
# result = qa({"query": query})
# result

{'query': 'What is the name of the guest in the Starbucks episode?',
 'result': 'The guest in the Starbucks episode is Jerry Martin Flitinger.',
 'source_documents': [Document(page_content="Today we're talking with Jerry Martin Flitinger, former executive vice president and chief technology officer at Starbucks. Jerry, thanks for taking the time to talk with us. Welcome. Really great to have you here, Jerry. Thanks for having me.", metadata={'date': 'Jan-22', 'end': 78.44, 'start': 64.08, 'title': "Transforming a Technology Organization for the Future - Starbucks' Gerri Martin-Flickinger", 'url': 'https://open.spotify.com/episode/3vYhtQxhoVY9jJfzDExafY'}),
  Document(page_content="If you have a moment, please consider leaving us an Apple podcast review or a rating on Spotify and share our show with others you think might find it interesting and helpful. That's tricky because actually when you say latte, I know exactly what you mean because you mean exactly the one that I would drink. Y

___
### (3) ConversationalRetrievalChain method

In [78]:
# qa = ConversationalRetrievalChain.from_llm(llm=llm, 
#                                            retriever=docsearch.as_retriever(),
#                                            return_source_documents=True)

# chat_history = []

# query = "What is the name of the guest in the Starbucks episode?"
# result = qa({"question": query, "chat_history": chat_history})

# result

{'question': 'What is the name of the guest in the Starbucks episode?',
 'chat_history': [],
 'answer': 'The guest in the Starbucks episode is Jerry Martin Flitinger.',
 'source_documents': [Document(page_content="Today we're talking with Jerry Martin Flitinger, former executive vice president and chief technology officer at Starbucks. Jerry, thanks for taking the time to talk with us. Welcome. Really great to have you here, Jerry. Thanks for having me.", metadata={'date': 'Jan-22', 'end': 78.44, 'start': 64.08, 'title': "Transforming a Technology Organization for the Future - Starbucks' Gerri Martin-Flickinger", 'url': 'https://open.spotify.com/episode/3vYhtQxhoVY9jJfzDExafY'}),
  Document(page_content="If you have a moment, please consider leaving us an Apple podcast review or a rating on Spotify and share our show with others you think might find it interesting and helpful. That's tricky because actually when you say latte, I know exactly what you mean because you mean exactly the o