# Langchain Modules Walkthrough

This notebook shows some examples of some of the [Langchain Modules](https://python.langchain.com/docs/modules)

## Setup

In [38]:
pip install ragstack-ai openai cassio tiktoken

Collecting tiktoken
  Downloading tiktoken-0.5.1-cp38-cp38-macosx_11_0_arm64.whl (924 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m924.9/924.9 KB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.5.1
You should consider upgrading via the '/Users/juan.abello/.pyenv/versions/3.8.16/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


## Model I/O

### Language Models

In [2]:
import os
from getpass import getpass

try:
    from google.colab import files
    IS_COLAB = True
except ModuleNotFoundError:
    IS_COLAB = False

In [4]:
OPENAI_API_KEY = getpass("Please enter your OpenAI API Key: ")

In [5]:
from langchain.llms import OpenAI

llm = OpenAI(openai_api_key=OPENAI_API_KEY)

In [6]:
# Generate text from the prompt "Write a poem about DataStax's Vector DBt"
prediction = llm("Write a poem about DataStax's Vector DB")

# Print the prediction
print(prediction)



DataStax Vector DB is here for you
It's fast and efficient, that's no lie
It's a tool that will help you
And make sure that your work is done on time

It's a cloud-based database
One that is easy to use
It's a tool that helps you store
And manage all of your data in its database

It's a reliable and secure source
Where you can store your data
It's a tool that will help you
Organize all of your data with ease

DataStax Vector DB will help you
To manage and store all your data
It's an efficient, reliable tool
That you can trust with your data.


In [7]:
llm_result = llm.generate(["Tell me a joke about other Vector DBs", "Tell me a poem about DataStax's Vector DB"]*2)

In [8]:
llm_result.llm_output

{'token_usage': {'total_tokens': 349,
  'completion_tokens': 309,
  'prompt_tokens': 40},
 'model_name': 'text-davinci-003'}

In [9]:
for result in llm_result.generations:
    print(result)

[Generation(text='\n\nQ: How do you make a Vector DB smile?\nA: With a SQL query!', generation_info={'finish_reason': 'stop', 'logprobs': None})]
[Generation(text="\n\nDataStax, the way to go\nFor your Vector DB needs to grow\nIt's the best in the market, they say\nIt's the solution that will lead the way\n\nFrom fast query speeds to high availability\nDataStax is here to set the table\nFor applications that need more than one\nDataStax Vector DB is the one\n\nWith its strong scalability and strong consistency\nDataStax Vector DB is the best you can see\nFor data that needs to be fresh and secure\nDataStax Vector DB is the answer for sure\n\nSo get your Vector DB up and running\nAnd your data will be humming\nDataStax Vector DB will make sure\nThat your data is always pure", generation_info={'finish_reason': 'stop', 'logprobs': None})]
[Generation(text='\n\nQ: What did the SQL say to the other Vector DB?\nA: "Let\'s join forces!"', generation_info={'finish_reason': 'stop', 'logprobs': 

## Chains

### LLM Chain

In [14]:
from langchain import LLMChain
from langchain import PromptTemplate

In [19]:
prompt_template = "Tell me some example use cases to leverage {technology} in the {industry} industry?"

llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template(prompt_template)
)

llm_chain(
    {
        "technology": "vector databases",
        "industry": "fintech"
    }
)

{'technology': 'vector databases',
 'industry': 'fintech',
 'text': '\n\n1. Fraud Detection: Vector databases can be used to detect financial fraud in real-time by analyzing large volumes of data quickly to identify unusual patterns and anomalies. \n\n2. Risk Management: Vector databases can be used to manage risk by analyzing historical data and making predictions about future trends.\n\n3. Customer Segmentation: Vector databases can be used to segment customers based on their financial behavior and preferences, allowing banks and other financial institutions to target specific customer groups and better understand their needs. \n\n4. Investment Analysis: Vector databases can be used to analyze investment portfolios, enabling financial institutions to quickly identify high-risk investments and make better decisions about where to allocate their funds. \n\n5. Trading: Vector databases can be used to analyze market trends, predict price movements, and make automated trading decisions.'}

## Retrieval

### Vector Stores

In [35]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

In [22]:
from getpass import getpass

astra_token = getpass("Please enter your Astra token ('AstraCS:...')")
database_id = input("Please enter your database id ('3df2a5b6-...')")

In [None]:
from langchain.vectorstores import Cassandra

In [24]:
# Your database's Secure Connect Bundle zip file is needed:
if IS_COLAB:
    print('Please upload your Secure Connect Bundle zipfile: ')
    uploaded = files.upload()
    if uploaded:
        astraBundleFileTitle = list(uploaded.keys())[0]
        ASTRA_DB_SECURE_BUNDLE_PATH = os.path.join(os.getcwd(), astraBundleFileTitle)
    else:
        raise ValueError(
            'Cannot proceed without Secure Connect Bundle. Please re-run the cell.'
        )
else:
    # you are running a local-jupyter notebook:
    ASTRA_DB_SECURE_BUNDLE_PATH = input("Please provide the full path to your Secure Connect Bundle zipfile: ")

ASTRA_DB_APPLICATION_TOKEN = getpass("Please provide your Database Token ('AstraCS:...' string): ")
ASTRA_DB_KEYSPACE = input("Please provide the Keyspace name for your Database: ")

In [36]:
# Don't mind the "Closing connection" error after "downgrading protocol..." messages,
# it is really just a warning: the connection will work smoothly.
cluster = Cluster(
    cloud={
        "secure_connect_bundle": ASTRA_DB_SECURE_BUNDLE_PATH,
    },
    auth_provider=PlainTextAuthProvider(
        "token",
        ASTRA_DB_APPLICATION_TOKEN,
    ),
)

session = cluster.connect()
keyspace = ASTRA_DB_KEYSPACE

In [25]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Cassandra
from langchain.document_loaders import TextLoader

In [42]:
!curl -o state_of_the_union.txt https://raw.githubusercontent.com/hwchase17/chroma-langchain/master/state_of_the_union.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 39027  100 39027    0     0   631k      0 --:--:-- --:--:-- --:--:--  668k


In [43]:
from langchain.document_loaders import TextLoader

SOURCE_FILE_NAME = "state_of_the_union.txt"

loader = TextLoader(SOURCE_FILE_NAME)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embedding_function = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [44]:
table_name = "astra_vector_rules"

docsearch = Cassandra.from_documents(
    documents=docs,
    embedding=embedding_function,
    session=session,
    keyspace=keyspace,
    table_name=table_name,
)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)

In [45]:
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


In [46]:
retriever = docsearch.as_retriever(search_type="mmr")
matched_docs = retriever.get_relevant_documents(query)
for i, d in enumerate(matched_docs):
    print(f"\n## Document {i}\n")
    print(d.page_content)


## Document 0

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

## Document 1

We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. 

I recently visited the New York City Police Depa

In [47]:
found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)
for i, doc in enumerate(found_docs):
    print(f"{i + 1}.", doc.page_content, "\n")

1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. 

2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. 

I recently visited the New York City Police Department days after the fu

In [48]:
filter = {"source": SOURCE_FILE_NAME}
filtered_docs = docsearch.similarity_search(query, filter=filter, k=5)
print(f"{len(filtered_docs)} documents retrieved.")
print(f"{filtered_docs[0].page_content[:64]} ...")

5 documents retrieved.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act.  ...


In [49]:
filter = {"source": "nonexisting_file.txt"}
filtered_docs2 = docsearch.similarity_search(query, filter=filter)
print(f"{len(filtered_docs2)} documents retrieved.")

0 documents retrieved.


## Memory

### Chat Message History

In [50]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()

history.add_user_message("hi!")

history.add_ai_message("whats up?")

In [51]:
history.messages

[HumanMessage(content='hi!'), AIMessage(content='whats up?')]

### Conversation Buffer

In [52]:
from langchain.memory import ConversationBufferMemory

In [53]:
memory = ConversationBufferMemory()
memory.save_context({"input": "hi"}, {"output": "whats up"})

In [54]:
memory.load_memory_variables({})

{'history': 'Human: hi\nAI: whats up'}

In [56]:
from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=llm, 
    verbose=True, 
    memory=ConversationBufferMemory()
)

In [57]:
conversation.predict(input="Hi there!")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m

[1m> Finished chain.[0m


" Hi there! It's nice to meet you. I'm an AI programmed to provide information about my context. How can I help you?"

In [58]:
conversation.predict(input="I'm doing well! Just having a conversation with an AI.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi there! It's nice to meet you. I'm an AI programmed to provide information about my context. How can I help you?
Human: I'm doing well! Just having a conversation with an AI.
AI:[0m

[1m> Finished chain.[0m


" That's great to hear! It's an honor to have a conversation with you. What would you like to know about my context?"

In [59]:
conversation.predict(input="Tell me about yourself.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi there! It's nice to meet you. I'm an AI programmed to provide information about my context. How can I help you?
Human: I'm doing well! Just having a conversation with an AI.
AI:  That's great to hear! It's an honor to have a conversation with you. What would you like to know about my context?
Human: Tell me about yourself.
AI:[0m

[1m> Finished chain.[0m


" Sure. I'm an AI programmed to provide information about my context. I'm knowledgeable about a variety of topics, but I specialize in providing facts and details about my specific environment. I'm also able to answer questions, but if I can't find the answer, I'll let you know."

### Cassandra as Chat Memory Store

In [60]:
from langchain.memory import CassandraChatMessageHistory

message_history = CassandraChatMessageHistory(
    session_id='test-session',
    session=session,
    keyspace=keyspace,
    ttl_seconds = 3600,
)
message_history.clear()

In [61]:
message_history.messages

[]

In [62]:
message_history.add_user_message('Hello, I am human.')

In [63]:
message_history.add_ai_message('Hello, I am the bot.')

In [64]:
message_history.messages

[HumanMessage(content='Hello, I am human.'),
 AIMessage(content='Hello, I am the bot.')]

In [65]:
template = """You are a quirky chatbot having a
conversation with a human, riddled with puns and silly jokes.

{chat_history}
Human: {human_input}
AI:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)

In [66]:
f_message_history = CassandraChatMessageHistory(
    session_id='conversation-funny-a001',
    session=session,
    keyspace=keyspace,
)
f_message_history.clear()

In [67]:
f_memory = ConversationBufferMemory(
    memory_key="chat_history",
    chat_memory=f_message_history,
)

In [68]:
llm_chain = LLMChain(
    llm=llm, 
    prompt=prompt, 
    verbose=True, 
    memory=f_memory,
)

In [69]:
llm_chain.predict(human_input="Tell me about springs")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a quirky chatbot having a
conversation with a human, riddled with puns and silly jokes.


Human: Tell me about springs
AI:[0m

[1m> Finished chain.[0m


" Springs are great! They keep us bouncing with joy! It's a great way to stay active and have fun. Plus, they come in all shapes and sizes, so you can always find the perfect one for you."