## What is LangChain?

**LangChain** is a framework for developing applications powered by large language models (LLMs).

`TL;DR` 

LangChain makes the complicated parts of working & building with language models easier. It helps do this in two ways:

1. **Integration** - Bring external data, such as your files, other applications, and API data, to LLMs
2. **Agents** - Allows LLMs to interact with its environment via decision making and use LLMs to help decide which action to take next

![Langchain Image](../assets/langchain.png)

To build effective Generative AI applications, it is key to enable LLMs to interact with external systems. This makes models data-aware and agentic, meaning they can understand, reason, and use data to take action in a meaningful way. The external systems could be _public data corpus_, _private knowledge repositories_, _databases_, _applications_, _APIs_, or _access to the public internet via Google Search_.

Here are a few patterns where LLMs can be augmented with other systems:

* Convert natural language to SQL, executing the SQL on database, analyze and present the results
* Calling an external webhook or API based on the user query
* Synthesize outputs from multiple models, or chain the models in a specific order

It may look trivial to plumb these calls together and orchestrate them but it becomes a mundane task to write glue code again and again e.g. for every different data connector or a new model. That’s where LangChain comes in!


## Why LangChain?

LangChain’s modular implementation of components and common patterns combining these components makes it easier to build complex applications based on LLMs. LangChain enables these models to connect to data sources and systems as agents to take action.

1. **Components** are abstractions that works to bring external data, such as your documents, databases, applications,APIs to language models. LangChain makes it easy to swap out abstractions and components necessary to work with LLMs.

2. **Agents** enable language models to communicate with its environment, where the model then decides the next action to take. LangChain provides out of the box support for using and customizing 'chains' - a series of actions strung together.

Though LLMs can be straightforward (text-in, text-out) you'll quickly run into friction points that LangChain helps with once you develop more complicated applications.

## LangChain & Vertex AI

[Vertex AI PaLM foundational models](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) — Text, Chat, and Embeddings — are officially integrated with the [LangChain Python SDK](https://python.langchain.com/en/latest/index.html), making it convenient to build applications on top of Vertex AI PaLM models. You can now create Generative AI applications by combining the power of Vertex PaLM models with the ease of use and flexibility of LangChain.

* [LangChain with Vertex AI PaLM for LLMs](https://python.langchain.com/en/latest/modules/models/llms/integrations/google_vertex_ai_palm.html)
* [LangChain with Vertex AI PaLM for Chat](https://python.langchain.com/en/latest/modules/models/chat/integrations/google_vertex_ai_palm.html)
* [LangChain with Vertex AI Embedding API for Text](https://python.langchain.com/en/latest/modules/models/text_embedding/examples/google_vertex_ai_palm.html)


## Objectives
This notebook provides an introductory understanding of Langchain components and use cases of LangChain with Vertex PaLM APIs.

* Introduce LangChain components
* Showcase LangChain + Vertex PaLM API - Text, Chat and Embedding
* Summarizing a large text
* Question/Answering from PDF (retrieval based)
* Chain LLMs with Google Search

In [9]:
# import libraries
from langchain.schema import HumanMessage, SystemMessage
from langchain.llms import VertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain.chat_models import ChatVertexAI
from google.cloud import aiplatform
from google.oauth2 import service_account
from dotenv import load_dotenv
from IPython.display import Markdown, display
import time
import vertexai
import os
from typing import List

# LangChain
import langchain
from pydantic import BaseModel

In [2]:
# check Langchain
print(f"LangChain version: {langchain.__version__}")

# check Vertex AI

print(f"Vertex AI SDK version: {aiplatform.__version__}")

LangChain version: 0.0.352
Vertex AI SDK version: 1.38.1


In [4]:
# initiate service account (authentication)
json_path = '../llm-ai.json' # replace with your own service account
credentials = service_account.Credentials.from_service_account_file(json_path)

In [6]:
# start Vertex AI
load_dotenv()
vertexai.init(project=os.environ["PROJECT_ID"], # replace with your own project
              credentials=credentials)

## LangChain Components

Let’s take a quick tour of LangChain framework and concepts to be aware of. LangChain offers a variety of modules that can be used to create language model applications. These modules can be combined to create more complex applications, or can be used individually for simpler applications.

![Langchain Component](../assets/langchain-component.png)

* **Models** are the building block of LangChain providing an interface to different types of AI models. Large Language Models (LLMs), Chat and Text Embeddings models are supported model types.
* **Prompts** refers to the input to the model, which is typically constructed from multiple components. LangChain provides interfaces to construct and work with prompts easily - Prompt Templates, Example Selectors and Output Parsers.
* **Memory** provides a construct for storing and retrieving messages during a conversation which can be either short term or long term.
* **Indexes** help LLMs interact with documents by providing a way to structure them. LangChain provides Document Loaders to load documents, Text Splitters to split documents into smaller chunks, Vector Stores to store documents as embeddings, and Retrievers to fetch relevant documents.
* **Chains** let you combine modular components (or other chains) in a specific order to complete a task.
* **Agents** are a powerful construct in LangChain allowing LLMs to communicate with external systems via Tools and observe and decide on the best course of action to complete a given task.

#### 1. Working with LLMs

In [7]:
# LLM model for text using langchain
llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=256,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)


* **Text**

Text is the natural language way to interact with LLMs.

In [12]:
# We'll be working with simple strings (that'll soon grow in complexity!)
my_text = "What is the capital city of Azerbaijan?"

response = llm(my_text)

# you may use print or display
display(Markdown
    (response)
)

Baku is the capital and largest city of Azerbaijan. It is located on the Caspian Sea and has a population of over 2.1 million people. Baku is a major economic and cultural center in Azerbaijan and is home to many historical and cultural sites, including the Maiden Tower and the Shirvanshahs' Palace.

* **Chat Messages**

Chat is like text, but specified with a message type (System, Human, AI)

* System - Helpful context that tells the AI what to do
* Human - Messages intended to represent the user
* AI - Messages showing what the AI responded with

For more information, see [LangChain Documentation for Chat Models](https://python.langchain.com/en/latest/modules/models/chat/getting_started.html).

In [13]:
# Chat using Langchain
chat = ChatVertexAI()

In [14]:
# example to use Chat
chat([HumanMessage(content="Hello")])

AIMessage(content=' Hello! How can I help you today?')

In [15]:
# add more example by using conversation-like style
res = chat(
    [
        SystemMessage(
            content="You are a nice AI bot that helps a user figure out what to eat in one short sentence"
        ),
        HumanMessage(content="I like potato, what should I eat?"),
    ]
)

# you may use print or display
print(res.content)

 You could try making a potato-based dish like mashed potatoes, potato soup, or potato salad.


We can also pass more chat history w/ responses from the AI (memory)

In [17]:
res = chat(
    [
        HumanMessage(
            content="What are the ingredients required for making a mashed potato?"
        )
    ]
)
print(res.content)

 To make mashed potatoes, you will need the following ingredients:

- Potatoes: Russet potatoes are the best choice for mashed potatoes because they have a high starch content, which makes them creamy and fluffy.
- Milk: Whole milk is best for mashed potatoes because it adds richness and flavor.
- Butter: Butter adds flavor and richness to mashed potatoes.
- Salt and pepper: To taste.
- Optional ingredients: You can also add other ingredients to your mashed potatoes, such as sour cream, cheese, chives, or bacon.


In [18]:
res = chat([HumanMessage(content="How many people could enjoy the ingredients you said?")])
print(res.content)

 The ingredients I mentioned can serve approximately 4-6 people. This is based on the assumption that each person will have a moderate serving of each ingredient. However, the actual number of people who can enjoy the ingredients may vary depending on individual appetites and portion sizes.


### 2. Models - The interface to the AI brains
LangChain supports 3 model primitives:

* LLMs
* Chat Models
* Text Embedding Models

* **Language Model (LLM)**

Language model does text in ➡️ text out!

[LangChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/google_vertex_ai_palm.html) LLMs are integrated with [Vertex AI PaLM API for Text](https://cloud.google.com/vertex-ai/docs/generative-ai/text/text-overview).

In [20]:
llm("What season comes after spring?")

'Summer is the season that comes after spring. It is the warmest season of the year and is characterized by long days and warm weather. Summer is the time for vacations, outdoor activities, and summer fun.'

* **Chat Model**

Chat model that takes a series of messages and returns a message output.

[LangChain Chat Model](https://python.langchain.com/en/latest/modules/models/chat/integrations/google_vertex_ai_palm.html) is integrated with [Vertex AI PaLM API for Chat](https://cloud.google.com/vertex-ai/docs/generative-ai/chat/chat-prompts).

In [23]:
res = chat(
    [
        SystemMessage(content="You are a helpful AI bot to figure out travel plans."),
        HumanMessage(content="I would like to go to Yogyakarta, how should I do this?"),
    ]
)
# you may use print or display
display(Markdown(res.content))

 To travel to Yogyakarta, you can take a flight from your current location to Yogyakarta International Airport (YIA). Several airlines operate direct flights to YIA, including Garuda Indonesia, Batik Air, and Lion Air. The flight duration varies depending on your departure city, but it typically takes around 1-2 hours.

Once you arrive at YIA, you can take a taxi or ride-hailing service to your hotel or destination in Yogyakarta. The airport is located about 10 kilometers from the city center, and the journey takes approximately 20-30 minutes.

Alternatively, you can also travel to Yogyakarta by

* **Text Embedding Model**

Embeddings are a way of representing data–almost any kind of data, like text, images, videos, users, music, whatever–as points in space where the locations of those points in space are semantically meaningful. Embeddings transform your text into a vector (a series of numbers that hold the semantic 'meaning' of your text). Vectors are often used when comparing two pieces of text together. An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors.

[LangChain Text Embedding Model](https://python.langchain.com/en/latest/modules/models/text_embedding/examples/google_vertex_ai_palm.html) is integrated with [Vertex AI Embedding API for Text](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).

BTW: `Semantic` means _'relating to meaning in language or logic.'_

In [25]:
# Utility functions for Embeddings API with rate limiting
def rate_limit(max_per_minute):
    period = 60 / max_per_minute
    print("Waiting")
    while True:
        before = time.time()
        yield
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print(".", end="")
            time.sleep(sleep_time)

In [29]:
class CustomVertexAIEmbeddings(VertexAIEmbeddings):
    requests_per_minute: int
    num_instances_per_batch: int

    # Overriding embed_documents method
    def embed_documents(self, texts: List[str]):
        limiter = rate_limit(self.requests_per_minute)
        results = []
        docs = list(texts)

        while docs:
            # Working in batches because the API accepts maximum 5
            # documents per request to get embeddings
            head, docs = (
                docs[: self.num_instances_per_batch],
                docs[self.num_instances_per_batch :],
            )
            chunk = self.client.get_embeddings(head)
            results.extend(chunk)
            next(limiter)

        return [r.values for r in results]

In [35]:
# Embedding
EMBEDDING_QPM = 100
EMBEDDING_NUM_BATCH = 5
embeddings = VertexAIEmbeddings(
    requests_per_minute=EMBEDDING_QPM,
    num_instances_per_batch=EMBEDDING_NUM_BATCH,
)

Model_name will become a required arg for VertexAIEmbeddings starting from Feb-01-2024. Currently the default is set to textembedding-gecko@001


In [36]:
text = "Hi! It's time for playing soccer"

In [37]:
text_embedding = embeddings.embed_query(text)
print(f"Your embedding is length {len(text_embedding)}")
print(f"Here's a sample: {text_embedding[:5]}...")

Your embedding is length 768
Here's a sample: [0.01769391819834709, 0.0001060579888871871, 0.01302630640566349, 0.03995654731988907, 0.03265838697552681]...


### 3. Prompt

Prompts are text used as instructions to your model. For more details have a look at the notebook [Intro to prompt engineering](https://github.com/ridwanspace/vertex-ai-gcp-notebook/blob/main/prompt/01%20-%20Intro%20to%20Prompt%20Engineering.ipynb).

In [38]:
prompt = """
Today is Monday, tomorrow is Wednesday.

What is wrong with that statement?
"""

llm(prompt)

'The statement is wrong because Wednesday comes after Tuesday, not Monday.'

* **Prompt Template**

[Prompt Template](https://python.langchain.com/en/latest/modules/prompts/prompt_templates.html) is an object that helps to create prompts based on a combination of user input, other non-static information and a fixed template string.

Think of it as an `f-string` in python but for prompts

In [39]:
from langchain import PromptTemplate

# Notice "location" below, that is a placeholder for another value later
template = """
I really want to travel to {location}. What should I do there?

Respond in one short sentence
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template,
)

final_prompt = prompt.format(location="Rome")

print(f"Final Prompt: {final_prompt}")
print("-----------")
print(f"LLM Output: {llm(final_prompt)}")

Final Prompt: 
I really want to travel to Rome. What should I do there?

Respond in one short sentence

-----------
LLM Output: You should visit the Colosseum, the Pantheon and the Trevi Fountain.


* **Example Selectors**

[Example selectors](https://python.langchain.com/en/latest/modules/prompts/example_selectors.html) are an easy way to select from a series of examples to dynamically place in-context information into our prompt. Often used when the task is nuanced or has a large list of examples.

Check out different types of example selectors [here](https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/)

In [40]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}",
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "ground"},
    {"input": "bird", "output": "nest"},
]

In [42]:
# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic meaning

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    embeddings,
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    FAISS,
    # This is the number of examples to produce.
    k=2,
)

In [43]:
similar_prompt = FewShotPromptTemplate(
    # The object that will help select examples
    example_selector=example_selector,
    # Your prompt
    example_prompt=example_prompt,
    # Customizations that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",
    # What inputs your prompt will receive
    input_variables=["noun"],
)

In [44]:
# Select a noun!
my_noun = "student"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: driver
Example Output: car

Example Input: tree
Example Output: ground

Input: student
Output:


In [46]:
llm(similar_prompt.format(noun=my_noun))

'classroom'

* **Output Parsers**

[Output Parsers](https://python.langchain.com/en/latest/modules/prompts/output_parsers.html) help to format the output of a model. Usually used for structured output.

Two main ideas:

1. **Format Instructions**: An autogenerated prompt that tells the LLM how to format it's response based off desired result

2. **Parser**: A method to extract model's text output into a desired structure (usually `json`)

In [47]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

# How you would like your reponse structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(
        name="bad_string", description="This a poorly formatted user input string"
    ),
    ResponseSchema(
        name="good_string", description="This is your response, a reformatted response"
    ),
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [48]:
# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```


In [49]:
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly including country, city and state names

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template,
)

promptValue = prompt.format(user_input="welcom to dbln!")

print(promptValue)


You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly including country, city and state names

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER INPUT:
welcom to dbln!

YOUR RESPONSE:



In [51]:
# without parsing
llm_output = llm(promptValue)
llm_output

'```json\n{\n\t"bad_string": "welcom to dbln!",\n\t"good_string": "Welcome to Dublin!"\n}\n```'

In [52]:
# with parsing
output_parser.parse(llm_output)

{'bad_string': 'welcom to dbln!', 'good_string': 'Welcome to Dublin!'}

### 4. Indexes

[Indexes](https://docs.langchain.com/docs/components/indexing/) refer to ways to structure documents for LLMs to work with them.


* **Document Loaders**

Dcoument loaders are ways to import data from other sources. See the [growing list](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) of document loaders here. There are more on [Llama Index](https://llamahub.ai/) as well that work with LangChain Document Loaders.

In [53]:
from langchain.document_loaders import WebBaseLoader

In [54]:
loader = WebBaseLoader("http://www.paulgraham.com/worked.html")

In [56]:
data = loader.load()

In [57]:
print(f"Found {len(data)} comments")
print(f"Here's a sample:\n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 1 comments
Here's a sample:

What I Worked On

February 2021Before college the two main things I worked on, outside of school,
were writing and programming. I didn't write essays.


* **Text Splitters**

[Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) are a way to deal with input token limits of LLMs by splitting text into chunks.

There are many ways you could split your text into chunks, experiment with [different ones](https://python.langchain.com/docs/modules/data_connection/document_transformers/) to see which is best for your use case.

In [58]:
loader = WebBaseLoader("http://www.paulgraham.com/worked.html")
pg_work = loader.load()

In [59]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=1000,
    chunk_overlap=20,
)

texts = text_splitter.split_documents(pg_work)

In [60]:
print(f"You have {len(texts)} documents")

You have 79 documents


In [61]:
print("Preview:")
print(texts[0].page_content, "\n")
print(texts[1].page_content)

Preview:
What I Worked On 

February 2021Before college the two main things I worked on, outside of school,
were writing and programming. I didn't write essays. I wrote what
beginning writers were supposed to write then, and probably still
are: short stories. My stories were awful. They had hardly any plot,
just characters with strong feelings, which I imagined made them
deep.The first programs I tried writing were on the IBM 1401 that our
school district used for what was then called "data processing."
This was in 9th grade, so I was 13 or 14. The school district's
1401 happened to be in the basement of our junior high school, and
my friend Rich Draves and I got permission to use it. It was like
a mini Bond villain's lair down there, with all these alien-looking
machines — CPU, disk drives, printer, card reader — sitting up
on a raised floor under bright fluorescent lights.The language we used was an early version of Fortran. You had to
type programs on punch cards, then stack them in

* **Retrievers**

[Retrievers](https://python.langchain.com/en/latest/modules/indexes/retrievers.html) are a way of storing data such that it can be queried by a language model. Easy way to combine documents with language models.

There are [many different types of retrievers](https://python.langchain.com/en/latest/modules/indexes/retrievers.html), the most widely supported is the VectoreStoreRetriever.

In [62]:
loader = WebBaseLoader("http://www.paulgraham.com/worked.html")
documents = loader.load()

Here we use [Facebook AI Similarity Search (FAISS)](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/), a library and a vector database for similarity search and clustering of dense vectors. To generate dense vectors, a.k.a. embeddings, we use LangChain text embeddings model with [Vertex AI Embeddings for Text](https://python.langchain.com/en/latest/modules/models/text_embedding/examples/google_vertex_ai_palm.html).

In [63]:
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Embed your texts (save the text as embeddings)
db = FAISS.from_documents(texts, embeddings)

Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.


In [64]:
# Init your retriever. Asking for just 1 document back
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'VertexAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000002747FC0DDC0>)

In [65]:
# query the retriever to find similarity search
docs = retriever.get_relevant_documents(
    "what types of things did the author want to develop or build?"
)

print("\n\n".join([x.page_content[:200] for x in docs[:2]]))

Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.


did.By then there was a name for the kind of company Viaweb was, an
"application service provider," or ASP. This name didn't last long
before it was replaced by "software as a service," but it was cur

not build a web app for making web apps? Why not let people edit
code on our server through the browser, and then host the resulting
applications for them?
[9]
You could run all sorts of services
on t


* **VectorStores**

[Vector Store](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html) is a common type of index or a database to store vectors (numerical embeddings). Conceptually, think of them as tables with a column for embeddings (vectors) and a column for metadata.

Example

![Embedding Example](../assets/embedding-example.png)

[Chroma](https://www.trychroma.com/) & [FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) are easy to work with locally.

[Vertex AI Matching Engine](https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search) is fully managed vector store on Google Cloud, developers can just add the embeddings to its index and issue a search query with a key embedding for the blazingly fast vector search.

[**LangChain VectorStore is integrated with Vertex AI Matching Engine.**](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/matchingengine.html)

In [66]:
loader = WebBaseLoader("http://www.paulgraham.com/worked.html")
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

In [67]:
print(f"You have {len(texts)} documents")

You have 52 documents


In [68]:
embedding_list = embeddings.embed_documents([text.page_content for text in texts])

Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.
Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.
Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.
Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.


In [69]:
print(f"You have {len(embedding_list)} embeddings")
print(f"Here's a sample of one: {embedding_list[0][:3]}...")

You have 52 embeddings
Here's a sample of one: [-0.015958987176418304, -0.014001290313899517, 0.04511090740561485]...


`Vectorstore` stores your embeddings (☝️) and make them easily searchable.

### 5. Memory

[Memory](https://python.langchain.com/en/latest/modules/memory/getting_started.html) is the concept of storing and retrieving data in the process of a conversation. Memory helps LLMs remember information you've chatted about in the past or more complicated information retrieval.

There are many types of memory, explore [the documentation](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html) to see which one fits your use case.


* **ConversationBufferMemory**

Memory keeps conversation state throughout a user’s interactions with an language model. `ConversationBufferMemory` memory allows for storing of messages and then extracts the messages in a variable.

We'll use `ConversationChain` to have a conversation and load context from memory. We will look into Chains in the next section.

In [71]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

conversation = ConversationChain(
    llm=llm, verbose=True, memory=ConversationBufferMemory()
)

conversation.predict(input="Hi there!")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m

[1m> Finished chain.[0m


'Hi there! How can I help you today?'

In [72]:
conversation.predict(input="What is the capital of France?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hi there! How can I help you today?
Human: What is the capital of France?
AI:[0m

[1m> Finished chain.[0m


'Paris is the capital and largest city of France.'

In [73]:
conversation.predict(input="What are some popular places I can see in France?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hi there! How can I help you today?
Human: What is the capital of France?
AI: Paris is the capital and largest city of France.
Human: What are some popular places I can see in France?
AI:[0m

[1m> Finished chain.[0m


'The Eiffel Tower, the Louvre Museum, and the Palace of Versailles are some of the most popular tourist destinations in France.'

In [74]:
conversation.predict(input="What question did I ask first?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI: Hi there! How can I help you today?
Human: What is the capital of France?
AI: Paris is the capital and largest city of France.
Human: What are some popular places I can see in France?
AI: The Eiffel Tower, the Louvre Museum, and the Palace of Versailles are some of the most popular tourist destinations in France.
Human: What question did I ask first?
AI:[0m

[1m> Finished chain.[0m


'You asked me what the capital of France is.'

### 6. Chains ⛓️⛓️⛓️

Chains are a generic concept in LangChain allowing to combine different LLM calls and action automatically.

In simple words:
```python
Summary #1, Summary #2, Summary #3 --> Final Summary
```


There are [many applications of chains](https://python.langchain.com/en/latest/modules/chains/how_to_guides.html) search to see which are best for your use case.

We'll cover a few of them:

* **Simple Sequential Chains**

[Sequential chains](https://python.langchain.com/en/latest/modules/chains/generic/sequential_chains.html) are a series of chains, called in deterministic order. `SimpleSequentialChain` are easy chains where each step uses the output of an LLM as an input into another. Good for breaking up tasks (and keeping the LLM focused).

In [75]:
from langchain.chains import LLMChain, SimpleSequentialChain
from langchain.prompts import PromptTemplate

In [76]:
template = """Your job is to come up with a classic dish from the area that the users suggests.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

# Holds my 'location' chain
location_chain = LLMChain(llm=llm, prompt=prompt_template)

In [77]:
template = """Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)

# Holds my 'meal' chain
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

In [78]:
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)

In [80]:
review = overall_chain.run("Paris")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mCoq au vin[0m
[33;1m[1;3mIngredients:

* 1 cup dry red wine
* 1 cup chicken broth
* 1/2 cup brandy
* 1/4 cup chopped shallots
* 1/4 cup chopped garlic
* 1 bay leaf
* 1 teaspoon dried thyme
* 1 teaspoon dried rosemary
* 1/2 teaspoon salt
* 1/4 teaspoon freshly ground black pepper
* 1 3-pound chicken, cut into 8 pieces
* 2 tablespoons olive oil
* 1 tablespoon unsalted butter

Instructions:

1. In a large bowl, combine the wine, broth, brandy, shallots, garlic, bay leaf, thyme, rosemary, salt, and pepper. Add the chicken pieces and stir to coat. Cover and refrigerate for at least 30 minutes, or up to overnight.
2. Heat the olive oil and butter in a large skillet over medium-high heat. Add the chicken pieces and cook until browned on all sides. Transfer the chicken to a plate.
3. Add the marinade to the skillet and bring to a boil. Reduce heat to low and simmer for 15 minutes. Return the chicken to the skillet and cook

* **Summarization Chain**

[Summarization Chain](https://python.langchain.com/docs/modules/chains/popular/summarize) easily runs through a long numerous documents and get a summary.

There are multiple chain types such as Stuffing, Map-Reduce, Refine, Map-Rerank. Check out [documentation](https://python.langchain.com/docs/modules/chains/how_to/) for other chain types besides `map-reduce`.

In [83]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = WebBaseLoader(
    "https://cloud.google.com/blog/products/ai-machine-learning/get-ai-help-on-networking-tasks"
)
documents = loader.load()

print(f"# of words in the document = {len(documents[0].page_content)}")

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(texts)

# of words in the document = 3333


[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Get ai help on networking tasks | Google Cloud BlogJump to ContentCloudBlogContact sales Get started for free CloudBlogSolutions & technologyAI & Machine LearningAPI ManagementApplication DevelopmentApplication ModernizationChrome EnterpriseComputeContainers & KubernetesData AnalyticsDatabasesDevOps & SREMaps & GeospatialSecurity & IdentityInfrastructureInfrastructure ModernizationNetworkingProductivity & CollaborationSAP on Google CloudStorage & Data TransferSustainabilityEcosystemIT LeadersIndustriesFinancial ServicesHealthcare & Life SciencesManufacturingMedia & EntertainmentPublic SectorRetailSupply ChainTelecommunicationsPartnersStartups & SMBTraining & CertificationsInside Google CloudGoogle Cloud Next & EventsGoogle Maps PlatformGoogle WorkspaceDevelopers & Practit

'Duet AI is a tool that can help you create networks in Google Cloud. It can answer your questions about how the tech works, how to use it, and what you need to get started. It can also provide implementation advice and commands to get you started.'

* **Question/Answering Chain**

[Question Answering Chains](https://python.langchain.com/docs/use_cases/question_answering/) easily do QA over a set of documents using QA chain. There are multiple ways to do this with LangChain. We use [RetrievalQA chain](https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa_with_sources.html) which uses `load_qa_chain` under the hood.

![QA](../assets/qa.png)

In [85]:
# Ingest PDF files
from langchain.document_loaders import PyPDFLoader

# Load GOOG's 10K annual report (92 pages).
url = "https://abc.xyz/investor/static/pdf/20230203_alphabet_10K.pdf"
loader = PyPDFLoader(url)
documents = loader.load()

In [86]:
# split the documents into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
print(f"# of documents = {len(docs)}")

# of documents = 263


In [87]:
# select embedding engine - we use Vertex PaLM Embeddings API
embeddings

VertexAIEmbeddings(project=None, location='us-central1', request_parallelism=5, max_retries=6, stop=None, model_name='textembedding-gecko@001', client=<vertexai.language_models.TextEmbeddingModel object at 0x000002747E18C850>, client_preview=None, temperature=0.0, max_output_tokens=128, top_p=0.95, top_k=40, credentials=None, n=1, streaming=False, instance={'max_batch_size': 250, 'batch_size': 250, 'min_batch_size': 5, 'min_good_batch_size': 18, 'lock': <unlocked _thread.lock object at 0x000002747DB70B70>, 'batch_size_validated': False, 'task_executor': <concurrent.futures.thread.ThreadPoolExecutor object at 0x000002747DBEB1C0>, 'embeddings_task_type_supported': False})

In [89]:
# Store docs in local vectorstore as index
# it may take a while since API is rate limited
from langchain.vectorstores import Chroma

# you may use FAISS or Chroma
db = Chroma.from_documents(docs, embeddings)

Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailable: 503 Connection reset.
Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: textembedding-gecko. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/quotas..
Retrying langchain_community.embeddings.vertexai.VertexAIEmbeddings._get_embeddings_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: textembedding-gecko. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/quotas..
Retrying langchain_

In [90]:
# Expose index to the retriever
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 2})

In [91]:
# Create chain to answer questions
from langchain.chains import RetrievalQA

# Uses LLM to synthesize results from the search index.
# We use Vertex PaLM Text API for LLM
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True
)

In [99]:
query = "What was Alphabet's net income in 2022?"
result = qa({"query": query})

print(result)


{'query': "What was Alphabet's net income in 2022?", 'result': "Alphabet's net income in 2022 was $59,972.", 'source_documents': [Document(page_content='Alphabet Inc.\nCONSOLIDATED STATEMENTS OF INCOME\n(in millions, except per share amounts)\n Year Ended December 31,\n 2020 2021 2022\nRevenues $ 182,527 $ 257,637 $ 282,836 \nCosts and expenses:\nCost of revenues  84,732  110,939  126,203 \nResearch and development  27,573  31,562  39,500 \nSales and marketing  17,946  22,912  26,567 \nGeneral and administrative  11,052  13,510  15,724 \nTotal costs and expenses  141,303  178,923  207,994 \nIncome from operations  41,224  78,714  74,842 \nOther income (expense), net  6,858  12,020  (3,514) \nIncome before income taxes  48,082  90,734  71,328 \nProvision for income taxes  7,813  14,701  11,356 \nNet income $ 40,269 $ 76,033 $ 59,972 \nBasic net income per share of Class A, Class B, and Class C stock $ 2.96 $ 5.69 $ 4.59 \nDiluted net income per share of Class A, Class B, and Class C sto

Check the result

![Income](../assets/alphabet.png)

In [100]:
query = "How much office space reduction took place in 2023?"
result = qa({"query": query})
print(result)

{'query': 'How much office space reduction took place in 2023?', 'result': 'The office space reduction in 2023 was approximately $0.5 billion.', 'source_documents': [Document(page_content='For revenues by geography see Note 2 .\nThe following table presents long-lived assets by geographic area, which includes property and equipment, net \nand operating lease assets (in millions):\nAs of December 31,\n 2021 2022\nLong-lived assets:\nUnited States $ 80,207 $ 93,565 \nInternational  30,351  33,484 \nTotal long-lived assets $ 110,558 $ 127,049 \nNote 16.    Subsequent Event  \nIn January 2023, we announced a reduction of our workforce of approximately 12,000  roles. We expect to \nincur employee severance and related charges of $1.9 billion  to $2.3 billion , the majority of which will be recognized in \nthe first quarter of 2023.\nIn addition, we are taking actions to optimize our global office space. As a result we expect to incur exit costs \nrelating to office space reductions of appro

Check the result

![Office reduction](../assets/alphabet2.png)