# LangChain で PaLM API と Matching Engine を連携した検索システムを構築するサンプル

このノートブックの使用方法は、次の Blog 記事を参照してください。

[Google Cloud で生成 AI アプリケーションを作ろう！パート 7 : 複数サービスの組み合わせ技で実用的なアプリを作る](https://zenn.dev/google_cloud_jp/articles/generative-retrieval-augmented-generation)


## 事前準備

LangChain のライブラリーをインストールします。

!pip install langchain==0.0.260 --user

※ ライブラリーをインストールした直後は、カーネルの再起動が必要です。

## 製品カタログデータの確認

製品カタログデータの CSV をダウンロードして、LangChain で扱える形式に変換します。

In [1]:
!curl -OL https://github.com/GoogleCloudPlatform/python-docs-samples/raw/main/cloud-sql/postgres/pgvector/data/retail_toy_dataset.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 1351k  100 1351k    0     0  1895k      0 --:--:-- --:--:-- --:--:-- 1895k


In [2]:
from langchain.document_loaders  import CSVLoader

loader = CSVLoader(
    file_path='./retail_toy_dataset.csv',
    csv_args={'delimiter': ','}
)

documents = loader.load()

In [3]:
print(documents[0].page_content)

product_id: 7e8697b5b7cdb5a40daf54caf1435cd5
crawl_timestamp: 2020-01-24 20:51:13 +0000
product_url: https://www.walmart.com/ip/Koplow-Games-Set-of-2-D12-12-Sided-Rock-Paper-Scissors-Game-Dice-White-with-Pink-Letters-13060/125007983
product_name: Koplow Games Set of 2 D12 12-Sided Rock, Paper, Scissors Game Dice - White with Pink Letters #13060
list_price: 3.56
sale_price: 3.56
brand: Koplow Games
item_number: 
gtin: 799443797461
package_size: 
category: Toys | Shop Toys by Price
postal_code: 
available: False


最初のレコードを埋め込みベクトルに変換して、ストレージバケットにアップロードします。

In [4]:
from langchain.embeddings import VertexAIEmbeddings

embeddings = VertexAIEmbeddings(model_name='textembedding-gecko')
initial_vector = embeddings.embed_documents(documents[:1])

In [5]:
len(initial_vector[0]), initial_vector[0][:5]

(768,
 [0.0005408113356679678,
  -0.03913634270429611,
  0.022778663784265518,
  0.013153254054486752,
  -0.011897163465619087])

In [6]:
import json
with open('initial_vector.json', 'w') as f:
    json.dump({'id': 'initial', 'embedding': initial_vector[0]}, f)

In [7]:
%%bash
PROJECT_ID=$(gcloud config get project)
BUCKET=$PROJECT_ID-embeddings
gsutil mb -l us-central1 gs://$BUCKET
gsutil cp initial_vector.json gs://$BUCKET

Creating gs://etsuji-genai3-embeddings/...
Copying file://initial_vector.json [Content-Type=application/json]...
/ [1 files][ 16.6 KiB/ 16.6 KiB]                                                
Operation completed over 1 objects/16.6 KiB.                                     


## Matching Engine へのデータ保存

この作業を開始する前に、Matching Engine の初期設定が必要なので、注意してください。

In [10]:
from langchain.vectorstores import MatchingEngine

PROJECT_ID = !gcloud config get project

INDEX_ID= !gcloud ai indexes list \
  --region=us-central1 --format="value(name)" \
  --filter="displayName=retail-data" 2>/dev/null \
  | rev | cut -d "/" -f 1 | rev

INDEX_ENDPOINT_ID= !gcloud ai index-endpoints list --region=us-central1 \
  --format="value(name)" \
  --filter="displayName=retail-endpoint" 2>/dev/null \
  | rev | cut -d "/" -f 1 | rev

PROJECT_ID = PROJECT_ID[0]
INDEX_ID = INDEX_ID[0]
INDEX_ENDPOINT_ID = INDEX_ENDPOINT_ID[0]

vector_store = MatchingEngine.from_components(
    embedding=VertexAIEmbeddings(model_name='textembedding-gecko'),
    project_id=PROJECT_ID,
    region='us-central1',
    gcs_bucket_name='gs://{}-embeddings'.format(PROJECT_ID),
    index_id=INDEX_ID,
    endpoint_id=INDEX_ENDPOINT_ID
)

In [11]:
%%time
vector_store.add_documents(documents)

Updating MatchingEngineIndex index: projects/320807823234/locations/us-central1/indexes/1535085358142717952
Update MatchingEngineIndex index backing LRO: projects/320807823234/locations/us-central1/indexes/1535085358142717952/operations/4789337067064655872
MatchingEngineIndex index Updated. Resource name: projects/320807823234/locations/us-central1/indexes/1535085358142717952
CPU times: user 8.3 s, sys: 513 ms, total: 8.81 s
Wall time: 43min 25s


['313d0fef-e342-4897-9718-8a35bc657d33',
 '235f89ef-7df4-45ee-bc43-cebba71e777f',
 '397010d9-2a8b-4a69-89d0-512b74e298e1',
 'de8ce07f-5faa-4880-a2c5-84dfd36a1cf9',
 '7353212b-fb7b-44ba-bccc-9fe64b343d42',
 'eed3e6e7-3c8e-47a1-80f5-3ec5b5b18cf6',
 '9f721299-b527-477f-8de7-92ae0b30930d',
 '4bdfa42e-c796-4c3a-bcbf-47920ba05718',
 'd22085c3-03cb-4439-8030-b5047b463160',
 'c35867b8-0b41-49ed-a1bb-0779e0a4eff0',
 '5e19f4fb-2cab-47ae-a582-a93871683df1',
 '20e5ed37-c640-4d37-883b-c91e163752ab',
 'eb746930-1c2f-4d39-9760-33b82635585a',
 'b2f16629-276a-437a-ad6c-8ab96e85b62f',
 '0dc3940f-06bf-48d6-bcd7-dd09076c2406',
 '85b8c10e-677d-4a79-b81c-e990a2098546',
 '7b59190e-95c7-46a8-9408-f7aa62867a27',
 'fd53d6a7-77c5-41ce-9c35-934ecefa1986',
 '70a6d194-e490-47d3-a7f0-d13ed91b61d1',
 'a4ec2603-e37a-4082-89ee-c04d3b2af33e',
 '6a6d3ba8-81a7-41a7-b0a7-8d424249d21f',
 '44126ace-e6d8-4d75-ba40-6c9ef61dfccc',
 '5c0dd089-5035-4dfd-9a86-082080d670b3',
 'bb0459c5-e8ff-4f4a-be23-eb88ae2232f9',
 'cde95f3e-3b97-

In [185]:
result = vector_store.similarity_search('soccer')
print(result[0].page_content)

product_id: 39cee39aee26c1f821836ec38efe4666
crawl_timestamp: 2020-03-10 11:54:48 +0000
product_url: https://www.walmart.com/ip/Sport-Squad-Flux-Magnetic-Reversible-Soccer-Hockey-Tabletop-Multi-Sport-Game-Set-2ct-Soccer-Foosballs-2ct-Hockey-Pucks/554346135
product_name: Sport Squad Flux Magnetic Reversible Soccer & Hockey Tabletop Multi Sport Game Set, 2ct Soccer Foosballs, 2ct Hockey Pucks
description: The Sport Squad Flux is the latest innovation to combination gaming! Play soccer or reverse the playfield and play hockey! Control your players using the magnetic handles under the board and deliver perfect shots against your opponent! The Sport Squad Flux is the perfect miniature table top game for the entire family and measures 20" long, 12" wide, and 6" high. The Sport Squad Flux is also an excellent portable travel game for kids and adults! Make sure you grab a friend because the Flux requires two players for maximum fun! You will need to have a quick reaction time and focus to crus

In [213]:
from langchain.llms import VertexAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """The string enclosed in the backquotes (```) is product information.
From that information, please extract product_name field, product_url field, and description filed.
Your output shoud be in the following format.

Name: the product_name field
URL: the product_url field
Summary: short summary of the description field less than 100 words

Here's the product information: ```{context}```
"""

toy_search_llm = VertexAI(temperature=0.2, max_output_tokens=1024)
retriever = vector_store.as_retriever(search_kwargs={'include_metadata': True})
prompt = PromptTemplate(input_variables=['context'], template=prompt_template)

toy_search = RetrievalQA.from_llm(llm=toy_search_llm, prompt=prompt, retriever=retriever)

In [214]:
print(toy_search.run('sports'))

Name: Sport Squad Flux Magnetic Reversible Soccer & Hockey Tabletop Multi Sport Game Set, 2ct Soccer Foosballs, 2ct Hockey Pucks
URL: https://www.walmart.com/ip/Sport-Squad-Flux-Magnetic-Reversible-Soccer-Hockey-Tabletop-Multi-Sport-Game-Set-2ct-Soccer-Foosballs-2ct-Hockey-Pucks/554346135
Summary: The Sport Squad Flux is the latest innovation to combination gaming! Play soccer or reverse the playfield and play hockey!


In [215]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

tools = [
    Tool(
        name='Retail Toy QA System',
        func=toy_search.run,
        description='useful for when you need to answer questions about the toys. Input should be comma-separated words, do not input a fully formed question.',
    ),
]

agent_llm = VertexAI(temperature=0.4, max_output_tokens=1024)
agent = initialize_agent(
    tools, llm=agent_llm, 
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

In [216]:
prompt_template = """You are a recommendation system that introduces the product that best matches the user's request.
The string enclosed in the backquotes (```) is the user's request. You must output only the product name and the url in the format as shown below.

Name: PRODUCT_NAME
URL: PRODUCT_URL

Begin!

Request: ```{query}``
"""

In [225]:
query = 'Can you suggest a birthday present for my 9-year-old daughter who loves sports?'
result = agent.run(prompt_template.format(query=query))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find a toy that is suitable for a 9-year-old girl who loves sports.
Action: Retail Toy QA System
Action Input: 9-year-old, girl, sports[0m
Observation: [36;1m[1;3mName: Razor V17 Child Multi Sport Skateboard Scooter Kids Safety Helmet, Pink (2 Pack)
URL: https://www.walmart.com/ip/Razor-V17-Child-Multi-Sport-Skateboard-Scooter-Kids-Safety-Helmet-Pink-2-Pack/791564401
Summary: Your young, adventurous fashionista deserves to look cool and stylish but also needs top-notch protection.

Name: Razor V17 Multi-Sport Youth Helmet, Satin Pink
URL: https://www.walmart.com/ip/Razor-V17-Multi-Sport-Youth-Helmet-Satin-Pink/8457151
Summary: Ride your scooter, bike, or skateboard and rock this sporty, satin pink Razor V17 Youth Helmet to stay safe and protected.

Name: Bell Axle Shifter Bike Helmet, Pink, Youth 8+ (54-58cm)
URL: https://www.walmart.com/ip/Bell-Axle-Shifter-Bike-Helmet-Pink-Youth-8-54-58cm/49707954
Summary: The 

In [223]:
query = 'Can you suggest a birthday present for my 16-year-old son who loves techno music?'
result = agent.run(prompt_template.format(query=query))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find a product that is techno related and suitable for a 16 year old
Action: Retail Toy QA System
Action Input: techno, 16 year old[0m
Observation: [36;1m[1;3mName: Razor V17 Youth Skateboard Scooter Unisex Adjustable Protective Helmet (2 Pack)
URL: https://www.walmart.com/ip/Razor-V17-Youth-Skateboard-Scooter-Unisex-Adjustable-Protective-Helmet-2-Pack/302831684
Summary: Top-of-the-line youth safety helmet for biking, scootering, skating, and more. Fits head sizes 22-23.5 inches. Fits child ages 8-14. Features ergonomic interior padding that is both comfortable and secure. 17 top and side vents keep riders heads cool during activity. Adjustable side release buckles help you tailor the fit to your needs. Complies with CPSC standards. Color: Purple. Manufacturer warranty: 6-month warranty. Quantity: 2

Name: Razor V17 Multi-Sport Youth Helmet, Satin Pink
URL: https://www.walmart.com/ip/Razor-V17-Multi-Sport-Youth-H