Notes:

- This notebook is based on the [LangChain Tutorial](https://github.com/gkamradt/langchain-tutorials) (by Greg Kamradt) and the Official LangChain [Documentation](https://docs.langchain.com/docs/).
- The goal is to provide an introductory understanding of the use cases of LangChain.
- The default models (davinci-003 and gpt-3.5-turbo) are used throughout the notebook; using GPT-4 would no doubt get better results.
- A  <mark>paid</mark>` OPENAI_API_KEY` is required throughout the notebook.



## Setup






In [None]:
!pip install -q langchain==0.0.309
!pip install -q openai
!pip install -q python-dotenv tiktoken chromadb  p_tqdm tqdm bs4 weaviate-client

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.0/40.0 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m448.1/448.1 kB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.0/108.0 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from dotenv import load_dotenv
import os

load_dotenv()

# assumes you have a .env file in the root directory
openai_api_key = os.getenv('OPENAI_API_KEY', 'PutYourAPIkeyHere_IfYouDontHaveEnvfile')

# Hands-on cases for today
* **QuickStart**
* **Retreiving data from Weaviate**
* **Question and Answering over Context**
* **Agents**
* **Summarization**

## Quickstart in LangChain

Employing LLM, PromptTemplate, and OutputParser

**LLM**

In [None]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

llm = OpenAI()
chat_model = ChatOpenAI()

In [None]:
text = "What would be a cute company name for a company that makes dumplings?"

p = llm.predict(text)
print(p)

p = chat_model.predict(text)
print(p)



Dumpling Delights
"Dumpling Delights"


**PromptTemplate**

In [None]:
from langchain.prompts.chat import ChatPromptTemplate

template = "You are a helpful assistant that translates {input_language} to {output_language}."
human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])

chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")

[SystemMessage(content='You are a helpful assistant that translates English to French.'),
 HumanMessage(content='I love programming.')]

**OutputParser**

In [None]:
from langchain.schema import BaseOutputParser

"""Parse the output of an LLM call to a comma-separated list."""
class CommaSeparatedListOutputParser(BaseOutputParser):
    def parse(self, text: str):
        return text.strip().split(", ")

CommaSeparatedListOutputParser().parse("hi, bye")

['hi', 'bye']

**LLM + PromptTemplate + OutputParser**

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import BaseOutputParser

class CommaSeparatedListOutputParser(BaseOutputParser):
    def parse(self, text: str):
        return text.strip().split(", ")

template = """
You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more.
"""

human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])

chain = chat_prompt | ChatOpenAI() | CommaSeparatedListOutputParser()

In [None]:
chain.invoke( {"text": "colors"} )

['red', 'blue', 'green', 'yellow', 'orange']

**Load a defined chain from LangChain's [chainhub](https://github.com/hwchase17/langchain-hub)**

In [None]:
from langchain.chains import load_chain

chain = load_chain("lc://chains/llm-math/chain.json")
chain.run("whats 2 raised to .12")





[1m> Entering new LLMMathChain chain...[0m
whats 2 raised to .12[32;1m[1;3m
Answer: 1.0791812460476249[0m
[1m> Finished chain.[0m


'Answer: 1.0791812460476249'

**Save a chain**

In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=OpenAI(temperature=0), verbose=True)
llm_chain.save("llm_chain.json")

In [None]:
%cat llm_chain.json

{
    "memory": null,
    "verbose": true,
    "tags": null,
    "metadata": null,
    "prompt": {
        "input_variables": [
            "question"
        ],
        "input_types": {},
        "output_parser": null,
        "partial_variables": {},
        "template": "Question: {question}\n\nAnswer: Let's think step by step.",
        "template_format": "f-string",
        "validate_template": true,
        "_type": "prompt"
    },
    "llm": {
        "model_name": "text-davinci-003",
        "temperature": 0.0,
        "max_tokens": 256,
        "top_p": 1,
        "frequency_penalty": 0,
        "presence_penalty": 0,
        "n": 1,
        "request_timeout": null,
        "logit_bias": {},
        "_type": "openai"
    },
    "output_key": "text",
    "output_parser": {
        "_type": "default"
    },
    "return_final_only": true,
    "llm_kwargs": {},
    "_type": "llm_chain"
}

## Vectorize Data to Weaviate

In [None]:
# !pip install bs4 python-dotenv langchain p_tqdm tqdm aiohttp lxml openai
from dataclasses import dataclass, asdict
import json
import os
from pathlib import Path
from typing import Literal

from bs4 import BeautifulSoup as bs
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from p_tqdm import p_map
import weaviate
from weaviate.util import generate_uuid5

load_dotenv("../../.env", override=True)

True

**Set path to files**

In [None]:
LIMIT = 100
ptt_files = list(Path("../../other_data/ptt/").glob("**/*.xml"))[:LIMIT]
print(f"Number of files: {len(ptt_files)}")

Number of files: 100


**Inspect a post**

In [None]:
with ptt_files[7].open() as f:
    soup = bs(f.read(), "xml")
print(soup.prettify())

<?xml version="1.0" encoding="utf-8"?>
<TEI.2>
 <teiHeader>
  <metadata name="media">
   ptt
  </metadata>
  <metadata name="author">
   wupaul (捷派陣線聯盟)
  </metadata>
  <metadata name="post_id">
   M.1617787676.A.06E
  </metadata>
  <metadata name="year">
   2021
  </metadata>
  <metadata name="board">
   HatePolitics-ptt
  </metadata>
  <metadata name="title">
   [公告] wupaul 違反政黑板規2-16 前科*2 水桶14天
  </metadata>
 </teiHeader>
 <text>
  <body author="wupaul (捷派陣線聯盟)">
   <s>
    <w type="Na">
     當事人
    </w>
    <w type="COLONCATEGORY">
     :
    </w>
    <w type="FW">
     wupaul
    </w>
   </s>
   <s>
    <w type="VE">
     判決
    </w>
    <w type="Na">
     依據
    </w>
    <w type="COLONCATEGORY">
     ︰
    </w>
   </s>
   <s>
    <w type="Neu">
     16
    </w>
    <w type="PERIODCATEGORY">
     .
    </w>
    <w type="Nb">
     每
    </w>
    <w type="Nf">
     日
    </w>
    <w type="VC">
     發文
    </w>
    <w type="FW">
     /
    </w>
    <w type="Na">
     回文
    </w>
   

**Extract data**

In [None]:
SPLITTER = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=8000, chunk_overlap=0
)


# https://realpython.com/python-data-classes/
# https://realpython.com/python-type-checking/
@dataclass
class ContentItem:
    media: Literal["ptt"]  # media source of the post or comment
    content_type: Literal["post", "comment"]  # post or comment
    author: str  # author of the post or comment
    post_id: str  # id of the post (comments share id with the post)
    year: str  # year of the post
    board: str  # board of the post (NTU-ptt, gossiping, etc.)
    title: str  # title of the post
    text: str  # text of the post or comment
    rating: Literal[
        "pos", "neu", "neg", ""
    ]  # rating of the comment (positive, neutral, negative)
    order: int  # 0 for post, 1, 2, 3, ... for comments
    chunk: int  # if text too long, split into chunks
    total_chunks: int  # total number of chunks


def get_comments(parent: ContentItem, soup: bs) -> list[ContentItem]:
    """
    Get comments from a post.

    Args:
        parent: ContentItem object of the post
        soup: BeautifulSoup object of the post

    Returns:
        List of ContentItem objects
    """
    res = []
    comments = soup.find_all("comment")
    content_type = "comment"

    for comment_idx, comment in enumerate(comments, 1):
        author = comment["author"]
        rating = comment["c_type"]
        text = comment.get_text().replace("\n", "")
        chunks = SPLITTER.split_text(text)
        if not chunks:
            chunks = [""]
        for chunk_idx, chunk in enumerate(chunks, 1):
            res.append(
                ContentItem(
                    media=parent.media,
                    content_type=content_type,
                    post_id=parent.post_id,
                    author=author,
                    rating=rating,
                    text=chunk,
                    year=parent.year,
                    board=parent.board,
                    title=parent.title,
                    order=comment_idx,  # 0 for post, 1, 2, 3, ... for comments
                    chunk=chunk_idx,
                    total_chunks=len(chunks),
                )
            )
    return res


def get_post_info(path: Path) -> list[ContentItem]:
    """
    Get post information from a post

    Args:
        path: path to the post

    Returns:
        List of ContentItem objects
    """
    content_type = "post"

    with path.open() as f:
        soup = bs(f.read(), "xml")

    media = soup.find("metadata", attrs={"name": "media"}).get_text().replace("\n", "")
    author = (
        soup.find("metadata", attrs={"name": "author"}).get_text().replace("\n", "")
    )
    post_id = (
        soup.find("metadata", attrs={"name": "post_id"}).get_text().replace("\n", "")
    )
    year = soup.find("metadata", attrs={"name": "year"}).get_text().replace("\n", "")
    board = soup.find("metadata", attrs={"name": "board"}).get_text().replace("\n", "")
    title = soup.find("metadata", attrs={"name": "title"}).get_text().replace("\n", "")
    text = soup.find("body").get_text().replace("\n", "")
    chunks = SPLITTER.split_text(text)
    if not chunks:
        chunks = [""]

    posts = []
    for idx, chunk in enumerate(chunks, 1):
        posts.append(
            ContentItem(
                media=media,
                author=author,
                post_id=post_id,
                year=year,
                board=board,
                title=title,
                text=chunk,
                rating="",
                content_type=content_type,
                order=0,  # 0 for post, 1, 2, 3, ... for comments
                chunk=idx,
                total_chunks=len(chunks),
            )
        )
    if not posts:
        print(f"Empty post: {path}")
        raise ValueError(path)  # shouldn't happen

    comments = get_comments(posts[0], soup)

    return posts + comments


def dedupe(items: list[ContentItem]) -> list[ContentItem]:
    """
    Dedupe items

    Args:
        items: list of ContentItem objects

    Returns:
        List of ContentItem objects
    """
    res = []
    seen = set()
    for item in items:
        dumps = json.dumps(asdict(item))  # strings are hashable
        if dumps not in seen:
            seen.add(dumps)
            res.append(item)
    return res

**Use multiprocessing**

- Use multiprocessing when you are CPU-bound

- More info: https://realpython.com/python-concurrency/#multiprocessing-version

In [None]:
res = p_map(get_post_info, ptt_files)  # default uses all cores
res = dedupe([item for sublist in res for item in sublist])  # flatten list of lists

print(f"Number of posts/comments: {len(res)}")

  0%|          | 0/100 [00:00<?, ?it/s]

Number of posts/comments: 2577


**Initialize Vector DB Weaviate.Client**

- The `url` depends on your IP/url if self-hosted. See also: [Weaviate installation](https://weaviate.io/developers/weaviate/installation)

- You can also use Weaviate's services: [here](https://weaviate.io/products)

In [None]:
client = weaviate.Client(
    url=os.environ["WEAVIATE_URL"],
    auth_client_secret=weaviate.AuthApiKey(api_key=os.environ["WEAVIATE_ADMIN_PASS"]),
    timeout_config=(5, 30),  # (connect timeout, read timeout) # type: ignore
    additional_headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
)

**Creating a schema**

- Each Weaviate class requires a `schema` that defines the data structure in a formal language.
  
- `schema` is a blueprint of how the data is to be organized and stored.

- `schema` defines:
  - data classes (i.e., collections of objects),
  
  - properties within each class (e.g., name, type, description, settings),
  
  - possible graph links between data objects (cross-references),
  
  - the vectorizer module (if any) to be used for the class,
  
  - settings such as the vectorizer module and index configurations.


- More info: https://weaviate.io/developers/weaviate/tutorials/schema

In [None]:
# Our schema for the content items (posts and comments from PTT)

schema = {
    "class": "TestContentItem",
    "description": "General content item",
    "moduleConfig": {"text2vec-openai": {"vectorizeClassName": False}},
    "vectorizer": "text2vec-openai",  # This could be any vectorizer
    "properties": [
        {
            "name": "media",
            "description": "Source of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "content_type",
            "description": "Type of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "author",
            "description": "Author of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "post_id",
            "description": "Post id of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "year",
            "description": "Year of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "board",
            "description": "Board of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": False,
                    "vectorizePropertyName": True,
                }
            },
        },
        {
            "name": "title",
            "description": "Title of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": False,
                    "vectorizePropertyName": True,
                }
            },
        },
        {
            "name": "text",
            "description": "Text of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": False,
                    "vectorizePropertyName": True,
                }
            },
        },
        {
            "name": "rating",
            "description": "Rating of the content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": False,
                    "vectorizePropertyName": True,
                }
            },
        },
        {
            "name": "order",
            "description": "0 for post, 1, 2, 3, ... for comments",
            "dataType": ["int"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "chunk",
            "description": "Chunk of the current content",
            "dataType": ["int"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
        {
            "name": "total_chunks",
            "description": "Total chunks of the content",
            "dataType": ["int"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True,
                    "vectorizePropertyName": False,
                }
            },
        },
    ],
}

In [None]:
client.schema.create_class(schema)

**Upload to Weaviate**

The code below will automatically generate a vector for each item using the `vectorizer` specified in the schema.

In [None]:
client.batch.configure(
    num_workers=16,
    batch_size=100,
    dynamic=True,
)
with client.batch as batch:
    for item in res:
        batch.add_data_object(
            data_object=asdict(item),
            class_name="TestContentItem",
            uuid=generate_uuid5(asdict(item)),
        )

**Use weaviate-client to query Weaviate**

In [None]:
attributes = [field.name for field in fields(ContentItem)]
response = (
    client.query.get("TestContentItem", attributes)
    .with_hybrid(
        query="放假"
    )
    .with_additional("vector")
    .with_limit(1)
    .do()
)
response = response["data"]["Get"]["TestContentItem"]
for key, val in response[0].items():
    print(f"{key}: {val}")
print(f"Vector length: {len(response[0]['_additional']['vector'])}")

_additional: {'vector': [-0.012226269, -0.01181336, 0.002815121, -0.033452526, -0.020701375, -0.01902175, -0.0187978, -0.0018160943, -0.008167176, -0.01450076, 0.016320353, -0.0035779506, -0.014402782, 0.014073855, -0.0029218472, 0.010336691, 0.011169504, 0.01373793, 0.0013524478, -0.005290818, -0.0053713, -0.0073553566, -0.038015507, 0.021765137, -0.0018633336, 0.0002869359, 0.015354569, -0.007649291, -0.019441657, -0.011141511, 0.014297806, -0.011169504, -0.013269035, -0.0017294886, -0.0109805465, -0.0009237936, 0.009566862, 0.016936217, 0.0059381733, 0.0003282704, 0.008398123, -0.010028759, 0.017790025, 0.0085031, -0.0088110315, 0.009839801, 0.025670264, -0.03496419, 0.00505637, 0.024228586, 0.007908233, 0.026818007, -0.009916784, -0.0017067436, 0.0021362726, 0.010441667, 0.0117013855, 0.009741823, -0.011498431, -0.003208783, 0.017552078, -0.006477053, -0.031269014, 0.005903181, -0.005826198, -0.014038864, -0.02582423, 0.005259325, -0.009244935, 0.0014897921, 0.029421426, 0.02865159

## Retreiving SoMe data from Weaviate

* **Taiwan Social Media Corpus (SoMe)**: A large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan.
* *Note: For efficiency, we've vectorized and uploaded the SoMe data to Weaviate. Please check out [this notebook](https://colab.research.google.com/drive/16IqjfMMy2k1JpOlyr6KUIx6U0tP_d5lJ?usp=sharing) for more details on vectorizing text data to Weaviate.*

In [None]:
import dataclasses
from dataclasses import dataclass
from pprint import pprint
import os
import weaviate
from langchain.retrievers.weaviate_hybrid_search import WeaviateHybridSearchRetriever


@dataclass
class ContentItem:
    media: str          # media source of the PTT post/comment
    content_type: str   # post/comment
    author: str         # author of the post/comment
    post_id: str        # id of the post
    year: str           # year of the post
    board: str          # board of the post
    title: str          # title of the post
    text: str           # content text of the post/comment
    rating: str         # rating of the comment
    order: int          # 0 for post, 1, 2, 3, ... for comments
    chunk: int          # if text too long, split into chunks
    total_chunks: int   # total number of chunks


os.environ['WEAVIATE_ADMIN_PASS'] = "weaviate-ultimate-forever-pass"

In [None]:
client = weaviate.Client(
    url="http://140.112.147.128:8000",
    auth_client_secret=weaviate.AuthApiKey(api_key=os.environ["WEAVIATE_ADMIN_PASS"]),
    # (connect timeout, read timeout) # type: ignore
    timeout_config=(5, 30),
    additional_headers={'X-OpenAI-Api-Key': openai_api_key}
)

attributes = [field.name for field in dataclasses.fields(ContentItem)]

Define a function with **WeaviateHybridSearchRetriever** for searching keywords

In [None]:
def retrieve_docs(keyword, count=8):
    retriever = WeaviateHybridSearchRetriever(
        client=client,
        index_name="ContentItem",
        text_key="text",
        alpha=0.5,              # The weight of the text key in the hybrid search.
        attributes=attributes,  # The attributes to return in the results.
        k=count,                # The attributes to return in the results.
    )
    r = retriever.get_relevant_documents(keyword)
    return r

In [None]:
docs = retrieve_docs('筆電')
pprint(docs)

[Document(page_content='載下來了，沒想到多年後，手機、筆電、iPad換新後，', metadata={'author': 'maggiekiki', 'board': 'movie-ptt', 'chunk': 1, 'content_type': 'comment', 'media': 'ptt', 'order': 30, 'post_id': 'M.1641378653.A.A1F', 'rating': 'neu', 'title': 'Re: [討論] 目前的DVD與藍光是否漸漸走向淘汰了？', 'total_chunks': 1, 'year': '2022'}),
 Document(page_content="明天MBP 14'就到惹啦紀念一下最後一天用這台破爛筆電修他的錢早就超過當初買他的錢= =我這輩子再買雙A的產品我就是狗:)", metadata={'author': 'aa871220 (NTU網美所阿肥)', 'board': 'NTU-ptt', 'chunk': 1, 'content_type': 'post', 'media': 'ptt', 'order': 0, 'post_id': 'M.1641531094.A.BD2', 'rating': '', 'title': '[廢文] acer筆電', 'total_chunks': 1, 'year': '2022'}),
 Document(page_content='每個人收藏實體的原因都不同像是畫質音效，喜歡實體感覺，方便轉賣或借人等等都非常有道理推文有人提到串流會下架，這也沒錯不過既然是收藏，也可以購買數位收藏像我自己是用iTunes收藏電影，自己喜歡的應該也買了一百多部比較新的電影，或是修復版，也都有4K畫質好處是隨點隨看，不限平台，想用電視，平板, 筆電,手機都可以蘋果也夠大，不太會倒，當然當年也沒有人認為諾基亞會倒真的擔心的話，購買完也是可以下載下來的我自己是數位派，實體這塊市場應該還會繼續萎縮', metadata={'author': 'A1bertPujols (The Machine)', 'board': 'movie-ptt', 'chunk': 1, 'content_type': 'post', 'media': 'ptt

## Q&A using context


In order to use LLMs for question and answer we must:

1. Pass the LLM relevant context it needs to answer a question
2. Pass it our question that we want answered

The process is like: `llm(context + question) ==> answer`

**Short text**

In [None]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

context = """
Rachel 是 33歲
Bob 是 55歲
Kevin 是 66歲
"""

question = "誰的年齡小於40歲?"

In [None]:
answer = llm(context + question)
print(answer)


Rachel 33歲


**Longer text**

In [None]:
docs = retrieve_docs('筆電')
text = docs[2].page_content

context = text
question = "數位收藏的優點是什麼?"

output = llm(context + question)

print(output.strip())

數位收藏的優點是可以隨時隨地觀看，不限平台，可以在電視、平板、筆電、手機等設備上觀看，而且可以下載下來，可以收藏更多的影片，而且不用擔心影片會下架，另外也可以購買4K畫質的影片，讓觀看體驗更加棒。


## Summarization

E.g., articles, transcripts, chat history, Slack/Discord, customer interactions, legal documents, podcasts, Tweet, code bases, reviews, etc.



In [None]:
'''Initialize a WeaviateHybridSearchRetriever for searching keywords'''

def retrieve_docs(keyword, count=8):
    retriever = WeaviateHybridSearchRetriever(
        client=client,
        k=count,
        # weighting for each search algorithm (alpha = 0 (sparse, BM25), alpha = 1 (dense), alpha = 0.5 (equal weight for sparse and dense))
        alpha=0.5,
        index_name="ContentItem",
        text_key="text",
        attributes=attributes,
    )
    r = retriever.get_relevant_documents(keyword)
    return r

In [None]:
docs = retrieve_docs('筆電')
text = docs[2].page_content

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate

template = """
您是生成文字摘要的助手
使用者將傳入一段文本，請您產生此文本的摘要
摘要的token數量必須少於100
只輸出摘要
"""

human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])

llm = ChatOpenAI(openai_api_key=openai_api_key, verbose=True,)

chain = chat_prompt | llm

In [None]:
# summarization
output = chain.invoke( {"text": text} )
output

AIMessage(content='實體收藏的原因各有不同，如畫質音效、喜歡實體感覺、方便轉賣或借人等。但數位收藏也有其優勢，例如隨點隨看、不限平台、可下載等。市場上實體收藏可能會繼續萎縮。')

In [None]:
# compare with the original text
text

'每個人收藏實體的原因都不同像是畫質音效，喜歡實體感覺，方便轉賣或借人等等都非常有道理推文有人提到串流會下架，這也沒錯不過既然是收藏，也可以購買數位收藏像我自己是用iTunes收藏電影，自己喜歡的應該也買了一百多部比較新的電影，或是修復版，也都有4K畫質好處是隨點隨看，不限平台，想用電視，平板, 筆電,手機都可以蘋果也夠大，不太會倒，當然當年也沒有人認為諾基亞會倒真的擔心的話，購買完也是可以下載下來的我自己是數位派，實體這塊市場應該還會繼續萎縮'

**Longer Text**

- For longer text, it can become a pain to manage and exceed token limits.
- LangChain has out of the box support for different methods to summarize via their [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html).

In [None]:
# Load a long text

docs = retrieve_docs('學校',150)
text = docs[147].page_content

4421

In [None]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [None]:
# Load a long text
texts = retrieve_docs('學校',150)
text = texts[147].page_content

num_tokens = llm.get_num_tokens(text)

print (f"There are {num_tokens} tokens in the text")

There are 7995 tokens in the text



> First, we need to split it up, i.e., chunking the text into smaller pieces. [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) is easy to control; you can also check out [other available splitters](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html).

```
    length_function: how the length of chunks is calculated. Defaults to just counting number of characters, but it's pretty common to pass a token counter here.
    chunk_size: the maximum size of your chunks (as measured by the length function).
    chunk_overlap: the maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks (e.g. do a sliding window).
    add_start_index: whether to include the starting position of each chunk within the original document in the metadata.
```

In [None]:
# Split text into multiple small documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,   # we set a small chunk size, just to show.
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)

docs = text_splitter.create_documents([text])
print(docs[0])
print(docs[1])

page_content='繼上次Part1   https://moptt.tw/p/movie.M.1643728037.A.FF5介紹了幾部比較有名的歐洲片之後 Part2要來介紹一些在台灣較冷門的電影' metadata={'start_index': 0}
page_content='大部分在台灣沒有上映 不過在各大影音網站還是找得到 還是要強調所有推薦的電影都是憑我個人喜好 所以如果覺得不好看不要罵我XDDD一樣會有我認為的闔家觀賞程度(一到五顆星)' metadata={'start_index': 91}


> Second, we need to load up a chain which will make successive calls to the LLM for us.
> Check out [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo) for other chain types besides `map-reduce`.
> <div>
<img src="https://drive.google.com/uc?id=1ifElnWJ9xrKWpB95YLQR_mpjImhQwB5x" width="450"/>
</div>


In [None]:
# Load the chain
chain = load_summarize_chain(llm=llm,
                             chain_type='map_reduce',
                             ) # verbose: True => to see what is getting sent to the LLM

In [None]:
# Run through the split documents, summarize the chunks, then get a summary of the summary.
output = chain.run(docs)
pprint(output)

(' This post reviews three European films - a German, Russian, and Bulgarian '
 'movie - all rated on a scale of one to five stars. The films explore themes '
 'of love, family, and generational differences, and are recommended for '
 'family movie night, but may not be suitable for elderly people. The German '
 'film, Thirst, is a tragedy set in a rural area during a summer drought, and '
 'follows a family living on a mountain top. It is not recommended for family '
 'viewing, and is rated three stars.')


# More

Other main use cases in LangChain

## Agents

- The core idea of agents is to use an LLM to choose a sequence of actions to take.
- In `chains`, a sequence of actions is hardcoded (in code); whereas in `agents`, a language model is used as a reasoning engine to determine which actions to take and in which order.
- Agent is powered by a language model and a prompt; the inputs are:

  - List of available tools
  - User input
  - Any previously executed steps (intermediate_steps)

- The Agent chain returns either the next action to take or the final response to send to the user (AgentAction or AgentFinish).

- For a full list of agent types see [agent types](https://python.langchain.com/docs/modules/agents/agent_types/)

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import tool
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.render import format_tool_to_openai_function   # let the agent know what tools it can use
from langchain.agents.format_scratchpad import format_to_openai_functions  # format intermediate steps to messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser   # convert the output message into an agent action/agent finish

In [None]:
# Load a LLM to control the agent
llm = ChatOpenAI(temperature=0)

In [None]:
# Create a simple tool
@tool
def get_word_length(word: str) -> int:
    """Returns the length of a word."""
    return len(word)

tools = [get_word_length]

Because `OpenAI Function Calling` is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format. We will just have two input variables: `input` (for the user question) and `agent_scratchpad` (for any previous steps taken)

In [None]:
# Create the prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are very powerful assistant, but bad at calculating lengths of words."),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

In [None]:
# Bind the tools to the LLM
llm_with_tools = llm.bind(
    functions=[format_tool_to_openai_function(t) for t in tools]
)

In [None]:
# Create the agent
agent = {
    "input": lambda x: x["input"],
    "agent_scratchpad": lambda x: format_to_openai_functions(x['intermediate_steps'])
    } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()

In [None]:
# Ask
output = agent.invoke({
    "input": "how many letters in the word, language?",
    "intermediate_steps": []
})

output

AgentActionMessageLog(tool='get_word_length', tool_input={'word': 'language'}, log="\nInvoking: `get_word_length` with `{'word': 'language'}`\n\n\n", message_log=[AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_word_length', 'arguments': '{\n  "word": "language"\n}'}})])

> ⬆️⬆️⬆️ It responds with an `AgentAction` to take (it's actually an `AgentActionMessageLog` - a subclass of `AgentAction` which also tracks the full message log). So this is just the first step - now we need to write a runtime for this. The simplest one is just one that continuously loops, calling the agent, then taking the action, and repeating until an `AgentFinish` is returned.

> Yet, `AgentExecutor` bundles up all of the above and adds in error handling, early stopping, tracing, and other quality-of-life improvements that reduce safeguards we need to write ⬇️⬇️⬇️


In [None]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

output = agent_executor.invoke({"input": "how many letters in the word, language?"})
output



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_word_length` with `{'word': 'educa'}`


[0m[36;1m[1;3m5[0m[32;1m[1;3mThere are 5 letters in the word "educa".[0m

[1m> Finished chain.[0m


{'input': 'how many letters in the word educa?',
 'output': 'There are 5 letters in the word "educa".'}

*For defining advanced custom tools, please check out [LangChain Tools](https://python.langchain.com/docs/modules/agents/tools/custom_tools).*

## VectorStore

In [None]:
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# Load the document in the repo folder, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('./state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())

In [None]:
# Ask a question
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

## Extraction

Extraction is the process of parsing data from a piece of text. This is commonly used with output parsing in order to *structure* our data.

* **Deep Dive** - [Use LLMs to Extract Data From Text (Expert Level Text Extraction](https://youtu.be/xZzvwR9jdPA), [Structured Output From OpenAI (Clean Dirty Data)](https://youtu.be/KwAXfey-xQk)
* **Examples** - [OpeningAttributes](https://twitter.com/GregKamradt/status/1646500373837008897)
* **Use Cases:** Extract a structured row from a sentence to insert into a database, extract multiple rows from a long document to insert into a database, extracting parameters from a user query to make an API call

- A popular library for advanced extraction is [Kor](https://eyurtsev.github.io/kor/).

In [None]:
# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using a chat model, defaults to gpt-3.5-turbo
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

### Vanilla Extraction

In [None]:
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""

In [None]:
# Make the prompt which combines the instructions w/ the fruit names
prompt = (instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])

print(output.content)
print(type(output.content))

{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}
<class 'str'>


In [None]:
output_dict = eval(output.content)

print(output_dict)
print(type(output_dict))

{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}
<class 'dict'>


While this worked this time, it's not a long term reliable method for more advanced use cases

### LangChain's Response Schema

- LangChain's response schema does two things for us:

  1. Autogenerate the a prompt with bonafide format instructions. This is great because I don't need to worry about the prompt engineering side, I'll leave that up to LangChain!

  2. Read the output from the LLM and turn it into a proper python object for me

- We are going to pull out the song and artist that a user wants to play from a pseudo chat message.

In [None]:
# The schema we want to output
response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

# The parser that will look for the LLM output in the schema and return it back
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [None]:
# The format instructions provided by LangChain
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```


In [None]:
# The prompt template that brings it all together
# Note: This is a different prompt template than before because we are using a Chat Model

prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \
                                                    {format_instructions}\n{user_prompt}")
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

In [None]:
my_query = prompt.format_prompt(user_prompt="I really like Sugar by Maroon 5")
print(my_query.messages[0].content)

Given a command from the user, extract the artist and song names 
                                                     The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```
I really like Sugar by Maroon 5


In [None]:
chat_model_output = chat_model(my_query.to_messages())
output = output_parser.parse(chat_model_output.content)

print(output)

{'artist': 'Maroon 5', 'song': 'Sugar'}


Now we have a dictionary that we can use later down the line

<span style="background:#fff5d6">Warning:</span> The parser looks for an output from the LLM in a specific format. Your model may not output the same format every time. Make sure to handle errors with this one. GPT4 and future iterations will be more reliable.

For more advanced parsing, please check out [Kor](https://eyurtsev.github.io/kor/).

## Evaluation

Evaluation is the process of doing quality checks on the output of your applications. Normal, deterministic, code has tests we can run, but judging the output of LLMs is more difficult because of the unpredictableness and variability of natural language. LangChain provides tools that aid us in this journey.

* **Examples** - [Lance Martin's Advanced](https://twitter.com/RLanceMartin) [Auto-Evaluator](https://github.com/rlancemartin/auto-evaluator)
* **Use Cases:** Run quality checks on your summarization or Question & Answer pipelines, check the output of you summarization pipeline

In [None]:
# Embeddings, store, and retrieval
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Model and doc loader
from langchain import OpenAI
from langchain.document_loaders import TextLoader

# Eval
from langchain.evaluation.qa import QAEvalChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [None]:
# Our long essay from before
loader = TextLoader('data/PaulGrahamEssays/worked.txt')
doc = loader.load()

print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document
You have 74663 characters in that document


Build VectoreStore so we can do question and answers

In [None]:
# Build VectoreStore
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

Now you have 29 documents that have an average of 2,930 characters (smaller pieces)


In [None]:
# Embeddings and docstore
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

Make a retrieval chain.
- `input_key` parameter tells the chain which key from a dictionary I supply has our prompt/query in it.
- We specify `question` to match the question in the dict below

In [None]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), input_key="question")

In [None]:
question_answers = [
    {'question' : "Which company sold the microcomputer kit that his friend built himself?", 'answer' : 'Healthkit'},
    {'question' : "What was the small city he talked about in the city that is the financial capital of USA?", 'answer' : 'Yorkville, NY'}
]

We use `chain.apply` to run the questions one by one.

We'll get our list of question and answers dictionaries back, but there'll be another key in the dictionary `result` which will be the output from the LLM.

*Note: we specifically made the 2nd question ambigious and tough to answer in one pass so the LLM would get it incorrect*

In [None]:
predictions = chain.apply(question_answers)
predictions

[{'question': 'Which company sold the microcomputer kit that his friend built himself?',
  'answer': 'Healthkit',
  'result': ' The microcomputer kit was sold by Heathkit.'},
 {'question': 'What was the small city he talked about in the city that is the financial capital of USA?',
  'answer': 'Yorkville, NY',
  'result': ' The small city he talked about is New York City, which is the financial capital of the United States.'}]

We then have the LLM compare the ground truth answer (the `answer` key) with the result from the LLM (`result` key).

Or simply, we are asking the LLM to grade itself.

In [None]:
# Start eval chain
eval_chain = QAEvalChain.from_llm(llm)

# Have it grade itself. The code below helps the eval_chain know where the different parts are
graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')

In [None]:
graded_outputs

[{'text': ' CORRECT'}, {'text': ' INCORRECT'}]



For #1, it was "Healthkit" and the prediction was "The microcomputer kit was sold by Heathkit."

The LLM knew that the answer and result were the same and gave us the label: `correct` .

For #2, it knew they were not the same and gave us the label: `incorrect` .

## Querying Tabular Data

It is super powerful to be able to query this data with LangChain and pass it through to an LLM.

* **Use Cases:** Use LLMs to query data about users, do data analysis, get real time information from your DBs
* For futher reading, plz check out **Agents + Tabular Data** ([Pandas](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/pandas.html), [SQL](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html), [CSV](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html))

Let's query an SQLite DB with natural language. We'll look at the [San Francisco Trees](https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq) dataset.

In [None]:
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [None]:
# Set the db path
sqlite_db_path = 'data/San_Francisco_Trees.db'
db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")

In [None]:
# Create a chain that takes the LLM and DB
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)



In [None]:
db_chain.run("How many Species of trees are there in San Francisco?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many Species of trees are there in San Francisco?
SQLQuery:[32;1m[1;3mSELECT COUNT(DISTINCT "qSpecies") FROM "SFTrees";[0m
SQLResult: [33;1m[1;3m[(578,)][0m
Answer:[32;1m[1;3mThere are 578 Species of trees in San Francisco.[0m
[1m> Finished chain.[0m


'There are 578 Species of trees in San Francisco.'

There are actually a few steps going on here:
1. Find which table to use
2. Find which column to use
3. Construct the correct sql query
4. Execute that query
5. Get the result
6. Return a natural language reponse back

In [None]:
import sqlite3
import pandas as pd

# Connect to the SQLite database
connection = sqlite3.connect(sqlite_db_path)

# Define SQL query
query = "SELECT count(distinct qSpecies) FROM SFTrees"

# Read the SQL query into a Pandas DataFrame
df = pd.read_sql_query(query, connection)

# Close connection
connection.close()

In [None]:
# Display the result in the first column first cell
print(df.iloc[0,0])

578


## Interacting with APIs

If the data or action is behind an API, we'll need the LLM to interact with APIs

* **Use Cases:** Understand a request from a user and carry out an action, be able to automate more real-world workflows

- This topic is closely related to Agents and Plugins, though we'll look at a simple use case for this section. For more information, check out [LangChain + plugins](https://python.langchain.com/en/latest/use_cases/agents/custom_agent_with_plugin_retrieval_using_plugnplai.html).

LangChain's `APIChain` has the ability to read API documentation and understand which endpoint it needs to call.

Below is a purposefully sloppy API documentation to demonstrate how this works.

In [None]:
from langchain.chains import APIChain
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [None]:
api_docs = """

BASE URL: https://restcountries.com/

API Documentation:

The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:
    - name: Name of country - Ex: italy, france

The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:
    - currency: 3 letter currency. Example: USD, COP

Woo! This is my documentation
"""

chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)

In [None]:
# make an API call for the country endpoint
chain_new.run('Can you tell me information about france?')



[1m> Entering new APIChain chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france[0m
[33;1m[1;3m[{"name":{"common":"France","official":"French Republic","nativeName":{"fra":{"official":"République française","common":"France"}}},"tld":[".fr"],"cca2":"FR","ccn3":"250","cca3":"FRA","cioc":"FRA","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"EUR":{"name":"Euro","symbol":"€"}},"idd":{"root":"+3","suffixes":["3"]},"capital":["Paris"],"altSpellings":["FR","French Republic","République française"],"region":"Europe","subregion":"Western Europe","languages":{"fra":"French"},"translations":{"ara":{"official":"الجمهورية الفرنسية","common":"فرنسا"},"bre":{"official":"Republik Frañs","common":"Frañs"},"ces":{"official":"Francouzská republika","common":"Francie"},"cym":{"official":"French Republic","common":"France"},"deu":{"official":"Französische Republik","common":"Frankreich"},"est":{"official":"Prantsuse Vabariik","common":"Prantsusmaa"},"f

' France is an officially-assigned, independent country located in Western Europe. Its capital is Paris and its official language is French. Its currency is the Euro (€). It has a population of 67,391,582 and its borders are with Andorra, Belgium, Germany, Italy, Luxembourg, Monaco, Spain, and Switzerland.'

In [None]:
# make an API call for the currency COP
chain_new.run('Can you tell me about the currency COP?')



[1m> Entering new APIChain chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/currency/COP[0m
[33;1m[1;3m[{"name":{"common":"Colombia","official":"Republic of Colombia","nativeName":{"spa":{"official":"República de Colombia","common":"Colombia"}}},"tld":[".co"],"cca2":"CO","ccn3":"170","cca3":"COL","cioc":"COL","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"COP":{"name":"Colombian peso","symbol":"$"}},"idd":{"root":"+5","suffixes":["7"]},"capital":["Bogotá"],"altSpellings":["CO","Republic of Colombia","República de Colombia"],"region":"Americas","subregion":"South America","languages":{"spa":"Spanish"},"translations":{"ara":{"official":"جمهورية كولومبيا","common":"كولومبيا"},"bre":{"official":"Republik Kolombia","common":"Kolombia"},"ces":{"official":"Kolumbijská republika","common":"Kolumbie"},"cym":{"official":"Gweriniaeth Colombia","common":"Colombia"},"deu":{"official":"Republik Kolumbien","common":"Kolumbien"},"est":{"official":"Colom

' The currency of Colombia is the Colombian peso (COP), symbolized by the "$" sign.'

In both cases the APIChain read the instructions and understood which API call it needed to make.

Once the response returned, it was parsed and then my question was answered. Awesome 🐒

## Chatbots


Chatbots use many of the tools with the addition of an important topic: Memory.

* **Examples** - [ChatBase](https://www.chatbase.co/?via=greg) (Affiliate link), [NexusGPT](https://twitter.com/achammah1/status/1649482899253501958?s=20), [ChatPDF](https://www.chatpdf.com/)
* **Use Cases:** Have a real time interaction with a user, provide an approachable UI for users to ask natural language questions.
- There are a ton of different [types of memory](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html).

In [None]:
from langchain.llms import OpenAI
from langchain import LLMChain
from langchain.prompts.prompt import PromptTemplate

# Chat specific components
from langchain.memory import ConversationBufferMemory

For this use case, we will customize the context that is given to a chatbot.

We could pass instructions on how the bot should respond, but also any additional relevant information it needs.

In [None]:
template = """
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"],
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")

In [None]:
llm_chain = LLMChain(
    llm=OpenAI(openai_api_key=openai_api_key),
    prompt=prompt,
    verbose=True,
    memory=memory
)

In [None]:
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it


Human: Is an pear a fruit or vegetable?
Chatbot:[0m

[1m> Finished chain.[0m


" Well, it depends on if you're a fruit or vegetable person!"

In [None]:
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

Human: Is an pear a fruit or vegetable?
AI:  Well, it depends on if you're a fruit or vegetable person!
Human: What was one of the fruits I first asked you about?
Chatbot:[0m

[1m> Finished chain.[0m


' An apple a day keeps the doctor away!'