# Vicuna Langchain Demo

Example of locally running [`Vicuna`](https://github.com/lm-sys/FastChat), a *llama.cpp* based large language model (LLM) under [`langchain`](https://github.com/hwchase17/langchain), in a Jupyter notebook running a Python 3.10 kernel.

In [1]:
!pip install -qU chromadb langchain==0.0.142 tiktoken tqdm load_dotenv ipywidgets pinecone-client pyllama llama-index llama-cpp-python html2text pyllamacpp
!pip list | grep langchain
!pip list | grep torch
!pip list | grep llama

langchain                     0.0.142
torch                         2.0.0
torchvision                   0.15.1
llama-cpp-python              0.1.43
llama-index                   0.5.27
pyllama                       0.0.9
pyllamacpp                    2.1.2


## Data preparation

In [2]:
def file_metadata(filename):
    d = dict()
    d["source"] = filename.replace('../data/docs/', 'https://').replace('index.html', '').replace('.html', '')
    return d

In [3]:
from pathlib import Path
from typing import Dict

from llama_index.readers.file.base_parser import BaseParser

class HtmlParser(BaseParser):
    """Html parser."""

    def _init_parser(self) -> Dict:
        """Init parser."""
        return {}

    def parse_file(self, file: Path, errors: str = "ignore") -> str:
        """Parse file."""
        try:
            import html2text
        except ImportError:
            raise ImportError(
                "html2text is required to read html files: `pip install html2text`"
            )
        with open(file, "r") as fp:
            text = fp.read()
            text_maker = html2text.HTML2Text()
            text_maker.ignore_links = True
            text_maker.ignore_images = True
            text_maker.bypass_tables = False
            text = text_maker.handle(text)
            # Remove extra white space
            text = ' '.join(text.split())

        return text

In [4]:
file_extractor: Dict[str, BaseParser] = {
    ".htm": HtmlParser(),
    ".html": HtmlParser(),
}

In [5]:
%%time
from llama_index import SimpleDirectoryReader
docs = SimpleDirectoryReader(input_dir='../data/docs/', recursive=True, file_extractor=file_extractor, file_metadata=file_metadata).load_langchain_documents()
len(docs)

CPU times: user 10.4 s, sys: 275 ms, total: 10.7 s
Wall time: 11.5 s


171

In [7]:
%%time
from tqdm.auto import tqdm

urls = ['https://www.priceless.com/m/filter/options/category/506', 
        'https://www.priceless.com/m/filter/options/category/510',
        'https://www.priceless.com/m/filter/options/location/9716/trk/20211/']
documents = []

for doc in tqdm(docs):
    src = doc.metadata['source']
    url = src.replace('rtdocs/', 'https://').replace('index.html', '').replace('.html', '')
    if not url in urls:
        continue

    documents.append(doc)

len(documents), documents[0]

  0%|          | 0/171 [00:00<?, ?it/s]

CPU times: user 6.7 ms, sys: 3.82 ms, total: 10.5 ms
Wall time: 9.84 ms


(3,
 Document(page_content="### Add to wishlist Follow ADD TO CART Waitlist 0 Log in Sign up Language Currency Interests Locations * * * Mastercard.com About Priceless Contact us ADVANCED SEARCH All Experiences Interests Entertainment Arts and Culture Sports Culinary Travel More... Shopping Less... Locations Argentina Australia Austria Brazil Bulgaria More... Canada Chile China Colombia Croatia Czechia Fiji France Germany Greece Hong Kong India Indonesia Ireland Italy Japan Kenya Macau Malaysia Maldives Mexico Morocco Netherlands New Zealand Nigeria Puerto Rico Romania Serbia Singapore South Africa Spain Sweden Thailand Turkey United Arab Emirates United Kingdom United States Uruguay Clear Selection ADVANCED SEARCH ADVANCED SEARCH All Experiences Interests Entertainment Arts and Culture Sports Culinary Travel More... Shopping Less... Locations Argentina Australia Austria Brazil Bulgaria More... Canada Chile China Colombia Croatia Czechia Fiji France Germany Greece Hong Kong India Indon

## Model preparation

Download [`vicuna` model](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1), choosing between q5_0 and q5_1:

> q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two. 

- import libs

In [8]:
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain

- set up prompt template:

In [9]:
template = """
Question: {question}
Answer: 
"""

prompt = PromptTemplate(template=template, input_variables=["question"])

- create language chain using prompt template and q5_0 model:

In [18]:
%%time
MODEL_PATH = "../../models/ggml-vic13b-q5_0.bin"
llm = LlamaCpp(model_path=MODEL_PATH)
llm_chain = LLMChain(prompt=prompt, llm=llm)

CPU times: user 12.2 ms, sys: 34.9 ms, total: 47.1 ms
Wall time: 111 ms


llama.cpp: loading model from ../../models/ggml-vic13b-q5_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  85.08 KB
llama_model_load_internal: mem required  = 10583.26 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  =  800.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 


- run prompt:

In [19]:
%%time
question = "Can you recommend any entertainment in New York?"

llm_chain.run(question)

CPU times: user 4min 43s, sys: 10min 37s, total: 15min 20s
Wall time: 5h 2min 23s



llama_print_timings:        load time = 17435.00 ms
llama_print_timings:      sample time =    83.14 ms /   104 runs   (    0.80 ms per run)
llama_print_timings: prompt eval time = 52439.86 ms /    20 tokens ( 2621.99 ms per token)
llama_print_timings:        eval time = 18083661.28 ms /   103 runs   (175569.53 ms per run)
llama_print_timings:       total time = 18143542.49 ms


'\n| Main Course | Sub Categories |\n| --- | --- |\n| Museums | Art, History, Science & Technology, Natural History |\n| Broadway Shows | Musicals, Plays, Off-Broadway |\n| Sports | Baseball, Basketball, Football, Hockey, Soccer |\n| Food | American, Italian, Chinese, Japanese, Mexican, French, Indian |\n| Music | Jazz, Rock, Pop, Classical, Hip Hop, R&B |'

In [20]:
print('\n| Main Course | Sub Categories |\n| --- | --- |\n| Museums | Art, History, Science & Technology, Natural History |\n| Broadway Shows | Musicals, Plays, Off-Broadway |\n| Sports | Baseball, Basketball, Football, Hockey, Soccer |\n| Food | American, Italian, Chinese, Japanese, Mexican, French, Indian |\n| Music | Jazz, Rock, Pop, Classical, Hip Hop, R&B |')


| Main Course | Sub Categories |
| --- | --- |
| Museums | Art, History, Science & Technology, Natural History |
| Broadway Shows | Musicals, Plays, Off-Broadway |
| Sports | Baseball, Basketball, Football, Hockey, Soccer |
| Food | American, Italian, Chinese, Japanese, Mexican, French, Indian |
| Music | Jazz, Rock, Pop, Classical, Hip Hop, R&B |


| Main Course | Sub Categories |
| --- | --- |
| Museums | Art, History, Science & Technology, Natural History |
| Broadway Shows | Musicals, Plays, Off-Broadway |
| Sports | Baseball, Basketball, Football, Hockey, Soccer |
| Food | American, Italian, Chinese, Japanese, Mexican, French, Indian |
| Music | Jazz, Rock, Pop, Classical, Hip Hop, R&B |

- create language chain using prompt template and q5_1 model:

In [14]:
%%time
MODEL_PATH = "../../models/ggml-vic13b-q5_1.bin"
llm = LlamaCpp(model_path=MODEL_PATH)
llm_chain = LLMChain(prompt=prompt, llm=llm)

CPU times: user 12 ms, sys: 31.7 ms, total: 43.7 ms
Wall time: 110 ms


llama.cpp: loading model from ../../models/ggml-vic13b-q5_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  85.08 KB
llama_model_load_internal: mem required  = 11359.04 MB (+ 3216.00 MB per state)
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
llama_init_from_file: kv self size  =  800.00 MB


- run prompt:

In [16]:
%%time
question = "Can you recommend any entertainment in New York?"

llm_chain.run(question)

Llama.generate: prefix-match hit


CPU times: user 2min 54s, sys: 6min 39s, total: 9min 33s
Wall time: 1h 16min 7s



llama_print_timings:        load time = 19284.29 ms
llama_print_timings:      sample time =    45.58 ms /    61 runs   (    0.75 ms per run)
llama_print_timings: prompt eval time = 39975.31 ms /    15 tokens ( 2665.02 ms per token)
llama_print_timings:        eval time = 4523198.70 ms /    60 runs   (75386.65 ms per run)
llama_print_timings:       total time = 4567377.50 ms


'\n| No. | Recommendation |\n| --- | --- |\n| 1 | Broadway shows, museums, Central Park, Times Square, Empire State Building, Statue of Liberty, shopping, restaurants, comedy clubs, bars and pubs, nightlife |'

In [17]:
print('\n| No. | Recommendation |\n| --- | --- |\n| 1 | Broadway shows, museums, Central Park, Times Square, Empire State Building, Statue of Liberty, shopping, restaurants, comedy clubs, bars and pubs, nightlife |')


| No. | Recommendation |
| --- | --- |
| 1 | Broadway shows, museums, Central Park, Times Square, Empire State Building, Statue of Liberty, shopping, restaurants, comedy clubs, bars and pubs, nightlife |


| No. | Recommendation |
| --- | --- |
| 1 | Broadway shows, museums, Central Park, Times Square, Empire State Building, Statue of Liberty, shopping, restaurants, comedy clubs, bars and pubs, nightlife |

## Generating Embeddings

As q5_1 model gives a better answer to question `Can you recommend any entertainment in New York?`, let us it to generate embeddings.

In [21]:
%%time
from langchain.embeddings import LlamaCppEmbeddings
MODEL_PATH = "../../models/ggml-vic13b-q5_1.bin"
llama_embeddings = LlamaCppEmbeddings(model_path=MODEL_PATH)

CPU times: user 10.7 ms, sys: 10.8 ms, total: 21.5 ms
Wall time: 86.6 ms


llama.cpp: loading model from ../../models/ggml-vic13b-q5_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  85.08 KB
llama_model_load_internal: mem required  = 11359.04 MB (+ 3216.00 MB per state)
llama_init_from_file: kv self size  =  800.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 


In [22]:
%%time
text = "This is a test document."

query_result = llama_embeddings.embed_query(text)

CPU times: user 6.83 s, sys: 7.05 s, total: 13.9 s
Wall time: 18.7 s



llama_print_timings:        load time = 18721.40 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 18720.98 ms /     7 tokens ( 2674.43 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 18723.87 ms


In [24]:
%%time
doc_result = llama_embeddings.embed_documents([text])

CPU times: user 7.04 s, sys: 7.15 s, total: 14.2 s
Wall time: 19.9 s



llama_print_timings:        load time = 18721.40 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 19905.06 ms /     7 tokens ( 2843.58 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 19908.13 ms


## Example Query Supported by a Document Based Knowledge Source

Example document query using the example from the [`langchain` docs](https://python.langchain.com/en/latest/use_cases/question_answering.html).

The idea is to run the query against a document source to retrieve some relevant context, and use that as part of the prompt context.

Now let's try with some source documents.

In [25]:
%%time
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
len(texts), texts[0]

CPU times: user 2.51 ms, sys: 568 µs, total: 3.08 ms
Wall time: 3.88 ms


(60,
 Document(page_content='### Add to wishlist Follow ADD TO CART Waitlist 0 Log in Sign up Language Currency Interests Locations * * * Mastercard.com About Priceless Contact us ADVANCED SEARCH All Experiences Interests Entertainment Arts and Culture Sports Culinary Travel More... Shopping Less... Locations Argentina Australia Austria Brazil Bulgaria More... Canada Chile China Colombia Croatia Czechia Fiji France Germany Greece Hong Kong India Indonesia Ireland Italy Japan Kenya Macau Malaysia Maldives Mexico Morocco', metadata={'source': 'https://www.priceless.com/m/filter/options/category/506'}))

In [26]:
%%time
from langchain.vectorstores import Chroma

docsearch = Chroma.from_documents(texts, llama_embeddings, persist_directory='../data/vicuna')
len(texts), docsearch

INFO:chromadb.telemetry.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
INFO:chromadb:Running Chroma using direct local API.
INFO:clickhouse_connect.driver.ctypes:Successfully imported ClickHouse Connect C data optimizations
INFO:clickhouse_connect.driver.ctypes:Successfully import ClickHouse Connect C/Numpy optimizations
INFO:clickhouse_connect.json_impl:Using orjson library for writing JSON byte strings
INFO:chromadb.db.duckdb:No existing DB found in ../data/vicuna, skipping load
INFO:chromadb.db.duckdb:No existing DB found in ../data/vicuna, skipping load

llama_print_timings:        load time = 18721.40 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 302484.61 ms /   114 tokens ( 2653.37 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 302547.35 ms

llama

CPU times: user 2h 26min 12s, sys: 2h 15min 11s, total: 4h 41min 24s
Wall time: 18h 8min 21s



llama_print_timings:        load time = 18721.40 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 400448.93 ms /    95 tokens ( 4215.25 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 400495.80 ms


(60, <langchain.vectorstores.chroma.Chroma at 0x135d27b80>)

In [27]:
%%time
from langchain.chains import RetrievalQA

MIN_DOCS = 1

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff",
                                 retriever=docsearch.as_retriever(search_kwargs={"k": MIN_DOCS}))

CPU times: user 531 µs, sys: 2.52 ms, total: 3.05 ms
Wall time: 6.85 ms


What do we get in response to our original query now?

In [28]:
%%time
question = "Entertainment in New York"

print(question)
qa.run(question)

Entertainment in New York



llama_print_timings:        load time = 18721.40 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 19865.81 ms /     7 tokens ( 2837.97 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 19870.24 ms
Llama.generate: prefix-match hit


CPU times: user 8min 41s, sys: 16min 47s, total: 25min 29s
Wall time: 46min 41s



llama_print_timings:        load time = 17435.00 ms
llama_print_timings:      sample time =   100.11 ms /   134 runs   (    0.75 ms per run)
llama_print_timings: prompt eval time = 484905.52 ms /   194 tokens ( 2499.51 ms per token)
llama_print_timings:        eval time = 2287258.68 ms /   133 runs   (17197.43 ms per run)
llama_print_timings:       total time = 2781126.80 ms


' Some of the popular entertainment options in New York City include Broadway shows, concerts at Madison Square Garden or the Barclays Center, comedy clubs like the Comedy Cellar, and sporting events at Yankee Stadium or Citi Field. Additionally, there are numerous museums and art galleries to explore throughout the city, such as the Metropolitan Museum of Art or the Museum of Modern Art. For those interested in outdoor activities, Central Park offers a variety of walking trails, gardens, and sports facilities, while the High Line is an elevated park built on an old railway line that provides stunning views of the city skyline.'

## Comparison with OpenAI Embeddings

In [72]:
import os
from dotenv import load_dotenv
from langchain.embeddings import OpenAIEmbeddings

load_dotenv('../.env', override=True)
openai_embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

In [73]:
%%time
from langchain.vectorstores import Chroma

docsearch2 = Chroma.from_documents(texts, openai_embeddings, persist_directory='../data/openai')
len(texts), docsearch2

INFO:chromadb.telemetry.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
INFO:chromadb:Running Chroma using direct local API.
INFO:chromadb.db.duckdb:No existing DB found in rtdocs/openai, skipping load
INFO:chromadb.db.duckdb:No existing DB found in rtdocs/openai, skipping load
INFO:chromadb.db.duckdb:PersistentDuckDB del, about to run persist
INFO:chromadb.db.duckdb:Persisting DB to disk, putting it in the save folder: db


CPU times: user 325 ms, sys: 124 ms, total: 448 ms
Wall time: 6.13 s


(60, <langchain.vectorstores.chroma.Chroma at 0x12f8bf340>)

In [86]:
from langchain.llms import OpenAI

qa2 = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff",
                                 retriever=docsearch2.as_retriever(search_kwargs={"k": MIN_DOCS}))

In [92]:
%%time

print(question)
qa2.run(question)

Entertainment in New York
CPU times: user 5.38 ms, sys: 7.78 ms, total: 13.2 ms
Wall time: 1.32 s


' Entertainment in New York includes Broadway shows, comedy clubs, music venues, museums, and more.'

In [89]:
from langchain.llms import OpenAI
from langchain.vectorstores import Pinecone
import pinecone

PINECONE_API_KEY = os.environ.get("PINECONE_API_KEY")
PINECONE_ENVIRONMENT = os.environ.get("PINECONE_ENVIRONMENT")
PINECONE_INDEX = os.environ.get("PINECONE_INDEX")
PINECONE_NAME_SPACE = os.environ.get("PINECONE_NAME_SPACE")

pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)
index = pinecone.Index(PINECONE_INDEX)
print(index.describe_index_stats())

test_cone = Pinecone.from_documents(documents=[],
                                    embedding=openai_embeddings,
                                    index_name=PINECONE_INDEX)

qa3 = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff",
                                 retriever=test_cone.as_retriever(search_kwargs={"k": 3}))

{'dimension': 1536,
 'index_fullness': 0.1,
 'namespaces': {'priceless-docs': {'vector_count': 201558},
                'priceless-docs-v2': {'vector_count': 174012},
                'priceless-docs-v3': {'vector_count': 148815}},
 'total_vector_count': 524385}


In [91]:
%%time

print(question)
qa3.run(question)

Entertainment in New York
CPU times: user 7.04 ms, sys: 1.97 ms, total: 9.02 ms
Wall time: 3.09 s


'\n\nNew York City is known for its vibrant entertainment scene. There are countless theaters, music venues, comedy clubs, and other entertainment options to choose from. From Broadway shows to stand-up comedy to live music, there is something for everyone in New York.'

We've tried to use Vicuna (q5_1), GPT4ALL and OpenAI to generate embeddings for 3 HTML files which are returned by Priceless Chatbot when someone asks "Entertainment in New York". These files are split into 60 chunks with a chunk size of 500 tokens. Then we ask all models to use the generated embeddings vectors to run RetrievalQA on the same question. For comparison, we also tried to use OpenAI + priceless-docs-v3 pre-generated embeddings vectors which are stored at Pinecone to run the same query. At last, we append the answer from Priceless Chatbot.

| Approach | Time to generate embeddings | Time to run query | Query result |
| --- | --- | --- | --- |
| Vicuna + Chroma (local) | 1088m 21.3s | 46m 41s | ' Some of the popular entertainment options in New York City include Broadway shows, concerts at Madison Square Garden or the Barclays Center, comedy clubs like the Comedy Cellar, and sporting events at Yankee Stadium or Citi Field. Additionally, there are numerous museums and art galleries to explore throughout the city, such as the Metropolitan Museum of Art or the Museum of Modern Art. For those interested in outdoor activities, Central Park offers a variety of walking trails, gardens, and sports facilities, while the High Line is an elevated park built on an old railway line that provides stunning views of the city skyline.' |
| GPT4ALL + Chroma (local) | 95m 6.4s | 10m 59.2s | ' You may enjoy a variety of entertainment options such as Broadway shows, concerts and performances at Lincoln Center, theaters across town, or comedy clubs throughout the city. Additionally, there are many museums and galleries to visit for cultural experiences.' |
| OpenAI + Chroma (local) | 6.1s | 1.3s | ' Entertainment in New York includes Broadway shows, comedy clubs, music venues, museums, and more.' |
| OpenAI + Pinecone (remote) | N/A | 3.1s | '\n\nNew York City is known for its vibrant entertainment scene. There are countless theaters, music venues, comedy clubs, and other entertainment options to choose from. From Broadway shows to stand-up comedy to live music, there is something for everyone in New York.' |
| OpenAI/GPT-4 + Pinecone (remote) | N/A | N/A | 'Entertainment in New York includes an ultra-glamorous, intimate 150-seat theater that showcases talent across various forms of entertainment such as magic, music, comedy, and Broadway cabarets. Additionally, there is an elegant and lively restaurant and bar called Hidden Leaf, as well as the Midnight Cafe, a cocktail bar with a beverage program directed by Giuseppe Santochirico.' |