# trying different methods of using LLM with Embeddings

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
asdf = load_dotenv(find_dotenv()) # read local .env file

In [2]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown


In [3]:
file = 'drivers_full.csv'
loader = CSVLoader(file_path=file,encoding='utf-8')

In [4]:
from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [5]:
queries = ["Tell me the youngest driver.","Tell me every driver that is German","how many german drivers are?", "Tell me the oldest driver.", "which driver uses the number 14?", "which driver has the oldest birthdate?"]
responses = list(map(lambda query:index.query(query),queries))
[display(Markdown(r)) for r in responses]

 The youngest driver is Pascal Wehrlein, born in 1994.

 Bernd Schneider, Pascal Wehrlein, Ralf Schumacher, and Manfred Winkelhock are all German drivers.

 4

 The oldest driver is Peter Walker, with a date of birth of 1912-10-07.

 David Coulthard.

 Ron Flockhart, with a birthdate of 1923-06-16.

[None, None, None, None, None, None]

In [6]:
docs = loader.load()

In [7]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [8]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [9]:
for query in queries:
    print(query)
    print(db.similarity_search(query))
    print()


Tell me the youngest driver.
[Document(page_content='driverId: 845\nname: sirotkin\nnumber: 35\ncode: SIR\nforename: Sergey\nsurname: Sirotkin\ndate of birth: 1995-08-27\nnationality: Russian\nurl: http://en.wikipedia.org/wiki/Sergey_Sirotkin_(racing_driver)', metadata={'source': 'drivers_full.csv', 'row': 843}), Document(page_content='driverId: 836\nname: wehrlein\nnumber: 94\ncode: WEH\nforename: Pascal\nsurname: Wehrlein\ndate of birth: 1994-10-18\nnationality: German\nurl: http://en.wikipedia.org/wiki/Pascal_Wehrlein', metadata={'source': 'drivers_full.csv', 'row': 835}), Document(page_content='driverId: 12\nname: piquet_jr\nnumber: \ncode: PIQ\nforename: Nelson\nsurname: Piquet Jr.\ndate of birth: 1985-07-25\nnationality: Brazilian\nurl: http://en.wikipedia.org/wiki/Nelson_Piquet,_Jr.', metadata={'source': 'drivers_full.csv', 'row': 11}), Document(page_content='driverId: 126\nname: toshio_suzuki\nnumber: \ncode: \nforename: Toshio\nsurname: Suzuki\ndate of birth: 1955-03-10\nnatio

In [10]:
retriever = db.as_retriever()

In [11]:
llm = ChatOpenAI(temperature = 0.0)

In [12]:
if file == 'drivers.csv':
    qdocs = "".join([docs[i].page_content for i in range(len(docs))])

    for query in queries:
        print(query)
        response = llm.call_as_llm(f"{qdocs} Question: {query}") 
        display(Markdown(response))


In [13]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

for query in queries:
    print(query)
    response = qa_stuff.run(query)
    display(Markdown(response))


Tell me the youngest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I'm sorry, I don't have enough information to answer that question. I have information about the drivers' birth dates, but I don't have their ages or the date of the race.

Tell me every driver that is German


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


The drivers that are German are:
- Josef Peters
- Bernd Schneider
- Pascal Wehrlein
- Ralf Schumacher

how many german drivers are?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


There are four German drivers listed in the given context.

Tell me the oldest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I'm sorry, but I don't have enough information to determine the oldest driver. The birth dates of the drivers are provided, but I would need to calculate their ages to determine who is the oldest.

which driver uses the number 14?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I'm sorry, but none of the drivers listed uses the number 14.

which driver has the oldest birthdate?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Bob Anderson has the oldest birthdate, which is May 19, 1931.

In [14]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="map_reduce", 
    retriever=retriever, 
    verbose=True
)

for query in queries:
    print(query)
    response = qa_stuff.run(query)
    display(Markdown(response))

Tell me the youngest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I'm sorry, but I cannot determine the youngest driver as the given portion of the document does not provide information about the age or birthdate of any drivers.

Tell me every driver that is German


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


The drivers that are German are:
- Josef Peters
- Pascal Wehrlein
- Sebastian Vettel
- Nico Hulkenberg
- Nico Rosberg
- Adrian Sutil
- Michael Schumacher
- Bernd Schneider
- Ralf Schumacher

how many german drivers are?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I'm sorry, but I cannot provide an answer to your question as none of the given portions of the document provide information about the total number of German drivers.

Tell me the oldest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I'm sorry, I cannot provide a definitive answer as there are conflicting statements in the extracted parts of the document. One statement says that Paddy Driver is the oldest driver with a birthdate of 1934-05-13, while another statement says that Josef Peters is the oldest driver with a birthdate of 1914-09-16. The third statement says that there is not enough information to determine the oldest driver, and the fourth statement mentions a birthdate of 1934-04-24 but does not specify if this driver is the oldest.

which driver uses the number 14?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


There is no information in the given portion of the document about which driver uses the number 14.

which driver has the oldest birthdate?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


There are multiple answers to this question based on the given information:
- If we only consider the provided portion of the document, it is not possible to determine which driver has the oldest birthdate.
- If we consider a different source of information, the driver with the oldest birthdate is Paddy Driver, born on May 13, 1934, or Bob Anderson, born on May 19, 1931.

In [15]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="refine", 
    retriever=retriever, 
    verbose=True
)

for query in queries:
    print(query)
    response = qa_stuff.run(query)
    display(Markdown(response))

Tell me the youngest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


The new context is not useful for answering the original question about the youngest driver. The original answer of Pascal Wehrlein being the youngest driver still stands.

Tell me every driver that is German


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Thank you for providing additional context. Based on the new information, there is another driver mentioned who is German: Ralf Schumacher.

how many german drivers are?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


I apologize for the confusion. Based on the new context provided, I still don't have access to information about the total number of German drivers. The context only provides details about one specific German driver named Kurt Kuhnke.

Tell me the oldest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Based on the new context, the oldest driver is Mike Taylor, who was born on April 24, 1934.

which driver uses the number 14?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Based on the new context, the driver with the number 14 is not mentioned. Therefore, the original answer still stands.

which driver has the oldest birthdate?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Based on the new context, the original answer needs to be updated. The driver with the oldest birthdate is Bob Anderson, born on May 19, 1931.

In [16]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type='map_rerank', 
    retriever=retriever, 
    verbose=True
)

for query in queries:
    print(query)
    response = qa_stuff.run(query)
    display(Markdown(response))

Tell me the youngest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Pascal Wehrlein was born on October 18, 1994, making him the youngest driver listed in this context.

Tell me every driver that is German


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Bernd Schneider

how many german drivers are?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


This document does not answer the question

Tell me the oldest driver.


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Paddy Driver, born on May 13, 1934.

which driver uses the number 14?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


No driver in this context uses the number 14.

which driver has the oldest birthdate?


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


Wilson Fittipaldi

In [17]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

In [18]:
for query in queries:
    print(query)
    response = index.query(query, llm=llm)
    print(response)

Tell me the youngest driver.
I'm sorry, I don't have enough information to answer that question. I need more context about the drivers you are referring to.
Tell me every driver that is German
The drivers that are German are:
- Josef Peters
- Bernd Schneider
- Pascal Wehrlein
- Ralf Schumacher
how many german drivers are?
Based on the given context, there are four German drivers mentioned.
Tell me the oldest driver.
The oldest driver is Peter Walker, born on October 7th, 1912.
which driver uses the number 14?
I'm sorry, but none of the drivers listed uses the number 14.
which driver has the oldest birthdate?
Bob Anderson has the oldest birthdate, which is May 19, 1931.
