## Data Ingestion
Loading PDF and ingesting data


In [2]:
! pip install --q unstructured langchain 


In [8]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [10]:
data_path="../Data/F1RuleBook.pdf"
# If local path exists
if data_path:
    loader=PyPDFLoader(file_path=data_path)
    data=loader.load()
else:
    print("Path is invalid")

In [11]:
data[0].page_content

' \n  \n2022 Formula 1 Sporting Regulations  1/107 30 Septem ber 2022  \n©20 22 Fédération Internationale de l’Automobile   Issue 2  2023 FORMULA ONE SPORTING REGULATIONS  \nPUBLISHED ON 30 S EPTEMBER  2022  \nISSUE 2 \nConvention:  \nBlack text: As approved for 2022 by the WMSC up to and including 19/7/22  \nPink text: Changes for 2023 approved by the WMSC up to and including 19/7/22 \nPink highlighted text:  Changes for 2023 approved by the WMSC  on 27/9/22 \n \nART CONTENTS  PAGE  \n1 REGULATIONS  2 \n2 GENERAL UNDERTAKING  2 \n3 GENERAL CONDITIONS  2 \n4 LICENCES  3 \n5 CHAMPIONSHIP COMPE TITION S 3 \n6 WORLD CHAMPIONSHIP  4 \n7 DEAD HEAT  6 \n8 COMPETITORS APPLICATIONS  6 \n9 CAR LIVERY  7 \n10 TRACK RUNNING TIME OUTSIDE A  \nCOMPETITION  8 \n11 PROMOTER  13 \n12 ORGANISATION OF A COMPETITION  13 \n13 INSURANCE  13 \n14 FIA DELEGATES  14 \n15 OFFICIALS  14 \n16 INSTRUCTIONS AND COMMUNICATIONS  \n TO COMPETITORS  15 \n17 PROTESTS AND APPEALS  15 \n18 SANCTIONS  16 \n19 PRESS CONFER

## Vector Embeddings and Database configuration



### Ollama configs

In [12]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling ce4a164fc046... 100% ▕██████

In [13]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED     
nomic-embed-text:latest	0a109f422b47	274 MB	1 second ago	
mistral:latest         	f974a74358d6	4.1 GB	31 hours ago	


### Splitting and chunking the data

In [14]:
!pip install --q chromadb


In [15]:
!pip install --q langchain-text-splitters

In [16]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [17]:
# Split and chunk
text_splitter=RecursiveCharacterTextSplitter(chunk_size=7500,chunk_overlap=100) #one chunk is of 7500 size,  
chunks=text_splitter.split_documents(data)

In [18]:
print(chunks)



### Adding embeddings to vector DB

In [19]:
import inspect
import collections
print(inspect.signature(Chroma.from_documents))

(documents: 'List[Document]', embedding: 'Optional[Embeddings]' = None, ids: 'Optional[List[str]]' = None, collection_name: 'str' = 'langchain', persist_directory: 'Optional[str]' = None, client_settings: 'Optional[chromadb.config.Settings]' = None, client: 'Optional[chromadb.Client]' = None, collection_metadata: 'Optional[Dict]' = None, **kwargs: 'Any') -> 'Chroma'


In [21]:
vector_db=Chroma.from_documents(documents=chunks,embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),persist_directory="../VectorStore/",collection_name="ChatPDF_LocalLLM")

OllamaEmbeddings: 100%|██████████| 107/107 [21:16<00:00, 11.93s/it]


## Retrieval

In [22]:
from langchain.prompts import ChatPromptTemplate,PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [23]:
#LLM from Ollama
local_model="mistral"
llm=ChatOllama(model=local_model)

In [24]:
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from
a vector database. By generating multiple perspectives on the user question, your
goal is to help the user overcome some of the limitations of the distance-based
similarity search. Provide these alternative questions separated by newlines.
Original question: {question}""",)  # system prompt

In [25]:
retriever=MultiQueryRetriever.from_llm(vector_db.as_retriever(),llm,prompt=QUERY_PROMPT)

In [26]:
# RAG prompt
template="""Answer the question based ONLY on the following context:
{context}
Question:{question}
""" 

In [27]:
prompt=ChatPromptTemplate.from_template(template)

In [28]:
chain=(
    {"context":retriever,"question":RunnablePassthrough()}
    |prompt # RAG prompt
    |llm
    |StrOutputParser()
)

In [41]:
chain.invoke(input(""))  

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.71s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.13s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.14s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.14s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.18s/it]


' This document appears to be a section from the rules (Article) of the Fédération Internationale de l’Automobile (FIA) regarding sprint sessions or races, specifically focusing on safety car procedures and restarting a session or race in wet conditions. It details what drivers should do when track conditions are suitable or unsuitable to resume the sprint session or race from a standing start or rolling start, as well as the use of wet-weather tyres under certain circumstances.'

In [29]:
chain.invoke("What are the tyres specifications for a Formula 1 car?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:03<00:00,  3.22s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.51s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.45s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.43s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.41s/it]


" According to the provided document (F1RuleBook), the information regarding which tyre specifications will be made available by the appointed tyre supplier is not explicitly stated in the excerpt you've provided. However, it mentions that cars must use components of a specification that have been used in at least one race or TCC during the year preceding the year of the Championship. This suggests that the specifications are determined based on the tires used in the previous season.\n\nFor the mandatory dry-weather race tyre specifications, the document states that up to two (2) can be provided, but no specifics are given in this excerpt.\n\nFor wet-weather tyre tests, it is mentioned that they can be carried out using cars which were designed and constructed to comply with the Technical Regulations of any of the three calendar years falling immediately prior to the Championship, appropriately modified to fit 18” wheels (Mule Cars). This suggests that there might be multiple wet-weath

In [43]:
chain.invoke("What is the fine or penalty if team fails to use prescribed tyre set during qualifying?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.78s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.16s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.19s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.19s/it]


' According to the provided document, there is no explicit mention of a fine or penalty if a team fails to use the prescribed tyre set during qualifying. However, it is stated that all drivers must use the same number of laps per driver and the only dry weather tyres that may be used are those allocated under Article 30.1a)iii). If a team chooses not to follow these regulations, it could potentially impact their qualifying performance.'