# Developing the RAG model 

This script will load the FAA Engine Maintenance manual, chunk them and store them in a vector database for retrieval 

In [6]:
import os
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader

openai_api_key = os.environ["OPENAI_API_KEY"]



### Step 1: Loading the documents 

In [7]:
loader = PyPDFLoader("docs/FAA_Engine-Maintainance.pdf")

docs = loader.load()

In [8]:
docs[0]

Document(metadata={'producer': 'Adobe PDF Library 15.0', 'creator': 'Adobe InDesign CC 13.1 (Windows)', 'creationdate': '2023-07-11T10:37:28-05:00', 'author': 'FAA', 'moddate': '2023-07-13T09:49:40-04:00', 'title': 'AMT Handbook - Powerplant (FAA-H-8083-32B)', 'source': 'docs/FAA_Engine-Maintainance.pdf', 'total_pages': 65, 'page': 0, 'page_label': '1'}, page_content='10-1\nReciprocating Engine Overhaul\nBoth maintenance and complete engine overhauls are \nperformed normally at specified intervals. This interval is \nusually governed by the number of hours the powerplant \nhas been in operation. The actual overhaul period for a \nspecific engine is generally determined by the manufacturer’s \nrecommendations. Each engine manufacturer sets a total \ntime in service when the engine should be removed from \nservice and overhauled. Depending upon how the engine \nis used in service, the overhaul time can be mandatory. The \noverhaul time is listed in hours and is referred to as time \nbefo

### Step 2 : Splitting

We chose recursive character text splitting because we wanted to preserve the chunks in a semantical meaning. Chunk size is selected to be 800 As a rule of thumb for technical documents with an overlap of 15-20% which is around 150

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

rc_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=800,
    chunk_overlap=150
)

splitted_docs = rc_splitter.split_documents(docs)

In [10]:
# print(len(splitted_docs[0].page_content))
# print(len(splitted_docs[1].page_content))
# print(splitted_docs[0].page_content)
# print(splitted_docs[1].page_content)


### Step 3 : Storage and retrieval 

We chose to embed the vectors using the text-embedding-3-small model from OpenAI and store it locally in ChromaDB

In [12]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma


embedding_function = OpenAIEmbeddings(api_key=openai_api_key, model="text-embedding-3-small")

vector_store = Chroma.from_documents(docs, embedding=embedding_function, persist_directory="../../../Documents/ChromaDB")

In [17]:
# vector_store._collection.count()
vector_store._collection.get(
    include=['metadatas']
)

{'ids': ['032e072c-745a-4c36-922c-1d59c6871117',
  '2d20b84a-8b8f-4576-aa5e-9b1f45e9fda2',
  '2d639e29-f924-48fe-962e-85106898f0c2',
  'f7a47c18-9925-49bf-83cd-c4d67ba4147d',
  '32b79ed8-0033-4b33-9cd3-8b1baedaabf8',
  '931569eb-b693-4cbc-b04b-6fa87cba9402',
  '586eca77-84fa-41e2-8815-efdd8ebdf5a4',
  '3767cf44-4c37-4d25-b5bc-fca3eb1eade4',
  '37871dff-bfd0-4462-960a-477fc2a53a0d',
  '6773a20b-74e4-4796-8de3-618f6c4fb2ec',
  '12c83087-7353-451c-8a3c-45349663205c',
  '35623af5-e669-4c8d-bcde-9bb9d377738c',
  '1757e3b4-b1a7-44fc-86ad-e4a975c6a2cf',
  '781d946f-6c37-430b-b439-2879aa3afbc3',
  '0a15593d-8210-44d5-af61-e1b1f04209d4',
  '41db8882-c550-4226-a1ec-16667d3456c4',
  'd103ae6f-5896-4eca-bfbd-c09e299bfda1',
  '5ef346d4-386b-44cf-a849-b8f711ec4858',
  '351c5fa8-3524-4674-922e-2398a0161c0f',
  'ce0d05f6-2cb1-4919-be8e-fd2af9279844',
  'd7f94512-8130-477f-ac45-40be98b096c6',
  '140677fe-0bdb-48ce-a9c8-bc44039ba10a',
  '6c662dad-7a9a-43b1-9b71-7440fedc6197',
  '5864585e-1403-4ac9-9a8c-

In [None]:
# Using the LangChain  and Retriever function 
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={'k':2})

# Designing the RAG prompt 

In [33]:
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model='gpt-4o-mini')

message = """
Assume you are a professional aircraft technician with years of experience. Based on the documents provided below, answer the query.

Documents:
{documents}

Query:
{query}

Answer:
"""

prompt_template = ChatPromptTemplate.from_messages([('human', message)])



In [36]:
from langchain_core.runnables import RunnablePassthrough

rag_chain = ( {"documents": retriever, "query": RunnablePassthrough()} | prompt_template | llm )

response = rag_chain.invoke("How to Inspect a turbin fan blade?")

print(response.content)

To inspect a turbine fan blade, follow these guidelines based on the information from the provided documents:

1. **Visual Inspection**:
   - Begin with a thorough visual inspection of the turbine blades for any visible damage, including cracks, nicks, or deformation.
   - Pay special attention to the leading and trailing edges, as these are critical areas.

2. **Check for Nicks and Damage**:
   - Nicks deeper than 0.008 inches are grounds for rejection. If nicks exceed 0.006 inches but do not go over 0.012 inches and are away from the leading or trailing edges, they may be acceptable.
   - Nicks at the outer tip end are generally less critical than those near the root.

3. **Leading and Trailing Edge Repairs**:
   - Nicks on the leading edge must be completely blended out; if too much material needs to be removed, reject the blade.
   - Nicks on the trailing edge also must be removed fully, as this is considered a critical area.

4. **Crack Examination**:
   - Carefully examine the ed