# Ollama PDF RAG Notebook

## Import Libraries


In [38]:
# Imports
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Jupyter-specific imports
from IPython.display import display, Markdown

# Set environment variable for protobuf
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

## Load PDF

In [39]:
# Load PDF
local_path = "HRPolicy_Manual2023.pdf"
if local_path:
    loader = UnstructuredPDFLoader(file_path=local_path)
    data = loader.load()
    print(f"PDF loaded successfully: {local_path}")
else:
    print("Upload a PDF file")

PDF loaded successfully: HRPolicy_Manual2023.pdf


## Split text into chunks

In [40]:
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(data)
print(f"Text split into {len(chunks)} chunks")

Text split into 512 chunks


## Create vector database

In [41]:
# Create vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text"),
    collection_name="local-rag"
)
print("Vector database created successfully")

Vector database created successfully


## Set up LLM and Retrieval

In [42]:
# Set up LLM and retrieval
local_model = "llama3.2" 
llm = ChatOllama(model=local_model)

In [43]:
# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate 2
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

# Set up retriever
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

## Create chain

In [44]:
# RAG prompt template
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [45]:
# Create chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Chat with PDF

In [46]:
def chat_with_pdf(question):
    """
    Chat with the PDF using the RAG chain.
    """
    return display(Markdown(chain.invoke(question)))

In [47]:
#Example 1
chat_with_pdf("Can you explain how the different types of leave (e.g., earned, casual, half-pay, maternity/paternity) are structured and approved at IIMA?")

Based on the provided context, here's an explanation of the structure and approval process for different types of leave at IIMA:

1. **Half-Pay Leave**: The policy doesn't explicitly define "earned" or regular leave in this context, but it mentions half-pay leave as a separate type. This leave can be taken on medical grounds, with no maximum limit.
2. **Casual Leave**: Casual leave is not mentioned as one of the types of leave that requires approval or has specific rules governing its use.
3. **Maternity/Paternity Leave**: Maternity leave is available for 180 days immediately after the date of adoption (not necessarily childbirth). Paternity leave may be granted, but it's not explicitly linked to a specific duration or process in this context.
4. **Half-Pay Leave on Medical Grounds**: This type of leave can be taken without a medical certificate for up to 90 days during the entire service. If commuted leave is taken, twice the number of days availed should be debited in the half-pay leave account.
5. **Commuted Leave**: Commuted leave can be taken up to:
	* 90 days during the entire service without a medical certificate for an approved course of study certified to be in the public interest.
	* 60 days without a medical certificate for a female employee with less than two living children on the adoption of a child less than one-year-old.
6. **Child Adoption Leave**: This leave is available for 180 days immediately after the date of adoption, and during this period, the female employee will be paid leave salary equal to the pay drawn immediately before proceeding on leave.

It's worth noting that the policy doesn't provide detailed information about the approval process for these leaves. However, it does mention that certain types of leave (like child adoption leave) require an authority competent to grant leave.

In [48]:
# Example 2
chat_with_pdf("Explain Earned leave type.")

Earned Leave (EL) is a type of leave that an employee can apply for, subject to approval by their head of department. The key characteristics of Earned Leave are:

* It will be reduced by 1/10th of EOL availed and/or the period of * during the previous half year, subject to a maximum of 15 days.
* The EL application must be submitted through ESS (Employee Self Service) to the head of the department for approval, at least 15 days prior to the start of the leave.
* Credit for earned leave is allowed at a specific rate:
	+ For the half-year in which an employee is due to retire or resigns from service: two & half days per completed calendar month up to the date of retirement or resignation.
	+ For employees who are removed or dismissed from service: two & half days per completed calendar month up to the end of the calendar month preceding the calendar month in which they are removed or dismissed.

In [49]:
# Example 3
# chat_with_pdf("How are the petroleum products classified?")

In [50]:
# Example 4
# chat_with_pdf("What should be the minimum headroom under vessels,pipes,cable racks,etc.?")

In [51]:
# Example 5
# chat_with_pdf("Where should the High Tension sub-stations be located? ")

In [52]:
# Example 6
# chat_with_pdf("The Aggregate capacity of tanks located in one Dyked enclosure should not exceed? ")

In [53]:
# Example 7
# chat_with_pdf("Where is an area deemed to be hazardous? ")

In [54]:
# Example 8
# chat_with_pdf("On what aspects should information be collected to prepare a layout? ")

In [55]:
# Example 8
# chat_with_pdf("Explain firewalls in layout of storage tanks. ")

In [56]:
# Example 1
#chat_with_pdf("what should be the minimum slope of the return line from compressor back to reservoir?")

In [57]:
# Example 2
#hat_with_pdf("What are the major auxiliary equipments used in a compressor?")

In [58]:
# Example 3
chat_with_pdf("List the significant factors to be considered while deciding location of pumps?")

KeyboardInterrupt: 

In [22]:
# Example 4
chat_with_pdf("Which compressor has a  a top suction, top discharge configuration or side suction side discharge configuration?")

According to the provided context, it is mentioned that "Compressor may be electric driven or turbine driven (Gas/Steam). In case of electric driven compressors, it is necessary to have physical dimensions of motor and its cooling system if any to allocate space required in the layout for installation and maintenance."

There is no specific information about the compressor configuration provided. However, further in the text:

"Grade mounted horizontal split case compressor may have increased downtime and other complications during a general maintenance as the suction and discharge piping may have to be dismantled to remove the casing."

And also 

"In Vertical split case type, the maintenance space required for removal of casing shall be provided at the side."

It is mentioned that "Grade mounted horizontal split case compressor" has a configuration that requires:

* Increased downtime
* Suction and discharge piping dismantling during maintenance

But no top suction or side discharge are explicitly described.

In [23]:
# Example 5
chat_with_pdf("What API and/or ASME Standards are considered while piping is designed?")

According to the provided context, the following API and/or ASME Standards are considered when designing piping:

1. ASME B31.1 - Power Piping
2. ASME B31.3 - Process Piping (most commonly applied)
3. ASME B31.4 - Pipeline Transportation Systems for Liquids and Slurries
4. ASME B31.8 - Gas Transmission and Distribution Piping Systems
5. API RP-14E - Practice for Design and Installation of Offshore Production Platform Piping Systems (Applicable for Offshore applications)

In [24]:
# Example 6
chat_with_pdf("where are the block valves in pump suction lines located?")

According to the provided context, block valves in pump suction lines shall generally be full port and shall be located upstream of any strainer.

In [25]:
# Example 7
chat_with_pdf("How are the eccentric reducers installed to minimize cavitation??")

According to the document, eccentric reducers are preferred at pump suction and should be installed "TOP FLAT" (flat on top) to minimize cavitation issues.

In [26]:
# Example 8
chat_with_pdf("What shall the piping not obstruct??")

According to the text, the piping should not obstruct the following:

* Process flow requirements (specifically, gravity flow, no pockets, self-draining, slope, and relative elevation of equipment)
* Safety gates
* Handrails (there should be a clearance of at least 75mm between the outside of the handrail and the nearest object)

In general, piping should not obstruct walkways, platforms, ramps, or floors, but the specific wording of "obstruct" may vary depending on the context.

In [27]:
# Example 9
chat_with_pdf("What should be considered while placing a strainer in the pipeline?")

According to the provided text, while placing a strainer in the pipeline, the following considerations should be made:

1. Strainers can be mounted in Horizontal or vertical runs, but correct orientation is required.
2. Sufficient space should be allowed for removal of filter without having to remove additional equipment.
3. Strainers should not be orientated so that solids can fall back into process line.
4. Strainer should only be mounted in horizontal lines with cover facing upwards for access.

Additionally, it's mentioned that In Line Basket Strainers are designed for installation in horizontal lines and should be oriented such that the basket is facing downwards to allow for easy cleaning.

In [28]:
# Example 10
chat_with_pdf("What are the significant factors that shall be accounted while locating exchangers in a plant layout?")

Based on the provided text, here are some significant factors to consider when locating exchangers in a plant layout:

1. **Accessibility and Space**: Ensure that the location of exchangers allows for easy access, maintenance, and repair.
2. **Temperature and Pressure Drop**: Consider the temperature and pressure drop across the exchanger to ensure that it can handle the required flow rates and pressures.
3. **Heat Transfer Area**: Choose a location with sufficient heat transfer area to achieve the desired heat exchange performance.
4. **Corrosion Protection**: Select a location that provides adequate protection from corrosion, considering factors such as exposure to chemicals, moisture, and extreme temperatures.
5. **Drainage and Flooding**: Ensure that the location has adequate drainage to prevent flooding and ensure safe operation of the exchanger.
6. **Fire Safety and Sprinkler Systems**: Locate exchangers in an area that is not directly adjacent to fire zones or areas with sprinkler systems, while also ensuring that emergency shutdown procedures are easily accessible.
7. **Electrical and Instrumentation Access**: Choose a location that provides convenient access to electrical power supplies and instrumentation for monitoring and control of the exchanger.
8. **Cable Management and Routing**: Consider the routing of cables and piping to minimize interference and ensure efficient operation of the facility.
9. **Noise and Vibration**: Select a location that minimizes noise and vibration from adjacent equipment or operations, ensuring a safe working environment.
10. **Regulatory Compliance**: Ensure that the chosen location complies with relevant local, national, and international regulations and standards.

By carefully considering these factors, plant designers and operators can optimize the layout of exchangers to achieve efficient, safe, and reliable operation within the plant.

## Clean up (optional)

In [29]:
# # Optional: Clean up when done 
# vector_db.delete_collection()
# print("Vector database deleted successfully")