### Problem Statement

The Indian Penal Code (IPC) is the primary criminal law framework of India, containing numerous sections, definitions, and legal provisions. However, understanding and navigating the IPC can be challenging for students, legal professionals, and the general public due to its length, technical language, and complex structure.

There is a need for an intelligent, user-friendly system that enables users to easily search, interpret, and understand IPC provisions without manually scanning through the entire document. Traditional keyword search methods are often insufficient for answering contextual or conversational queries related to specific sections, punishments, or legal terms.

To address this challenge, the objective of this project is to develop an AI-powered chatbot that can interact with the official Indian Penal Code document. The chatbot should:

Allow users to ask natural language questions about IPC sections and legal concepts.

Retrieve relevant provisions from the IPC document.

Provide clear, concise, and accurate explanations.

Support multi-turn conversations for deeper understanding.

The proposed solution aims to simplify access to legal knowledge and make the Indian Penal Code more approachable for students, lawyers, researchers, and citizens seeking legal information.

In [1]:
!pip install langchain langchain_community langchain-openai langchain_text_splitters
!pip install "unstructured[all-docs]"


Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-openai
  Downloading langchain_openai-1.1.10-py3-none-any.whl.metadata (3.1 kB)
Collecting langchain_text_splitters
  Downloading langchain_text_splitters-1.1.1-py3-none-any.whl.metadata (3.3 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain_community)
  Downloading langchain_classic-1.0.1-py3-none-any.whl.metadata (4.2 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain_community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain_community)
  Downloading marshmallow-3.26.2-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7.0,>=0.6.7->langchain_community)
  Downl

In [1]:
import os
from google.colab import userdata
ok=userdata.get('openai')
os.environ['OPENAI_API_KEY']=ok


In [2]:
# 1 step document loader
from langchain_community.document_loaders import PyPDFLoader

doc=PyPDFLoader("/content/THE_INDIAN_PENAL_CODE.pdf")
document=doc.load()

In [3]:
# Step 2: Extract text from all pages
full_text = ""

for doc in document:
    full_text += doc.page_content + "\n"

# Step 3: Print first 1000 characters
print("First 1000 characters:\n")
print(full_text[:1000])

# Step 4: Count lines, words, and characters
num_lines = len(full_text.split("\n"))
num_words = len(full_text.split())
num_characters = len(full_text)

print("\nText Statistics:")
print("Number of lines:", num_lines)
print("Number of words:", num_words)
print("Number of characters:", num_characters)

First 1000 characters:

1 
 
THE INDIAN PENAL CODE 
___________ 
ARRANGEMENT OF SECTIONS  
__________ 
CHAPTER I  
INTRODUCTION  
PREAMBLE 
SECTIONS 
1. Title and extent of operation of the Code.  
2. Punishment of offences committed within India.  
3. Punishment of offences committed beyond, but which by law may be tried within, India. 
4. Extension of Code to extra-territorial offences. 
5. Certain laws not to be affected by this Act. 
CHAPTER II 
GENERAL EXPLANATIONS 
6. Definitions in the Code to be understood subject to exceptions.  
7. Sense of expression once explained.  
8. Gender. 
9. Number. 
10. “Man”.  “Woman”.  
11. “Person”. 
12.  “Public”.  
13. [Omitted .]. 
14. “Servant of Government”.  
15. [Repealed. ]. 
16. [Repealed .] . 
17. “Government”.  
18. “India”.  
19. “Judge”.  
20. “Court of Justice”.  
21. “Public  servant”.  
22. “Moveable property”.  
23. “Wrongful gain”. 
“Wrongful loss”. 
Gaining wrongfully/ Losing wrongfully. 
24.  “Dishonestly”.  
25. “Fraudulently

In [4]:
# 2 text-splitter -chunkings
from langchain_text_splitters import RecursiveCharacterTextSplitter

rts=RecursiveCharacterTextSplitter(separators=["\n\n","\n"," ",],chunk_size=500,chunk_overlap=10)

chunks=rts.split_documents(document)

In [5]:
print(len(chunks))

1064


In [6]:
pip install chromadb

[31mERROR: Operation cancelled by user[0m[31m
[0mTraceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/commands/install.py", line 447, in run
    conflicts = self._determine_conflicts(to_install)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/commands/install.py", line 578, in _determine_conflicts
    return check_install_conflicts(to_install)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/operations/check.py", line 101, in check_install_conflicts
    package_set, _

KeyboardInterrupt: 

In [30]:
!pip install langchain langchain_community langchain-openai langchain_classic



In [7]:
# 3 and 4 -- embdedding ans storing
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embed=OpenAIEmbeddings(model="text-embedding-3-small")

vdb=Chroma.from_documents(documents=chunks,embedding=embed,persist_directory="db")
vdb.persist()

  vdb.persist()


In [8]:
retriver=vdb.as_retriever(search_kwargs={"k":3})

In [9]:
query = """
What is the section related to take part in an unlawful assembly or riot.
"""

In [12]:
result = retriver.invoke(query)

In [13]:
result

[Document(metadata={'creationdate': '2019-04-08T12:45:27-07:00', 'creator': 'Microsoft® Word 2010', 'page_label': '4', 'moddate': '2019-04-08T12:45:27-07:00', 'author': 'Admin', 'page': 3, 'source': '/content/THE_INDIAN_PENAL_CODE.pdf', 'total_pages': 112, 'producer': 'Microsoft® Word 2010'}, page_content='149. Every member of unlawful assembly guilty of offence committed in prosecution of common object. \n150. Hiring, or conniving at hiring, of persons to join unlawful assembly.  \n151. Knowingly joining or continuing in assembly of five or more persons after it has been commanded to disperse. \n152. Assaulting or obstructing public servant when suppressing riot, etc. \n153. Wantonly giving provocation, with intent to cause riot— \nif rioting be committed; if not committed.'),
 Document(metadata={'producer': 'Microsoft® Word 2010', 'creationdate': '2019-04-08T12:45:27-07:00', 'creator': 'Microsoft® Word 2010', 'page': 39, 'total_pages': 112, 'page_label': '40', 'moddate': '2019-04-08T

In [14]:
def f1(data):
  l=[]
  for y in data:
    l.append(y.page_content)

  return "\n\n".join(l)

In [15]:
from langchain_core.runnables import RunnableLambda,RunnableParallel,RunnablePassthrough,RunnableSequence

In [16]:
r2=RunnableLambda(f1)

In [17]:
chain1=RunnableSequence(retriver,r2)

In [62]:
from langchain_core.runnables import (
    RunnableLambda,
    RunnableParallel,
    RunnablePassthrough,
    RunnableSequence
)
from operator import itemgetter

In [54]:
from langchain_core.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
    MessagesPlaceholder
)

In [56]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Retriever chain
retrieval_chain = retriver | RunnableLambda(f1)


In [75]:
# Parallel input preparation
rag_input_chain = RunnableParallel({
    "context": itemgetter("question") | retrieval_chain,
    "question": itemgetter("question")
}).assign(
    chat_history=itemgetter("chat_history")
)

In [71]:
# Prompt
prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(
        "You are an AI-powered legal assistant specialized in the Indian Penal Code (IPC). "
        "Use the provided context to answer clearly and accurately."
    ),
    MessagesPlaceholder(variable_name="chat_history"),
    HumanMessagePromptTemplate.from_template(
        """Answer the question based on the below context.
If context is missing, say 'I don't know'.

Context:
{context}

Question:
{question}"""
    )
])

# LLM
model = ChatOpenAI(model="gpt-5.2-2025-12-11")

parser = StrOutputParser()

rag_chain = RunnableSequence(rag_input_chain, prompt, model, parser)


In [72]:

from langchain_classic.memory import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Store session histories
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

rag_with_memory = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="chat_history"
)


In [74]:
# 8️⃣ Chatbot Loop
# ==============================

session_id = "user1"

print("\nIndian Penal Code Chatbot Ready!")
print("Type 'exit' to stop.\n")

while True:
    user_input = input("You: ")

    if user_input.lower() == "exit":
        break

    response = rag_with_memory.invoke(
        {"question": user_input},
        config={"configurable": {"session_id": session_id}}
    )

    print("Bot:", response)


Indian Penal Code Chatbot Ready!
Type 'exit' to stop.

You: exit
