## Data Ingestion 


In [5]:
import os
from dotenv import load_dotenv
from pathlib import Path
from typing import List, Dict, Any
from langchain_core.documents import Document
from langchain_text_splitters import(
    RecursiveCharacterTextSplitter,
    CharacterTextSplitter,
    TokenTextSplitter
)
from langchain_community.document_loaders import (
    TextLoader, 
    DirectoryLoader,
    PyPDFLoader,
    PyMuPDFLoader,
    UnstructuredPDFLoader
)
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from langchain_community.vectorstores import FAISS

from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.messages import HumanMessage, AIMessage
# from langchain_classic.chains import create_retrieval_chain
# from langchain_classic.chains.combine_documents import create_stuff_documents_chain

print("Setup Completed!")

Setup Completed!


### TextLoader from langchain.text_loaders to load data from text files.

In [6]:


loader = TextLoader("data/text_files/python_intro.txt", encoding = "utf-8")
documents = loader.load()
print(type(documents))
print(documents)

<class 'list'>
[Document(metadata={'source': 'data/text_files/python_intro.txt'}, page_content='Python Programming Introduction\n\nPython is a high-level, interpreted programming language known for its simplicity and readability.\nCreated by Guido van Rossum and first released in 1991, Python has become one of the most popular\nprogramming languages in the world.\n\nPython has various levels, I learnt python using the book "Byte of Python" in 2012 on Python 2, though this was a fantastic book my use of python\nremained confined to writing short scripts in DevOps and I never delved deeper in modular programming. With use of Jupyter notebooks its so different now. \nI remember one of my old scripts where I would take decisions based on IP address structure, where the logic behind each octet was different. I used python extensively on\nDELL iDRAC project to write a lot of middleware in Python, that used other libraries written in C to communicate with hardware. \nWith ML the use of Python

#### Load multiple text files from a directory and create Document objects for each file.

In [7]:
dir_loader=DirectoryLoader(
    "data/text_files",
    glob="**/*.txt",
    loader_cls = TextLoader,
    loader_kwargs = {'encoding': 'utf=8'},
    show_progress = True
)
documents=dir_loader.load()

print(f"Loaded {len(documents)} documents")
for i, doc in enumerate(documents):
    print(f"\nDocument {i+1}: ")
    print(f" Source: {doc.metadata['source']}")
    print(f" Length: {len(doc.page_content)} characters")

100%|██████████| 2/2 [00:00<00:00, 2985.27it/s]

Loaded 2 documents

Document 1: 
 Source: data/text_files/python_intro.txt
 Length: 1223 characters

Document 2: 
 Source: data/text_files/machine_learning.txt
 Length: 715 characters





In [8]:
# Here I am splitting the document using newline and checking the output. 
text = documents[0].page_content
print("Character Text Splitter")
char_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 200,
    chunk_overlap = 20,
    length_function = len
)

char_chunks = char_splitter.split_text(text)
print(f"Created {len(char_chunks)} chunks")
for chunk in char_chunks:
    print(chunk)
    print("-----------")

Character Text Splitter
Created 8 chunks
Python Programming Introduction
Python is a high-level, interpreted programming language known for its simplicity and readability.
-----------
Created by Guido van Rossum and first released in 1991, Python has become one of the most popular
programming languages in the world.
-----------
Python has various levels, I learnt python using the book "Byte of Python" in 2012 on Python 2, though this was a fantastic book my use of python
-----------
remained confined to writing short scripts in DevOps and I never delved deeper in modular programming. With use of Jupyter notebooks its so different now.
-----------
I remember one of my old scripts where I would take decisions based on IP address structure, where the logic behind each octet was different. I used python extensively on
-----------
DELL iDRAC project to write a lot of middleware in Python, that used other libraries written in C to communicate with hardware.
-----------
With ML the use of Pyt

In [9]:
# Recursive character text splitter does this recursively using different separator in each iteration. 
print("Recursive Character Text Splitter")
recursive_splitter = RecursiveCharacterTextSplitter(
    separators = ["\n\n", "\n", " ", ""],
    chunk_size = 200,
    chunk_overlap = 20,
    length_function = len
)

recursive_chunks = recursive_splitter.split_text(text)
print(f"Created {len(recursive_chunks)} chunks")
[print(chunk+"\n---") for chunk in recursive_chunks]

Recursive Character Text Splitter
Created 10 chunks
Python Programming Introduction
---
Python is a high-level, interpreted programming language known for its simplicity and readability.
Created by Guido van Rossum and first released in 1991, Python has become one of the most popular
---
programming languages in the world.
---
Python has various levels, I learnt python using the book "Byte of Python" in 2012 on Python 2, though this was a fantastic book my use of python
---
remained confined to writing short scripts in DevOps and I never delved deeper in modular programming. With use of Jupyter notebooks its so different now.
---
I remember one of my old scripts where I would take decisions based on IP address structure, where the logic behind each octet was different. I used python extensively on
---
DELL iDRAC project to write a lot of middleware in Python, that used other libraries written in C to communicate with hardware.
---
With ML the use of Python has skyrocketted, Its like in

[None, None, None, None, None, None, None, None, None, None]

#### Load PDF files using PyMuPDF and create Document objects for each page in the PDF.

In [10]:
# Loading a PDF file from local

try:
    pymupdf_loader = PyMuPDFLoader("data/pdf/BikeSecurity.pdf")
    pymupdf_docs = pymupdf_loader.load()

    print(f" Loaded {len(pymupdf_docs)} pages")
    print(f" Includes detailed metadata")
    print(pymupdf_docs)
except Exception as e:
    print(f" Error: {e}")

 Loaded 20 pages
 Includes detailed metadata
[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-07-10T15:23:25+08:00', 'source': 'data/pdf/BikeSecurity.pdf', 'file_path': 'data/pdf/BikeSecurity.pdf', 'total_pages': 20, 'format': 'PDF 1.7', 'title': 'Sustainable Daily Mobility and Bike Security', 'author': 'Sergej Gričar, Christian Stipanović and Tea Baldigara', 'subject': 'As climate change concerns, urban congestion, and environmental degradation intensify, cities prioritise cycling as a sustainable transport option to reduce CO2 emissions and improve quality of life. However, rampant bicycle theft and poor security infrastructure often deter daily commuters and tourists from cycling. This study explores how advanced security measures can bolster sustainable urban mobility and tourism by addressing these challenges. A mixed-methods approach is utilised, incorporating primary survey data from Slovenia and secondary data on bicycle 

In [11]:
# SmartPDFLoader class to load the pdf file and cleanup before chunking

class SmartPDFProcessor:
    """Advanced PDF Processing with error handling"""
    def __init__(self, chunk_size=1000, chunk_overlap=100):
        self.chunk_size = chunk_size,
        self.chunk_overlap = chunk_overlap,
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size = chunk_size,
            chunk_overlap = chunk_overlap,
            separators = [" "]
        )

    def process_pdf(self, pdf_path:str) -> List[Document]:
        """Process PDF with smart chunking and metadata enhancement"""

        #Load PDF
        loader = PyMuPDFLoader(pdf_path)
        pages = loader.load()

        #Process each page
        processed_chunks = []

        for page_num, page in enumerate(pages):
            ## clean text
            cleaned_text = self._clean_text(page.page_content)

            # Skip nearly empty pages
            if len(cleaned_text.strip()) < 50:
                continue

            # Create chunks with enhanced metadata
            chunks = self.text_splitter.create_documents(
                texts = [cleaned_text],
                metadatas = [{
                    **page.metadata,
                    "page": page_num + 1,
                    "total_pages": len(pages),
                    "chunk_method": "smart_pdf_processor",
                    "char_count": len(cleaned_text)
                }]
            )

            processed_chunks.extend(chunks)

        return processed_chunks
    
    def _clean_text(self, text:str) -> str:
        # Remove excessive whitespace
        text = " ".join(text.split())
        
        # Fix ligatures
        text = text.replace("ﬁ", "fi")
        text = text.replace("ﬂ", "fl")
        
        return text

In [12]:
preprocessor = SmartPDFProcessor()
try:
    smart_chunks = preprocessor.process_pdf("data/pdf/BikeSecurity.pdf")
    print(f"Processed into {len(smart_chunks)} smart chunks")

    # Show enhanced metadata
    if smart_chunks:
        print("\nSample chunk metadata:")
        for key, value in smart_chunks[0].metadata.items():
            print(f" {key}: {value}")

except Exception as e:
    print(f"Processing error: {e}")

Processed into 80 smart chunks

Sample chunk metadata:
 producer: pdfTeX-1.40.25
 creator: LaTeX with hyperref
 creationdate: 2025-07-10T15:23:25+08:00
 source: data/pdf/BikeSecurity.pdf
 file_path: data/pdf/BikeSecurity.pdf
 total_pages: 20
 format: PDF 1.7
 title: Sustainable Daily Mobility and Bike Security
 author: Sergej Gričar, Christian Stipanović and Tea Baldigara
 subject: As climate change concerns, urban congestion, and environmental degradation intensify, cities prioritise cycling as a sustainable transport option to reduce CO2 emissions and improve quality of life. However, rampant bicycle theft and poor security infrastructure often deter daily commuters and tourists from cycling. This study explores how advanced security measures can bolster sustainable urban mobility and tourism by addressing these challenges. A mixed-methods approach is utilised, incorporating primary survey data from Slovenia and secondary data on bicycle sales, imports and thefts from 2015 to 2024. F

#### Read multiple PDF files from a directory and create Document objects for each page in each PDF.

In [13]:
# Update SmartPDFProcessor to scan multiple PDF files 
class SmartPDFProcessor:
    """Advanced PDF Processing with error handling and multi-file support"""
    def __init__(self, chunk_size=1000, chunk_overlap=100):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            separators=[" "]
        )

    def process_pdf(self, pdf_path: str) -> List[Document]:
        """Process a single PDF with smart chunking and metadata enhancement"""
        loader = PyMuPDFLoader(pdf_path)
        pages = loader.load()
        processed_chunks = []

        for page_num, page in enumerate(pages):
            cleaned_text = self._clean_text(page.page_content)

            if len(cleaned_text.strip()) < 50:
                continue

            chunks = self.text_splitter.create_documents(
                texts=[cleaned_text],
                metadatas=[{
                    **page.metadata,
                    "page": page_num + 1,
                    "total_pages": len(pages),
                    "chunk_method": "smart_pdf_processor",
                    "char_count": len(cleaned_text),
                    "source_file": Path(pdf_path).name,
                    "file_type": "pdf"
                }]
            )
            processed_chunks.extend(chunks)
        return processed_chunks

    def process_all_pdfs(self, pdf_directory: str) -> List[Document]:
        """Process all PDF files in a directory recursively"""
        all_chunks = []
        pdf_dir = Path(pdf_directory)
        pdf_files = list(pdf_dir.glob("**/*.pdf"))
        print(f"Found {len(pdf_files)} PDF files to process")

        for pdf_file in pdf_files:
            print(f"\nProcessing: {pdf_file.name}")
            try:
                chunks = self.process_pdf(str(pdf_file))
                all_chunks.extend(chunks)
                print(f"--> Created {len(chunks)} chunks from {pdf_file.name}")
            except Exception as e:
                print(f"Error processing {pdf_file.name}: {e}")

        print(f"\nTotal chunks created from all PDFs: {len(all_chunks)}")
        return all_chunks

    def _clean_text(self, text: str) -> str:
        text = " ".join(text.split())
        text = text.replace("ﬁ", "fi").replace("ﬂ", "fl")
        return text


# Example usage:
preprocessor = SmartPDFProcessor()
all_chunks = preprocessor.process_all_pdfs("data/pdf")


Found 2 PDF files to process

Processing: BikeSecurity.pdf
--> Created 80 chunks from BikeSecurity.pdf

Processing: Project-Proposal.pdf
--> Created 8 chunks from Project-Proposal.pdf

Total chunks created from all PDFs: 88


### Create chunking using SemanticChunking 

In [14]:
loader=TextLoader("data/text_files/python_intro.txt")
docs=loader.load()
embedding=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
chunker=SemanticChunker(embedding)
chunks=chunker.split_documents(docs)
for i,chunk in enumerate(chunks):
    print(f"\n chunk {i+1}:\n{chunk.page_content}")


 chunk 1:
Python Programming Introduction

Python is a high-level, interpreted programming language known for its simplicity and readability. Created by Guido van Rossum and first released in 1991, Python has become one of the most popular
programming languages in the world. Python has various levels, I learnt python using the book "Byte of Python" in 2012 on Python 2, though this was a fantastic book my use of python
remained confined to writing short scripts in DevOps and I never delved deeper in modular programming. With use of Jupyter notebooks its so different now. I remember one of my old scripts where I would take decisions based on IP address structure, where the logic behind each octet was different. I used python extensively on
DELL iDRAC project to write a lot of middleware in Python, that used other libraries written in C to communicate with hardware.

 chunk 2:
With ML the use of Python has skyrocketted, Its like in this decade I am learning it all over again offcourse wi

#### Interestingly the context is presevered much better using SemanticChunker than RecursiveCharacterTextSplitter.

In [15]:
# Update the SmartPDFProcessor for SemanticChunking

class SmartPDFProcessorWithSemanticChunking:
    """Process multiple PDFs with semantic chunking"""

    def __init__(self, embedding_model_name="sentence-transformers/all-MiniLM-L6-v2"):
        self.embedding = HuggingFaceEmbeddings(model_name=embedding_model_name)
        self.chunker = SemanticChunker(self.embedding)

    def process_pdf(self, pdf_path: str) -> List[Document]:
        loader = PyMuPDFLoader(pdf_path)
        pages = loader.load()
        all_page_docs = []

        for page_num, page in enumerate(pages):
            cleaned_text = self._clean_text(page.page_content)
            if len(cleaned_text.strip()) < 50:
                continue

            # Create a Document for each cleaned page with metadata
            doc = Document(
                page_content=cleaned_text,
                metadata={
                    **page.metadata,
                    "page": page_num + 1,
                    "total_pages": len(pages),
                    "source_file": Path(pdf_path).name,
                    "file_type": "pdf"
                }
            )
            all_page_docs.append(doc)

        # Use semantic chunker to split all page documents semantically
        sem_chunks = self.chunker.split_documents(all_page_docs)
        return sem_chunks

    def process_all_pdfs(self, pdf_directory: str) -> List[Document]:
        all_chunks = []
        pdf_dir = Path(pdf_directory)
        pdf_files = list(pdf_dir.glob("**/*.pdf"))
        print(f"Found {len(pdf_files)} PDF files to process")

        for pdf_file in pdf_files:
            print(f"\nProcessing: {pdf_file.name}")
            try:
                chunks = self.process_pdf(str(pdf_file))
                all_chunks.extend(chunks)
                print(f"--> Created {len(chunks)} semantic chunks from {pdf_file.name}")
            except Exception as e:
                print(f"Error processing {pdf_file.name}: {e}")

        print(f"\nTotal semantic chunks created from all PDFs: {len(all_chunks)}")
        return all_chunks
    
    def create_update_vectorstore(self, chunks: List[Document], save_dir: str = "faiss_index") -> FAISS:
        """
        Create or update a FAISS vector store with new chunks.
        If a vectorstore exists in save_dir, load it and add new chunks,
        otherwise create a new vectorstore from chunks.
        Args:
            chunks: List of Document chunks to add.
            save_dir: Directory path to save/load vectorstore.
        Returns:
            FAISS vectorstore instance.
        """
        if os.path.exists(save_dir):
            print(f"Loading existing vectorstore from '{save_dir}'")
            vectorstore = FAISS.load_local(save_dir, self.embedding, allow_dangerous_deserialization=True)
            print(f"Adding {len(chunks)} new chunks to existing vectorstore")
            vectorstore.add_documents(chunks)
        else:
            print("Creating new vectorstore from chunks")
            vectorstore = FAISS.from_documents(chunks, self.embedding)
        
        vectorstore.save_local(save_dir)
        print(f"Vectorstore saved to '{save_dir}' with total {vectorstore.index.ntotal} vectors")
        return vectorstore

    def _clean_text(self, text: str) -> str:
        text = " ".join(text.split())
        text = text.replace("ﬁ", "fi").replace("ﬂ", "fl")
        return text

processor = SmartPDFProcessorWithSemanticChunking()
all_semantic_chunks = processor.process_all_pdfs("data/pdf")

print(f"Total semantic chunks processed: {len(all_semantic_chunks)}")

# print sample chunks
for i, chunk in enumerate(all_semantic_chunks[:3]):
    print(f"\nChunk {i+1} content:\n{chunk.page_content[:500]}...\nMetadata: {chunk.metadata}")


Found 2 PDF files to process

Processing: BikeSecurity.pdf
--> Created 62 semantic chunks from BikeSecurity.pdf

Processing: Project-Proposal.pdf
--> Created 9 semantic chunks from Project-Proposal.pdf

Total semantic chunks created from all PDFs: 71
Total semantic chunks processed: 71

Chunk 1 content:
Academic Editor: Jianming Cai Received: 13 June 2025 Revised: 3 July 2025 Accepted: 5 July 2025 Published: 8 July 2025 Citation: Griˇcar, S.; Stipanovi´c, C.; Baldigara, T. Sustainable Daily Mobility and Bike Security. Sustainability 2025, 17, 6262. https://doi.org/10.3390/ su17146262 Copyright: © 2025 by the authors....
Metadata: {'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-07-10T15:23:25+08:00', 'source': 'data/pdf/BikeSecurity.pdf', 'file_path': 'data/pdf/BikeSecurity.pdf', 'total_pages': 20, 'format': 'PDF 1.7', 'title': 'Sustainable Daily Mobility and Bike Security', 'author': 'Sergej Gričar, Christian Stipanović and Tea Baldigara', 'subjec

#### Save those vectors in FAISS vector store for retrieval.

In [16]:
vector_store = processor.create_update_vectorstore(all_semantic_chunks, save_dir="faiss_index")

Loading existing vectorstore from 'faiss_index'
Adding 71 new chunks to existing vectorstore
Vectorstore saved to 'faiss_index' with total 213 vectors


### Querying the vector store and retrieving relevant documents.

In [17]:
## Similarity Search 
query="How QR code verification works"

results_with_scores=vector_store.similarity_search_with_score(query,k=3)

print("\n\nSimilarity search with scores:")
for doc, score in results_with_scores:
    print(f"\nScore: {score:.3f}")
    print(f"Source: {doc.metadata['source']}")
    print(f"Content preview: {doc.page_content[:100]}...")



Similarity search with scores:

Score: 0.878
Source: data/pdf/BikeSecurity.pdf
Content preview: Sustainability 2025, 17, 6262 4 of 20 as QR codes can significantly strengthen the security framewor...

Score: 0.878
Source: data/pdf/BikeSecurity.pdf
Content preview: Sustainability 2025, 17, 6262 4 of 20 as QR codes can significantly strengthen the security framewor...

Score: 0.878
Source: data/pdf/BikeSecurity.pdf
Content preview: Sustainability 2025, 17, 6262 4 of 20 as QR codes can significantly strengthen the security framewor...


In [18]:
# Load environment variables
load_dotenv()

# Initialize the llm
groq_api_key = os.getenv("GROQ_API_KEY")
llm=ChatGroq(groq_api_key= groq_api_key, model_name="llama-3.1-8b-instant", temperature=0.1, max_tokens=1024)

#create retriever
retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k":3})

def format_docs(docs: List[Document]) -> str:
    """Format documents for insertion into prompt"""
    formatted = []
    for i, doc in enumerate(docs):
        source = doc.metadata.get('source', 'Unknown')
        formatted.append(f"Document {i+1} (Source: {source}):\n{doc.page_content}")
    return "\n\n".join(formatted)

simple_prompt = ChatPromptTemplate.from_template("""Answer the question based only on the following context:
Context: {context}

Question: {question}

Answer:""")

simple_rag_chain=(
    {"context":retriever | format_docs,"question":RunnablePassthrough() }
    | simple_prompt
    | llm
    |StrOutputParser()

)
# Maintain history and that could be used as context
conversational_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the provided context to answer questions."),
    ("placeholder", "{chat_history}"),
    ("human", "Context: {context}\n\nQuestion: {input}"),
])

def create_conversational_rag():
    """Create a conversational RAG chain with memory"""
    return (
        RunnablePassthrough.assign(
            context=lambda x: format_docs(retriever.invoke(x["input"]))
        )
        | conversational_prompt
        | llm
        | StrOutputParser()
    )

conversational_rag = create_conversational_rag()

In [19]:
simple_rag_chain

{
  context: VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x156d3c2f0>, search_kwargs={'k': 3})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\nContext: {context}\n\nQuestion: {question}\n\nAnswer:'), additional_kwargs={})])
| ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x308648980>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x3086496a0>, model_name='llama-3.1-8b-instant', temperature=0.1, model_kwargs={}, groq_api_key=SecretStr('**********'), max_tokens=1024)
| StrOutputParser()

In [20]:
conversational_rag

RunnableAssign(mapper={
  context: RunnableLambda(lambda x: format_docs(retriever.invoke(x['input'])))
})
| ChatPromptTemplate(input_variables=['context', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag=

In [21]:
def test_rag_chains(question: str):
    """Test all RAG chain variants"""
    print(f"Question: {question}")
    print("=" * 80)
    
    # 1. Simple RAG
    print("\n1. Simple RAG Chain:")
    answer = simple_rag_chain.invoke(question)
    print(f"Answer: {answer}")

test_rag_chains("How are QR codes used?")

Question: How are QR codes used?

1. Simple RAG Chain:
Answer: According to the provided context, QR codes are used to significantly strengthen the security framework for bike initiatives and individual bike ownership.


In [22]:
## Conversational example
print("\n3. Conversational RAG:")
chat_history = []

# First question
q1 = "Why do we need to secure the bikes?"
a1 = conversational_rag.invoke({
    "input": q1,
    "chat_history": chat_history
})

print(f"Q1: {q1}")
print(f"A1: {a1}")


3. Conversational RAG:
Q1: Why do we need to secure the bikes?
A1: According to the provided documents, the main reason to secure the bikes is due to the insufficient safety measures, which are causing a significant portion of individuals to remain hesitant about cycling despite their interest in it. This is because there is a high bike ownership rate but a relatively low percentage of frequent cyclists. By improving physical security infrastructure, such as secure parking facilities, the opportunity to increase cycling participation arises.


In [23]:
# Update history
chat_history.extend([
    HumanMessage(content=q1),
    AIMessage(content=a1)
])
# Follow-up question
q2 = "how is it related to CO2?"
a2 = conversational_rag.invoke({
    "input": q2,
    "chat_history": chat_history
})
print(f"\nQ2: {q2}")
print(f"A2: {a2}")


Q2: how is it related to CO2?
A2: Based on the provided documents, there is no direct information linking bicycle sales or theft incidents to CO2 levels. However, it can be inferred that reducing bicycle theft incidents might encourage more people to cycle, which in turn could lead to a reduction in CO2 emissions due to increased cycling as a mode of transportation.

But to establish a direct relationship, we would need more information about the correlation between bicycle sales, theft incidents, and CO2 emissions.


In [24]:
print(conversational_rag.invoke({"input": "When does my project complete? ", "chat_history": chat_history}))

Unfortunately, the provided documents do not contain any information about the project's completion date. They appear to be authorship and contributor lists for a research paper or study titled "BikeSecurity.pdf." The documents list the authors, their roles, and the contributors but do not mention the project's completion date.


In [26]:
chat_history = []

while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break
    
    # Run conversational RAG with current history
    response = conversational_rag.invoke({"input": user_input, "chat_history": chat_history})
    print(f"You: {user_input}")
    print(f"Assistant: {response}")
    
    # Update chat history by appending user question and assistant answer
    chat_history += f"User: {user_input}\nAssistant: {response}\n"

You: How to secure bikes?
Assistant: Based on the provided context, it seems that the documents are discussing the importance of improving physical security infrastructure to encourage more people to cycle. While the documents don't provide a direct answer to how to secure bikes, we can infer some possible solutions from the context:

1. **Secure parking facilities**: The documents mention the need for secure parking facilities, which suggests that providing safe and secure places to park bikes can help alleviate concerns about bike theft and vandalism.
2. **Physical security infrastructure**: This phrase implies that there are other physical measures that can be taken to secure bikes, such as bike locks, bike racks, or other types of bike storage solutions.

To secure bikes, some possible solutions could include:

* Using high-quality bike locks, such as U-locks or chain locks, to prevent theft.
* Parking bikes in well-lit, secure areas, such as bike racks or lockers.
* Using bike sto

APIStatusError: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-8b-instant` in organization `org_01k6vh7r4gfayak0q4tyfkhcbq` service tier `on_demand` on tokens per minute (TPM): Limit 6000, Requested 9386, please reduce your message size and try again. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}