In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [5]:
OPENAI_API_KEY=os.getenv("OPENAI_API_KEY")
# OPENAI_API_KEY

In [None]:
#OPEN AI MODEL -> invalid API KEY 
MODEL="gpt-3.5-turbo"
from langchain_openai.chat_models import ChatOpenAI 
model = ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
model.invoke("YO, whats up?")

In [22]:
MODEL="deepseek-r1:1.5b"
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

In [9]:
# model.invoke("YO, tell me a joke")

"<think>\n\n</think>\n\nSure! Here's a light one:\n\nWhy don’t skeletons fight each other?  \nBecause they don’t have the *gym*!"

In [10]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 


In [12]:
print(chain.invoke("YO, tell me a joke"))

<think>

</think>

Why did the scarecrow win an award?  
Because he was outstanding in his field.


In [13]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(file_path="thesis.pdf")
pages = loader.load_and_split()
pages

[Document(metadata={'producer': 'pdfTeX-1.40.26', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-08-08T14:14:32+00:00', 'author': '', 'keywords': '', 'moddate': '2025-08-08T14:14:32+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) kpathsea version 6.4.0', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'thesis.pdf', 'total_pages': 65, 'page': 0, 'page_label': 'a'}, page_content='FAKULTÄT FÜR INFORMATIK\nTECHNISCHE UNIVERSITÄT MÜNCHEN\nMaster’s Thesis in Informatik\nPlanning with Large Language Models\nAkanksha Chawla, 03766791'),
 Document(metadata={'producer': 'pdfTeX-1.40.26', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-08-08T14:14:32+00:00', 'author': '', 'keywords': '', 'moddate': '2025-08-08T14:14:32+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) kpathsea version 6.4.0', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'thesis.pdf', 'total_pages': 65, '

In [15]:
from langchain.prompts import ChatPromptTemplate

template = '''Answer the question based on the given context. Incase, you 
cannot answer the question based SOLELY on the context, reply "I dont know".

Context: {context}
Question: {question}
'''
prompt = ChatPromptTemplate.from_template(template)
print(prompt.format(
    context="Hi I am susan.",
    question="What is my name?"
))

Human: Answer the question based on the given context. Incase, you 
cannot answer the question based SOLELY on the context, reply "I dont know".

Context: Hi I am susan.
Question: What is my name?



In [19]:
chain = prompt | model | parser
print(chain.invoke({
    "context":"Hi I am susan.",
    "question": "Was ist meine name?"
}))

<think>
Okay, so I need to figure out what Susan's name is based on the given context. The user provided a conversation where they introduced themselves as Susan and asked, "Was ist meine name?" in German. 

First, I'll break down the question. It seems like it's asking for the user's name in German. Since the user mentioned they're called "Susan," that should be straightforward.

Wait, but sometimes names can have accents or different spellings. In this case, Susan is already a common spelling, so the answer should be Susan without any changes.

I don't think there are other context clues because the previous message was about a response based on the given context, which only includes that conversation. So, I should stick to that information.

Also, since it's a direct question and the context is clear, I can confidently say Susan.
</think>

Susan


In [23]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)
retriever = vectorstore.as_retriever()

In [28]:
retriever.invoke("reinforcement learning")

[Document(metadata={'producer': 'pdfTeX-1.40.26', 'creator': 'LaTeX with hyperref', 'creationdate': '2025-08-08T14:14:32+00:00', 'author': '', 'keywords': '', 'moddate': '2025-08-08T14:14:32+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) kpathsea version 6.4.0', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'thesis.pdf', 'total_pages': 65, 'page': 20, 'page_label': '14'}, page_content='3 Methodology\na custom dataset within the Blocksworld domain. Usually, problem statements from\nBlocksworlds domain consist of a set of named blocks that can be stacked on top of one\nanother or placed on a table. The goal is to move the blocks from an initial configuration\nto a target configuration using a sequence of valid actions, namely:pick-up, stack,\nput-down, unstack. Any action can only be performed after required pre-conditions\nhave been met, upon performing the action the relevant state variables are updated\naccordingly to reflect t

In [29]:
question="How and why do I use reinforcement learning in the project."

In [30]:
from operator import itemgetter

chain = ( 
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")} 
    | prompt
    | model
    | parser 
)

In [31]:
print(chain.invoke({"question": question}))

<think>
Okay, so I need to figure out how and why I used reinforcement learning (RL) in my project. Let's break this down step by step.

First, looking at the context provided, it seems that the project was about using small models like GPT-3, which are known for their inference speeds but sometimes struggle with reasoning tasks because they don't have much internal structure or mechanisms to think through problems themselves.

I remember seeing in the document that they used an instruction-fine-tuned 12B parameter model. This model was likely being trained on a dataset called Blocksworld. The key thing about this setup is that it's a smaller model compared to larger ones like OpenAI's o1 or DeepSeek-R1, which are more powerful because they have more parameters and a more complex architecture.

The user mentioned that their goal was to improve reasoning without relying heavily on external verifiers. They were using the Blocksworld domain as their benchmark. In the literature survey sec