## Load Llama3-8B LLM

In [1]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

In [2]:
MODEL = "llama3:8b"

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

print(model.invoke("Tell me a joke"))

Here's one:

Why couldn't the bicycle stand up by itself?

(Wait for it...)

Because it was two-tired!

Hope that made you smile! Do you want to hear another one?


## Create the chain and set the prompt

In [3]:
from langchain_core.output_parsers import StrOutputParser

In [4]:
# This code useful for converting into a clean string format, but llama3 already does this
parser = StrOutputParser()

chain = model | parser 
# print(chain.invoke("Tell me a joke"))

In [5]:
from langchain.prompts import PromptTemplate

In [6]:
template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))


Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: Here is some context

Question: Here is a question



In [7]:
chain = prompt | model | parser

chain.invoke({"context": "My name is Fahri", "question": "What's your name?"})

'My name is Fahri.'

## Load PDF files and split that into chunks

In [8]:
from langchain_community.document_loaders import DirectoryLoader

In [9]:
loader = DirectoryLoader("sample-pdf", glob="*.pdf", show_progress=True)
documents = loader.load()
print(len(documents), "documents loaded")

100%|██████████| 2/2 [00:17<00:00,  8.85s/it]

2 documents loaded





In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

In [11]:
# texts

## Create vector database and store the texts

In [12]:
from langchain_community.vectorstores import DocArrayInMemorySearch

In [13]:
db = DocArrayInMemorySearch.from_documents(texts, embedding=embeddings)



## Create retriever and chain it with LLM

In [14]:
retriever = db.as_retriever()

In [15]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

In [30]:
questions = [
    "What is the main focus of the course HL 2090 Special Topic in Literature, I: Literature and Economics?",
    "What's the website that provide NTU Academic Integrity Guidelines?",
    "What if you wish to use the materials for your assignments?",
    "Can you list the books mentioned in the course?",
    "Can you provide the English Language requirement?",
    "What is the minimum acceptable score for the TOEFL iBT test?",
    "How long is the validity of the MUET score?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print()

Question: What is the main focus of the course HL 2090 Special Topic in Literature, I: Literature and Economics?
Answer: The main focus of the course HL 2090 Special Topic in Literature, I: Literature and Economics is an introduction to economic concepts and ideas through their dramatization in literature.

Question: What's the website that provide NTU Academic Integrity Guidelines?
Answer: http://www.ntu.edu.sg/ai/Pages/academic-integrity-policy.aspx

Question: What if you wish to use the materials for your assignments?
Answer: You must cite them accordingly.

Question: Can you list the books mentioned in the course?
Answer: Based on the context, here are the books mentioned in the course:

1. Fitzgerald's The Great Gatsby
2. Hamid's How to Get Filthy Rich in Rising Asia
3. Kwan's Crazy Rich Asians
4. Norris's McTeague (specifically mentioning Signet, Penguin, Dover, or Norton edition)

Let me know if you have any further questions!

Question: Can you provide the English Language requ

In [29]:
for s in chain.stream({"question": "What's the NUS institution code for TOEFL?"}):
    print(s, end="", flush=True)

I don't know. The provided context does not contain any information about the NUS institution code for TOEFL. It appears to be a collection of documents related to an academic course, including pages from PDF files that mention English proficiency and plagiarism guidelines, but do not provide specific information on TOEFL codes.