# RAG-based text documents Q&A chat with LangChain, ChatGPT and Mistral 7B

I have built a simple RAG demo where the LLM answers questions regarding the set of external text files. RAG stands for retrieval augmented generation and it works by retrieving external documents and using them when executing queries to the LLMs. Using this technique we can ask out language model questions specific for the content of these documents. We will build a simple demo of it where the LLM will answer some questions regarding the set of external text files.

We will build two versions of this RAG in parallel: one using ChatGPT model and the other using Mistral 7B open-source model. I do it mostly for comparison of both closed-source and open-source solutions for such taks. I should mention that the second approach is especially interesting as it is completely independent and can be run even offline - without calling and kind of API like with ChatGPT

We'll use Stanford's CS224 Natural Language Processing with Deep Learning amazing course's syllabus and lectures trascript text files as our external data content we want to ask questions about.

In this experiment I used:

* Data source: text files
* Model 1: gpt-3.5-turbo with OpenAIEmbeddings embeddings
* Model 2: Mistral 7B with e5-large embeddings
* RAG: LangChain

In [1]:
!pip install accelerate
!pip install langchain
!pip install openai
!pip install langchain-openai
!pip install sentence-transformers
!pip install chromadb

## 1. Loading documents

We'll start by loading the data we want to ask our LLM about.

In [2]:
import os
import openai

openai.api_key  = os.environ['OPENAI_API_KEY']

DATA_PATH = './data/'

In [3]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [4]:
from langchain.document_loaders import TextLoader

loaders = [
    TextLoader(DATA_PATH + "CS224n_Syllabus.txt"),
    TextLoader(DATA_PATH + "CS224N_NLP_with_Deep_Learning_Winter_2021_Lecture_1.txt"),
    TextLoader(DATA_PATH + "CS224N_NLP_with_Deep_Learning_Winter_2021_Lecture_2.txt"),
    TextLoader(DATA_PATH + "CS224N_NLP_with_Deep_Learning_Winter_2021_Lecture_3.txt"),
    TextLoader(DATA_PATH + "CS224N_NLP_with_Deep_Learning_Winter_2021_Lecture_4.txt")
]
pages = []
for loader in loaders:
    pages.extend(loader.load())

In [6]:
len(pages)

5

In [7]:
pages[0].page_content[0:200]

'CS224n: Natural Language Processing with Deep Learning\nStanford / Winter 2021\n\nNatural language processing (NLP) is a crucial part of artificial intelligence (AI), modeling how people share informatio'

In [8]:
pages[0].metadata

{'source': './data/CS224n_Syllabus.txt'}

## 2. Documents splitting

Out next step should be splitting the data. LLMs input context has a limited token length. That is why we need to chunk our input data.

Individual chunks, later down the line, will be represented as embeddings vectors which will be selected as input for the model by their semantic similarity to the posted question or problem.

We will use one of the Langchain's simplest and essential split method RecursiveCharacterTextSplitter and run it on our loaded PDFs text content.

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500, # 1000
    chunk_overlap = 150 # 100
)

In [10]:
splits = text_splitter.split_documents(pages)

In [11]:
len(splits)

203

In [12]:
len(pages)

5

In [13]:
splits[0]

Document(page_content='CS224n: Natural Language Processing with Deep Learning\nStanford / Winter 2021\n\nNatural language processing (NLP) is a crucial part of artificial intelligence (AI), modeling how people share information. In recent years, deep learning approaches have obtained very high performance on many NLP tasks. In this course, students gain a thorough introduction to cutting-edge neural networks for NLP.\n\nInstructors:\nChris Manning\nJohn Hewitt\n\nTA:\nCourse Coordinator\nAmelie Byun\nTeaching Assistants\nDaniel Do\nRachel Gardner\nDavide Giovanardi\nAlvin Hou\nPrerna Khullar\nGita Krishna\nMegan Leszczynski\nElissa Li\nMandy Lu\nShikhar Murty\nAkshay Smit\nDilara Soylu\nAngelica Sun\nChris Waites\nAndrew Wang\nRui Wang\nYuyan Wang\nZihan Wang\nLingjue Xie\nRui Yan\nAnna Yang\nLauren Zhu\nLogistics', metadata={'source': './data/CS224n_Syllabus.txt'})

## 3. Embeddings and vector storage

Like I said above, we will use the generated individual chunks and represent their meaning as embeddings vectors representing the semantic meaning of the chunk of text in the high-dimensional space.

We need to perform this step separately for both models since they will use different type of embeddings.

### 3.1 ChatGPT embeddings

Since we are using ChatGPT as our LLM we will use the OpenAIEmbeddings for good fit.

In [48]:
from langchain.embeddings.openai import OpenAIEmbeddings
chatgpt_embedding = OpenAIEmbeddings()

In order to be able to use the generated documents chunks embeddings vector we need to store them in a persistent and easy to access way. Vectorstores do exactly this. It is a vector database that stores our embeddings that will be then used when performing queries using our LLM.

Chroma will serve our embeddings storage and retrieval needs pretty well. Its `from_documents` method will also take care of transforming the text chunks into the embeddings form.

In [15]:
# Remove previous version if exist
!rm -rf ./data/chroma_chatgpt/

In [16]:
from langchain.vectorstores import Chroma

In [17]:
chatgpt_persist_directory = DATA_PATH + "chroma_chatgpt/"

In [18]:
chatgpt_persist_directory

'./data/chroma_chatgpt/'

In [19]:
chatgpt_chroma_db = Chroma.from_documents(
    documents=splits,
    embedding=chatgpt_embedding,
    persist_directory=chatgpt_persist_directory
)

In [20]:
chatgpt_chroma_db.persist()

In [21]:
chatgpt_chroma_db._collection.count()

203

Let's use simple embeddings similarity search to answer some question about document by identifying text chunks that potentially could contain information related to the question. No LLMs yet - just simple cosine similarity calculated on question and documents embeddings.

In [22]:
question = "Who is the course instructor?"
docs = chatgpt_chroma_db.similarity_search(question, k=3)
docs[0].page_content

"Lectures: are on Tuesday/Thursday 4:30-5:50pm Pacific Time (Remote, Zoom link is posted on Canvas).\nLecture videos for enrolled students: are posted on Canvas (requires login) shortly after each lecture ends. Unfortunately, it is not possible to make these videos viewable by non-enrolled students.\nPublicly available lecture videos and versions of the course: Complete videos from the 2019 edition are available (free!) on the Stanford Online Hub and on the CS224N YouTube channel. Anyone is welcome to enroll in XCS224N: Natural Language Processing with Deep Learning, the Stanford Artificial Intelligence Professional Program version of this course, throughout the year (medium fee, community TAs and certificate). You can enroll in CS224N via Stanford online in the (northern hemisphere) Autumn to do the course in the Winter (high cost, gives Stanford credit). The lecture slides and assignments are updated online each year as the course progresses. We are happy for anyone to use these reso

In [23]:
question = "In which lecture - give the number and date - are word embeddings described?"
docs = chatgpt_chroma_db.similarity_search(question, k=3)
docs[0].page_content

'Gensim word vectors example:\n[code] [preview] \tSuggested Readings:\n\nEfficient Estimation of Word Representations in Vector Space (original word2vec paper)\nDistributed Representations of Words and Phrases and their Compositionality (negative sampling paper)\n\nAssignment 1 out\n[code]\n[preview] \t\nThu Jan 14 \tWord Vectors 2 and Word Window Classification\n[slides] [notes] \tSuggested Readings:\n\nGloVe: Global Vectors for Word Representation (original GloVe paper)\nImproving Distributional Similarity with Lessons Learned from Word Embeddings\nEvaluation methods for unsupervised word embeddings\n\nAdditional Readings:\n\nA Latent Variable Model Approach to PMI-based Word Embeddings\nLinear Algebraic Structure of Word Senses, with Applications to Polysemy\nOn the Dimensionality of Word Embedding\n\n\t\nFri Jan 15 \tPython Review Session\n[code] [preview] \t10:00am - 11:20am \t\t\nTue Jan 19 \tBackprop and Neural Networks\n[slides] [notes] \tSuggested Readings:\n\nmatrix calculus 

In [24]:
question = "How many assignments are there in the course?"
docs = chatgpt_chroma_db.similarity_search(question, k=3)
docs[0].page_content

'Credit:\n    Assignment 1 (6%): Introduction to word vectors\n    Assignment 2 (12%): Derivatives and implementation of word2vec algorithm\n    Assignment 3 (12%): Dependency parsing and neural network foundations\n    Assignment 4 (12%): Neural Machine Translation with sequence-to-sequence, attention, and subwords\n    Assignment 5 (12%): Self-supervised learning and fine-tuning with Transformers\nDeadlines: All assignments are due on either a Tuesday or a Thursday before class (i.e. before 4:30pm). All deadlines are listed in the schedule.\nSubmission: Assignments are submitted via Gradescope. If you need to sign up for a Gradescope account, please use your @stanford.edu email address. Further instructions are given in each assignment handout. Do not email us your assignments.\nLate start: If the result gives you a higher grade, we will not use your assignment 1 score, and we will give you an assignment grade based on counting each of assignments 2–5 at 13.5%.\nCollaboration: Study 

It works pretty well. That's the magic of embeddings alone. Just this high-dimensional vector representation of the documents chunks meaning helps to identify the semantic relation between question we pose and the part of the text that can help to answer it.

### 3.2 Mistral 7B embeddings

We now follow similar steps but this time we prepare embeddings for Mistral 7B.

The first major change we will make in order to use the Mistral 7B model instead of ChatGPT and keep using only open-source tools in the process is switching the embeddings. Here we will use open-source e5-large embeddings ("Text Embeddings by Weakly-Supervised Contrastive Pre-training", Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022).

Besides changing the embeddings, the steps we follow for generating them are the same.

We start with loading new embeddings.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

llama_embedding = HuggingFaceEmbeddings(

    model_name = "intfloat/e5-large",

    model_kwargs={"device": device},
    encode_kwargs={"normalize_embeddings": True},
)

We create another vector storage to store the embeddings.

In [None]:
# Remove previous version if exist
!rm -rf ./data/chroma_llama/

In [None]:
from langchain.vectorstores import Chroma

In [None]:
llama_persist_directory = DATA_PATH + "chroma_llama/"

In [None]:
llama_persist_directory

'./data/chroma_llama/'

In [None]:
llama_chroma_db = Chroma.from_documents(
    documents=splits,
    embedding=llama_embedding,
    persist_directory=llama_persist_directory
)

In [None]:
llama_chroma_db.persist()

In [None]:
llama_chroma_db._collection.count()

203

Again, let's use simple embeddings similarity search to answer some question with cosine similarity calculated on question and documents embeddings.

In [None]:
question = "Who is the course instructor?"
docs = llama_chroma_db.similarity_search(question, k=3)
docs[0].page_content

"Hi, everybody.\n\nWelcome to Stanford's\nCS224N, also known\n\nas Ling284, Natural Language\nProcessing with Deep Learning.\n\nI'm Christopher Manning,\nand I'm the main instructor\n\nfor this class.\n\nSo what we hope to do\ntoday is to dive right in.\n\nSo I'm going to spend\nabout 10 minutes talking\n\nabout the course,\nand then we're going\n\nto get straight into\ncontent for reasons\n\nI'll explain in a minute.\n\nSo we'll talk about human\nlanguage and word meaning,\n\nI'll then introduce the ideas\nof the word2vec algorithm\n\nfor learning word meaning.\n\nAnd then going from there\nwe'll kind of concretely\n\nwork through how you can\nwork out objective function\n\ngradients with respect to\nthe word2vec algorithm,\n\nand say a teeny bit about\nhow optimization works.\n\nAnd then right at\nthe end of the class\n\nI then want to spend\na little bit of time\n\ngiving you a sense of how\nthese word vectors work,\n\nand what you can do with them.\n\nSo really the key\nlearning fo

In [None]:
question = "In which lecture - give the number and date - are word embeddings described?"
docs = llama_chroma_db.similarity_search(question, k=3)
docs[0].page_content

'Gensim word vectors example:\n[code] [preview] \tSuggested Readings:\n\nEfficient Estimation of Word Representations in Vector Space (original word2vec paper)\nDistributed Representations of Words and Phrases and their Compositionality (negative sampling paper)\n\nAssignment 1 out\n[code]\n[preview] \t\nThu Jan 14 \tWord Vectors 2 and Word Window Classification\n[slides] [notes] \tSuggested Readings:\n\nGloVe: Global Vectors for Word Representation (original GloVe paper)\nImproving Distributional Similarity with Lessons Learned from Word Embeddings\nEvaluation methods for unsupervised word embeddings\n\nAdditional Readings:\n\nA Latent Variable Model Approach to PMI-based Word Embeddings\nLinear Algebraic Structure of Word Senses, with Applications to Polysemy\nOn the Dimensionality of Word Embedding\n\n\t\nFri Jan 15 \tPython Review Session\n[code] [preview] \t10:00am - 11:20am \t\t\nTue Jan 19 \tBackprop and Neural Networks\n[slides] [notes] \tSuggested Readings:\n\nmatrix calculus 

In [None]:
question = "How many assignments are there in the course?"
docs = llama_chroma_db.similarity_search(question, k=3)
docs[0].page_content

'Michael A. Nielsen. Neural Networks and Deep Learning\nEugene Charniak. Introduction to Deep Learning\n\n\nCoursework\nAssignments (54%)\n\nThere are five weekly assignments, which will improve both your theoretical understanding and your practical skills. All assignments contain both written questions and programming parts. In office hours, TAs may look at students’ code for assignments 1, 2 and 3 but not for assignments 4 and 5.'

Again, the results look reasonable.

## 4. Question answering RAG chat - simple case

Retrieval - the R in RAG - is used for output generation in question answering step using `RetrievalQA` chain and loaded target LLM.

We build here the most simple comversational RAG. Later on we will expand it with memory unit to make it more human-like.

The steps - besides the embeddings, embeddings vector storage and model used - are mostly the same. What we will focus here is rather the outputs and their comparison.



### 4.1 Simple RAG with ChatGPT

We start by loading our model.

In [25]:
from langchain_openai import ChatOpenAI
chatgpt_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

We then prepare prompt template to fit the model specification.

In [27]:
chatgpt_template = """Use this context to answer the question. Keep the answer short and concise with up to three sentences. Say you don't know if you don't know the answer.
{context}
Question: {question}
Helpful Answer:"""

In [28]:
from langchain.prompts import PromptTemplate
CHATGPT_QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=chatgpt_template)

In [29]:
context = "This is an example context."
question = "Explain what are Deep Neural Networks in 2-3 sentences."
print(CHATGPT_QA_CHAIN_PROMPT.format(context=context, question=question))

Use this context to answer the question. Keep the answer short and concise with up to three sentences. Say you don't know if you don't know the answer.
This is an example context.
Question: Explain what are Deep Neural Networks in 2-3 sentences.
Helpful Answer:


To combine the LLM with the database, we'll use the `RetrievalQA` chain.

In [30]:
from langchain.chains import RetrievalQA
chatgpt_qa_chain = RetrievalQA.from_chain_type(chatgpt_llm,
                                               retriever=chatgpt_chroma_db.as_retriever(),
                                               return_source_documents=True,
                                               chain_type_kwargs={"prompt": CHATGPT_QA_CHAIN_PROMPT})

In [31]:
question = "Who is the course instructor?"
result = chatgpt_qa_chain.invoke({"query": question})
result["result"]

'Christopher Manning'

In [32]:
question = "In which lecture are word embeddings described? Please return just lecture number."
result = chatgpt_qa_chain.invoke({"query": question})
print(result["result"])

Lecture 1


In [33]:
question = "How many assignments are there in the course?"
result = chatgpt_qa_chain.invoke({"query": question})
print(result["result"])

There are five assignments in the course.


All the answers are really short and crisp. Nothing to complain about - besides the fact that geenrating them takes sending our data to the OpenAI API endpoint.

### 4.2 Simple RAG with Llama 2

We now follow similar steps to build a Q&A chat but this time with Llama 2.

The main difference here is that insteaf of connecting to OpenAI API we simply load trained model checkpoint.

This solution is 100% open-source and self hosted - meaning that at no point you are sending any of your data outside your environment.

We start by loading a model from Hugging Face hub.

In [None]:
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

model.generation_config.pad_token_id = model.generation_config.eos_token_id
tokenizer.pad_token = tokenizer.unk_token

generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 256
generation_config.temperature = 0.0001
generation_config.top_p = 0.75
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

text_pipeline = pipeline(
    "text-generation",
    model=model,
    device=device,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

llama_llm = HuggingFacePipeline(pipeline=text_pipeline)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

We then prepare the prompt template.

In [None]:
llama_template = """<s>[INST] You are a helpful assistant. Your task is to answer the following question using below context. Use up to four sentences. Keep the answer concise. Say you don't know if you don't know the answer.

Context: {context}

Question: {question}[/INST]
</s>"""
print(llama_template.format(context="xxx", question="yyy"))

<s>[INST] You are a helpful assistant. Your task is to answer the following question using below context. Use up to four sentences. Keep the answer concise. Say you don't know if you don't know the answer.

Context: xxx

Question: yyy[/INST]
</s>


In [None]:
from langchain.prompts import PromptTemplate

LLAMA_QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=llama_template)

And we set up the `RetrievalQA` specifically for Mistral 7B model.

In [None]:
from langchain.chains import RetrievalQA

llama_qa_chain = RetrievalQA.from_chain_type(llama_llm,
                                             retriever=llama_chroma_db.as_retriever(),
                                             return_source_documents=True,
                                             chain_type_kwargs={"prompt": LLAMA_QA_CHAIN_PROMPT})

Finally, we can now query the model about our data.

In [47]:
question = "Who is the course instructor?"
result = llama_qa_chain.invoke({"query": question})

In [None]:
print(result["result"].split("</s>")[-1].strip())

The course instructor is Christopher Manning. He is the main instructor for Stanford's CS224N, also known as Ling284, Natural Language Processing with Deep Learning.


In [46]:
question = "In which lecture - give the number and date - are word embeddings described?"
result = llama_qa_chain({"query": question})

In [None]:
print(result["result"].split("</s>")[-1].strip())

In the first lecture on January 14, 2021, Chris Manning introduces the idea of word embeddings using the word2vec algorithm. This lecture sets the foundation for understanding word representations as vectors.


In [45]:
question = "How many assignments are there in the course?"
result = llama_qa_chain({"query": question})

In [None]:
print(result["result"].split("</s>")[-1].strip())

There are a total of five assignments in the course, worth 54% of the final grade. The first assignment was due recently, and the second assignment has been released. Late submission of assignments is permitted, but it may lower the overall grade. Collaboration is allowed in study groups, but each student must submit their own assignment. The final project makes up the remaining 43% of the grade.


Now, we have answers for our data specific questions coming from both ChatGPT and Llama 2 RAGs.

And both of them answer correctly all the answers. The only difference is the language style between the models but aside from that relatively small model like Mistral 7B works very well in such RAG setting.

And as a plus, all the data stays within environment. Here it does not matter, but imagine setting where we would create RAG chat based on some medical instituion patients data or some contracts details from consulting firms etc. Open-source Llama 2 based solution is in this aspect very simple and secure.

## 5. Question answering RAG chat - robust case

The final problem that we need to address is making sure the question answering task will have element of continuity between subsequent question to be similar to actual natural conversation.

### 5.1 Robust RAG with ChatGPT

The problem can be visible when we ask a follow up question refering implicitely to the previous one.

In [34]:
question = "What are their topics? Present them as numbered list."
result = chatgpt_qa_chain.invoke({"query": question})
print(result["result"])

1. Structure of sentences and conveying meaning in human language.
2. Introduction to PyTorch framework for deep learning.
3. Final project choices and considerations.


As we can see in the follow up question the LLM lost the context of the course assignments mentioned in previous question. And that is quite obvious as LLMs are not equiped with dialogue memory.

To make the Q&A more natural we will use `ConversationBufferMemory` conversation memory mechanism provided by LangChain. We will also use more complex `ConversationalRetrievalChain` - instead of the original `RetrievalQA` - chain in order to handle the conversation history and feeding it to the LLM with queries.

In [35]:
from langchain.memory import ConversationBufferMemory
chatgpt_memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

In [36]:
from langchain.chains import ConversationalRetrievalChain
chatgpt_qa_chain_memory = ConversationalRetrievalChain.from_llm(
    chatgpt_llm,
    retriever=chatgpt_chroma_db.as_retriever(),
    memory=chatgpt_memory,
    combine_docs_chain_kwargs={"prompt": CHATGPT_QA_CHAIN_PROMPT}
)

In [44]:
question = "How many assignments are there in the course?"
result = chatgpt_qa_chain_memory({"question": question})
print(result['answer'])

In [38]:
question = "What are their topics? Present them as numbered list."
result = chatgpt_qa_chain_memory({"question": question})
print(result['answer'])

1. Introduction to word vectors
2. Derivatives and implementation of word2vec algorithm
3. Dependency parsing and neural network foundations
4. Neural Machine Translation with sequence-to-sequence, attention, and subwords
5. Self-supervised learning and fine-tuning with Transformers


In [39]:
question = "What is the third one?"
result = chatgpt_qa_chain_memory({"question": question})
print(result['answer'])

The topic of the third assignment in the course is Dependency parsing and neural network foundations.


Now - after adding the conversation memory buffer - we can see that not only the answers are correct but also the follow up question does not loose the context of previous answers and give more details on it keeping the conversation continuit intact.

### 5.2 Robust RAG with Llama 2

The same problem occurs in case of Llama 2 based RAG where the context of previously asked questions is lost.

In [43]:
question = "What are their topics? Present them as numbered list."
result = llama_qa_chain({"query": question})

In [None]:
print(result["result"].split("</s>")[-1].strip())

1. Math details of neural network learning, including working out gradients by hand and the backpropagation algorithm
2. Linguistics and natural language processing, specifically dependency parsing
3. Syntactic structure of languages, introducing constituency and dependency
4. Dependency grammars and dependency treebanks
5. Building natural language processing systems, discussing transition-based dependency parsing
6. Developing a simple and highly effective neural dependency parser
7. Assignments, including information about submission deadlines and discount options.


Again, to make the Q&A more natural we will use `ConversationBufferMemory` conversation memory mechanism and more complex `ConversationalRetrievalChain` chain provided by LangChain.

In [None]:
from langchain.memory import ConversationBufferMemory
llama_memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

In [None]:
from langchain.chains import ConversationalRetrievalChain
llama_qa_chain_memory = ConversationalRetrievalChain.from_llm(
    llama_llm,
    retriever=llama_chroma_db.as_retriever(),
    memory=llama_memory,
    combine_docs_chain_kwargs={"prompt": LLAMA_QA_CHAIN_PROMPT}
)

In [42]:
question = "How many assignments are there in the course?"
result = llama_qa_chain_memory({"question": question})

In [None]:
print(result["answer"].split("</s>")[-1].strip())

There are a total of five assignments in the course, worth 54% of the final grade. The first assignment was due recently, and the second assignment has been released. Late submission of assignments is permitted, but it may lower the overall grade. Collaboration is allowed in study groups, but each student must submit their own assignment. The final project makes up the remaining 43% of the grade.


In [41]:
question = "What are their topics? List them as numbered list."
result = llama_qa_chain_memory({"question": question})

In [None]:
print(result["answer"].split("</s>")[-1].strip())

The five assignments cover various topics related to natural language processing and deep learning. Here's a brief overview of each assignment:

1. Assignment 1: Introduction to word vectors
2. Assignment 2: Derivatives and implementation of word2vec algorithm
3. Assignment 3: Dependency parsing and neural network foundations
4. Assignment 4: Neural Machine Translation with sequence-to-sequence, attention, and subwords
5. Assignment 5: Self-supervised learning and fine-tuning with Transformers

These assignments aim to improve both theoretical understanding and practical skills, with both written questions and programming parts. Office hours are available for assistance, and submissions are made via Gradescope. Late submissions may affect the overall grade, while collaboration is allowed within study groups.


In [40]:
question = "What is the third one?"
result = llama_qa_chain_memory({"question": question})

In [None]:
print(result["answer"].split("</s>")[-1].strip())

The third assignment is called "Dependency parsing and neural network foundations." It focuses on dependency parsing, which is a common technique used in natural language processing to analyze the grammatical structure of sentences. Additionally, it covers the basics of neural networks and their foundational concepts.


Again, after adding the conversation memory buffer - we can see that besides all answer being factually correct, the follow up question does not loose the context.

Both our RAG text data files Q&A chats are now fully functional and work quite well.

This experiment shows that - with tools like LangChain - open-source, free to use and self-hosted model like Llama 7B can offer performance comparable to that of the best commercial models like ChatGPT while you do not need to share your data in any way with APIs.

Of course this project is very simple I will probably explore this topic with follow-up experiment, applying similar setup to more realistic high-volume and high-complexity body of documents and data. Stay tuned.