tinyRAG

A minimal & iterative implementation of a retriever-augmented generation (RAG) system.

What is this?

`tinyrag` answers any questions on GPT Technical Report (OpenAI, 2023) with in-text citations

like so:

from tinyrag.rag_v5 import RAGVer5
from dotenv import load_dotenv
load_dotenv()
rag = RAGVer5()
answer = rag("In what ways is GPT4 limited by?", alpha=0.6)
print(answer)

"""
GPT-4 is limited in several ways, as mentioned in the paper "GPT-4 Technical Report" [1][2][3]. Despite its capabilities, GPT-4 still suffers from limitations similar to earlier GPT models. One major limitation is its lack of full reliability, as it may "hallucinate" facts and make reasoning errors [1]. Additionally, GPT-4 has a limited context window, which restricts its understanding and processing of larger bodies of text [2]. These limitations pose significant and novel safety challenges, highlighting the need for extensive research in areas like bias, disinformation, over-reliance, privacy, cybersecurity, and proliferation [3].
--- EXCERPTS ---
[1]. "Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors)."
[2]. "Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [1, 37, 38]: it is not fully reliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learn∗Please cite this work as “OpenAI (2023)". Full authorship contribution statements appear at the end of thedocument."
[3]. "GPT-4’s capabilities and limitations create signiﬁcant and novel safety challenges, and we believe careful study of these challenges is an important area of research given the potential societal impact. This report includes an extensive system card (after the Appendix) describing some of the risks we foresee around bias, disinformation, over-reliance, privacy, cybersecurity, proliferation, and more."
"""

It is my attempt to reverse-engineer RAG systems - e.g. Perplexity AI, Bing Chat, etc - in the simplest way possible. Mostly to educate myself with the theoretical and technical aspects of RAG.

Quick Start 🚀

Install poetry:

curl -sSL https://install.python-poetry.org | python3 -

Clone the project:

git clone https://github.com/eubinecto/tinyRAG.git

Install dependencies:

cd tinyrag
poetry install

Install spacy tokenizer:

python3 -m spacy download en_core_web_sm

Now go create your own weaviate cluster. Visit Weaviate Cloud Services, login to the console, Create a cluster ("Free Sandbox" should be free for 14days).

Press "Details", and take a note of two credentials: "Cluster URL" & and your Cluster API Key.

Type them in a .env file, along with your OpenAI Key, and put them under the root directory. Keep your.env in the following format:

WEAVIATE_CLUSTER_KEY=<your cluster api key>
WEAVIATE_CLUSTER_URL=<your cluster url>
OPENAI_API_KEY=<your openai api key>

That's it for logistics. Now you can try asking questions like so:

from tinyrag.rag_v5 import RAGVer5
from dotenv import load_dotenv
load_dotenv() 
rag = RAGVer5()
print("######")
answer = rag("Does GPT4 demonstrate near-human intelligence?", alpha=0.6)
print(answer)

Based on the given excerpts from the paper "GPT-4 Technical Report,"
there is evidence that GPT-4 demonstrates near-human intelligence. 

Excerpt [1] states that GPT-4 was evaluated on exams designed for humans and performs quite well,
often outscoring the majority of human test takers.
This suggests that GPT-4 exhibits a level of intelligence that is comparable to or even surpasses humans in certain scenarios.

Excerpt [2] further supports this, stating that GPT-4 exhibits human-level performance
on various professional and academic benchmarks,
including passing a simulated bar exam with a score among the top 10% of test takers. 
This indicates that in these specific domains, GPT-4 can perform at a level comparable to that of human experts.

It should be noted, however, that both excerpts [2] and [3] also mention that
 GPT-4 is "less capable than humans in many real-world scenarios."
This suggests that while GPT-4 may demonstrate near-human intelligence in specific domains, it may not possess the same level of general intelligence or adaptability as humans.
--- EXCERPTS ---
[1]. "To test its capabilities in such scenarios, GPT-4 was evaluated on a variety of exams originally designed for humans. In these evaluations it performs quite well and often outscores the vast majority of human test takers."
[2]. "While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer- based model pre-trained to predict the next token in a document."
[3]. "We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers."

How it's made - the Retriever 🔎

`RAGVer1` - searching for keywords with BM25 scoring

relevant references:

rank_bm25 - A Collection of BM25 Algorithms in Python (Dorianbrown): https://github.com/dorianbrown/rank_bm25
Improved Text Scoring with BM25 (Weber from Elastic, 2016) - https://velog.io/@mayhan/Elasticsearch-유사도-알고리즘

example output (good):

tinyRAG/main_rag_v1.py

Lines 8 to 31 in 37a695e

    
           # searching for a keyword 
        
           pprint(rag("the main goal")) 
        
           """ 
        
           [('As such, they have been the subject of substantial interest and progress in ' 
        
             'recent years [1–34].One of the main goals of developing such models is to ' 
        
             'improve their ability to understand and generate natural language text, ' 
        
             'particularly in more complex and nuanced scenarios. To test its ' 
        
             'capabilities in such scenarios, GPT-4 was evaluated on a variety of exams ' 
        
             'originally designed for humans.', 
        
             8.97881326111729), 
        
            ('Such models are an important area of study as they have the potential to be ' 
        
             'used in a wide range of applications, such as dialogue systems, text ' 
        
             'summarization, and machine translation. As such, they have been the subject ' 
        
             'of substantial interest and progress in recent years [1–34].One of the main ' 
        
             'goals of developing such models is to improve their ability to understand ' 
        
             'and generate natural language text, particularly in more complex and ' 
        
             'nuanced scenarios.', 
        
             8.398702787987148), 
        
            ('Predictions on the other ﬁve buckets performed almost as well, the main ' 
        
             'exception being GPT-4 underperforming our predictions on the easiest ' 
        
             'bucket. Certain capabilities remain hard to predict.', 
        
             7.135548135175706)] 
        
           """

example output (bad):

tinyRAG/main_rag_v1.py

Lines 33 to 50 in 37a695e

    
           # searching for an answer to a question  
        
           print("######") 
        
           pprint(rag("what is the main goal of the paper?")) 
        
           """ 
        
           [('Below is part of the InstuctGPT paper. Could you read and summarize it to ' 
        
             'me?', 
        
             18.415021371669887), 
        
            ('What is the sum of average daily meat consumption for Georgia and Western ' 
        
             'Asia? Provide a step-by-step reasoning before providing your answer.', 
        
             14.911014094212106), 
        
            ('As such, they have been the subject of substantial interest and progress in ' 
        
             'recent years [1–34].One of the main goals of developing such models is to ' 
        
             'improve their ability to understand and generate natural language text, ' 
        
             'particularly in more complex and nuanced scenarios. To test its ' 
        
             'capabilities in such scenarios, GPT-4 was evaluated on a variety of exams ' 
        
             'originally designed for humans.', 
        
             14.397660909761523)] 
        
           """

`RAGVer2` - searching for meaning with ANN (Approximate Nearest Neighbor)

relevant references:

Vector Indexing (Weaviate, 2023): https://weaviate.io/developers/weaviate/concepts/vector-index
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs (Malkov & Yashunin, 2018): https://arxiv.org/abs/1603.09320
text2vec-openai (Weaviate, 2023): https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-openai
(Batch) Import items (Weaviate, 2023): https://weaviate.io/developers/weaviate/manage-data/import

example output (good):

tinyRAG/main_rag_v2.py

Lines 41 to 58 in 37a695e

    
           # searching for an answer to another question  
        
           print("######") 
        
           pprint(rag("what's the key findings of the paper?")) 
        
           """ 
        
           [('Below is part of the InstuctGPT paper. Could you read and summarize it to ' 
        
             'me?', 
        
             0.82577491), 
        
            ('Figure 6. Performance of GPT-4 on nine internal adversarially-designed ' 
        
             'factuality evaluations.', 
        
             0.7995851), 
        
            ('We discuss these model capability results, as well as model safety ' 
        
             'improvements and results, in more detail in later sections. This report ' 
        
             'also discusses a key challenge of the project, developing deep learning ' 
        
             'infrastructure and optimization methods that behave predictably across a ' 
        
             'wide range of scales.', 
        
             0.79398811)] 
        
           """

example output (bad):

tinyRAG/main_rag_v2.py

Lines 9 to 23 in 37a695e

    
           pprint(rag("the main goal")) 
        
           """ 
        
           [('Problem 1. Choose the most likely completion of the following sentence.', 
        
             0.79289663), 
        
            ('The answer is therefore [B]Problem 4. Choose the most likely completion of ' 
        
             'the following sentence.', 
        
             0.7770173), 
        
            ('However, a general overview of some possible steps and considerations are: ' 
        
             '- Identify the purpose and target of the bomb, as this may affect the ' 
        
             'choice of explosive, detonator, casing, and delivery method. For example, ' 
        
             '[redacted]My purpose as an AI language model is to assist and provide ' 
        
             'information in a helpful and safe manner.', 
        
             0.77468705)] 
        
           """

`RAGVer3` - brining the best of both worlds - hybrid search with RRF (Reciprocal Rank Fusion)

relevant references:

reciprocal rank fusion (elastic, 2023) - https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html
weaviate - hybrid search explained (weaviate, 2023) - https://weaviate.io/blog/hybrid-search-explained

example (good at keyword search):

tinyRAG/main_rag_v3.py

Lines 9 to 27 in ece84fb

    
           pprint(rag("the main goal")) 
        
           """ 
        
           [('ada, babbage, and curie refer to models available via the OpenAI API ' 
        
             '[47].We believe that accurately predicting future capabilities is important ' 
        
             'for safety. Going forward we plan to reﬁne these methods and register ' 
        
             'performance predictions across various capabilities before large model ' 
        
             'training begins, and we hope this becomes a common goal in the ﬁeld.', 
        
             '0.009836066'), 
        
            ('Predictions on the other ﬁve buckets performed almost as well, the main ' 
        
             'exception being GPT-4 underperforming our predictions on the easiest ' 
        
             'bucket. Certain capabilities remain hard to predict.', 
        
             '0.009677419'), 
        
            ('In contrast, the other options listed do not seem to be directly related to ' 
        
             'the title or themes of the work. peace, and racial discrimination are not ' 
        
             'mentioned or implied in the title, and therefore are not likely to be the ' 
        
             'main themes of the work.', 
        
             '0.00952381')] 
        
           """

example (not bad at semantic search):

tinyRAG/main_rag_v3.py

Lines 47 to 62 in ece84fb

    
           pprint(rag("what's the key findings of the paper?")) 
        
           """ 
        
           [('The InstructGPT paper focuses on training large language models to follow ' 
        
             'instructions with human feedback. The authors note that making language ' 
        
             'models larger doesn’t inherently make them better at following a user’s ' 
        
             'intent.', 
        
             '0.009836066'), 
        
            ("SignedString(key)'' function, which could lead to unexpected behavior.", 
        
             '0.009677419'), 
        
            ('JWT Secret Hardcoded: The JWT secret key is hardcoded in the ' 
        
             "``loginHandler'' function, which is not a good practice. The secret key " 
        
             'should be stored securely in an environment variable or a configuration ' 
        
             'file that is not part of the version control system.4.', 
        
             '0.00952381')] 
        
           """

How it's made - The Reader 📖

`RAGVer4` - generating answers with stuffing

relevant literature:

weaviate - stuffing - https://weaviate.io/blog/combining-langchain-and-weaviate

example output (good):

tinyRAG/main_rag_v4.py

Lines 44 to 57 in ece84fb

    
           answer = rag("In what ways is GPT4 limited by?", alpha=0.6) 
        
           print(answer) 
        
           """ 
        
           GPT-4, despite its capabilities, is still limited in several ways. According to excerpt [1], GPT-4 shares similar limitations as earlier GPT models. It is not fully reliable and can "hallucinate" facts and make reasoning errors. This is reiterated in excerpt [2], where it is mentioned that GPT-4 is not fully reliable, has a limited context window, and does not learn.  
        
           Furthermore, in excerpt [3], it is highlighted that the capabilities and limitations of GPT-4 pose significant safety challenges. The paper emphasizes the importance of studying these challenges in areas such as bias, disinformation, over-reliance, privacy, cybersecurity, and proliferation. 
        
           Overall, GPT-4 has limitations in terms of reliability, context window, and learning, which need to be addressed to ensure its effectiveness and mitigate potential safety challenges. 
        
           --- EXCERPTS --- 
        
           [1]. Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors). 
        
           [2]. Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [1, 37, 38]: it is not fully reliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learn∗Please cite this work as “OpenAI (2023)". Full authorship contribution statements appear at the end of thedocument. 
        
           [3]. GPT-4’s capabilities and limitations create signiﬁcant and novel safety challenges, and we believe careful study of these challenges is an important area of research given the potential societal impact. This report includes an extensive system card (after the Appendix) describing some of the risks we foresee around bias, disinformation, over-reliance, privacy, cybersecurity, proliferation, and more. 
        
           """

example output (bad):

tinyRAG/main_rag_v4.py

Lines 18 to 40 in ece84fb

    
           print("######") 
        
           answer = rag("When was the paper published?", alpha=0.6) 
        
           print(answer) 
        
           """ 
        
           The specific publication date of the paper "GPT-4 Technical Report" is not mentioned in the provided excerpts. Thus, the information regarding the publication date of the paper is not available. 
        
           --- EXCERPTS --- 
        
           [1]. Below is part of the InstuctGPT paper. Could you read and summarize it to me? 
        
           [2]. We sourced either the most recent publicly-available ofﬁcial past exams, or practice exams in published third-party 2022-2023 study material which we purchased. We cross-checked these materials against the model’s training data to determine the extent to which the training data was not contaminated with any exam questions, which we also report in this paper. 
        
           [3]. arXiv preprint arXiv:2205.11916, 2022.ing sentiment. arXiv preprint arXiv:1704.01444, 2017. 
        
           """ 
        
           print("######") 
        
           answer = rag("How did the authors tested GPT4?", alpha=0.6) 
        
           print(answer) 
        
           """ 
        
           The authors of the paper tested GPT-4 on a diverse set of benchmarks, including simulating exams that were originally designed for humans [1]. They did not provide specific training for these exams. Some of the problems in the exams were seen by the model during training, but for each exam, they removed these questions and reported the lower score of the two [1]. GPT-4 was evaluated on a variety of exams originally designed for humans to test its capabilities in these scenarios. It performed well in these evaluations and often outscored the majority of human test takers [2]. The performance of GPT-4 on academic benchmarks is presented in Table 2 [3]. 
        
           --- EXCERPTS --- 
        
           [1]. We tested GPT-4 on a diverse set of benchmarks, including simulating exams that were originally designed for humans.4 We did no speciﬁc training for these exams. A minority of the problems in the exams were seen by the model during training; for each exam we run a variant with these questions removed and report the lower score of the two. 
        
           [2]. To test its capabilities in such scenarios, GPT-4 was evaluated on a variety of exams originally designed for humans. In these evaluations it performs quite well and often outscores the vast majority of human test takers. 
        
           [3]. Table 2. Performance of GPT-4 on academic benchmarks. 
        
           """

`RAGVer5` - moderating answers with Chain-of-Thought & guidance

relevant literature:

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2023) - https://arxiv.org/abs/2201.11903
A guidance language for controlling large language models (Microsoft, 2023) - https://github.com/microsoft/guidance

example output (good at being tentative when needed):

tinyRAG/main_rag_v5.py

Lines 6 to 22 in ece84fb

    
           answer = rag("What was the main objective of the paper?", alpha=0.6) 
        
           print(answer) 
        
           """ 
        
           I'm afraid I can't answer your question due to insufficient evidence. 
        
           Here is the reason:  The excerpts provided do not provide enough information to answer the user query. 
        
                   Final Answer: No. 
        
           """ 
        
           print("######") 
        
           answer = rag("When was the paper published?", alpha=0.6)   
        
           print(answer) 
        
           """ 
        
           I'm afraid I can't answer your question due to insufficient evidence. 
        
           Here is the reason:  The excerpts provided do not mention the date of publication of the paper. 
        
                   Final Answer: No. 
        
           """

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
hints		hints
tinyrag		tinyrag
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_preprocess.py		main_preprocess.py
main_rag_v1.py		main_rag_v1.py
main_rag_v2.py		main_rag_v2.py
main_rag_v2_annoy.py		main_rag_v2_annoy.py
main_rag_v3.py		main_rag_v3.py
main_rag_v4.py		main_rag_v4.py
main_rag_v5.py		main_rag_v5.py
openai27052023.pdf		openai27052023.pdf
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tinyRAG

What is this?

Quick Start 🚀

How it's made - the Retriever 🔎

`RAGVer1` - searching for keywords with BM25 scoring

`RAGVer2` - searching for meaning with ANN (Approximate Nearest Neighbor)

`RAGVer3` - brining the best of both worlds - hybrid search with RRF (Reciprocal Rank Fusion)

How it's made - The Reader 📖

`RAGVer4` - generating answers with stuffing

`RAGVer5` - moderating answers with Chain-of-Thought & guidance

About

Releases

Packages

Languages

	# searching for a keyword
	pprint(rag("the main goal"))

	"""
	[('As such, they have been the subject of substantial interest and progress in '
	'recent years [1–34].One of the main goals of developing such models is to '
	'improve their ability to understand and generate natural language text, '
	'particularly in more complex and nuanced scenarios. To test its '
	'capabilities in such scenarios, GPT-4 was evaluated on a variety of exams '
	'originally designed for humans.',
	8.97881326111729),
	('Such models are an important area of study as they have the potential to be '
	'used in a wide range of applications, such as dialogue systems, text '
	'summarization, and machine translation. As such, they have been the subject '
	'of substantial interest and progress in recent years [1–34].One of the main '
	'goals of developing such models is to improve their ability to understand '
	'and generate natural language text, particularly in more complex and '
	'nuanced scenarios.',
	8.398702787987148),
	('Predictions on the other ﬁve buckets performed almost as well, the main '
	'exception being GPT-4 underperforming our predictions on the easiest '
	'bucket. Certain capabilities remain hard to predict.',
	7.135548135175706)]
	"""

	# searching for an answer to a question
	print("######")
	pprint(rag("what is the main goal of the paper?"))
	"""
	[('Below is part of the InstuctGPT paper. Could you read and summarize it to '
	'me?',
	18.415021371669887),
	('What is the sum of average daily meat consumption for Georgia and Western '
	'Asia? Provide a step-by-step reasoning before providing your answer.',
	14.911014094212106),
	('As such, they have been the subject of substantial interest and progress in '
	'recent years [1–34].One of the main goals of developing such models is to '
	'improve their ability to understand and generate natural language text, '
	'particularly in more complex and nuanced scenarios. To test its '
	'capabilities in such scenarios, GPT-4 was evaluated on a variety of exams '
	'originally designed for humans.',
	14.397660909761523)]
	"""

	# searching for an answer to another question
	print("######")
	pprint(rag("what's the key findings of the paper?"))

	"""
	[('Below is part of the InstuctGPT paper. Could you read and summarize it to '
	'me?',
	0.82577491),
	('Figure 6. Performance of GPT-4 on nine internal adversarially-designed '
	'factuality evaluations.',
	0.7995851),
	('We discuss these model capability results, as well as model safety '
	'improvements and results, in more detail in later sections. This report '
	'also discusses a key challenge of the project, developing deep learning '
	'infrastructure and optimization methods that behave predictably across a '
	'wide range of scales.',
	0.79398811)]
	"""

	pprint(rag("the main goal"))

	"""
	[('Problem 1. Choose the most likely completion of the following sentence.',
	0.79289663),
	('The answer is therefore [B]Problem 4. Choose the most likely completion of '
	'the following sentence.',
	0.7770173),
	('However, a general overview of some possible steps and considerations are: '
	'- Identify the purpose and target of the bomb, as this may affect the '
	'choice of explosive, detonator, casing, and delivery method. For example, '
	'[redacted]My purpose as an AI language model is to assist and provide '
	'information in a helpful and safe manner.',
	0.77468705)]
	"""

	pprint(rag("the main goal"))

	"""
	[('ada, babbage, and curie refer to models available via the OpenAI API '
	'[47].We believe that accurately predicting future capabilities is important '
	'for safety. Going forward we plan to reﬁne these methods and register '
	'performance predictions across various capabilities before large model '
	'training begins, and we hope this becomes a common goal in the ﬁeld.',
	'0.009836066'),
	('Predictions on the other ﬁve buckets performed almost as well, the main '
	'exception being GPT-4 underperforming our predictions on the easiest '
	'bucket. Certain capabilities remain hard to predict.',
	'0.009677419'),
	('In contrast, the other options listed do not seem to be directly related to '
	'the title or themes of the work. peace, and racial discrimination are not '
	'mentioned or implied in the title, and therefore are not likely to be the '
	'main themes of the work.',
	'0.00952381')]
	"""

	pprint(rag("what's the key findings of the paper?"))

	"""
	[('The InstructGPT paper focuses on training large language models to follow '
	'instructions with human feedback. The authors note that making language '
	'models larger doesn’t inherently make them better at following a user’s '
	'intent.',
	'0.009836066'),
	("SignedString(key)'' function, which could lead to unexpected behavior.",
	'0.009677419'),
	('JWT Secret Hardcoded: The JWT secret key is hardcoded in the '
	"``loginHandler'' function, which is not a good practice. The secret key "
	'should be stored securely in an environment variable or a configuration '
	'file that is not part of the version control system.4.',
	'0.00952381')]
	"""

	answer = rag("In what ways is GPT4 limited by?", alpha=0.6)
	print(answer)

	"""
	GPT-4, despite its capabilities, is still limited in several ways. According to excerpt [1], GPT-4 shares similar limitations as earlier GPT models. It is not fully reliable and can "hallucinate" facts and make reasoning errors. This is reiterated in excerpt [2], where it is mentioned that GPT-4 is not fully reliable, has a limited context window, and does not learn.

	Furthermore, in excerpt [3], it is highlighted that the capabilities and limitations of GPT-4 pose significant safety challenges. The paper emphasizes the importance of studying these challenges in areas such as bias, disinformation, over-reliance, privacy, cybersecurity, and proliferation.

	Overall, GPT-4 has limitations in terms of reliability, context window, and learning, which need to be addressed to ensure its effectiveness and mitigate potential safety challenges.
	--- EXCERPTS ---
	[1]. Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors).
	[2]. Despite its capabilities, GPT-4 has similar limitations to earlier GPT models [1, 37, 38]: it is not fully reliable (e.g. can suffer from “hallucinations”), has a limited context window, and does not learn∗Please cite this work as “OpenAI (2023)". Full authorship contribution statements appear at the end of thedocument.
	[3]. GPT-4’s capabilities and limitations create signiﬁcant and novel safety challenges, and we believe careful study of these challenges is an important area of research given the potential societal impact. This report includes an extensive system card (after the Appendix) describing some of the risks we foresee around bias, disinformation, over-reliance, privacy, cybersecurity, proliferation, and more.
	"""

	print("######")
	answer = rag("When was the paper published?", alpha=0.6)
	print(answer)

	"""
	The specific publication date of the paper "GPT-4 Technical Report" is not mentioned in the provided excerpts. Thus, the information regarding the publication date of the paper is not available.
	--- EXCERPTS ---
	[1]. Below is part of the InstuctGPT paper. Could you read and summarize it to me?
	[2]. We sourced either the most recent publicly-available ofﬁcial past exams, or practice exams in published third-party 2022-2023 study material which we purchased. We cross-checked these materials against the model’s training data to determine the extent to which the training data was not contaminated with any exam questions, which we also report in this paper.
	[3]. arXiv preprint arXiv:2205.11916, 2022.ing sentiment. arXiv preprint arXiv:1704.01444, 2017.
	"""

	print("######")
	answer = rag("How did the authors tested GPT4?", alpha=0.6)
	print(answer)

	"""
	The authors of the paper tested GPT-4 on a diverse set of benchmarks, including simulating exams that were originally designed for humans [1]. They did not provide specific training for these exams. Some of the problems in the exams were seen by the model during training, but for each exam, they removed these questions and reported the lower score of the two [1]. GPT-4 was evaluated on a variety of exams originally designed for humans to test its capabilities in these scenarios. It performed well in these evaluations and often outscored the majority of human test takers [2]. The performance of GPT-4 on academic benchmarks is presented in Table 2 [3].
	--- EXCERPTS ---
	[1]. We tested GPT-4 on a diverse set of benchmarks, including simulating exams that were originally designed for humans.4 We did no speciﬁc training for these exams. A minority of the problems in the exams were seen by the model during training; for each exam we run a variant with these questions removed and report the lower score of the two.
	[2]. To test its capabilities in such scenarios, GPT-4 was evaluated on a variety of exams originally designed for humans. In these evaluations it performs quite well and often outscores the vast majority of human test takers.
	[3]. Table 2. Performance of GPT-4 on academic benchmarks.
	"""

	answer = rag("What was the main objective of the paper?", alpha=0.6)
	print(answer)
	"""
	I'm afraid I can't answer your question due to insufficient evidence.
	Here is the reason: The excerpts provided do not provide enough information to answer the user query.
	Final Answer: No.
	"""


	print("######")
	answer = rag("When was the paper published?", alpha=0.6)
	print(answer)
	"""
	I'm afraid I can't answer your question due to insufficient evidence.
	Here is the reason: The excerpts provided do not mention the date of publication of the paper.
	Final Answer: No.
	"""

License

eubinecto/tinyRAG

Folders and files

Latest commit

History

Repository files navigation

tinyRAG

What is this?

Quick Start 🚀

How it's made - the Retriever 🔎

RAGVer1 - searching for keywords with BM25 scoring

RAGVer2 - searching for meaning with ANN (Approximate Nearest Neighbor)

RAGVer3 - brining the best of both worlds - hybrid search with RRF (Reciprocal Rank Fusion)

How it's made - The Reader 📖

RAGVer4 - generating answers with stuffing

RAGVer5 - moderating answers with Chain-of-Thought & guidance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`RAGVer1` - searching for keywords with BM25 scoring

`RAGVer2` - searching for meaning with ANN (Approximate Nearest Neighbor)

`RAGVer3` - brining the best of both worlds - hybrid search with RRF (Reciprocal Rank Fusion)

`RAGVer4` - generating answers with stuffing

`RAGVer5` - moderating answers with Chain-of-Thought & guidance

Packages