# 检索增强生成的力量：RAG 与微调 LLM 的比较
在此，我们将探讨如何在不对此类数据进行微调的情况下，利用 RAG 的力量，使 LLM 准确回答 OpenAI 的最新消息

### 导入依赖

In [1]:
from langchain.docstore.document import Document
from langchain.document_loaders import HuggingFaceDatasetLoader

from encoder.encoder import Encoder
from generator.generator import Generator
from retriever.vector_db import VectorDatabase

### 定义全局变量

In [2]:
TEMPLATE = """
Use the following pieces of context to answer the question at the end taking in consideration the dates. 
{context}
Question: {question}
Answer:
"""

QUERY = "What happened to the CEO of OpenAI?"

### 加载数据集并进行预处理

In [3]:
# Get some open ai news to add to the final dataset
openai_news = [
    "2023-11-22 - Sam Altman returns to OpenAl as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.",
    "2023-11-21 - Ilya and the board's decision to fire Sam from OpenAI caught everyone off guard, with no prior information shared.",
    "2023-11-21 - In a swift response, Sam was welcomed into Microsoft by Satya Nadella himself.",
    "2023-11-21 - Meanwhile, a staggering 500+ OpenAI employees made a bold move, confronting the board with a letter: either step down or they will defect to Sam's new team at Microsoft.",
    "2023-11-21 - In a jaw-dropping twist, Ilya, integral to Sam's firing, also put his name on that very letter. Talk about an unexpected turn of events!",
    "2023-11-20 - BREAKING: Sam Altman and Greg Brockman Join Microsoft, Emmett Shear Appointed CEO of OpenAI",
    "2023-11-20 - Microsoft CEO Satya Nadella announced a major shift in their partnership with OpenAI. Sam Altman and Greg Brockman, key figures at OpenAI, are now joining Microsoft to lead a new AI research team. This move marks a significant collaboration and potential for AI advancements. Additionally, Emmett Shear, former CEO of Twitch, has been appointed as the new CEO of OpenAI, signaling a new chapter in AI leadership and innovation.",
    "2023-11-20 - Leadership Shakeup at OpenAI - Sam Altman Steps Down!",
    "2023-11-20 - Just a few days after presenting at OpenAI's DevDay, CEO Sam Altman has unexpectedly departed from the company, and Mira Murati, CTO of the company, steps in as Interim CEO. This is a huge surprise and speaks volumes about the dynamic shifts in tech leadership today.",
    """2023-11-20 - What's Happening at OpenAI?
    - Sam Altman, the face of OpenAI, is leaving not just the CEO role but also the board of directors.
    - Mira Murati, an integral part of OpenAI's journey and a tech visionary, is taking the helm as interim CEO.
    - The board is now on a quest to find a permanent successor.""",
    "2023-11-20 - The transition raises questions about the future direction of OpenAI, especially after the board's statement about losing confidence in Altman's leadership.",
    """2023-11-20 - With a board consisting of AI and tech experts like Ilya Sutskever, Adam D’Angelo, Tasha McCauley, and Helen Toner, OpenAI is poised to continue its mission. Can they do it without Sam?
    - Greg Brockman, stepping down as chairman, will still play a crucial role, reporting to the new CEO."""
]

In [4]:
# load dataset with some news
loader = HuggingFaceDatasetLoader("cnn_dailymail", "highlights", name='3.0.0')
docs = loader.load()[:10000] # get a sample of news

# add openai news to our list of docs
docs.extend([
    Document(page_content=x) for x in openai_news
])

  from .autonotebook import tqdm as notebook_tqdm


### 启动 RAG 模块

In [5]:
# initiate our classes for the Encoder, Retriever and Generator
encoder = Encoder()
faiss_db = VectorDatabase()
generator = Generator(TEMPLATE)

                encoding was transferred to model_kwargs.
                Please confirm that encoding is what you intended.
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /Users/minp/AI_LLM/large-language-models/rag/../model/nous-hermes-llama-2-7b.Q4_0.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    6:      

### 在向量数据库中创建和存储段落

In [7]:
# Create passages and store them in a vector DB
passages = faiss_db.create_passages_from_documents(docs)
faiss_db.store_passages_db(passages, encoder.encoder)

### 检索最相似的文件

In [8]:
# retrive most similar document to our query
context = faiss_db.retrieve_most_similar_document(QUERY, k=8)
print(context)

2023-11-20 - What's Happening at OpenAI?
    - Sam Altman, the face of OpenAI, is leaving not just the CEO role but also the board of directors.
    - Mira Murati, an integral part of OpenAI's journey and a tech visionary, is taking the helm as interim CEO.
    - The board is now on a quest to find a permanent successor.
2023-11-20 - Microsoft CEO Satya Nadella announced a major shift in their partnership with OpenAI. Sam Altman and Greg Brockman, key figures at OpenAI, are now joining Microsoft to lead a new AI research team. This move marks a significant collaboration and potential for AI advancements. Additionally, Emmett Shear, former CEO of Twitch, has been appointed as the new CEO of OpenAI, signaling a new chapter in AI leadership and innovation.
2023-11-20 - With a board consisting of AI and tech experts like Ilya Sutskever, Adam D’Angelo, Tasha McCauley, and Helen Toner, OpenAI is poised to continue its mission. Can they do it without Sam?
    - Greg Brockman, stepping down as

In [9]:
# RAG LLama
print(generator.get_answer(context, QUERY))

Sam Altman first stepped down from his role at OpenAI in November 2021 due to personal reasons. In April 2023, he returned to the company as its new CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.



llama_print_timings:        load time =    5149.68 ms
llama_print_timings:      sample time =       5.55 ms /    66 runs   (    0.08 ms per token, 11891.89 tokens per second)
llama_print_timings: prompt eval time =   30928.97 ms /   600 tokens (   51.55 ms per token,    19.40 tokens per second)
llama_print_timings:        eval time =    4039.96 ms /    66 runs   (   61.21 ms per token,    16.34 tokens per second)
llama_print_timings:       total time =   35149.65 ms


In [10]:
# Base LLama
print(generator.get_answer('', QUERY))

Llama.generate: prefix-match hit


The CEO of OpenAI is Sam Altman, and as per our knowledge, there has been no recent news or reports about any changes or departures in his role within the company or any plans for him to leave the position. Therefore, it's hard to determine what exactly happened without any additional information provided by a reliable source. Could you please provide us with more details so that we can assist you better?



llama_print_timings:        load time =    5149.68 ms
llama_print_timings:      sample time =       7.44 ms /    85 runs   (    0.09 ms per token, 11423.20 tokens per second)
llama_print_timings: prompt eval time =     755.74 ms /    16 tokens (   47.23 ms per token,    21.17 tokens per second)
llama_print_timings:        eval time =    4423.77 ms /    85 runs   (   52.04 ms per token,    19.21 tokens per second)
llama_print_timings:       total time =    5306.89 ms
