### 1. bs이용하여 웹 페이지 내용 얻기

In [55]:
from langchain_community.document_loaders import WebBaseLoader
import nest_asyncio
nest_asyncio.apply()
from dotenv import load_dotenv
load_dotenv()

loader = WebBaseLoader(["https://lilianweng.github.io/posts/2023-06-23-agent/","https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/","https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/"])
loader.requests_per_second = 1
docs = loader.aload()
texts = []
for data in docs:
    texts.append(data.page_content)

Fetching pages: 100%|##########| 3/3 [00:00<00:00,  5.89it/s]


### 2. splitter 이용하여 웹 페이지 내용 분할

In [56]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=50,
    length_function=len,
    is_separator_regex=False,
)
texts = text_splitter.create_documents(texts)
print(texts[0])
print(texts[1])

page_content="LLM Powered Autonomous Agents | Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil'Log"
page_content="Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch"


### 3. vector 스토어에 임베딩하기

In [57]:
import os
from langchain_openai import OpenAIEmbeddings
with open("../api_key.txt", 'r') as file:
    # 파일 전체 내용을 읽어오기
    api_key = file.read()

embeddings_model = OpenAIEmbeddings(openai_api_key=api_key)

from langchain_chroma import Chroma
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)

# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# load it into Chroma
db = Chroma.from_documents(texts, embedding_function)

# query it
query = "agent memory"
docs = db.similarity_search(query)

# print results
print(docs[0].page_content)



Long-term memory: This provides the agent with the capability to retain and recall (infinite)


### 4. 유사도 구하기

In [76]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate

retriever = db.as_retriever()
model = ChatOpenAI(model="gpt-3.5-turbo-0125",openai_api_key=api_key)
template = """
        You are an AI system that evaluates the relevance between a user query and a retrieved chunk of text. Your task is to determine if the retrieved chunk is relevant to the user's query based on the following criteria:

        1. **Keyword Matching**: Check if the key terms in the user query appear in the retrieved chunk.
        2. **Semantic Similarity**: Assess whether the meaning of the retrieved chunk aligns with the intent of the user query, even if exact keywords are not present.
        3. **Context Appropriateness**: Ensure the context of the retrieved chunk makes sense in relation to the user's query.

        Provide a relevance score between 0 and 1, where 0 means completely irrelevant and 1 means highly relevant. Additionally, provide a brief explanation for your score.
        If the Relevance Score is more than 0.5, [relevance: yes] is output, and if it is less than 0.5, [relevance: no] is output.
        
        ### Example:

        **User Query**: "What are the health benefits of green tea?"

        **Retrieved Chunk**: "Green tea is known for its high antioxidant content, which can help reduce inflammation and improve heart health."

        **Relevance Score**: 1.0

        **Explanation**: The retrieved chunk directly addresses the health benefits of green tea by mentioning its antioxidant content and positive effects on inflammation and heart health.

        Now, evaluate the following pairs:

        **User Query**: {user_query}

        **Retrieved Chunk**: {context}

        **Relevance Score**:

        **Explanation**:
    """
prompt = ChatPromptTemplate.from_template(template)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "user_query": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

# user_query = "agent memory"
user_query = "I like an apple"

result = rag_chain.invoke(user_query)

print(result)

**Relevance Score**: 0.7

**Explanation**: The retrieved chunk discusses long-term memory, which is related to memory, one of the key terms in the user query. While the chunk doesn't directly mention "agent memory," it does cover the concept of memory retention and recall, which aligns with the user query's topic. However, the repetition of the same sentence multiple times may reduce the overall relevance score. 

[relevance: yes]
