## Gemini pdf chatbot

- 사용 모델
    - text-embedding-004 (임베딩 모델)
        - 분당 요청 1500개
        - 가격 무료
    - gemini-pro (자연어 생성 모델). 항상 gemini 1.0 pro의 최신 안정화 버전을 사용함. 현재 gemini-1.0-pro-002를 가르키고 있음.
        - 총 토큰 한도: 32,760 토큰
        - 출력 토큰 한도: 8,192 토큰
        - 가격 무료

In [72]:
#import
import google.generativeai as genai
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import ChatGoogleGenerativeAI

In [None]:
genai.configure(api_key='your-api-key')
os.environ["GOOGLE_API_KEY"] = "your-api-key"

In [74]:
# PDF 파일 로드
loader = PyPDFDirectoryLoader("data/")
docs = loader.load()

In [75]:
# 문서 텍스트 분할
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len,
    separators=["\n\n", "\n", " ",""],
)

text_chunks = [] 
for doc in docs:
    chunks = text_splitter.split_text(doc.page_content)
    text_chunks.extend([Document(page_content=chunk) for chunk in chunks])

In [None]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

# Then create your vector store
vectorstore = FAISS.from_documents(
    documents=text_chunks,
    embedding=embeddings
)

In [77]:
# Retriever 정의
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 6, 'lambda_mult': 0.25}
    )

In [90]:
# 프롬프트 생성
prompt = PromptTemplate.from_template(
    """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Answer in Korean.

#Question: 
{question} 
#Context: 
{context} 

#Answer:"""
)

llm = ChatGoogleGenerativeAI(model="gemini-pro")

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [92]:
question = "기후 공시가 뭐야?"
response = chain.invoke(question)
print(response)

 기후 공시는 기업이 기후 변화가 사업에 미치는 위험과 기회를 공개하는 것입니다.
