In [None]:
!pip install unstructured

In [None]:
!pip install sentence-transformers

In [None]:
!pip install chromadb

In [3]:
from dotenv import load_dotenv
from langchain.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains.question_answering import load_qa_chain

load_dotenv()

documents = TextLoader("./AI.txt").load()

def split_docs(documents, chunk_size=1000, chunk_overlap=20):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

docs = split_docs(documents)

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = Chroma.from_documents(docs, embeddings)
llm = ChatOpenAI(
    temperature=0,
    model_name='gpt-4-turbo',
)
chain = load_qa_chain(llm, chain_type='stuff', verbose=True)

In [4]:
query = "AI란?"
matching_docs = db.similarity_search(query)
answer = chain.invoke({"input_documents": matching_docs, "question": query})
print(answer['output_text'])



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is a field of study in computer science that develops and studies intelligent machines. Such machines may be called AIs.

AI technology is widely used throughout industry, government, and science. Some high-profile applications are: advanced web search engines (e.g., Google Search), recommendation systems (used by YouTube, Amazon, and Netflix), understanding human speech (such as Google Assistant, Siri, and Alexa), self-driving cars (e.g., Waymo), generative and creative tools (ChatGPT and AI art), and superhuman play and analysis in strategy g

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m> Finished chain.[0m

[1m> Finished chain.[0m
AI(인공 지능)는 인간이나 동물의 지능과 대비되는 기계나 소프트웨어의 지능을 의미합니다. 컴퓨터 과학의 한 분야로, 지능적인 기계를 개발하고 연구하는 것을 목표로 합니다. 이러한 기계들은 AI라고 불릴 수 있습니다. AI 기술은 산업, 정부, 과학 전반에 걸쳐 널리 사용되며, 구글 검색, 유튜브, 아마존, 넷플릭스의 추천 시스템, 구글 어시스턴트, 시리, 알렉사와 같은 인간의 말을 이해하는 시스템, 웨이모와 같은 자율 주행 차, 창작 도구(예: ChatGPT, AI 아트), 체스와 바둑과 같은 전략 게임에서의 초인간적인 플레이 및 분석 등 고도의 응용 프로그램에서 사용됩니다.
