<a href="https://colab.research.google.com/github/rypotter/depo/blob/master/LangchainQuickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Langchain Quickstart**

https://python.langchain.com/docs/use_cases/question_answering/quickstart

## **typical RAG application**

**Indexing:** a pipeline for ingesting data from a source and indexing it. This usually happens offline.

*Load:* First we need to load our data. We’ll use DocumentLoaders for this.

*Split:* Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won’t fit in a model’s finite context window.

*Store:* We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.

**Retrieval and generation:** the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

*Retrieve:* Given a user input, relevant splits are retrieved from storage using a Retriever.

*Generate:* A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data



In [7]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass() # 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

# import dotenv

# dotenv.load_dotenv()

··········


In [None]:
# Tkp. csak ezek kellenének
%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai chromadb bs4

In [None]:
!pip install langchain langchain_community

In [2]:
#!pip install Iterator AsyncIterator
#!python -m pip install Iterator
#!pip search iterator

In [None]:
#!pip install langchain_openai
!pip install langchain_openai --upgrade

In [None]:
!pip install typing_extensions --upgrade

In [5]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [None]:
!pip install chromadb

In [None]:
!pip install langchainhub

In [12]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)



In [13]:
rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done through prompting techniques like Chain of Thought (CoT) or Tree of Thoughts, which guide the model to think step by step and explore multiple reasoning possibilities. Task decomposition can also involve task-specific instructions or human inputs.'

In [None]:
# cleanup
vectorstore.delete_collection()