<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Extraction" data-toc-modified-id="Extraction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Extraction</a></span></li><li><span><a href="#Loading" data-toc-modified-id="Loading-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Loading</a></span></li><li><span><a href="#Splitting" data-toc-modified-id="Splitting-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Splitting</a></span></li><li><span><a href="#Storing" data-toc-modified-id="Storing-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Storing</a></span></li><li><span><a href="#Retrieval" data-toc-modified-id="Retrieval-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Retrieval</a></span></li><li><span><a href="#Question-Answering" data-toc-modified-id="Question-Answering-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Question Answering</a></span></li></ul></div>

In [1]:
# !pip install langchain openai chromadb python-dotenv

In [2]:
# Set env var OPENAI_API_KEY or load from a .env file
import dotenv
dotenv.load_dotenv()

True

# Extraction

In [3]:
from extract_urls import crawl

In [4]:
base_url = "https://python.langchain.com/docs/get_started/introduction"

In [5]:
urls = crawl(base_url, max_urls=10)

https://python.langchain.com/docs/get_started/introduction
https://python.langchain.com/docs/get_started/installation
https://python.langchain.com/docs/community
https://python.langchain.com/docs/additional_resources/dependents
https://python.langchain.com/docs/additional_resources/tutorials
https://python.langchain.com/docs/use_cases
https://python.langchain.com/docs/guides
https://python.langchain.com/docs/guides/evaluation
https://python.langchain.com/docs/guides/evaluation/trajectory
https://python.langchain.com/docs/guides/evaluation/trajectory/custom


# Loading

In [6]:
from langchain.document_loaders import WebBaseLoader

In [7]:
loader = WebBaseLoader(urls)

In [8]:
data = loader.load()

In [9]:
print(data[6].page_content)






Guides | 🦜️🔗 Langchain





Skip to main content🦜️🔗 LangChainDocsUse casesIntegrationsAPILangSmithJS/TS DocsCTRLKGet startedIntroductionInstallationQuickstartModulesModel I/​ORetrievalChainsMemoryAgentsCallbacksModulesLangChain Expression LanguageGuidesAdaptersDebuggingDeploymentEvaluationFallbacksLangSmithRun LLMs locallyModel comparisonPrivacyPydantic compatibilitySafetyAdditional resourcesCommunity navigatorGuidesGuidesDesign guides for key parts of the development process🗃️ Adapters1 items📄️ DebuggingIf you're building with LLMs, at some point something will break, and you'll need to debug. A model call will fail, or the model output will be misformatted, or there will be some nested model calls and it won't be clear where along the way an incorrect output was created.🗃️ Deployment1 items🗃️ Evaluation4 items📄️ FallbacksWhen working with language models, you may often encounter issues from the underlying APIs, whether these be rate limiting or downtime. Therefore, as you go to 

# Splitting

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [11]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)

In [12]:
all_splits = text_splitter.split_documents(data)

In [13]:
len(all_splits)

56

In [14]:
print(all_splits[10].page_content)

in the loop🦜 Contribute to LangChainLangChain is the product of over 5,000+ contributions by 1,500+ contributors, and there is **still** so much to do together. Here are some ways to get involved:Open a pull request: We’d appreciate all forms of contributions–new features, infrastructure improvements, better documentation, bug fixes, etc. If you have an improvement or an idea, we’d love to work on it with you.Read our contributor guidelines: We ask contributors to follow a "fork and pull request" workflow, run a few local checks for formatting, linting, and testing before submitting, and follow certain documentation and testing conventions.First time contributor? Try one of these PRs with the “good first issue” tag.Become an expert: Our experts help the community by answering product questions in Discord. If that’s a role you’d like to play, we’d be so grateful! (And we have some special experts-only goodies/perks we can tell you more about). Send us an email to introduce yourself at


# Storing 

In [15]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

In [16]:
vectorstore = Chroma.from_documents(documents=all_splits, 
                                    embedding=OpenAIEmbeddings(), 
                                    persist_directory='langchain_embeddings')

In [17]:
vectorstore._collection.count()

56

# Retrieval

In [18]:
from langchain.chat_models import ChatOpenAI

In [19]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [20]:
from langchain.chains import RetrievalQA

In [21]:
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=vectorstore.as_retriever(),
                                       return_source_documents=True,)

# Question Answering

In [22]:
question = "what is langchain framework?"

In [24]:
result = qa_chain({"query": question})

In [26]:
print(result['result'])

LangChain is a framework for developing applications powered by language models. It provides components and off-the-shelf chains that make it easy to work with language models and build context-aware applications. The framework allows applications to connect language models to various sources of context, reason based on provided context, and construct sequences of calls. It also offers modules for model I/O, retrieval, chains, agents, memory, and callbacks. LangChain is part of a larger ecosystem of tools and has a community of developers and resources for support and learning.


In [27]:
result['source_documents']

[Document(page_content='are using the rest of the LangChain framework or notOff-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasksOff-the-shelf chains make it easy to get started. For complex applications, components make it easy to customize existing chains and build new ones.Get started\u200bHere’s how to install LangChain, set up your environment, and start building.We recommend following our Quickstart guide to familiarize yourself with the framework by building your first LangChain application.Note: These docs are for the LangChain Python package. For documentation on LangChain.js, the JS/TS version, head here.Modules\u200bLangChain provides standard, extendable interfaces and external integrations for the following modules, listed from least to most complex:Model I/O\u200bInterface with language modelsRetrieval\u200bInterface with application-specific dataChains\u200bConstruct sequences of callsAgents\u200bLet chains choose which t