### New Agent Framework with Langchain Release v0.1.0

<p>
<img src="https://blog.langchain.dev/content/images/size/w1248/format/webp/2024/01/V0.1.0_Export--1-.png" 
      width="35%" height="auto"
      style="display: block; margin: 0 auto" />

[Update notes](https://blog.langchain.dev/langchain-v0-1-0/)

# Agents

In [26]:
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.document_loaders import WebBaseLoader, Docx2txtLoader, UnstructuredWordDocumentLoader, PyPDFLoader
from langchain_community.vectorstores import chroma
from langchain_openai import OpenAIEmbeddings

In [2]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["TAVILY_API_KEY"] = ""


### Tools are the mini-engine which make the Agent work

Load the Langchain built-in tool Tavily for live online search

In [3]:
online_search = TavilySearchResults()

In [7]:
online_search.invoke("Cuanto quedo el Boys contra el Cristal?")

[{'url': 'https://www.alaskacommons.com/sporting-cristal-vs-sport-boys-live-rimenses-win-3-0-in-liga-1-apertura-2024/123235/',
  'content': 'Sporting Cristal vs Sport Boys LIVE: ‘Rimenses’ win 3-0 in Liga 1 Apertura 2024 February 4, 2024 // News Team  duel being between Sporting Cristal and Sport Boys. Both teams have a rich history and aim to be protagonists in League  in defense for Cristal.  Sporting Cristal is leading 3-0 against Sport Boys at the end of the first half. Cazonatti scored a goal for SportingFebruary 4, 2024 // News Team Sporting Cristal is leading 3-0 against Sport Boys at the end of the first half. Cazonatti scored a goal for Sporting Cristal, establishing their lead in Callao. Quispe almost made a mistake by failing to catch a cross from Ignácio. Sport Boys are struggling to put together passes and create offensive plays.'},
 {'url': 'https://www.vsstats.com/football/2024-02-04/sport-boys-vs-sporting-cristal',
  'content': 'Sport Boys VS Sporting Cristal Team Stats

### Setup the Retrieval part of the tool

Recall that retrievers need:
1. Source text
2. Document loader
3. Text splitter
4. Embedding model
5. Vector store
6. Actual retriever

In [14]:
loader = UnstructuredWordDocumentLoader("sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx", mode="elements")
docs = loader.load()

In [16]:
print(len(docs))
docs

216


[Document(page_content='Source Data Mapping Approach to CDMV5.0.1', metadata={'source': 'sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx', 'category_depth': 0, 'last_modified': '2024-02-04T20:19:13', 'page_number': 1, 'languages': ['eng'], 'file_directory': 'sample_docs', 'filename': 'WebMD_PBM_ETL_5.0.1_20170606.docx', 'filetype': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'category': 'Title'}),
 Document(page_content='', metadata={'source': 'sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx', 'languages': ['eng'], 'file_directory': 'sample_docs', 'filename': 'WebMD_PBM_ETL_5.0.1_20170606.docx', 'filetype': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'category': 'PageBreak'}),
 Document(page_content='Table name: stem_table', metadata={'source': 'sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx', 'category_depth': 0, 'last_modified': '2024-02-04T20:19:13', 'page_number': 2, 'languages': ['eng'], 'file_directory': 'sample_docs', 'fil

In [25]:
sum([len(docs[i].dict()["page_content"]) for i in range(len(docs))])

47031

In [27]:
pdf_loader = PyPDFLoader("sample_docs/WebMD_PBM_ETL_5.0.1_20170606.pdf")
pdf_pages = loader.load_and_split()

In [29]:
print(len(pdf_pages))
pdf_pages

187


[Document(page_content='Source Data Mapping Approach to CDMV5.0.1', metadata={'source': 'sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx', 'category_depth': 0, 'last_modified': '2024-02-04T20:19:13', 'page_number': 1, 'languages': ['eng'], 'file_directory': 'sample_docs', 'filename': 'WebMD_PBM_ETL_5.0.1_20170606.docx', 'filetype': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'category': 'Title'}),
 Document(page_content='Table name: stem_table', metadata={'source': 'sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx', 'category_depth': 0, 'last_modified': '2024-02-04T20:19:13', 'page_number': 2, 'languages': ['eng'], 'file_directory': 'sample_docs', 'filename': 'WebMD_PBM_ETL_5.0.1_20170606.docx', 'filetype': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'category': 'Title'}),
 Document(page_content='Reading from sample_medical_claims_20170502.csv', metadata={'source': 'sample_docs/WebMD_PBM_ETL_5.0.1_20170606.docx', 'category_depth'

In [32]:
pdf_pages[-1].dict()["page_content"]

