# Hello World! with LangChain

### Install required dependencies

In [1]:
%%capture
%pip install langchain langchain-community langchain-ollama langchain-chroma pypdf

### Download documents

In [2]:
from platzi_langchain.extraction import fetch_and_load_papers

urls = [
    'https://arxiv.org/pdf/2306.06031v1.pdf',
    'https://arxiv.org/pdf/2306.12156v1.pdf',
    'https://arxiv.org/pdf/2306.14289v1.pdf',
    'https://arxiv.org/pdf/2305.10973v1.pdf',
    'https://arxiv.org/pdf/2306.13643v1.pdf'
]

ml_papers = fetch_and_load_papers(urls)

Downloading paper1.pdf in ../data/external/paper1.pdf
Downloading paper2.pdf in ../data/external/paper2.pdf
Downloading paper3.pdf in ../data/external/paper3.pdf
Downloading paper4.pdf in ../data/external/paper4.pdf
Downloading paper5.pdf in ../data/external/paper5.pdf


### Validate **ml_papers** content

In [3]:
type(ml_papers), len(ml_papers), ml_papers[3]

(list,
 57,
 Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-06-12T00:32:18+00:00', 'author': '', 'keywords': '', 'moddate': '2023-06-12T00:32:18+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'templateversion': 'IJCAI.2023.0', 'title': '', 'trapped': '/False', 'source': '../data/external/paper1.pdf', 'total_pages': 7, 'page': 3, 'page_label': '4'}, page_content='Figure 1: FinGPT Framework.\n4.1 Data Sources\nThe first stage of the FinGPT pipeline involves the collec-\ntion of extensive financial data from a wide array of online\nsources. These include, but are not limited to:\n• Financial news: Websites such as Reuters, CNBC, Yahoo\nFinance, among others, are rich sources of financial news\nand market updates. These sites provide valuable informa-\ntion on market trends, company earnings, macroeconomic\nindicators, and other financial events.\n•

### Split documents

In [4]:
from platzi_langchain.transformation import split_documents

documents = split_documents(ml_papers)

In [5]:
len(documents), documents[10]

(211,
 Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-06-12T00:32:18+00:00', 'author': '', 'keywords': '', 'moddate': '2023-06-12T00:32:18+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'templateversion': 'IJCAI.2023.0', 'title': '', 'trapped': '/False', 'source': '../data/external/paper1.pdf', 'total_pages': 7, 'page': 2, 'page_label': '3'}, page_content='highly volatile, changing rapidly in response to news events\nor market movements.\nTrends, often observable through websites like Seeking\nAlpha, Google Trends, and other finance-oriented blogs and\nforums, offer critical insights into market movements and in-\nvestment strategies. They feature:\n• Analyst perspectives: These platforms provide access to\nmarket predictions and investment advice from seasoned\nfinancial analysts and experts.\n• Market sentiment: The discourse on these platform

### Create embeddings and data loadin in vectorial database

In [None]:
from platzi_langchain.load import get_vectordb_retriever

retriever = get_vectordb_retriever(documents)

Processing 211 document chunks...
Loading existing vector store...
Vector store operation took 0.09 seconds


### Chat models and QA chains

In [None]:
from platzi_langchain.models import get_chat_model

qa_chain = get_chat_model(retriever)

In [8]:
qa_chain.invoke("what is FinGPT?")["result"]

'FinGPT doesn\'t seem to be a widely recognized term in the context provided. However, based on the information given, it appears to be related to "Financial Graphical Processing Tool" or possibly "Financial Graph Processing Tool".\n\nGiven the mention of "hands-on tutorials and demo applications for financial tasks", I would guess that FinGPT is an application or tool designed specifically for working with financial data, perhaps in a graphical interface. It seems to offer practical demonstrations and guides for users to explore and learn about various aspects of finance.\n\nIf you could provide more context or information about what FinGPT actually does, I may be able to provide a more accurate answer.'

In [9]:
qa_chain.invoke("what makes training a model like fingpt difficult?")["result"]

"According to the text, training a model like FinGPT is considered difficult for several reasons:\n\n1. **High computational cost**: Training models like BloombergGPT requires approximately 1.3 million GPU hours of computation, which translates to a staggering cost of around $3 million per training session.\n2. **Expensive and lengthy retraining from scratch**: Re-training a model from scratch is expensive and time-consuming, making it impractical for large-scale applications.\n3. **Lack of open-source alternatives**: Unlike models like BloombergGPT, which are based on top open-source Large Language Models (LLMs), there aren't many alternative options available for training similar models.\n\nThese challenges make it difficult to train a model like FinGPT, but the text suggests that the benefits of using an accessible and cost-effective solution outweigh these drawbacks."

In [10]:
qa_chain.invoke("what is fast segment?")["result"]

"FastSegment (also known as FastSAM) is an object segmentation algorithm proposed by researchers. It's a type of deep learning-based approach that uses a combination of point-prompt, box-prompt, and everything modes to generate high-quality segmentation masks for objects in images.\n\nHere's a brief overview:\n\n**Key components:**\n\n1. **Point-prompt**: A point-prompt is a small region around an object that is used as a prompt to generate the segmentation mask.\n2. **Box-prompt**: A box-prompt is a larger region around an object that is used to refine the segmentation mask.\n3. **Everything mode**: Everything mode generates a complete segmentation mask for an entire image, including background and foreground regions.\n\n**How it works:**\n\n1. The algorithm starts by identifying objects in the image using a pre-trained model (e.g., YOLOv8).\n2. For each object, FastSegment uses point-prompt to generate a small region around the object that is used as a prompt.\n3. The box-prompt is t

In [11]:
qa_chain.invoke("what is the difference between FastSAM and MobileSAM?")["result"]

'According to the text, the main differences between FastSAM and MobileSAM are:\n\n1. **Speed**: MobileSAM is faster than FastSAM, with a processing time of 10ms compared to 40ms for FastSAM.\n2. **Size**: MobileSAM is smaller in terms of parameters (less 10M vs. 68M) and has fewer parameters overall.\n3. **Inference speed**: On a single GPU, MobileSAM takes 10ms to process an image, while FastSAM takes 40ms.\n\nAdditionally, the text mentions that:\n\n* FastSAM often fails to predict objects when the foreground prompt point is set too close to the background prompt point, which is not a problem for MobileSAM.\n* The mask proposal generated by FastSAM is sometimes difficult to interpret, whereas the mask proposal generated by MobileSAM is more accurate and easier to understand.\n\nOverall, it seems that MobileSAM is designed to be faster and more efficient than FastSAM, while still achieving good performance in terms of object detection and segmentation.'