<a href="https://colab.research.google.com/github/kunalom/stock_analysis_RAG/blob/main/stock_predictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!python --version

y
Python 3.11.13


# Installing dependencies

In [None]:
pip install langchain langchain_huggingface langchain_community faiss-cpu pypdf beautifulsoup4 requests

y
Collecting langchain_huggingface
  Downloading langchain_huggingface-0.2.0-py3-none-any.whl.metadata (941 bytes)
Collecting langchain_community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting pypdf
  Downloading pypdf-5.6.0-py3-none-any.whl.metadata (7.2 kB)
Collecting streamlit
  Downloading streamlit-1.45.1-py3-none-any.whl.metadata (8.9 kB)
Collecting pyngrok
  Downloading pyngrok-7.2.11-py3-none-any.whl.metadata (9.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting wat

In [None]:
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "your_hugging face_api"

## just checking model is working or not

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv

load_dotenv()

llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Llama-3.1-8B-Instruct",
    task="text-generation"
)

model = ChatHuggingFace(llm=llm)

result = model.invoke("What is the capital of India?")

print(result.content)

The capital of India is New Delhi.


In [None]:
import requests
from bs4 import BeautifulSoup
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFaceEndpoint
from langchain.chat_models import ChatHuggingFace
from langchain.schema import Document

## Function to scrape Screener.in

In [None]:
def scrape_screener_data(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    for tag in soup(["script", "style", "noscript"]):
        tag.decompose()

    text = soup.get_text(separator="\n", strip=True)
    return text

## Function to create augmented prompt

In [None]:
def create_augmented_prompt(stock_data, question):
    return f"""You are a stock market expert trained on high-quality investment material.

Given the following stock information scraped from Screener.in and the investment strategies you've learned from the PDFs:

---
STOCK DATA:
{stock_data}

QUESTION:
{question}
"""

## Function to load and index PDFs

In [None]:
def load_and_index_pdfs(pdf_paths):
    documents = []
    for path in pdf_paths:
        loader = PyPDFLoader(path)
        docs = loader.load()
        documents.extend(docs)

    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = splitter.split_documents(documents)

    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vectorstore = FAISS.from_documents(chunks, embeddings)
    return vectorstore

# Function to create retriever

In [None]:
def create_retriever(vectorstore):
    return vectorstore.as_retriever()


## Function to run RAG analysis

In [None]:
def run_rag_analysis(vectorstore, stock_url, model):
    stock_text = scrape_screener_data(stock_url)
    retriever = create_retriever(vectorstore)

    prompt = create_augmented_prompt(
        stock_data=stock_text,
        question="Should I invest in this stock? What are the pros and cons based on your knowledge?"
    )

    qa_chain = RetrievalQA.from_chain_type(llm=model, retriever=retriever, chain_type="stuff")
    answer = qa_chain.invoke(prompt)
    return answer

In [None]:
stock_url = "https://www.screener.in/company/SBILIFE/"

## Load PDFs and run RAG

In [None]:
pdfs = ["/content/Module 1_Introduction to Stock Markets.pdf",
         "/content/Module 3_Fundamental Analysis.pdf"
            ]
vectorstore = load_and_index_pdfs(pdfs)
result = run_rag_analysis(vectorstore, stock_url, model)

#Result

In [None]:
print(result['result'])

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Based on the provided financial data and analysis tools, I can offer some insights on the pros and cons of investing in SBI Life Insurance Company Ltd. Please note that this is not personalized investment advice, and it's essential to do your own research and consider your financial goals and risk tolerance before making any investment decisions.

**Pros:**

1. **Debt-free company**: SBI Life Insurance Company Ltd has minimal debt, which reduces its financial risk and allows it to allocate more resources to investing and growing its business.
2. **Strong brand and presence**: As a joint venture between State Bank of India and BNP Paribas Cardif S.A., SBI Life Insurance has a well-established brand and wide reach, which can help attract customers and increase sales.
3. **Diversified product portfolio**: The company offers a range of life insurance products, including individual and group plans, which can help spread risk and provide opportunities for growth.
4. **Growing premium income*