## Webpage Loaders
- Load the webpage and extract the data using the `WebBaseLoader` and `BeautifulSoup` libraries.
- Use LLM to extract meaningful data from the webpage.

### Project 1: Share Market Data Analysis Based on Global Cues
- We will extract the data from the stock market website and analyze the data to understand the impact of global cues on the Indian share market.

#### Stock Market Data Extraction

In [23]:
from dotenv import load_dotenv

load_dotenv('../env')

True

In [1]:
from langchain_community.document_loaders import WebBaseLoader

urls = ['https://economictimes.indiatimes.com/markets/stocks/news',
        'https://www.livemint.com/latest-news',
        'https://www.livemint.com/latest-news/page-2'
        'https://www.livemint.com/latest-news/page-3',
        'https://www.moneycontrol.com/']

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
loader = WebBaseLoader(web_paths=urls)

In [3]:
docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

Fetching pages: 100%|##########| 4/4 [00:01<00:00,  3.62it/s]


In [4]:
def format_docs(docs):
    return "\n\n".join([x.page_content for x in docs])

In [5]:
context = format_docs(docs)

In [6]:
# print(context)
# context

import re

def text_clean(text):
    text = re.sub(r'\n\n+', '\n\n', text)
    text = re.sub(r'\t+', '\t', text)
    text = re.sub(r'\s+', ' ', text)
    return text

In [7]:
context = text_clean(context)

In [8]:
print(context)



#### Stock Market Data Processing with LLM

In [9]:
from scripts import llm

In [10]:
# response = llm.ask_llm(context, "What is todays news?")
# this will take any time between 7 to 8 mins
response = llm.ask_llm(context, "Extract stock market news from the given text.")


In [11]:
print(response)

Here are the extracted stock market news:

1. SEBI’s market call, IT slowdown, Starlink’s pricing puzzle, and more | Moneycontrol Editor's Picks
2. Flipkart’s Super.Money reshapes UPI race, Narayana Murthy criticizes shallow AI & Starlink’s space sector impact | MC Tech3
3. Will softer-than-expected US and domestic CPI bring cheer to Nifty, Sensex? | Market Minutes

These news articles appear to be related to the Indian stock market, specifically discussing trends and developments in the IT sector, e-commerce, and inflation.


In [12]:
response = llm.ask_llm(context[:10_000], "Extract stock market news from the given text.")

In [13]:
print(response)

Here are some of the stock market news mentioned in the text:

1. **IndiGo (InterGlobe Aviation) is a top pick for 2025**: Market expert Hemang Jani is bullish on IndiGo, citing infrastructure growth, stable crude prices, and operational efficiency as key drivers.

2. **JaiPrakash Associates and Gensol Engineering are stocks to sell**: These two smallcap stocks saw massive declines of nearly 50% or more during market corrections, making them vulnerable stocks for investors to consider selling.

3. **Market corrections offer opportunities for quality assets**: Seasoned investor Anshul Saigal views recent market corrections as valuable opportunities to buy quality assets at reasonable valuations during downturns.

4. **SpiceJet is a stock with growth potential**: Market expert Hemang Jani is optimistic on SpiceJet, citing infrastructure growth, stable crude prices, and operational efficiency as key drivers.

5. **India stocks are expected to outperform US peers**: Asia hedge funds perfor

In [14]:
def chunk_text(text, chunk_size, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

In [15]:
chunks = chunk_text(context, 10_000)

In [16]:
question = "Extract stock market news from the given text."

chunk_summary = []
for chunk in chunks:
    response = llm.ask_llm(chunk, question)
    chunk_summary.append(response)

In [17]:
for chunk in chunk_summary:
    print(chunk)
    print("\n\n")
    break

Here are the extracted stock market news:

1. Asia shares rose on Friday, driven by the likely aversion of a U.S. government shutdown which boosted market sentiment.
2. Gold hit a record high as trade tensions escalated, prompting investors to seek safe-haven assets.
3. U.S. stock futures surged in response to the positive news from Congress.
4. India’s RAC industry is witnessing strong demand in FY25 despite compressor shortages.
5. Manufacturers have managed supply issues through alternative sourcing and inventory management.
6. The Indian AC market is expected to grow at a 19% CAGR, with industry players seeking BIS certification extensions for key components.
7. Asia hedge funds performed better than U.S. counterparts during the March market selloff due to the strong performance of Chinese stocks.
8. Global and U.S. hedge funds faced significant losses, while Asia funds experienced smaller declines.
9. India’s stock market is cautious despite slide, with Bernstein predicting no ups

In [18]:
summary = "\n\n".join(chunk_summary)

In [19]:
# print(summary)

In [20]:
# question = "Write a detailed report in Markdown from the given context."
# question = """Write a detailed market news report in markdown format. Think carefully then write the report."""
# response = llm.ask_llm(summary, question)

In [21]:
import os

os.makedirs("data", exist_ok=True)

with open("data/report.md", "w") as f:
    f.write(response)

In [22]:
with open("data/summary.md", "w") as f:
    f.write(summary)