### 4. WebBaseLoader
* It is a document loader in LangChain used to load and extract text content from web pages (URLs).
* It uses BeautifulSoup under the hood to parse HTML and extract visible text.

### When to Use:
* For blogs, news articles, or public websites where the content is primarily text-based and static.

In [3]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_groq import ChatGroq
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv

load_dotenv()

True

In [4]:
model = ChatGroq(model='llama-3.1-8b-instant')

In [5]:
prompt = PromptTemplate(
    template='Answer the following question \n {question} from the following text - \n {text}',
    input_variables=['question','text']
)

In [6]:
parser = StrOutputParser()

In [7]:
url = 'https://www.flipkart.com/apple-macbook-air-m2-16-gb-256-gb-ssd-macos-sequoia-mc7x4hn-a/p/itmdc5308fa78421'
loader = WebBaseLoader(url)

In [9]:
docs = loader.load()

In [10]:
chain = prompt | model | parser

In [12]:
print(chain.invoke({'question':'What is the product that we are talking about?', 'text':docs[0].page_content}))

The product that we are talking about is the Apple MacBook AIR Apple M2 - (8 GB/256 GB SSD/Mac OS Monterey) MLY33HN/A.


### Limitations:
* Doesn’t handle JavaScript-heavy pages well (use SeleniumURLLoader for that).
* Loads only static content (what's in the HTML, not what loads after the page renders)