In [3]:
import os
import pickle
import time
import langchain
from langchain.document_loaders import UnstructuredURLLoader

In [4]:
loaders = UnstructuredURLLoader(urls=[
    "https://www.techtarget.com/whatis/definition/Amazon",
    "https://en.wikipedia.org/wiki/Flipkart"
])

data = loaders.load()
len(data)

2

#### To create Chunks

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=300
)

# As data is of type documents we can directly use split_documents over split_text in order to get the chunks.
docs = text_splitter.split_documents(data)

In [12]:
len(docs)

135

In [13]:
docs[0]

Document(metadata={'source': 'https://www.techtarget.com/whatis/definition/Amazon'}, page_content="WhatIs\n\nBrowse Definitions :\n\nA\n\nB\n\nC\n\nD\n\nE\n\nF\n\nG\n\nH\n\nI\n\nJ\n\nK\n\nL\n\nM\n\nN\n\nO\n\nP\n\nQ\n\nR\n\nS\n\nT\n\nU\n\nV\n\nW\n\nX\n\nY\n\nZ\n\n#\n\nHome\n\nPersonal computing\n\nDefinition\n\nWhat is Amazon? Definition and company history of Amazon.com\n\nShare this item with your network:\n\nBy\n\nRahul Awati\n\nKinza Yasar, Technical Writer\n\nPublished: Apr 23, 2025\n\nAmazon (Amazon.com) is the world's largest online retailer and one of the largest providers of cloud services. As of 2025, it is considered a giant in both e-commerce and cloud computing.\n\nHeadquartered in Seattle, Amazon has individual websites, software development centers, customer service centers, data centers and fulfillment centers around the world. The company was founded by Jeff Bezos in 1994; he remained its CEO and president until 2021. As of 2025, he remains Amazon's executive chair and 

In [14]:
docs[5]

Document(metadata={'source': 'https://www.techtarget.com/whatis/definition/Amazon'}, page_content='The early 2000s\n\nIn October 2000, Amazon launched its first overseas operations in Japan. The Japanese Amazon marketplace remains accessible at www.amazon.co.jp.\n\nIn the last quarter of 2001, Amazon turned a profit for the first time in its history. Then, 2003 became its first profitable year.\n\nThroughout the 2000s, Amazon expanded its original e-commerce bookstore to include hundreds of other products in many other categories, such as software, personal care, music goods, gourmet foods, sporting goods, and photography items like cameras. The expansion of its product range continued over the next few years, with the addition of newer categories like jewelry, baby products and vehicles.')

In [15]:
docs[83]

Document(metadata={'source': 'https://en.wikipedia.org/wiki/Flipkart'}, page_content="In November 2021, Flipkart Health acquired Sastasundar Healthbuddy for undisclosed sum.[205][206]\n\nFunding\n\n[edit]\n\nThey received ₹33.4 crore funding from Japan-based pharmaceutical company Rohto Pharmaceutical.[207] In Aug 2019, Mitsubishi Corporation invested ₹100 crore in the company.[208]\n\nCriticism\n\n[edit]\n\nOn 13 September 2014, a Flipkart deliveryman allegedly molested a housemaid in Hyderabad.[209] The housemaid's employer sued Flipkart for this incident, citing the need for regulations to make offline delivery services safer.[210]\n\nIn 2014, competitors such as Future Group (owner of retail chain Big Bazaar at that time) filed complaints with India's Ministry of Commerce and Industry, alleging that Flipkart's Big Billion Days discounts undercut prices in a manner predatory to other retailers. The ministry stated that it would look into the complaints.[211][212][213]")

In [16]:
docs[99]

Document(metadata={'source': 'https://en.wikipedia.org/wiki/Flipkart'}, page_content='^ "Flipkart acquires gaming startup Mech Mocha". The Economic Times. Retrieved 16 August 2022.\n\n^ Tiwary, Avanish (4 November 2020). "Flipkart acquires gaming startup Mech Mocha to expand customer base". KrASIA. Retrieved 16 August 2022.\n\n^ Srivastava, Moulishree (17 November 2020). "Flipkart\'s acquisition of augmented reality startup Scapic aimed at enhancing customer experience". KrASIA. Retrieved 16 August 2022.\n\n^ "Flipkart adds Gujarati; app now available in eight Indian languages". Business Today. 9 April 2021. Retrieved 11 September 2022.\n\n^ "Flipkart acquires Cleartrip as part of its diversification drive". The Economic Times. Retrieved 21 June 2022.\n\n^ Abrar, Peerzada (7 September 2022). "Flipkart launches Flipkart Hotels for domestic, international market". www.business-standard.com. Retrieved 8 February 2023.\n\n^ BL Mumbai Bureau (21 February 2022). "Wego to Acquire Cleartrip\'s

#### Create embeddings for chunks and save them to FAISS index

In [17]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings()

# Pass the documents and embeddings inorder to create FAISS vector index
vectorindex = FAISS.from_documents(docs, embeddings)




In [18]:
# Storing vector index create in local
file_path = "vector_index.pkl"

with open(file_path, "wb") as f:
    pickle.dump(vectorindex, f)

In [19]:
if os.path.exists(file_path):
    with open(file_path, "rb") as f:
        vectorIndex = pickle.load(f) 

### Retrieve similar embeddings for a given question and call LLM to retrieve final answer

In [77]:
from langchain_huggingface.llms.huggingface_endpoint import HuggingFaceEndpoint
import os
from dotenv import load_dotenv

load_dotenv()
hf_token = os.getenv("HF_TOKEN")

HUGGINGFACEHUB_API_TOKEN = hf_token

hf_endpoint = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
    timeout=60,
    temperature=0.7,
    max_new_tokens=512
)


In [70]:
from langchain_huggingface.chat_models.huggingface import ChatHuggingFace

chat_llm = ChatHuggingFace(
    llm=hf_endpoint,
    verbose=True
)


In [71]:
from langchain.chains import RetrievalQAWithSourcesChain

chain = RetrievalQAWithSourcesChain.from_llm(
    llm=chat_llm,
    retriever=vectorIndex.as_retriever()
)


### Checking by giving some queries

In [73]:
query = "which product categories contributed most to its gross merchandise volume during the sale?"

# Run the query
print(f"\nQuery: {query}")
result = chain({"question": query}, return_only_outputs=True)

print("\n=== RESULT ===")
print("Answer:", result.get('answer', 'No answer found'))
print("Sources:", result.get('sources', 'No sources found'))


Query: which product categories contributed most to its gross merchandise volume during the sale?

=== RESULT ===
Answer: There is no final answer provided as the given text does not contain sufficient information to determine which product categories contributed most to Flipkart's gross merchandise volume during a sale. Further context or external sources may be necessary to answer the question accurately.
Sources: 


In [75]:
query = "How many cloud services provided?"

# Run the query
print(f"\nQuery: {query}")
result = chain({"question": query}, return_only_outputs=True)

print("\n=== RESULT ===")
print("Answer:", result.get('answer', 'No answer found'))
print("Sources:", result.get('sources', 'No sources found'))


Query: How many cloud services provided?

=== RESULT ===
Answer: 1. Based on the provided portions of the documents, it can be inferred that AWS offers at least three cloud services, including Amazon S3, Amazon Simple Queue Service, and Amazon EC2. However, it is unclear how many cloud services are provided in total as the information provided may not be exhaustive.

2. According to the text, as of 2025, the AWS ecosystem includes more than 200 cloud-based services used by millions of customers worldwide. Therefore, it can be inferred that AWS offers over 200 cloud services.

3. Based on the provided question and text, the cloud service provided for data archiving and storage is Amazon S3 Glacier. Other storage classes are also available for different access patterns and storage durations.
Sources: 


#### *Both queries i did mention company names, just askedquestion i have randomly, It gave answers after recognizing the company which i have been asked*