In [1]:
import streamlit as st
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI
from langchain.document_loaders import SeleniumURLLoader
from langchain import PromptTemplate



In [2]:
urls = ['https://www.pcmag.com/how-to/how-to-build-a-pc-the-ultimate-beginners-guide',
        'https://www.pcmag.com/how-to/how-to-see-your-frames-per-second-fps-in-games',
        'https://www.pcmag.com/how-to/how-to-download-youtube-videos',
        'https://www.pcmag.com/how-to/how-to-connect-airpods-to-your-laptop',
        'https://www.pcmag.com/how-to/how-to-access-your-wi-fi-routers-settings',
        'https://www.geeksforgeeks.org/basic-linux-commands/',
       ]

In [3]:
loader = SeleniumURLLoader(urls=urls)
docs_not_splitted = loader.load()

# we split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)

Created a chunk of size 1598, which is longer than the specified 1000


In [4]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
from dotenv import load_dotenv
import os

load_dotenv("keys.env") 
ACTIVELOOP_TOKEN= os.getenv("ACTIVELOOP_TOKEN")

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "akashghanathey"
my_activeloop_dataset_name = "langchain_course_customer_support"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
#db.add_documents(docs)

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  from .autonotebook import tqdm as notebook_tqdm





Using embedding function is deprecated and will be removed in the future. Please use embedding instead.


Deep Lake Dataset in hub://akashghanathey/langchain_course_customer_support already exists, loading from the storage


In [5]:
query = "Can you give me few linux commands"
docs = db.similarity_search(query)
print(docs[0].page_content)

Here we have put all Basic Linux Commands that every Linux user (as a beginner in 2025) should know. These are not all that you should know, but these are the basic and most commonly used commands.


In [6]:
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.

{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.

Question: {query}

Answer:"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template,
)

In [7]:
query = "How to check disk usage in linux?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# generate answer
llm_engine=ChatOllama(
    model="deepseek-r1:14b",
    temperature=0.3
)

In [8]:
answer = llm_engine.invoke(prompt_formatted)
print(answer)

content='<think>\nOkay, so I need to figure out how to answer the question "How to check disk usage in Linux?" based on the provided context. Let me go through the context step by step.\n\nLooking at the context, there\'s a section about the \'df\' command. It says that df stands for "disk free" and it\'s used to get details about the file system. The example shows using \'df -h\', which makes the output more readable in human-readable format, like showing sizes in MB or GB instead of bytes.\n\nI remember seeing other commands like \'du\' mentioned elsewhere, but in this context, only \'df\' is discussed for disk usage. So I should focus on that. The answer should mention using \'df -h\' to display the disk space information in a user-friendly way.\n\nWait, does the context explicitly say that \'df\' checks disk usage? Yes, it says "df command in Linux gets the details of the file system." So that\'s exactly what is needed for checking disk usage. I should explain how using \'df -h\' p

In [9]:
print(docs)

[Document(metadata={'source': 'https://www.geeksforgeeks.org/basic-linux-commands/', 'title': '25 Basic Linux Commands For Beginners [2025] - GeeksforGeeks', 'description': 'Basic Linux commands are essential for beginners to efficiently perform tasks and manage files in a Linux-based operating system.', 'language': 'en-US'}, page_content='25 Most-Commonly Used Linux Commands\n\n1. Is command in Linux\n\nThe ls command is commonly used to identify the files and directories in the working directory. This command is one of the many often-used Linux commands that you should know.\n\nThis command can be used by itself without any arguments and it will provide us the output with all the details about the files and the directories in the current working directory. There is a lot of flexibility offered by this command in terms of displaying data in the output. Check the below image for the output.\n\n2. pwd command in Linux\n\nThe pwd command is mostly used to print the current working direct

In [10]:
print(chunks_formatted)

25 Most-Commonly Used Linux Commands

1. Is command in Linux

The ls command is commonly used to identify the files and directories in the working directory. This command is one of the many often-used Linux commands that you should know.

This command can be used by itself without any arguments and it will provide us the output with all the details about the files and the directories in the current working directory. There is a lot of flexibility offered by this command in terms of displaying data in the output. Check the below image for the output.

2. pwd command in Linux

The pwd command is mostly used to print the current working directory on your terminal. It is also one of the most commonly used commands.

wc -m shows the number of characters present in a file

Let’s see one example of these options

Command:

Output:

Here we used the touch command to create a text file and then used the echo command to input a sentence that contains six words and we used the wc -w command to ca