In [None]:
pip install langchain langchain_community langchain_chroma

### LangSmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with LangSmith.


In [24]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ["LANGCHAIN_TRACING_V2"] = "true"
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")
MODEL = "llama3"

### Define the model and embedding

Let's create a simple indexing pipeline and RAG chain using llama3 in local.

In [27]:
from langchain.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

In [28]:
model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

In [29]:
# Test whether the model is working or not
model.invoke("tell me a joke in nepali")

'Here\'s a Nepali joke for you:\n\nको चाहिन्छ कि हामी सिधे? (Ko chaahinch kati haami sidhe?)\n\nTranslation: "Who wants to sleep when we can dream?"\n\n(Note: "सिधे" means "to sleep", and the joke is playing with the idea that it\'s not just about sleeping, but also about having dreams and aspirations)\n\nI hope you enjoyed it!'

### Import necessary libraries

In [32]:
#@ Importing libraries
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter

### STEP 1: Load the document

In [42]:
#@ STEP 1: Load the Document
parser = StrOutputParser()
loader = PyPDFLoader("demo.pdf")

# It does the same thing as loader.load
# pages = loader.load_and_split()
# pages
docs = loader.load()
docs


[Document(page_content='AI ML DS Data Science Data Analysis Data Visualization Machine Learning Deep Learning NLP Compute\nNatural Language Processing (NLP) –\nOverview\nLast Updated : 26 May, 2024\nThe meaning of NLP is Natural Language Processing (NLP) which is a\nfascinating and rapidly evolving field that intersects computer science, artificial\nintelligence, and linguistics. NLP focuses on the interaction between computers\nand human language, enabling machines to understand, interpret, and\ngenerate human language in a way that is both meaningful and useful. With\nthe increasing volume of text data generated every day, from social media\nposts to research articles, NLP has become an essential tool for extracting\nvaluable insights and automating various tasks.\nNatural Language Processing\nIn this article, we will explore the fundamental concepts and techniques of\nNatural Language Processing, shedding light on how it transforms raw text\ninto actionable information. From tokeniz

### STEP 2: Split the texts

In [58]:
#@ STEP 2: Split texts

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
splits = text_splitter.split_documents(docs)
# print(splits)
vector_store = Chroma.from_documents(splits, embedding=embeddings)

### STEP 3: Retrieve and generate the relevant snippets from the document

In [49]:
#@ STEP 3: Retrieve and generate using the relevant snippents from the pdf. 

#@ Lets create a prompt for the model
from multiprocessing import context
from langchain import PromptTemplate

template = """ 
Answer the question based on the context below. If the context is not relevant, just reply "I don't know"

context: {context}

question: {question}
"""

prompt = PromptTemplate(template=template)
# print(prompt.format(context="Here is the context", question="Here is the question"))

retriever = vector_store.as_retriever()

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {
        "context": retriever | format_docs, "question": RunnablePassthrough()}

    | prompt | model | parser
    
)

In [50]:
rag_chain.invoke("What is the meaning of NLP")

'The meaning of NLP is "Natural Language Processing", which is a fascinating and rapidly evolving field that intersects computer science, artificial intelligence, and linguistics.'

In [55]:
#@ Let's create a list of questions
questions = [
    
    "What is the meaning of NLP?",
    "What are the key NLP techniques?",
    "what do you understand by speech processing?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {rag_chain.invoke({'question': question})}")
    print()

Question: What is the meaning of NLP?
Answer: Natural Language Processing (NLP) is the ability of a computer program to understand, interpret, and generate human language, with the goal of enabling computers to process and understand natural language data in much the same way that humans do.

Question: What are the key NLP techniques?
Answer: The key NLP techniques include:

1. Text Processing and Preprocessing
2. Word Sense Disambiguation (WSD)
3. Coreference Resolution
4. Information Extraction
5. Entity Extraction
6. Relation Extraction
7. Text Classification
8. Sentiment Analysis
9. Topic Modeling
10. Spam Detection

These techniques are categorized into several broad areas, each addressing different aspects of language processing, such as understanding natural language, identifying entities and relationships, classifying text, generating text, and more.

Question: what do you understand by speech processing?
Answer: Based on the context, I would answer that "Speech Processing" is 

In [56]:
#@ If we have to delete the data
vector_store.delete_collection()