<a href="https://colab.research.google.com/github/somesh-awasthi/NLP-PROJECT/blob/main/NLP_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install langchain langchain-openai
!pip install ctransformers sentence-transformers langchain-chroma langchain
!pip install pandas nltk spacy PyPDF

In [2]:
#linking my drive with my notebook & giving the data path
# from google.colab import drive
# drive.mount("/content/drive")
path="../data"

In [3]:
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
# Load documents from PDF
loader = DirectoryLoader(path, glob="Medical_book.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

In [4]:
import re
import nltk
import spacy
import string
from nltk.corpus import stopwords

nltk.download('stopwords')

nlp = spacy.load("en_core_web_sm")

def preprocess_text(text):
    # Tokenization and POS tagging using SpaCy
    doc = nlp(text)

    # Filtering out tokens based on POS tags and dependency parsing
    filtered_tokens = [token.text.lower() for token in doc if token.pos_ not in ["SPACE", "X"] and token.dep_ not in ["det", "punct"]]

    # Stopword removal
    filtered_tokens = [token for token in filtered_tokens if token not in stopwords.words('english')]

    # Lemmatization
    lemmatized_tokens = [token.lemma_ for token in nlp(" ".join(filtered_tokens))]

    return " ".join(lemmatized_tokens)

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\somes\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [5]:
# Preprocess each document
for doc in documents:
    doc.page_content = preprocess_text(doc.page_content)

In [6]:

from langchain.text_splitter import RecursiveCharacterTextSplitter
# Split the preprocessed documents (chunking)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=20,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks.")

Split 637 documents into 4325 chunks.


In [7]:
# how our chunk looks like
prints=chunks[10]
print(prints.page_content)
print(prints.metadata)

gale encyclopedia medicine 2 medical ref- erence product design inform educate readersabout wide variety disorder condition treatment diagnostic test gale group believe productto comprehensive necessarily definitive isintende supplement replace consultation aphysician healthcare practitioner galegroup make substantial effort provide informationthat accurate comprehensive date galegroup make representation warranty anykind include without limitation warranties mer- chantability fitness particular
{'source': '..\\..\\data\\Medical_book.pdf', 'page': 5, 'start_index': 0}


In [8]:
from langchain_chroma import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
#embedding model
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
db = Chroma.from_documents(chunks, embedding, persist_directory="../chroma_db-v2")

  from .autonotebook import tqdm as notebook_tqdm


In [9]:

from langchain import PromptTemplate
prompt_template="""
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else.
Helpful answer:
"""
PROMPT=PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain_type_kwargs={"prompt": PROMPT}

In [10]:
query = "What are Allergies"

# docs=db.similarity_search(query, k=3)

docs=db.similarity_search_with_relevance_scores(query, k=7)
print("Result", docs)

Result [(Document(page_content='non - allergic rhinitis clinical aspect ed n. mygund r. m. naclerio philadelphia w. b. saun ders co. 1993 lawlor g. j. jr . t. j. fischer d. c. adelman manual allergy immunology boston little brown co. 1995 novick n. l. something allergy new york macmillan 1994 weil a. natural health natural medicine comprehensive manual wellness self care new york houghton mifflin 1995 richard robinson allergie definition allergie abnormal reaction immune sys- tem occur response otherwise harmless sub - stance', metadata={'page': 127, 'source': '..\\..\\data\\Medical_book.pdf', 'start_index': 1924}), 0.5219200787054694), (Document(page_content='non - allergic rhinitis clinical aspect ed n. mygund r. m. naclerio philadelphia w. b. saun ders co. 1993 lawlor g. j. jr . t. j. fischer d. c. adelman manual allergy immunology boston little brown co. 1995 novick n. l. something allergy new york macmillan 1994 weil a. natural health natural medicine comprehensive manual wellness

In [11]:
from langchain.chains import RetrievalQA
from langchain.llms import CTransformers
# importing model
llm = CTransformers(model="TheBloke/Llama-2-7B-Chat-GGML",
                  model_type="llama",
                  config={'max_new_tokens':512,
                          'temperature':0.8})

qa=RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=db.as_retriever(search_kwargs={'k': 2}),
    return_source_documents=True, 
    chain_type_kwargs=chain_type_kwargs)

Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 1002.22it/s]
Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]Error while downloading from https://cdn-lfs.huggingface.co/repos/30/e3/30e3aca7233f7337633262ff6d59dd98559ecd8982e7419b39752c8d0daae1ca/45833e0b59c8fe80676c664f556031fc411da8856e0716ac7b8ed201b7221c08?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-7b-chat.ggmlv3.q2_K.bin%3B+filename%3D%22llama-2-7b-chat.ggmlv3.q2_K.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1714391814&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDM5MTgxNH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy8zMC9lMy8zMGUzYWNhNzIzM2Y3MzM3NjMzMjYyZmY2ZDU5ZGQ5ODU1OWVjZDg5ODJlNzQxOWIzOTc1MmM4ZDBkYWFlMWNhLzQ1ODMzZTBiNTljOGZlODA2NzZjNjY0ZjU1NjAzMWZjNDExZGE4ODU2ZTA3MTZhYzdiOGVkMjAxYjcyMjFjMDg%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=BdC2CI4fohki

In [12]:
user_input=input(f"Input Prompt:")
result=qa({"query": user_input})
print("Response : ", result["result"])

  warn_deprecated(
Number of tokens (513) exceeded maximum context length (512).
Number of tokens (514) exceeded maximum context length (512).
Number of tokens (515) exceeded maximum context length (512).
Number of tokens (516) exceeded maximum context length (512).
Number of tokens (517) exceeded maximum context length (512).
Number of tokens (518) exceeded maximum context length (512).
Number of tokens (519) exceeded maximum context length (512).
Number of tokens (520) exceeded maximum context length (512).
Number of tokens (521) exceeded maximum context length (512).
Number of tokens (522) exceeded maximum context length (512).
Number of tokens (523) exceeded maximum context length (512).
Number of tokens (524) exceeded maximum context length (512).
Number of tokens (525) exceeded maximum context length (512).
Number of tokens (526) exceeded maximum context length (512).
Number of tokens (527) exceeded maximum context length (512).
Number of tokens (528) exceeded maximum context len

Response :  Fever is a common symptom of many illnesses, including respiratory infections such as pneumonia, bronchitis, and croup. Vomiting with watery eyes can also be caused by gastrointestinal issues like food poisoning or norovirus. However, based on the text provided, it seems that adenovirus infections may cause acute pharyngoconjunctival fever and occasionally pneumonia in children, sometimes manifesting as a sore throatypillioletereeperate 3-7 symptomt least 3 or more severe respirborning inflammimingle types of symptomethereular fever 3-feveral the common cold-type 3-fever and influenza 3-fevereinflammoti sympotentially serious lower respirborningitis, bacterious symptomited fever 3-7 symptomphб sore throat least week of the common cold or all of severe disease. conjunctivsore symptomophthletypepticophtha sore symptomiting symptomethereann flu-feveral the following symptom otim (presentation with fever 3-type 3-7 symptomited symptomoph throat least week-symptomphthena or more