# RAG on single scientific article on Deep Learning
##Retrieving information from single document for Q&A
Sebastian Vinther Jensen

In [None]:
!pip install accelerate --q
!pip install pypdf --q
!pip install -qqq chromadb==0.4.10 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.299 --progress-bar off
!pip install -qqq xformers==0.0.21 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off
!pip install -qqq tokenizers==0.14.0 --progress-bar off
!pip install -qqq optimum==1.13.1 --progress-bar off
!pip install -qqq auto-gptq==0.4.2 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ --progress-bar off
!pip install -qqq unstructured==0.10.16 --progress-bar off

In [None]:
from langchain.document_loaders import UnstructuredMarkdownLoader
from langchain.document_loaders import PyPDFLoader
from langchain.llms import HuggingFaceHub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain import PromptTemplate

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline

from textwrap import fill
import torch

Loading the Deep learning review and discussion of its future pdf article

In [None]:
loader = PyPDFLoader("/content/Deep_learning_review_and_discussion_of_its_future_.pdf")

docs = loader.load()
len(docs)

7

Splitting the document into smaller chunks for the purpose of improved efficiency, contextual understanding. Each chunk is **1024**, which is determined to be large enough to contain meaningful text segments, but also small enough to ensure efficient processing. Overlap is **64**, which determines how much each chunk overlaps eachother, this helps with preserving context across chunks, which is essential for building a QA RAG system.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
texts = text_splitter.split_documents(docs)
len(texts)

36

This code initializes a text embedding model using Hugging Face Transformers. It sets parameters for model usage and embedding processing, then embeds the text content of the first document chunk.

In [None]:
embeddings = HuggingFaceEmbeddings(
    model_name="thenlper/gte-large",
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True},
)

query_result = embeddings.embed_query(texts[0].page_content)
print(len(query_result))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

onnx/config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

onnx/special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

onnx/tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

onnx/tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

onnx/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/670M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

1024


This code creates a database (db) using the **Chroma library**, which stores embeddings generated from the provided texts. It specifies a directory for persisting the database. Then, it performs a similarity search within the database for documents similar to the query "Advantages", retrieving the top 2 results.

In [None]:
db = Chroma.from_documents(texts, embeddings, persist_directory="db")
results = db.similarity_search("Advantages", k=2)
print(results[0].page_content)

2.2 Advantages and disad vantages of deep learning  
Deep learning has shown better performance than traditional neural networks. After a deep neural network is trained and properly adjusted for certain task like image classification, it saves a lot of calculations, and can complete  a lot of work in a short time. . Deep learning is 
also malleable. Usually, for traditional algorithms, if you need to adjust the model, you may 
need to make copious changes to the code. For the determined network framework used for 
deep learning, if you n eed to adjust the model, you only need to adjust the parameters, thus 
deep learning has great flexibility. The deep learning framework can be continuously 
improved and then reached the almost perfect state. Deep learning is also more general, it 
can be mod elled based on problems, not limited to a fixed problem.  
Deep learning has some shortcomings as well. First of all, its training cost is relatively


Overall, this code sets up a text generation pipeline using a pre-trained language model, tokenizer, and generation configuration, with specific settings for controlling the generated text's properties like randomness and diversity.

In [None]:
MODEL_NAME = "TheBloke/Llama-2-7B-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto"
)

# Create a configuration for text generation based on the specified model name
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)

# Set the maximum number of new tokens in the generated text to 1024.
# This limits the length of the generated output to 1024 tokens.
generation_config.max_new_tokens = 1024

# Set the temperature for text generation. Lower values (e.g., 0.0001) make output more deterministic, following likely predictions.
# Higher values make the output more random.
generation_config.temperature = 0.0001

# Set the top-p sampling value. A value of 0.95 means focusing on the most likely words that make up 95% of the probability distribution.
generation_config.top_p = 0.95

# Enable text sampling. When set to True, the model randomly selects words based on their probabilities, introducing randomness.
generation_config.do_sample = True

# Set the repetition penalty. A value of 1.15 discourages the model from repeating the same words or phrases too frequently in the output.
generation_config.repetition_penalty = 1.15


# Create a text generation pipeline using the initialized model, tokenizer, and generation configuration
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

# Create a LangChain pipeline that wraps the text generation pipeline and set a specific temperature for generation
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})


tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]



generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

This code sets up a system for answering questions using a pre-trained language model and a database of documents. It defines a template for generating prompts, configures a question-answering pipeline, and provides answers based on retrieved documents and the input question.

In [None]:
template = """
<s>[INST] <<SYS>>
You are a Data Scientist tasked with analyzing the following information to provide insights for the question at the end. Attempt to be less academic in your language and answer the question in language, which the general public will understand.
<</SYS>>

{context}

{question} [/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])


qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

result = qa_chain(
    "What are the advantages of Deep Learning?"
)
print(result["result"].strip())

Hey there! So, you want to know about the advantages of Deep Learning? Well, let me tell you - it's got some pretty cool benefits! 🤖
Firstly, Deep Learning can save a ton of calculations when it comes to tasks like image classification. It can finish a lot of work in a short amount of time, making it super efficient! ⏱️
Another awesome thing about Deep Learning is that it's really flexible. If you need to adjust the model, you don't have to make lots of changes to the code like you would with other algorithms. Instead, you just need to tweak a few parameters, and voila! Your model is ready to go! 🎉
Oh, and did I mention that Deep Learning can be improved over time? Yep, you heard that right! You can keep updating and refining your model until it reaches near perfection! 🔥
Last but not least, Deep Learning is super versatile. It can be applied to different types of problems, unlike some other algorithms that are only good for specific tasks. This means you can use Deep Learning for a wi

Different query and answer, with an 80 character max width (fill function)

In [None]:
result = qa_chain(
    "Summarize the future of deep learning in 2-3 sentences."
)
print(fill(result["result"].strip(), width=80))

The future of deep learning is expected to be shaped by the need for complete
theoretical support to explain its inner principles. Adjusting model parameters
to improve performance will not suffice, and continuous improvement of the
underlying theory is crucial. Areas like natural language processing, robotics,
and computer vision are likely to see further advancements in deep learning,
with potential applications in autonomous driving, intelligent dialogue robots,
and medical image processing.
