# GPU Information

In [2]:
!nvidia-smi

Tue Dec  5 13:52:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Install & Import Libraries

In [3]:
!pip install langchain gpt4all pypdf chromadb -Uqq

In [4]:
import torch
import langchain
from langchain.llms.gpt4all import GPT4All
from langchain import PromptTemplate, LLMChain
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings, GPT4AllEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

In [5]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


# Model

## Download

In [6]:
# Download the GPT4All Model
# !wget https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf -Uqq

## Initialize

In [7]:
# Pretrained Model's Path
model_checkpoint = "/content/mistral-7b-openorca.Q4_0.gguf"

model = GPT4All(
    model = model_checkpoint
)

In [8]:
# Generate text
# response = model("Once upon a time, ")
# print(response)

# Load Docs

## Download

In [9]:
!wget https://www.cse.iitk.ac.in/users/sigml/lec/Slides/Ram.pdf

--2023-12-05 13:52:40--  https://www.cse.iitk.ac.in/users/sigml/lec/Slides/Ram.pdf
Resolving www.cse.iitk.ac.in (www.cse.iitk.ac.in)... 202.3.77.10
Connecting to www.cse.iitk.ac.in (www.cse.iitk.ac.in)|202.3.77.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1024219 (1000K) [application/pdf]
Saving to: ‘Ram.pdf.1’


2023-12-05 13:52:43 (751 KB/s) - ‘Ram.pdf.1’ saved [1024219/1024219]



## Load

In [10]:
pdf_path = "/content/Ram.pdf"

# Load the pdf
loader = PyPDFLoader(pdf_path)

# Pdf data
pdf_data: list | None = loader.load()

print(f"Total number of pages: {len(pdf_data)}")
print(type(pdf_data))
print(pdf_data)

Total number of pages: 19
<class 'list'>
[Document(page_content='Introduction to Deep Learning\nM S Ram\nDept. of Computer Science & Engg .\nIndian Institute of Technology Kanpur\nReading of Chap. 1 from “Learning Deep Architectures for AI”; Yoshua Bengio ; FTML Vol. 2, No. 1 (2009) 1 –127\n1 Date: 12 Nov, 2015', metadata={'source': '/content/Ram.pdf', 'page': 0}), Document(page_content='A Motivational Task: Percepts\uf0e0Concepts\n•Create algorithms \n•that can understand scenes and describe \nthem in natural language\n•that can infer semantic concepts to allow \nmachines to interact with humans using these \nconcepts\n•Requires creating a series of abstractions\n•Image (Pixel Intensities) \uf0e0Objects in Image \uf0e0Object \nInteractions \uf0e0Scene Description\n•Deep learning aims to automatically learn these \nabstractions with little supervision\nCourtesy: Yoshua Bengio , Learning Deep Architectures for AI\n2', metadata={'source': '/content/Ram.pdf', 'page': 1}), Document(page_co

In [11]:
print(pdf_data[8].page_content)

Drawbacks of Back -propagation based Deep Neural 
Networks
•They are discriminative models
•Get all the information from the labels
•And the labels don’t give so much of information
•Need a substantial amount of labeled data
•Gradient descent with random initialization leads to poor local 
minima


## Tokenization

In [12]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1024,
    chunk_overlap = 64
)

texts: list | None = text_splitter.split_documents(pdf_data)

print(type(texts))
print(f"{len(texts)} {texts}")

<class 'list'>
19 [Document(page_content='Introduction to Deep Learning\nM S Ram\nDept. of Computer Science & Engg .\nIndian Institute of Technology Kanpur\nReading of Chap. 1 from “Learning Deep Architectures for AI”; Yoshua Bengio ; FTML Vol. 2, No. 1 (2009) 1 –127\n1 Date: 12 Nov, 2015', metadata={'source': '/content/Ram.pdf', 'page': 0}), Document(page_content='A Motivational Task: Percepts\uf0e0Concepts\n•Create algorithms \n•that can understand scenes and describe \nthem in natural language\n•that can infer semantic concepts to allow \nmachines to interact with humans using these \nconcepts\n•Requires creating a series of abstractions\n•Image (Pixel Intensities) \uf0e0Objects in Image \uf0e0Object \nInteractions \uf0e0Scene Description\n•Deep learning aims to automatically learn these \nabstractions with little supervision\nCourtesy: Yoshua Bengio , Learning Deep Architectures for AI\n2', metadata={'source': '/content/Ram.pdf', 'page': 1}), Document(page_content='Deep Visual -Sem

In [13]:
print(texts[3].page_content)

Challenge in Modelling Complex Behaviour
•Too many concepts to learn
•Too many object categories
•Too many ways of interaction between objects categories
•Behaviour is a highly varying function underlying factors
•f: L V
•L: latent factors of variation
•low dimensional latent factor space
•V: visible behaviour
•high dimensional observable space
•f: highly non -linear function
4


# Embeddings

In [14]:
embeddings = GPT4AllEmbeddings(
    model = model_checkpoint
)
print(embeddings)

db = Chroma.from_documents(
    texts,
    embeddings,
    persist_directory = "db"
)
print(db)

# Demo
print(embeddings.embed_documents(["Subrata", "Mondal"]))

client=<gpt4all.gpt4all.Embed4All object at 0x797e5353b430>
<langchain.vectorstores.chroma.Chroma object at 0x797e5351c160>
[[-0.09711340069770813, 0.000904667773284018, -0.024697553366422653, -0.04399970918893814, 0.01615772396326065, -0.07538696378469467, -0.006384008564054966, 0.05518573895096779, -0.04213697835803032, -0.030894434079527855, -0.005064909812062979, -0.0455617792904377, -0.008511842228472233, -0.02269393764436245, -0.015313736163079739, 0.049883000552654266, 0.04793348163366318, 0.01194586418569088, 0.013956594280898571, -0.12195905297994614, 0.04416993260383606, 0.0912923738360405, -0.012974007055163383, -0.015094476751983166, 0.007220440078526735, -0.015183888375759125, -0.019292157143354416, -0.026059553027153015, -0.013971583917737007, -0.00977165438234806, -0.012909025885164738, 0.020741993561387062, 0.05236663669347763, -0.062396060675382614, 0.042088862508535385, 0.028679868206381798, -0.0708686113357544, -0.059478510171175, 0.04030084237456322, 0.0835859179496

# Chains

In [15]:
qa = RetrievalQA.from_chain_type(
    llm = model,
    chain_type = "stuff",
    retriever = db.as_retriever(search_kwargs={"k": 3}),
    return_source_documents = True,
    verbose = False,
)

print(qa)

combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"), llm=GPT4All(model='/content/mistral-7b-openorca.Q4_0.gguf', client=<gpt4all.gpt4all.GPT4All object at 0x797e64292c80>)), document_variable_name='context') return_source_documents=True retriever=VectorStoreRetriever(tags=['Chroma', 'GPT4AllEmbeddings'], vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x797e5351c160>, search_kwargs={'k': 3})


# Ask Questions

In [16]:
%%time
response = qa("Can you give me a summary of the document. Use bullet points to mention the important points.")
print(response)

{'query': 'Can you give me a summary of the document. Use bullet points to mention the important points.', 'result': ' Multi-Task Learning is a machine learning technique that involves training an algorithm on multiple related tasks simultaneously, allowing it to learn from and transfer knowledge between these tasks. This approach can improve performance in various domains such as natural language processing, computer vision, and speech recognition.', 'source_documents': [Document(page_content='Multi -task Learning\n17\nSource: https://en.wikipedia.org/wiki/Multi -task_learning', metadata={'page': 16, 'source': '/content/Ram.pdf'}), Document(page_content='Multi -task Learning\n17\nSource: https://en.wikipedia.org/wiki/Multi -task_learning', metadata={'page': 16, 'source': '/content/Ram.pdf'}), Document(page_content='Multi -task Learning\n17\nSource: https://en.wikipedia.org/wiki/Multi -task_learning', metadata={'page': 16, 'source': '/content/Ram.pdf'})]}
CPU times: user 15min 7s, sys:

In [18]:
print(response["result"])

 Multi-Task Learning is a machine learning technique that involves training an algorithm on multiple related tasks simultaneously, allowing it to learn from and transfer knowledge between these tasks. This approach can improve performance in various domains such as natural language processing, computer vision, and speech recognition.
