# Main Jupyter Notebook for the CS205 Final Project

## In this project we will explore a usecase of Large Language Models
### We will start with how to use an LLM through HuggingFace, and explain some of the basic concepts behind an LLM. Once we have a good understading of how to use an LLM for generating text, we will explore Retrieval Augmented Generation (RAG). 

#### For this project we have used Llama 2 7 Billion paramter model with OpenAI's text-embed-002 embedding model. Llama 2 7B was served locally by Ollama. We have used Llama Index and LangChain to interact with the LLM

#### The data store is at the root of the project directory with the name 'data'. Create a data repository before running the indexing and query cells

### Let's understand how to use an LLM using HuggingFace

In [6]:
import torch

#Import HuggingFace Transformer
from transformers import AutoModelForCausalLM, AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
#Fetch Meta's OPT LLM with 1.3 billion parameters. This is quite a small model compared to the SOTA like GPT4V, etc.

model = AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b')
tokenizer = AutoTokenizer.from_pretrained('facebook/opt-1.3b')

##### Open Pre-trained Transformer (OPT) is a collection of decoder-only transformer developed by Meta. 

In [None]:
input_text = 'I like CS205 Artificial Intelligence course, because'
tok_input = tokenizer(input_text, return_tensors='pt', add_special_tokens=True, truncation=True) #Create tokens from the given input and return PyTorch tensors

In [None]:
torch.manual_seed(123)

generated_output = model.generate(**tok_input, 
                                  max_new_tokens=200, 
                                  return_dict_in_generate=True, 
                                  do_sample=True) #Generation is deterministic. 
                                                  #To use top-k sampling, set do_sample=True to get different responses in each generation
                                                  #Set do_sample=False to have a deterministic generation each time


In [None]:
decoded_output = tokenizer.batch_decode(generated_output.sequences, skip_special_tokens=True)[0]
print(decoded_output)

#### We saw text generation in the previous subsection. Now lets explore text summarization, a critical usecase of LLM

#### What is an embedding?

Embedding is a numerical representation of the text. The words in the vocabulary are mapped to a set of integers. These integers are then converted into another mathematical representation. This 'embedding' vector is of shape (n_tokens x embedding_dimension)

In [15]:
sentence1 = "Life is sweet and sugary, but eating ice cream on Everest is long"
sentence2 = "The apple pie I eat is very sweet, it will take long to finish eating"

In [16]:
tokenize1 = {s: i for i, s in enumerate(sorted(sentence1.replace(',', '').split()))}
tokenize2 = {s: i for i, s in enumerate(sorted(sentence2.replace(',', '').split()))}

In [21]:
sentence_vec = torch.tensor([tokenize1[s] for s in sentence1.replace(',', '').split()])
print(sentence_vec)

tensor([ 1,  8, 12,  2, 11,  3,  5,  6,  4, 10,  0,  8,  9])


The sentence1 is tokenized as shown in the output above. The words are mapped to an integer. 

In [18]:
torch.manual_seed(123)

<torch._C.Generator at 0x12823fb70>

In [19]:
embed = torch.nn.Embedding(len(sentence_vec), 10)
embedded_sentence = embed(sentence_vec).detach()

The sentence "Life is sweet and sugary, but eating ice cream in Manali is long" now has a representation like shown below. It is converted from text to a form which the machine understands

In [20]:
print(embedded_sentence.shape)
print(embedded_sentence)

torch.Size([13, 10])
tensor([[ 0.6984, -1.4097,  0.1794,  1.8951,  0.4954,  0.2692, -0.0770, -1.0205,
         -0.1690,  0.9178],
        [ 0.2553, -0.5496,  1.0042,  0.8272, -0.3948,  0.4892, -0.2168, -1.7472,
         -1.6025, -1.0764],
        [-1.4205, -0.2238, -0.2548,  1.1517, -0.0179,  0.4264, -0.7657, -0.0545,
         -0.7321,  1.2347],
        [ 1.5810,  1.3010,  1.2753, -0.2010,  0.4965, -1.5723,  0.9666, -1.1481,
         -1.1589,  0.3255],
        [-0.9896,  0.7016, -0.9405, -0.4681, -0.8016, -0.8183, -1.1820, -0.2877,
         -0.6043,  0.6002],
        [-0.6315, -2.8400, -1.3250,  0.1784, -2.1338,  1.0524, -0.3885, -0.9343,
         -0.4991, -1.0867],
        [-1.4779,  1.1331, -1.2203,  1.3139,  1.0533,  0.1388,  2.2473, -0.8036,
         -0.2808,  0.7697],
        [-0.6596, -0.7979,  0.1838,  0.2293,  0.5146,  0.9938, -0.2587, -1.0826,
         -0.0444,  1.6236],
        [ 0.8805,  1.5542,  0.6266, -0.1755,  0.0983, -0.0935,  0.2662, -0.5850,
          0.8768,  1.6221]

#### What is RAG?

#### Retrieval Augmented Generation is a technique in generative AI to boost the knowledge of an LLM. The LLM parameters are learned and not updated to the current information, so a specialized database of knowledge (could be private) is created for the LLM is access. This is a non-parametric memory, i.e, this information is not stored in the learned paramaters of the LLM. 

#### Implementing RAG with Llama Index using ChromaDB as the vector database. Ollama is used to serve Llama 2 locally

The data can be accessed at this link. Add this folder to the root of the project directory

https://drive.google.com/drive/folders/10wzRErO4Zlj6L3bLSqh9QtTLDkaUOuxS?usp=drive_link

In [1]:
import openai

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, download_loader
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import OpenAIEmbedding

from langchain.llms import Ollama

import chromadb

In [2]:
from dotenv import dotenv_values
from pathlib import Path

In [3]:
api_key = dotenv_values('../.env')["OPENAI_API_KEY"]
openai.api_key = api_key

In [4]:
#Set the embedding

llm = Ollama(model="llama2")
#embed_model = OllamaEmbeddings(base_url="http://localhost:11434", model="llama2") #Local Llama 2 embedding model
embed_model = OpenAIEmbedding() #Using OpenAI's text-embed-002

In [5]:
COLLECTION = "aiprof"
SLIDE_COLLECTION = 'slides'
PATH = '../chroma'

In [None]:
# create client and a new collection
db = chromadb.PersistentClient(path=PATH)
chroma_collection = db.get_or_create_collection(COLLECTION)

In [7]:
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

In [49]:
# load documents
documents = SimpleDirectoryReader("../data/AIMA/").load_data()

In [None]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
)

In [8]:
# load from disk
db2 = chromadb.PersistentClient(path=PATH)
chroma_collection = db2.get_or_create_collection(COLLECTION)

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

index2 = VectorStoreIndex.from_vector_store(
    vector_store,
    service_context=service_context,
)

In [10]:
query_engine = index2.as_query_engine()

In [11]:
resp = query_engine.query("What is the turing test?")
print(resp.response)

The Turing Test is a theoretical framework proposed by Alan Turing in 1950 to measure a machine's ability to exhibit intelligent behavior comparable to, or indistinguishable from, that of a human. The test involves interrogating a machine (a computer or a robot) via a teletype and assessing whether an human interrogator cannot tell if it is a machine or a human at the other end. The test requires the machine to pass certain cognitive tasks, such as natural language processing, knowledge representation, automated reasoning, and machine learning, sufiiciently well to fool the interrogator. The Turing Test is considered a benchmark for measuring the intelligence of machines and has been the subject of much research and debate in the fi eld of Artificial Intelligence (AI).


In [None]:
resp = query_engine.query("Generate 2 concise questions about rational agents")

In [None]:
resp.response.split('\n')

#### Querying Slides (PPTx)

In [37]:
# create client and a new collection
slides_db = chromadb.PersistentClient(path=PATH)
slides_chroma_collection = slides_db.get_or_create_collection(SLIDE_COLLECTION)

In [38]:
vector_store = ChromaVectorStore(chroma_collection=slides_chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

In [12]:
slides_reader = download_loader("PptxReader")
loader = slides_reader()

print(type(SimpleDirectoryReader))

slides = loader.load_data(Path('../data/slides/4_Adversarial_Search.pptx'))

<class 'abc.ABCMeta'>


In [13]:
slides_index = VectorStoreIndex.from_documents(documents=slides, 
                                               storage_context=storage_context, 
                                               service_context=service_context)

In [14]:
slides_query = slides_index.as_query_engine()
resp = slides_query.query("What is The Minimax Algorithm?")
print(resp.response)

Based on the context information provided, the Minimax algorithm is a decision-making algorithm used in game theory and artificial intelligence. It is an optimal algorithm that helps players make the best move possible in a game by evaluating the utility function of each possible move and selecting the one with the highest value. The algorithm works by recursively exploring the game tree down to the terminal nodes, evaluating the utility function of each node, and choosing the move with the highest expected payoff.

The Minimax algorithm is guaranteed to find the optimal move in a game, but it has a time complexity that grows exponentially with the depth of the game tree. This means that as the depth of the tree increases, the time required to find the optimal move also increases dramatically. To address this issue, Alpha-Beta pruning is used to reduce the number of nodes that need to be evaluated, making the search process more efficient.

In summary, the Minimax algorithm is a decisi