# Main Jupyter Notebook for the CS205 Final Project

## In this project we will explore a usecase of Large Language Models
### We will start with how to use an LLM through HuggingFace, and explain some of the basic concepts behind an LLM. Once we have a good understading of how to use an LLM for generating text, we will explore Retrieval Augmented Generation (RAG). 

#### For this project we have used Llama 2 7 Billion paramter model with OpenAI's text-embed-002 embedding model. Llama 2 7B was served locally by Ollama. We have used Llama Index and LangChain to interact with the LLM

#### The data store is at the root of the project directory with the name 'data'. Create a data repository before running the indexing and query cells

### Let's understand how to use an LLM using HuggingFace

In [None]:
import torch

#Import HuggingFace Transformer
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
#Fetch Meta's OPT LLM with 1.3 billion parameters. This is quite a small model compared to the SOTA like GPT4V, etc.

model = AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b')
tokenizer = AutoTokenizer.from_pretrained('facebook/opt-1.3b')

##### Open Pre-trained Transformer (OPT) is a collection of decoder-only transformer developed by Meta. 

In [None]:
input_text = 'I like CS205 Artificial Intelligence course, because'
tok_input = tokenizer(input_text, return_tensors='pt', add_special_tokens=True, truncation=True) #Create tokens from the given input and return PyTorch tensors

In [None]:
torch.manual_seed(123)

generated_output = model.generate(**tok_input, 
                                  max_new_tokens=200, 
                                  return_dict_in_generate=True, 
                                  do_sample=True) #Generation is deterministic. 
                                                  #To use top-k sampling, set do_sample=True to get different responses in each generation
                                                  #Set do_sample=False to have a deterministic generation each time


In [None]:
decoded_output = tokenizer.batch_decode(generated_output.sequences, skip_special_tokens=True)[0]
print(decoded_output)

#### We saw text generation in the previous subsection. Now lets explore text summarization, a critical usecase of LLM

#### What is an embedding?

#### What is RAG?

#### Retrieval Augmented Generation is a technique in generative AI to boost the knowledge of an LLM. The LLM parameters are learned and not updated to the current information, so a specialized database of knowledge (could be private) is created for the LLM is access. This is a non-parametric memory, i.e, this information is not stored in the learned paramaters of the LLM. 

#### Implementing RAG with Llama Index using ChromaDB as the vector database. Ollama is used to serve Llama 2 locally

In [40]:
import openai

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, download_loader
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import OpenAIEmbedding

from langchain.llms import Ollama

import chromadb

In [39]:
from dotenv import dotenv_values
from pathlib import Path

In [None]:
api_key = dotenv_values('../.env')["OPENAI_API_KEY"]
openai.api_key = api_key

In [None]:
#Set the embedding

llm = Ollama(model="llama2")
#embed_model = OllamaEmbeddings(base_url="http://localhost:11434", model="llama2") #Local Llama 2 embedding model
embed_model = OpenAIEmbedding() #Using OpenAI's text-embed-002

In [35]:
COLLECTION = "aiprof"
SLIDE_COLLECTION = 'slides'
PATH = '../chroma'

In [None]:
# create client and a new collection
db = chromadb.PersistentClient(path=PATH)
chroma_collection = db.get_or_create_collection(COLLECTION)

In [None]:
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

In [49]:
# load documents
documents = SimpleDirectoryReader("../data/AIMA/").load_data()

In [None]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
)

In [None]:
# load from disk
db2 = chromadb.PersistentClient(path=PATH)
chroma_collection = db2.get_or_create_collection(COLLECTION)

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

index2 = VectorStoreIndex.from_vector_store(
    vector_store,
    service_context=service_context,
)

In [None]:
query_engine = index2.as_query_engine()

In [None]:
resp = query_engine.query("What is The rational agent approach?")
print(resp.response)

In [None]:
resp = query_engine.query("Generate 2 concise questions about rational agents")

In [None]:
resp.response.split('\n')

#### Querying Slides (PPTx)

In [37]:
# create client and a new collection
slides_db = chromadb.PersistentClient(path=PATH)
slides_chroma_collection = slides_db.get_or_create_collection(SLIDE_COLLECTION)

In [38]:
vector_store = ChromaVectorStore(chroma_collection=slides_chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

In [50]:
slides_reader = download_loader("PptxReader")
loader = slides_reader()

print(type(SimpleDirectoryReader))

slides = loader.load_data(Path('../data/slides/4_Adversarial_Search.pptx'))

<class 'abc.ABCMeta'>


In [51]:
slides_index = VectorStoreIndex.from_documents(documents=slides, 
                                               storage_context=storage_context, 
                                               service_context=service_context)

In [52]:
slides_query = slides_index.as_query_engine()
resp = slides_query.query("What is Adversarial search?")
print(resp.response)

Based on the context information provided, Adversarial search is a game-playing search algorithm that assumes relaxation of assumptions where there is only one intelligent entity with explicit control over the "world." The algorithm considers the scenario where these assumptions are relaxed and explores the application of adversarial search to economics, business, politics, and war.

Adversarial search involves using a utility function to evaluate nodes in a game tree, which helps determine the best move for each player. The algorithm recursively applies the utility function to terminal nodes until backed-up values reach the initial state, providing the minimum score for Max.

The context information also highlights the limitations of depth-limited minimax search and the challenges of creating effective utility functions for games like Go, where even the most powerful supercomputers may require hundreds or thousands of years to achieve competitive performance.

Overall, Adversarial sea