# Main Jupyter Notebook for the CS205 Final Project

In this project we will explore a usecase of Large Language Models
 
We will start with how to use an LLM through HuggingFace, and explain some of the basic concepts behind an LLM. Once we have a good understading of how to use an LLM for generating text, we will explore Retrieval Augmented Generation (RAG). 

For this project we have used Llama 2 7 Billion paramter model with OpenAI's text-embed-002 embedding model. Llama 2 7B was served locally by Ollama. We have used Llama Index and LangChain to interact with the LLM

The data store is at the root of the project directory with the name 'data'. Create a data repository before running the indexing and query cells

### Let's understand how to use an LLM using HuggingFace

In [None]:
import torch

#Import HuggingFace Transformer
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
#Fetch Meta's OPT LLM with 1.3 billion parameters. This is quite a small model compared to the SOTA like GPT4V, etc.

model = AutoModelForCausalLM.from_pretrained('facebook/opt-1.3b')
tokenizer = AutoTokenizer.from_pretrained('facebook/opt-1.3b')

##### Open Pre-trained Transformer (OPT) is a collection of decoder-only transformer developed by Meta. 

In [None]:
input_text = 'I like CS205 Artificial Intelligence course, because'
tok_input = tokenizer(input_text, return_tensors='pt', add_special_tokens=True, truncation=True) #Create tokens from the given input and return PyTorch tensors

In [None]:
torch.manual_seed(123)

generated_output = model.generate(**tok_input, 
                                  max_new_tokens=200, 
                                  return_dict_in_generate=True, 
                                  do_sample=True) #Generation is deterministic. 
                                                  #To use top-k sampling, set do_sample=True to get different responses in each generation
                                                  #Set do_sample=False to have a deterministic generation each time


In [None]:
decoded_output = tokenizer.batch_decode(generated_output.sequences, skip_special_tokens=True)[0]
print(decoded_output)

#### We saw text generation in the previous subsection. Now lets explore text summarization, a critical usecase of LLM

#### What is an embedding?

#### What is RAG?

 Retrieval Augmented Generation is a technique in generative AI to boost the knowledge of an LLM. The LLM parameters are learned and not updated to the current information, so a specialized database of knowledge (could be private) is created for the LLM is access. This is a non-parametric memory, i.e, this information is not stored in the learned paramaters of the LLM. 

#### Implementing RAG with Llama Index using ChromaDB as the vector database. Ollama is used to serve Llama 2 locally

The data can be accessed at this link. Add this folder to the root of the project directory

https://drive.google.com/drive/folders/10wzRErO4Zlj6L3bLSqh9QtTLDkaUOuxS?usp=drive_link

In [1]:
import openai

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, download_loader
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import OpenAIEmbedding
from langchain.llms import Ollama
from dotenv import dotenv_values
from pathlib import Path
import chromadb
import os

In [2]:
api_key = dotenv_values('../.env')["OPENAI_API_KEY"]
openai.api_key = api_key

In [3]:
#Set the embedding

llm = Ollama(model="llama2")
#embed_model = OllamaEmbeddings(base_url="http://localhost:11434", model="llama2") #Local Llama 2 embedding model
embed_model = OpenAIEmbedding() #Using OpenAI's text-embed-002

In [4]:
COLLECTION = "aiprof"
SLIDE_COLLECTION = 'slides'
PATH = '../chroma'

In [5]:
# create client and a new collection
db = chromadb.PersistentClient(path=PATH)
chroma_collection = db.get_or_create_collection(COLLECTION)

In [6]:
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults()

In [7]:
# load documents
documents = SimpleDirectoryReader("../data/AIMA", recursive=True).load_data()

In [8]:
slides_reader = download_loader("PptxReader")
loader = slides_reader()
slides = []

for file in os.listdir("../data/slides/"):
    slides += loader.load_data("../data/slides/" + file)

In [9]:
information = documents + slides

In [10]:
index = VectorStoreIndex.from_documents(
    information, storage_context=storage_context, service_context=service_context
)

In [11]:
# load from disk
db2 = chromadb.PersistentClient(path=PATH)
chroma_collection = db2.get_or_create_collection(COLLECTION)

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

index2 = VectorStoreIndex.from_vector_store(
    vector_store,
    service_context=service_context,
)

In [12]:
query_engine = index2.as_query_engine()

In [13]:
resp = query_engine.query("What is The rational agent approach?")
print(resp.response)

The rational agent approach refers to designing an intelligent agent that makes decisions and takes actions based on rationality. A rational agent is one that takes actions that maximize its expected utility, given its knowledge and beliefs about the world. This approach involves defining the agent's goals, specifying its knowledge and capabilities, and designing algorithms or mechanisms for decision-making and action selection. The rational agent approach is a fundamental concept in artificial intelligence and is used to develop intelligent systems that can solve complex problems and make optimal decisions.


In [14]:
resp = query_engine.query("Generate 2 concise questions about rational agents")

In [15]:
resp.response.split('\n')

['What are some key characteristics of rational agents?',
 'How do rational agents make decisions?']

In [16]:
resp = query_engine.query("write pseudo code for Hill Climbing Local Search")
print(resp.response)

function HILL-CLIMBING(problem) returns a solution state
     inputs: problem, a problem
     static: current, a node; next, a node
     current <- MAKE-NODE(INITIAL-STATE[problem])
     loop do
         next <- a highest-valued successor of current
         if VALUE[next] < VALUE[current] then return current
         current <- next
     end


In [18]:
resp = query_engine.query("what are the expectations for fall 2023 final project and when is the deadline")
print(resp.response)

The expectations for the Fall 2023 final project include preparing a tutorial lesson using Google CoLab or a similar platform to demonstrate a favorite Machine Learning technique or an advanced topic in AI. The project should include a motivating question, a real dataset related to the question, and a method to approach answering the question. The project should be unique and have the student's stamp of uniqueness. It should be structured like an end-to-end tutorial with clear illustrations and explanations. All code should be well-commented and understandable. Preparatory work should be clearly documented. The tutorial should be succinct, complete, and submitted via a Google Form. The deadline for the project is tentatively set for November 30, 2023.


In [20]:
resp = query_engine.query("why is pluto not a planet? is this information present in the database")
print(resp.response)

Yes, the information about why Pluto is not considered a planet is not present in the given context.


In [23]:
resp = query_engine.query("is this censored?")
print(resp.response)

There is no information in the given context that indicates whether the content is censored or not.
