# Interactive Introduction to ML and AI with a RAG-System

Based on a PDF containing a starter set of DND 5e character [sheets](https://dnd5echaractersheet.com/)


## sys admin

Create a .env file with the following content:

`OPENAI_API_KEY = "^<API_KEY>"`

In [2]:
# required libraries for the tutorial
import openai
import os
from dotenv import load_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

## Load PDF data
Loads the data and splits it into chunks.
Each chunk contains 1000 characters max with a max overlap of 100 characters.

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
loader = PyPDFLoader("data/StarterSet_Charactersv2.pdf")
chunks = loader.load_and_split(text_splitter)

### Check the chunks
get chunk content with: chunks[index].page_content

In [5]:
print(chunks[0])
print("The chunk contains " + str(len(chunks[0].page_content)) + " characters")

page_content='raceclass & level Player Name\ncharacter NameBackgrouNd\nexPerieNce PoiNts aligNmeNt\nSKILLSHit Point Maximum\nCP\neP\nPPgPsP\nEquIpmEntAttAcKS & SpELLcAStIng\nFEAturES & trAItSDeatH saves Hit DiCe\nNaMe atk BoNus DaMage/tyPetotal suCCesses\nFailuresiDeals\nBoNDs\nFlawsPersoNality traitsArmor\ncLASS\ncurrEnt hIt poIntS\ntEmporAry hIt poIntSInItIA tIvE SpEEd\n*see your equipment.  Strength\n  Dexterity\n  Constitution\n  Intelligence\n  Wisdom\n  Charisma\nPassive wisdom (PercePtion)\nOther prOficiencies & languages\n  Acrobatics (Dex)\n  Animal Handling (Wis)\n  Arcana (Int)\n  Athletics (Str)\n  Deception (Cha)\n  History (Int)\n  Insight (Wis)\n  Intimidation (Cha)\n  Investigation (Int)\n  Medicine (Wis)\n  Nature (Int)\n  Perception (Wis)\n  Performance (Cha)\n  Persuasion (Cha)\n  Religion (Int)\n  Sleight of Hand (Dex)\n  Stealth (Dex)\n  Survival (Wis)\nProficiency bonusstrength\ndexterity\nconstitution\nintelligence\nwisdom\ncharismasaving throwsinsPirationFighter

## Setup models

We need to prepare an embedding model to vectorise our chunks before storing them into our ChromaDB and a language model to generate answers to our questions.

In [6]:

# Load environment variables from .env file
load_dotenv()

# Access the API key using the variable name defined in the .env file
openai.api_key = os.getenv("OPENAI_API_KEY")

# Initialize the OpenAI chat model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)

# initialize the OpenAI embeddings model
embeddings = OpenAIEmbeddings()

### Load / Create Chroma DB

We check for the existence of the directory for 2 reasons:
1) We use Openai Embeddings and pay for the embedding generation
2) Chroma does not overwrite an existing database, but allows to upate it

In [7]:
if os.path.exists("chroma"):
    print("Loading Chroma from disk...")
    Chroma(persist_directory="chroma", embedding_function=embeddings)
else:
    chroma_db = Chroma.from_documents(documents=chunks,
                                    embedding=embeddings,
                                    persist_directory="chroma",
                                    collection_name="lc_chroma_demo")

### Test Your Database

In [8]:
query = "What is this document about?"

Simple Similarity Search

In [10]:
result = chroma_db.similarity_search(query)
print(result)

[Document(page_content='Halfling\xadrogue\xad(criminal),\xadpage\xad2\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 5, 'source': 'data/StarterSet_Charactersv2.pdf'}), Document(page_content='fine\xadclothes,\xadsignet\xadring,\xadscroll \xad \nof pedigree\n*While wearing this armor, you \nhave disadvantage on Dexterity (Stealth)\xadchecks.Lawful neutral\nHuman\xadfighter\xad(noble),\xadpage\xad1\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 0, 'source': 'data/StarterSet_Charactersv2.pdf'}), Document(page_content='delivered in your trances, your god has called you to a new mission. A goblin tribe has made its lair in an ancient ruin now called Cragmaw Castle, where they have defiled a shrine once 

Similarity Search with Scores

In [11]:
result_with_scores = chroma_db.similarity_search_with_score(query)
print(result_with_scores)

[(Document(page_content='Halfling\xadrogue\xad(criminal),\xadpage\xad2\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 5, 'source': 'data/StarterSet_Charactersv2.pdf'}), 0.4932325241139848), (Document(page_content='fine\xadclothes,\xadsignet\xadring,\xadscroll \xad \nof pedigree\n*While wearing this armor, you \nhave disadvantage on Dexterity (Stealth)\xadchecks.Lawful neutral\nHuman\xadfighter\xad(noble),\xadpage\xad1\xadof\xad2\xad TM\xad&\xad©\xad2014\xadWizards\xadof\xadthe\xadCoast\xadLLC.\xadPermission\xadis\xadgranted\xadto\xadphotocopy\xadthis\xaddocument\xadfor\xadpersonal\xaduse.', metadata={'page': 0, 'source': 'data/StarterSet_Charactersv2.pdf'}), 0.5097612371207857), (Document(page_content='delivered in your trances, your god has called you to a new mission. A goblin tribe has made its lair in an ancient ruin now called Cragmaw C

In [12]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=chroma_db.as_retriever())

In [14]:
response = chain.invoke(query)
print(response)

{'query': 'What is this document about?', 'result': "The document appears to be a character sheet for a Human Fighter (noble) in a Dungeons & Dragons game. It includes details about the character's background, alignment, personal goals, and a mission involving a goblin tribe defiling a shrine sacred to a deity. The character is Chaotic Good and aims to civilize a town called Phandalin."}


### Debug mode:

Dieser Modus zeigt den ganzen Verlauf der Chain an.

In [16]:
from langchain.globals import set_debug
set_debug(True)
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=chroma_db.as_retriever())

In [17]:
response = chain.invoke(query)
print(response)

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What is this document about?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What is this document about?",
  "context": "Halfling­rogue­(criminal),­page­2­of­2­ TM­&­©­2014­Wizards­of­the­Coast­LLC.­Permission­is­granted­to­photocopy­this­document­for­personal­use.\n\nfine­clothes,­signet­ring,­scroll ­ \nof pedigree\n*While wearing this armor, you \nhave disadvantage on Dexterity (Stealth)­checks.Lawful neutral\nHuman­fighter­(noble),­page­1­of­2­ TM­&­©­2014­Wizards­of­the­Coast­LLC.­Permission­is­granted­to­photocopy­this­document­for­personal­use.\n\ndelivered in your trances, your god has called you to a new mission. A goblin tribe has 

### Test some queries Yourself

In [15]:
def get_response(query:str):
    ## add the functionality to combine the functionalities above.

SyntaxError: incomplete input (1678908652.py, line 2)