# Part II: Building the AI Assistant

## Introduction

* [Attention Is All You Need](https://arxiv.org/pdf/1706.03762) Vaswani et al. (2017, Google Brain/ Research)
* 5 days to 1 million users (OpenAI)
* 1.8 billion monthly visits in March 2023 (OpenAI)

- **Agricultural Revolution**: Around 10,000 BCE, shift to settled farming.
- **Industrial Revolution**: Late 18th century, rise of industrialization.
- **Digital (Computer) Revolution**: Mid-20th century, advent of computers.
- **AI Revolution**: Early 21st century, integration of artificial intelligence.

### Agenda

* Introduction to LLMs
* Starting Docker Containers
* Build AI Assistant (Walkthrough & HandsOn)
    * Ingestion (Load, Split, Embed, Store)
    * Similarity Search
    * Combine Context
    * Response Generation
* Langserve and Streamlit App

* We will build a simple retrieval augmented generation (RAG) pipeline and complete HandsOn tasks.
* The notebook is based on [langchains rag intro](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb).
* We build towards a broader understanding of the RAG langscape langchain's [rag from scratch](https://github.com/langchain-ai/rag-from-scratch/tree/main).

### Learning Objectives

1. Complete Installation and Understand Functioning of Essential Tools  
2. **Understand the Basics of Large Language Models (LLMs)**
3. **Understand on a Programmatic Level how AI Assistants are Built**

## Starting The Application

### Clone GitHub Repo

```bash
# Clone the repository from the given URL
git clone https://github.com/marioisbeck/ai_workshop_tmc.git

# Change directory to the cloned repository
cd ai_workshop_tmc
```

### Run Docker Containers

#### Mac/ Linux

```bash
# Copy the example environment file to a new .env file
cp .env.example .env

```bash
# Build and start the Docker containers in detached mode
docker-compose up -d --build
```

#### Windows

```powershell
# Copy the example environment file to a new .env file
Copy-Item -Path ".env.example" -Destination ".env"
```

```bash
# Build and start the Docker containers in detached mode
docker-compose up -d --build
```

### Download LLMs

#### Mac/ Linux

```bash
# Pull the mxbai-embed-large model in the ollama container
docker exec ollama ollama pull mxbai-embed-large && \
# Pull the all-minilm model in the ollama container
docker exec ollama ollama pull all-minilm && \
# Pull the wizardlm2:7b model in the ollama container
docker exec ollama ollama pull wizardlm2:7b && \
# Run the wizardlm2:7b model in the ollama container
docker exec ollama ollama run wizardlm2:7b
```

#### Windows

```powershell
# Pull the mxbai-embed-large model in the ollama container
docker exec ollama ollama pull mxbai-embed-large; `
# Pull the all-minilm model in the ollama container
docker exec ollama ollama pull all-minilm; `
# Pull the wizardlm2:7b model in the ollama container
docker exec ollama ollama pull wizardlm2:7b; `
# Run the wizardlm2:7b model in the ollama container
docker exec ollama ollama run wizardlm2:7b
```

### Test Application

Open `Docker Desktop` and go to `Containers`. You should see something like this (all container icons should be green):

![assets/imgs/docker-compose_success.png](assets/imgs/docker-compose_success.png)

* **Application** Go to [http://localhost:8501](http://localhost:8501) and see if the frontend is there, upload a pdf (<5MB) and ask a question about it. If everything goes well, it should answer 😊.
* **Jupyter** Go to [http://localhost:8888](http://localhost:8888) where you should see the option to run the ai_workshop_tmx.ipynb notebook. Within this notebook you can follow along.

### Remove Containers

```bash
# Stop and remove all containers defined in the docker-compose.yml file
docker-compose down
```

## Client Journey 

* programming interest, proactive approach
* presented working chatbot prototype to client
* client invested in dedicated GPU server

## TMChampionship

* TEL organises/-ed a project journey towards a shark tank like investor pitch in November
* Milan and I - started TMChampionship project Prometheon.ai to build a sustainable manufacturing knowledge expert
* TEL has many wonderful opportunities for you to try new things, learn, connect and especially grow

## Prometheon.ai

[www.prometheon.ai](https://www.prometheon.ai)

**Theon**
> Theon helps manufacturing companies manage their data securely on-site and integrates with existing systems to improve decision-making and efficiency.

**Rachel**
> Rachel offers expert guidance on sustainable manufacturing through an AI chat interface, integrating supplier details, company data, and resources to promote eco-friendly practices while ensuring efficiency and profitability.

## Introduction to LLMs

Slides are shamelessly taken from 3Blue1Brown's [But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning](https://www.youtube.com/watch?v=wjZofJX0v4M&ab_channel=3Blue1Brown)

![/overview.jpeg](assets/imgs/overview.jpeg)

![/tokens.jpeg](assets/imgs/tokens.jpeg)

![/giving_meaning.jpeg](assets/imgs/giving_meaning.jpeg)

## Build AI Assistant

![assets/imgs/simple_rag.png](assets/imgs/simple_rag.png)

- **Ingestion**: Load and preprocess documents for further processing.
    - **Load**: Upload documents to the backend.
    - **Split**: Split documents into manageable chunks using characters, sections, semantic meaning, and delimiters.
    - **Embed**: Convert document chunks (and query) into vector embeddings for representation.
    - **Store**: Store the embeddings in a vector database (Vectorstore) for efficient retrieval.
- **Similarity Search**: Use the query embedding to search and retrieve the most relevant document chunks from the Vectorstore.
- **Combine Context**: Combine retrieved document chunks with the query to provide context for the generation model.
- **Response Generation**: Use a language model to generate a response based on the query and retrieved context.

### Setup

#### Keep it Clean

The following is only to suppress output which we do not care about in this workshop.

In [1]:
import warnings
import logging
import os
import numpy as np

# Setting USER_AGENT variable for jupyter notebook
os.environ['USER_AGENT'] = 'jovyan'

# Disable warnings
warnings.filterwarnings('ignore')

# Disable info messages
logging.getLogger().setLevel(logging.WARNING)

### Load Libraries

In [2]:
import bs4  # Library for web scraping and parsing HTML/XML
from langchain import hub  # Access langchain hub for pre-built tools and models
from langchain.text_splitter import RecursiveCharacterTextSplitter  # Tool to split text recursively by characters
from langchain_core.output_parsers import StrOutputParser  # Parses output into strings
from langchain_core.runnables import RunnablePassthrough  # Pass-through runnable for data processing
from langchain_community.chat_models import ChatOllama  # Chat model from Langchain Community
from langchain_community.embeddings.ollama import OllamaEmbeddings  # Ollama embeddings for text representation
from langchain_community.document_loaders import WebBaseLoader  # Load documents from the web
from langchain_community.vectorstores import Chroma  # Chroma vector store for efficient retrieval
from langchain_community.document_loaders import PyPDFLoader # reading in pdfs
from langchain.prompts import ChatPromptTemplate # class for promts

### Settings

In [3]:
OLLAMA_LARGE_LANGUAGE_MODEL = "wizardlm2:7b"  # Specifies the large language model version
OLLAMA_SERVER = "http://ollama:11434"  # URL for the Ollama server

In [4]:
question = "What is the TMC Entrepreneurial Lab?"

## Ingestion

### Load

![assets/imgs/simple_rag.png](assets/imgs/simple_rag.png)

[Document Loaders](https://python.langchain.com/docs/integrations/document_loaders/)

#### PDF Loader

In [5]:
# pdf document loader
loader = PyPDFLoader(
    "./backend/tmc_tel_lab.pdf"
)
docs = loader.load()

In [6]:
len(docs)

4

In [7]:
docs[0].page_content[:300]

"Room to play\nAt the Entrepreneurial Lab, we believe in the power of multidisciplinary collaboration and\nthe freedom to explore. Here, diverse teams bringing together various skills to tackle the\nmultifaceted challenges of today. It's where individual competencies unite to solve\ncomplex problems, dri"

#### HandsOn --- Web Loader

> How can you use the WebBaseLoader to load the contents of the following website: "https://www.themembercompany.com/nl/employeneurship"?

> How long is the page_content of the resulting document?

In [8]:
# # HandsOn: - Web Loader
# loader = WebBaseLoader(
#     web_paths=("https://www.themembercompany.com/nl/employeneurship",)
# )
# docs = loader.load()

In [9]:
# # show character length page content
# len(docs[0].page_content)

### Split

![assets/imgs/simple_rag__load.png](assets/imgs/simple_rag__load.png)

[Splitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/)

In [10]:
# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=200, 
    chunk_overlap=20
)

# Make splits
splits = text_splitter.split_documents(docs)

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [11]:
len(splits)

6

In [12]:
splits[0].page_content[:500]

"Room to play\nAt the Entrepreneurial Lab, we believe in the power of multidisciplinary collaboration and\nthe freedom to explore. Here, diverse teams bringing together various skills to tackle the\nmultifaceted challenges of today. It's where individual competencies unite to solve\ncomplex problems, driving innovative solutions through collective expertise.\nThrough hands-on, experienced-based learning, we create an environment where\nexperimentation is encouraged, failures are embraced as opportuniti"

In [13]:
docs[0].page_content

"Room to play\nAt the Entrepreneurial Lab, we believe in the power of multidisciplinary collaboration and\nthe freedom to explore. Here, diverse teams bringing together various skills to tackle the\nmultifaceted challenges of today. It's where individual competencies unite to solve\ncomplex problems, driving innovative solutions through collective expertise.\nThrough hands-on, experienced-based learning, we create an environment where\nexperimentation is encouraged, failures are embraced as opportunities for growth, and\nbreakthroughs are celebrated.\nWhether it's pioneering technology or disruptive healthcare solutions, the Entrepreneurial\nLab provides the space, resources, and collaborative spirit to turn your vision into a\nsuccessful venture.\nHow it works\nSounds promising, doesn't it? But how exactly does the Entrepreneurial Lab operate?\n> Our employeneurs work on their own innovation if they desire, but they do this\nalongside their client project.\n> TMC supports with a physi

In [14]:
splits[0].page_content

"Room to play\nAt the Entrepreneurial Lab, we believe in the power of multidisciplinary collaboration and\nthe freedom to explore. Here, diverse teams bringing together various skills to tackle the\nmultifaceted challenges of today. It's where individual competencies unite to solve\ncomplex problems, driving innovative solutions through collective expertise.\nThrough hands-on, experienced-based learning, we create an environment where\nexperimentation is encouraged, failures are embraced as opportunities for growth, and\nbreakthroughs are celebrated.\nWhether it's pioneering technology or disruptive healthcare solutions, the Entrepreneurial\nLab provides the space, resources, and collaborative spirit to turn your vision into a\nsuccessful venture.\nHow it works\nSounds promising, doesn't it? But how exactly does the Entrepreneurial Lab operate?\n> Our employeneurs work on their own innovation if they desire, but they do this\nalongside their client project."

In [15]:
splits[1].page_content

'alongside their client project.\n> TMC supports with a physical lab and possibly even with financial resources.\n> The ownership of the innovation remains of our employeneurs; TMC has no interest in\nit.\nBUILDING THE FUTURE IN\nThe Entrepreneurial Lab\nThis is the place where you can let your technical dreams come true. If you\nhave a groundbreaking idea for a new innovation, you can initiate your\nproject and take it from concept to reality with the support of a vibrant\ncommunity of like-minded innovators.\x00 \x00'

### Embed

![assets/imgs/simple_rag__embed.png](assets/imgs/simple_rag__embed.png)

[Text embedding models](https://python.langchain.com/docs/integrations/text_embedding/openai)

#### Text Embedding

In [16]:
OLLAMA_EMBEDDING_MODEL = "mxbai-embed-large"

In [17]:
embedding = OllamaEmbeddings(model=OLLAMA_EMBEDDING_MODEL, base_url=OLLAMA_SERVER)
query_result = embedding.embed_query(question)
split_result = embedding.embed_query(splits[0].page_content)

In [18]:
len(query_result)

1024

In [19]:
len(split_result)

1024

In [20]:
split_result[:4]

[0.6084602475166321,
 0.10699772834777832,
 -0.6222756505012512,
 -0.09857344627380371]

#### Cosine Similarity

[Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions) is reccomended (1 indicates identical).

In [21]:
def cosine_similarity(vec1, vec2, print_output = False):
    
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    
    similarity = dot_product / (norm_vec1 * norm_vec2)
    
    if print_output:
        print("Cosine Similarity:", similarity)
    
    return similarity

In [22]:
similarity = cosine_similarity(query_result, split_result, True)

Cosine Similarity: 0.7252169697102998


#### HandsOn --- Better Embedding

> Write code to use the more sophisticated `mxbai-embed-large` instead of the `all-miniml` embedding model with the local Ollama instance. This enables better performance and more accurate results.

In [23]:
# OLLAMA_EMBEDDING_MODEL = "all-minilm"

In [24]:
# embedding = OllamaEmbeddings(model=OLLAMA_EMBEDDING_MODEL, base_url=OLLAMA_SERVER)
# query_result = embedding.embed_query(question)
# split_result = embedding.embed_query(splits[0].page_content)

In [25]:
# similarity = cosine_similarity(query_result, split_result, True)

#### HandsOn --- Stroopwafel

> Similar to the `Japan - Germany` example from the `Introduction to LLMs` we will now calculate the distance between Netherlands and Germany in the vector space. This we can then use to understand what item in Germany corresponds to what the stroopwafel is in the Netherlands.

In [26]:
OLLAMA_EMBEDDING_MODEL = "mxbai-embed-large"
embedding = OllamaEmbeddings(model=OLLAMA_EMBEDDING_MODEL, base_url=OLLAMA_SERVER)

In [27]:
# Assuming embedding is some pre-trained embedding model with an embed_query method
words = ["Bratwurst", "Mercedes", "Schwarzwälder Kirschtorte", "Berliner", "Lebkuchen"]

In [28]:
# getting vectors of tokens/ words
stroopwafel_embedding = np.array(embedding.embed_query("Stroopwafel"))
netherlands_embedding = np.array(embedding.embed_query("Netherlands"))
germany_embedding = np.array(embedding.embed_query("Germany"))

In [29]:
# calculating the comparison vector
comparison_embedding = stroopwafel_embedding - (netherlands_embedding - germany_embedding)

In [30]:
# initiating variables
highest_similarity = -1
closest_word = None

In [31]:
# running the loop
for word in words:
    
    # embedding the query word
    word_embedding = np.array(embedding.embed_query(word))

    # generating output
    print(word)
    similarity = cosine_similarity(comparison_embedding, word_embedding, True)
    print("")

    # capturing highest similarity
    if similarity > highest_similarity:
        highest_similarity = similarity
        closest_word = word

Bratwurst
Cosine Similarity: 0.7104238314729429

Mercedes
Cosine Similarity: 0.5495463777308662

Schwarzwälder Kirschtorte
Cosine Similarity: 0.6412197961550299

Berliner
Cosine Similarity: 0.7296874018682962

Lebkuchen
Cosine Similarity: 0.6988265639666382



In [32]:
# final evaluation
print(f"The word closest to 'stroopwafel' is '{closest_word}' with a cosine similarity of {highest_similarity}.")

The word closest to 'stroopwafel' is 'Berliner' with a cosine similarity of 0.7296874018682962.


### Store

![assets/imgs/simple_rag__store.png](assets/imgs/simple_rag__store.png)

[Vectorstores](https://python.langchain.com/docs/integrations/vectorstores/)

In [33]:
vectorstore = Chroma.from_documents(
    collection_name=OLLAMA_EMBEDDING_MODEL,
    documents=splits,
    embedding=OllamaEmbeddings(model=OLLAMA_EMBEDDING_MODEL, base_url=OLLAMA_SERVER)
)

In [34]:
try:
    Chroma.delete_collection(vectorstore)

    vectorstore = Chroma.from_documents(
        collection_name=OLLAMA_EMBEDDING_MODEL,
        documents=splits,
        embedding=OllamaEmbeddings(model=OLLAMA_EMBEDDING_MODEL, base_url=OLLAMA_SERVER)
    )
    
except:
    vectorstore = Chroma.from_documents(
        collection_name=OLLAMA_EMBEDDING_MODEL,
        documents=splits,
        embedding=OllamaEmbeddings(model=OLLAMA_EMBEDDING_MODEL, base_url=OLLAMA_SERVER)
    )

In [35]:
vectorstore.get(limit = 1)

{'ids': ['6ba9d156-ab8e-4b7b-b81e-a4a02f0b897a'],
 'embeddings': None,
 'metadatas': [{'page': 3, 'source': './backend/tmc_tel_lab.pdf'}],
 'documents': ['Careers\nGraduate programs\nVIE Program\nCorporate vacancies\nAll vacancies\nService areas\nTechnology & Engineering\nDigital & IT\nEnergy & Renewables\nLife Sciences & Pharma\nDiscover TMC\nAbout us\nUpdates\nFAQ\nContact\n© 2024 TMC |Terms and conditions |Privacy statement |Cookie Statement |Settings'],
 'uris': None,
 'data': None,
 'included': ['metadatas', 'documents']}

### Similarity Search

![assets/imgs/simple_rag__similarity_search.png](assets/imgs/simple_rag__similarity_search.png)

In [68]:
returned_docs = vectorstore.similarity_search_with_relevance_scores(question, k = 2)

In [69]:
returned_docs

[(Document(page_content='alongside their client project.\n> TMC supports with a physical lab and possibly even with financial resources.\n> The ownership of the innovation remains of our employeneurs; TMC has no interest in\nit.\nBUILDING THE FUTURE IN\nThe Entrepreneurial Lab\nThis is the place where you can let your technical dreams come true. If you\nhave a groundbreaking idea for a new innovation, you can initiate your\nproject and take it from concept to reality with the support of a vibrant\ncommunity of like-minded innovators.\x00 \x00', metadata={'page': 0, 'source': './backend/tmc_tel_lab.pdf'}),
  -111.32334465887976),
 (Document(page_content="Room to play\nAt the Entrepreneurial Lab, we believe in the power of multidisciplinary collaboration and\nthe freedom to explore. Here, diverse teams bringing together various skills to tackle the\nmultifaceted challenges of today. It's where individual competencies unite to solve\ncomplex problems, driving innovative solutions through 

### Combine Context

![assets/imgs/simple_rag__combine_context.png](assets/imgs/simple_rag__combine_context.png)

In [41]:
# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'))])

In [42]:
# LLM
llm = ChatOllama(model=OLLAMA_LARGE_LANGUAGE_MODEL, base_url=OLLAMA_SERVER)

In [43]:
# Chain
chain = prompt | llm

### Response Generation

![assets/imgs/simple_rag__combine_context.png](assets/imgs/simple_rag__response_generation.png)

[RAG chains](https://python.langchain.com/docs/expression_language/get_started#rag-search-example)

In [44]:
# Invoking the RAG chain
response = chain.invoke({"context":returned_doc[0], "question":question})
response.content

' Based on the provided context, the TMC Entrepreneurial Lab is a facility or program that supports innovative projects initiated by its client project teams, which include employeneurs (self-employed professionals who work within an organization like TMC). The lab provides a physical space (a "vibrant community of like-minded innovators") where these teams can develop and build their ideas from conception to reality. It offers both the necessary infrastructure (a lab) and possibly financial support. Notably, TMC does not claim ownership over the innovations that are developed within its Entrepreneurial Lab; the intellectual property remains with the employeneurs who initiate these projects. The lab\'s purpose is to help build the future by fostering technological advancements and entrepreneurial spirit.'

In the `chain.invoke()` example above we used directly the result output of a similarity search of the vector database. Langchain has a better approach for this via retrievers.

#### Retriever

In [70]:
# here we create a retriever from the vectorstore which can perform similarity search and returns one document
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 1}, )

In [46]:
# with this retreiver the context (relevant split) is directry passed to the question addressing the LLM.
response = chain.invoke({"context":retriever,"question":question})
response.content

'Based on the context provided, which includes \'Chroma\' and \'OllamaEmbeddings\' within tags and a reference to a `Chroma` object from the `langchain_community.vectorstores` module, it seems like you are referring to a system or application that uses Chroma, which is a vector store for similarity retrieval, possibly in the context of a larger system or service named "TMC Entrepreneurial Lab."\n\nHowever, without additional context specific to "TMC Entrepreneurial Lab," it\'s not possible to provide a definitive answer about what it is. The term "TMC Entrepreneurial Lab" does not directly relate to the provided code or tags. It could be a separate entity, a project, or a program that utilizes similar technologies for search or data retrieval tasks, potentially leveraging vector embeddings (like those from \'OllamaEmbeddings\') stored in a Chroma vector store.\n\nIf "TMC Entrepreneurial Lab" is indeed a specific initiative, organization, or product that uses these technologies, you wou

#### HandsOn --- Answer not in Splits

> What happens if the answer is not in the splits of any retreived document?

In [48]:
response = chain.invoke({"context":retriever,"question":"What is a large language model?"})
response.content

' A large language model (LLM) is an artificial intelligence system designed to understand, generate, and work with human language at a level that can often seem coherent and contextually relevant. These models are typically trained on vast amounts of text data and have the capability to perform a wide range of language-related tasks, such as translation, question answering, summarization, and more.\n\nThe term "large" refers to the model\'s size in terms of the number of parameters it contains—ranging from millions to hundreds of billions. Examples of large language models include OpenAI\'s GPT (Generative Pre-trained Transformer) series, Google\'s BERT (Bidirectional Encoder Representations from Transformers), and others like T5 (Text-to-Text Transfer Transformer).\n\nThe context you provided mentions \'Chroma\' and \'OllamaEmbeddings\' within a technical setting that suggests the use of these tools or libraries for handling vector data. Chroma is a vector store that allows for effic

## HandsOn

1. Rerun the overall app using a question which relates to the [website](https://www.themembercompany.com/nl/employeneurship). You first will have to load the website via WebBaseLoader.
2. Directly ask the llm something ...

In [55]:
# LLM
llm = ChatOllama(model=OLLAMA_LARGE_LANGUAGE_MODEL, base_url=OLLAMA_SERVER)

In [56]:
answer = llm.invoke("Tell me a joke about the weather.")

In [57]:
answer.content

'Sure, here\'s a light-hearted weather joke for you:\n\nWhy don\'t clouds ever get cold?\n\nBecause most of their sweat evaporates before it reaches their feet! (A play on words with "evaporation" and "sweat")\n\nRemember, jokes about the weather are often as changeable as the forecast itself!'

## Summary

Implementation of Simple Retrieval-Augmented Generation (RAG) from Scratch
- **Ingestion Phase**:
    - **Load**: Loading documents into the system.
    - **Split**: Splitting documents into manageable chunks.
    - **Embed**: Embedding document chunks into vector representations.
    - **Store**: Storing the embedded documents in a vector store.
- **Similarity Search**: Searching for relevant documents using embedded query.
- **Combine Context**: Combining the retrieved document context with the query.
- **Response Generation**: Generating the final response using a Large Language Model (LLM).

## Next Steps

There is plenty more to discover at [langchain's](https://github.com/langchain-ai) and many other websites! Especially check out: [YouTube](https://www.youtube.com/watch?v=sVcwVQRHIc8&ab_channel=freeCodeCamp.org) and [Github](https://github.com/langchain-ai/rag-from-scratch/tree/main). Here an overview:

![assets/imgs/langchain_rag_overview.png](assets/imgs/langchain_rag_overview.png)

## Thank You

- **Technical Support**: Milan and Raul
- **Organisational Support**: Marlies, Wendy, and Varsha
- **Motivational Support**: TMChampionship/ TEL/ Pepijn

## Feedback

![assets/imgs/ai_workshop_tmc__feedback.png](assets/imgs/ai_workshop_tmc__feedback.png)

https://forms.office.com/e/CwRvint3LY?origin=lprLink

## Appendix

- **credits**: this notebook heavily borrows from langchain's [rag_from_scratch_1_to_4.ipynb]("https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb")

* **GPT**: Generative Pre-trained Transformer, a type of language model developed by OpenAI that generates human-like text using transformer architecture.
* **LLM**: Large Language Model, a machine learning model trained on vast amounts of text data to understand and generate human language.
* **Transformer**: Deep learning model using attention mechanism for context understanding and parallel processing, introduced in the "Attention is All You Need" paper.
* **Embedding Models**: Convert text to vector representations (e.g., BERT).
* **Generation Models**: Generate text from prompts (e.g., GPT-3).
* **Softmax Function**: Converts values to probabilities, used in classification models.
* **Fine-Tune vs. Retrieval "Augmented Generation**
    * **Fine-Tuning an LLM**: Adapts model to specific tasks using labeled data.
    * **RAG (Retrieval-Augmented Generation)**: Combines retrieval with generation for context-specific responses.
* **Micro Timeline**
    * **2017**: "Attention is All You Need" paper.
    * **2018**: BERT, GPT-2
    * **2020**: GPT-3.
* **Quantization**: Reduces precision of model parameters.
    * **Benefits**: Smaller size, faster inference, lower power consumption.
    * **Types**: Static, Dynamic, Quantization-Aware Training.
    * **Challenges**: Accuracy loss, hardware support needed.
* **Not all LLMs are GPTs**: Other models include BERT, T5, XLNet, RoBERTa.
* **Not all LLMs use transformers**: Other architectures include RNNs, CNNs, MoE, Memory-Augmented Networks.