<a href="https://colab.research.google.com/github/leakydishes/AppTruckSharing/blob/main/legacy_files/embeddings_qdrant_MiniLM_L6_v2_384.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div class="chatbot-title"><b>AI Chatbot: </b> Alcohol and Drug Foundation (ADF)</div>

<div class="chatbot-authors"><b>Project Manager: </b>Dotahn</div>

<div class="chatbot-authors"><b>Authored by (interns): </b>Te' Claire and Khuzaima Jamil</div>

<div class="chatbot-dates"><b>Dates: </b>December 2023/ January, February 2024</div>

<div class="chatbot-github"><b>GitHub Repo: </b>
<a href="https://github.com/Dotahn/ADFAIChatbot-Internship/tree/main"> Github Link</a></div>

<br>

<div class="chatbot-section-header"><b>Overview: </b>The aim of this project is to utilising an API to seamlessly integrate natural language capabilities (LLM models) into a chat application customised to Alcohol and Drug Foundation (ADF) website (https://adf.org.au/), while respecting ADF Artifical Intelligence Ethical Framework.
</div>

<br>

<div class="chatbot-sub-section"><b>embeddings_qdrant.ipynb: </b></div>

<div class="chatbot-sub-section">
  <i>Python script #3.5</i>
  <br>
  <ol>
    <li>Embeddings & Tokenizing, text into individual words upload to Qdrant</li>
    <li>Creates Knowledge Base for Generative Component</li>
  </ol>
</div>



---



##### Overview:

##Database:
######Qdrant [12], [13], [14]
*   Qdrant functions as a database and search engine for vectors, storing neural embeddings and the metadata (payload).
*    It uses an API to store, search, and manage vectors with an additional payload (metadata). This enables faster and more accurate retrieval of unstructured data.
*   An opensource alternative to Pinecone
*   Qdrant DB stores data in document/JSON format.
*   Qdrant and FlowiseAI [24].

##RAG:
1. **Embedding:** Embed your documents with an embedding model. Embedding a document means transforming its sentences or chunks of words into a vector of numbers. The idea is that sentences that are similar to each other should be close in terms of distance between its vectors and sentences that are different should be further away.
2. **Vector Store:** Once you've got a list of numbers, you can store them in a vector store like ChromaDB, FAISS, or Pinecone. A vector store is like a database but as the name says, it indexes and stores vector embeddings for fast retrieval and similarity search.
3. **Query:** Now that your document is embedded and stored, when you ask a specific question to an LLM, it will embed your query and find in the vector store the sentences that are the closest to your question in terms of cosine similarity for example.
4. **Answering Your Question:** Once the closest sentences have been found, they are injected into the prompt.

##### Embedding Models:
1. **GPT4 (OpenAI)**
  - Model: text-embedding-ada-002
  - Outputfile name: embeddings_text_embedding_ada_002_1536
  - Vector Dimensions (1536)

2. **Mixtral vanilla**
  - Model: mixtral-7b-8expert
  - Outputfile name: embeddings_mixtral_7b_8expert_1024
  - Vector Dimensions (1024)
mixtral-7b-8expert [27], [28], [29]
- Vector Dimensions (1024)
* Text Generation: Model is too large to load onto the free Inference API. To try the model, launch it on Inference Endpoints instead.

- Mixtral 8x7B, uses open-weights that outperforms well-known models like GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B in human benchmarks [24].
- What is MoE?
- Mistral 7x8B employs a router to assign 2 out of its 8 experts to each token, providing access to a whopping 47 billion parameters. Despite actively utilizing only 13 billion parameters during inference, Mixtral 8x7B outpaces the 70B Llama 2 on most benchmarks, achieving an impressive six times faster inference.
- Mixtral 8x7B [25] is an LLM that is more complex than Mistral 7B [26]
Mixtral AX7 (additional)
https://miro.medium.com/v2/resize:fit:640/format:webp/1*Vr0GjhpAlAZ8oLZjqImBNQ.png

3. **Claude 2.1**
  - Model: all-mpnet-base-v2
  - Outputfile name: embeddings_mpnet_base_v2_384
  - Vector Dimensions (384)
MPNET Base V2 with Sentence Embeddings
- all-mpnet-base-v2 https://huggingface.co/sentence-transformers/all-mpnet-base-v2
Model all-mpnet-base-v [25], [27]
*   Performance Sentence Embeddings (14 Datasets) 69.57
*   Performance Semantic Search (6 Datasets) 57.02
*   Avg. Performance 63.30
*   Speed 2800
*   Model Size 420 MB

<br>

### Embeddings
The Washington Post found in Googles C4 dataset, that the quality and quantity of embeddings are equally important to the model LLM training [3], [4]. When to fine-tuning models, the industry typically uses datasets (high-quality) to protect users from some unwanted content.\

*   The size of the embedding depends on the model that we choose.
*   The higher the cost, the more dimensions the embeddings will have, resulting in more accurate results, ie. Ada (1024), Babbage (2048), Curie (4096), Davinci (12288) [5].
<br>

- Vector Embedding is model-specific.
- Model to Model embedding may change.
- Techniques: Word Embeddings, Sentence Embeddings, or Contextual embedding, vector embeddings provide a compact and meaningful representation of textual data.


#### Additional Research
2. BGE Embedding [21], [22]
- bge-large-en-v1.5 https://huggingface.co/BAAI/bge-large-en-v1.5
- https://huggingface.co/BAAI/bge-large-en-v1.5
- BGE embedding is a general Embedding Model
- Different from other embedding models using mean pooling, BGE uses the last hidden state of [cls] as the sentence embedding: sentence_embeddings = model_output[0][:, 0]. If you use mean pooling, there will be a significant decrease in performance
- You can also use the bge models with sentence-transformers.
- First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.


##### Notes:
- As data is represented as vectors (high-dimensional space) with a 'id' and a 'payload', the elements need to be stored in a 'Collection' (a vector database like Qdrant).
- Where each element of the vector corresponds to a specific feature or attribute of the object.
- Vector translate dat into binary for similarity search (distance metrics), ie. Euclidean distance, cosine similarity. Vector database that can perform similarity searches increase efficiency in returning best distnace metrics.

##### Qdrant Cloud
*   API https://cloud.qdrant.io/
*   Create a new 'cluster' using the basic free tier version.
*   Cluster: 'adf_chatbot_embeddings'
*   Note: a cluster can have several Collections, each collection can contain one or more points (vectors).
*   Authorised with Github & API key added to JSON secrets file (qdrantKey)
<br>

##### Pre-requisites
1. Qdrant server instance (local Docker container)
2. The qdrant-client library to interact with the vector database.
3. A model API key or Hugging Faces Library
<br>

##### RAM Usage Rate Google Collab
- Usage rate: approximately 5.53 per hour V100 (High-ram)
- Usage rate: approximately 1.96 per hour T4 GPU (Low-ram)
- Usage rate: approximately 2.05 per hour T4 GPU (High-ram)

##Research

### Embeddings
The Washington Post found in Googles C4 dataset, that the quality and quantity of embeddings are equally important to the model LLM training [3], [4]. When to fine-tuning models, the industry typically uses datasets (high-quality) to protect users from some unwanted content.\

*   The size of the embedding depends on the model that we choose.
*   The higher the cost, the more dimensions the embeddings will have, resulting in more accurate results, ie. Ada (1024), Babbage (2048), Curie (4096), Davinci (12288) [5].
<br>






---



References
<br>
[1] https://blog.apify.com/what-is-data-labeling-in-ai/
<br>
[2] https://blog.apify.com/what-is-retrieval-augmented-generation/
<br>
[3] https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
<br>
[4] https://www.semanticscholar.org/paper/Documenting-the-English-Colossal-Clean-Crawled-Dodge-Sap/40c3327a6ddb0603b6892344509c7f428ab43d81?itid=lk_inline_enhanced-template
<br>
[5] https://huggingface.co/spaces/mteb/leaderboard
<br>
[6] https://neon.tech/blog/mistral-7b-and-baai-on-workers-ai-vs-openai-models-for-rag
<br>
[7] https://neon.tech/blog/mistral-7b-and-baai-on-workers-ai-vs-openai-models-for-rag
<br>
[8] https://arxiv.org/pdf/2310.06825.pdf
<br>
[9] https://docs.mistral.ai/
<br>
[10] https://blog.stackademic.com/building-a-multidocument-chatbot-using-mistral-7b-qdrant-and-langchain-1d9982186736
<br>
[11] https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/qdrant/Getting_started_with_Qdrant_and_OpenAI.ipynb
<br>
[12] https://qdrant.tech/documentation/overview/
<br>
[13] https://sidgraph.medium.com/building-a-youtube-chatbot-using-langchain-qdrant-and-mistral-7b-in-depth-guide-d46a7ad2af61
<br>
[14] https://medium.com/@karanshingde/power-your-rag-application-using-qdrantdb-mistral-8x7b-moe-langchain-and-streamlit-15cd90ad4d49
<br>
[15] https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
<br>
[16] https://mistral.ai/news/announcing-mistral-7b/
<br>
[17] https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
<br>
[18] https://docs.flowiseai.com/integrations/vector-stores/qdrant
<br>
[19] https://huggingface.co/sentence-transformers/all-mpnet-base-v2
<br>
[20] https://colab.research.google.com/github/qdrant/examples/blob/master/qdrant_101_getting_started/getting_started.ipynb
<br>
[21] https://github.com/FlagOpen/FlagEmbedding
<br>
[22] https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding
<br>
[23] https://medium.com/@ryanntk/choosing-the-right-embedding-model-a-guide-for-llm-applications-7a60180d28e3
<br>
[24] https://ai.plainenglish.io/mixture-of-experts-comprehensive-exploration-of-mixtral-8x7b-973184c1de27
<br>
[25] https://arxiv.org/pdf/2401.04088.pdf
<br>
[26] https://arxiv.org/pdf/2310.06825.pdf
<br>
[27] https://towardsdatascience.com/mistral-ai-vs-meta-comparing-top-open-source-llms-565c1bc1516e
<br>
[28] https://huggingface.co/DiscoResearch/mixtral-7b-8expert
<br>
[29] https://huggingface.co/docs/transformers/model_doc/mixtral

## Set Up
#### Install Dependencies

In [None]:
import warnings
warnings.filterwarnings('ignore')

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q torch==2.1.0
!pip install -q -U xformers
!pip install -q -U langchain
!pip install -q -U ctransformers[cuda]
!pip install sentence-transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for accelerate (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━

In [None]:
# Vector database
!pip install qdrant_client # Qdrant

Collecting qdrant_client
  Downloading qdrant_client-1.7.1-py3-none-any.whl (205 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/205.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m174.1/205.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m205.9/205.9 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Collecting grpcio-tools>=1.41.0 (from qdrant_client)
  Downloading grpcio_tools-1.60.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m48.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx[http2]>=0.14.0 (from qdrant_client)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
Collecting portalocker<3.0.0,>=2

In [None]:
# Check GPU
!nvidia-smi -L

GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-2b602f55-cf7b-0a18-a5e5-39a308c246f8)


#### Import Libraries

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import os, warnings, json, uuid, requests, torch
from sentence_transformers import SentenceTransformer
from huggingface_hub import notebook_login
from google.colab import drive
import datetime as dt

In [None]:
# Import RAG modules for RAG set up
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import PromptTemplate, LLMChain
from langchain.chains import RetrievalQA


# Vector database dependencies
from qdrant_client import QdrantClient # Qdrant

#### Drive

In [None]:
# Import and Mount Google Collab
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
%cd /content/drive/MyDrive/ADFAIChatbot
!ls

Mounted at /content/drive
/content/drive/.shortcut-targets-by-id/1OP0r8cO6DFjxo5iF0yrlyXaRPCme93DI/ADFAIChatbot
Docker	models	model_train  output_stats  python_scripts  secrets


#### Load API Keys


In [None]:
file_path = "/content/drive/MyDrive/ADFAIChatbot/secrets/secrets.json"
with open(file_path, "r") as file: # Read JSON
      keys = json.load(file)
      huggingfaceKey = keys["huggingfaceKey"] # Hugging Face
      supabase_token = keys["supabaseKey"] # Supabase
      supabase_url = keys["supabaseUrl"] # Supabase
      supabase_db = keys["supabaseDBPooler"] # Supabase
      qdrant_key_token = keys["qdrantKey"] # Qdrant
      qdrant_domain = keys["qdrantUrl"] # Qdrant

### Supabase Set Up
- API request
- GET Request (retrieve the data from table)
- Convert Data to Dataframe

In [None]:
# Supabase
supabase_url = supabase_url
supabase_api_key = supabase_token
table_name = "cleandata"

# Set up Headers
headers = {
    "Content-Type": "application/json",
    "apikey": supabase_api_key,
    "Authorization": f"Bearer {supabase_api_key}"
}

api_endpoint = f"{supabase_url}/rest/v1/{table_name}" # API endpoint
response = requests.get(api_endpoint, headers=headers) # Fetch data

# Request Check
if response.status_code == 200:
    data = response.json()
    # Convert JSON data to dataFrame
    cleaned_data = pd.DataFrame(data)
    print("Data loaded into DataFrame successfully.")
else:
    print(f"Error: {response.status_code} - {response.text}")

Data loaded into DataFrame successfully.


### Hugging faces

In [None]:
# Hugging Faces
# Authentication
from huggingface_hub import notebook_login
!huggingface-cli login

# Print user
!huggingface-cli whoami


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful
LeakyDishes
[1morgs: [0m Alcohol-and-Drug-Foundation


---

### Test Using ['text'] column from database for embeddings

In [None]:
# Use ['text'] column for embeddings
texts = cleaned_data['text'].tolist()  # Extract text from column

# Extract text from column ['text']
texts = cleaned_data['text'].tolist()
cleaned_data.head()

Unnamed: 0,id,created_at,markdown,text,url,last_updated,title,description,url_reduced
0,2971,2024-01-24T05:15:47.010268+00:00,"# Amyl nitrite\n\nLast published: November 23,...","amyl nitrite last published november 23, 2023 ...",https://adf.org.au/drug-facts/amyl-nitrite/,2023-11-23T00:00:00,amyl nitrite,"amyl nitrites effects, a depressant known for ...",adf.org.au/drug-facts
1,2972,2024-01-24T05:15:47.010268+00:00,# Anabolic steroids\n\nLast published: Novembe...,"anabolic steroids last published november 22, ...",https://adf.org.au/drug-facts/steroids/,2023-11-22T00:00:00,steroids,"anabolic steroids, their medical uses, and non...",adf.org.au/drug-facts
2,2973,2024-01-24T05:15:47.010268+00:00,"# Aspirin\n\nLast published: December 21, 2023...","aspirin last published december 21, 2023 what ...",https://adf.org.au/drug-facts/aspirin/,2023-12-21T00:00:00,asprin,aspirin acetylsalicylic acid is a pharmaceutic...,adf.org.au/drug-facts
3,2974,2024-01-24T05:15:47.010268+00:00,# Benzodiazepines\n\nLast published: November ...,"benzodiazepines last published november 22, 20...",https://adf.org.au/drug-facts/benzodiazepines/,2023-11-22T00:00:00,benzodiazepines,"understand benzodiazepines, their effects, ris...",adf.org.au/drug-facts
4,2975,2024-01-24T05:15:47.010268+00:00,"# Ayahuasca\n\nLast published: December 07, 20...","ayahuasca last published december 07, 2023 wha...",https://adf.org.au/drug-facts/ayahuasca/,2023-12-07T00:00:00,ayahuasca,"ayahuascas psychedelic effects, its traditiona...",adf.org.au/drug-facts


In [None]:
# langchain.document_loaders import TextLoader
# documents = [Document(page_content=text, metadata={"source": "local"}) for text in texts]

# Using only ['Text'] and additional columns for metadata
documents = [Document(
    page_content=record['text'],
    metadata={
        "url": record['url'],
        "title": record['title'],
        "description": record['description'],
        "last_updated": record['last_updated']
    }
) for index, record in cleaned_data.iterrows()]

####Set up RAG Components

#### Embeddings and Vectors
1.   Text Splitting, Embeddings (using documents module)
2.   Vector Store (using embeddings)

##### Change chunk_size and chunk_overlap values depending on embeddings output preferred

In [None]:
# Text is split into smaller chunks
# Chunk_size, chunk_overlap parameters define how text is split up
def get_chunks(documents):
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_documents(documents)
    return chunks

text_chunks = get_chunks(documents)

In [None]:
# Inspect the Document first Object
print(vars(text_chunks[0]))

{'page_content': 'amyl nitrite last published november 23, 2023 what is amyl nitrite? amyl nitrite is a depressant which means it slows down the messages travelling between the brain and body. classified as an inhalant, it belongs to a class of drugs known as alkyl nitrites, which also includes butyl nitrite, isobutyl nitrite and isopropyl nitrite. amyl nitrite is a vasodilator. vasodilators are medicines that cause the blood vessels in the body to dilate and the involuntary smooth muscles to relax, lowering blood pressure. what does it look like? amyl nitrite is an extremely flammable and highly volatile oil that is clear or yellowish in colour and comes in a small glass bottle. it typically has a distinct smell similar to dirty socks. other names poppers, jungle juice, liquid gold, rush, purple haze and buzz. how is it used? amyl nitrite has been used medically in the past for the treatment of angina chest pain, and has been used for the treatment of cyanide poisoning. recreationally

## Embedding Models
##### Embedding Models:
1. **Claude 2.1**
  - Model: all-mpnet-base-v2
  - Outputfile name: embeddings_mpnet_base_v2_384
  - Vector Dimensions (384)


In [None]:
from sentence_transformers import SentenceTransformer

# # Define the model
# model_name = "sentence-transformers/all-mpnet-base-v2"
# embedding_model = SentenceTransformer(model_name)

# # Assuming text_chunks is a list of objects where each object has a 'text' attribute
# # chunk_embeddings = [embedding_model.encode(chunk.page_content) for chunk in text_chunks]
# chunk_embeddings = [(embedding_model.encode(chunk.page_content), chunk.metadata) for chunk in text_chunks]
# print(chunk_embeddings[0])

# model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") # Old test
# Reference https://huggingface.co/sentence-transformers/all-mpnet-base-v2


model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedding_model = SentenceTransformer(model_name)

# Assuming text_chunks is a list of objects where each object has a 'text' attribute
# chunk_embeddings = [embedding_model.encode(chunk.page_content) for chunk in text_chunks]
chunk_embeddings = [(embedding_model.encode(chunk.page_content), chunk.metadata) for chunk in text_chunks]
print(chunk_embeddings[0])

(array([-7.55856931e-02, -4.98172343e-02,  4.40601818e-03,  7.41399406e-03,
        8.92868638e-03, -8.50624442e-02,  9.43308547e-02, -1.19197657e-02,
        1.30912796e-01, -7.72167593e-02, -7.58649781e-02,  2.29869448e-02,
       -8.12717080e-02,  8.66935030e-02, -8.45631361e-02,  5.80974072e-02,
        2.80641112e-02, -4.32347804e-02,  1.84224620e-02,  9.98330042e-02,
        3.31367701e-02,  6.14176132e-02, -2.84463298e-02,  1.28190508e-02,
       -4.67867702e-02,  8.00715294e-03, -2.51103137e-02, -1.94146372e-02,
        5.87008707e-03, -3.46222743e-02,  8.65346715e-02, -1.20327994e-02,
       -8.57718587e-02,  1.78273432e-02, -2.91611031e-02,  2.27504503e-02,
       -8.88954625e-02,  3.31952460e-02, -4.91235964e-02,  3.13298590e-02,
        3.85471396e-02, -5.50916381e-02, -7.59686604e-02,  2.50346400e-03,
        3.44036520e-03,  1.46877477e-02, -5.49393408e-02, -5.94010623e-03,
        1.21789286e-02, -5.25474995e-02, -2.71602478e-02, -8.16488788e-02,
        1.55356934e-03, 

In [None]:
# chunk_embeddings: list of tuples where the first element is the embedding
print(f"Type of first embedding: {type(chunk_embeddings[0][0])}")
print(f"Sample embedding: {chunk_embeddings[0][0]}")


Type of first embedding: <class 'numpy.ndarray'>
Sample embedding: [-7.55856931e-02 -4.98172343e-02  4.40601818e-03  7.41399406e-03
  8.92868638e-03 -8.50624442e-02  9.43308547e-02 -1.19197657e-02
  1.30912796e-01 -7.72167593e-02 -7.58649781e-02  2.29869448e-02
 -8.12717080e-02  8.66935030e-02 -8.45631361e-02  5.80974072e-02
  2.80641112e-02 -4.32347804e-02  1.84224620e-02  9.98330042e-02
  3.31367701e-02  6.14176132e-02 -2.84463298e-02  1.28190508e-02
 -4.67867702e-02  8.00715294e-03 -2.51103137e-02 -1.94146372e-02
  5.87008707e-03 -3.46222743e-02  8.65346715e-02 -1.20327994e-02
 -8.57718587e-02  1.78273432e-02 -2.91611031e-02  2.27504503e-02
 -8.88954625e-02  3.31952460e-02 -4.91235964e-02  3.13298590e-02
  3.85471396e-02 -5.50916381e-02 -7.59686604e-02  2.50346400e-03
  3.44036520e-03  1.46877477e-02 -5.49393408e-02 -5.94010623e-03
  1.21789286e-02 -5.25474995e-02 -2.71602478e-02 -8.16488788e-02
  1.55356934e-03  1.00651328e-02  4.41658758e-02 -7.46629983e-02
 -4.01593745e-02 -5.830

In [None]:
from qdrant_client import QdrantClient
from qdrant_client.http.models import VectorParams, Distance
import os

qdrant_client = QdrantClient(qdrant_domain, api_key=qdrant_key_token)

# Define the collection name
collection_name = "bot_all_MiniLM_L6_v2"
os.environ["qdrant_embeddings"] = collection_name

# Create Collection
qdrant_client.create_collection(
    collection_name=collection_name,
    # vectors_config=vectors_config
    vectors_config=VectorParams(size=384, distance=Distance.EUCLID),
)

True

### Data Insertion

In [None]:
def insert_into_qdrant(client, collection_name, embedded_chunks, batch_size=100):
    for i in range(0, len(embedded_chunks), batch_size):
        batch = embedded_chunks[i:i+batch_size]
        points = [{
            "id": i + j,
            "vector": embedding.tolist() if isinstance(embedding, np.ndarray) else embedding,
            "payload": {k: (str(v) if not pd.isna(v) else "") for k, v in metadata.items()}  # Handle NaN values and ensure all values are strings
        } for j, (embedding, metadata) in enumerate(batch)]

        # Debugging print
        print(f"Inserting batch {i // batch_size} with starting index {i}")
        # Uncomment the next line to see the format of the first point in the batch
        # print(points[0])

        try:
            client.upsert(collection_name=collection_name, points=points)
        except Exception as e:
            print(f"Error in batch starting at index {i}: {e}")
            break

insert_into_qdrant(client=qdrant_client, collection_name=collection_name, embedded_chunks=chunk_embeddings)


Inserting batch 0 with starting index 0
Inserting batch 1 with starting index 100
Inserting batch 2 with starting index 200
Inserting batch 3 with starting index 300
Inserting batch 4 with starting index 400


### Qdrant Check data
- Count points in collection
- Retrieve sample point
- Run query/ retrieve and inspect the vector data

In [None]:
# Count points in collection
def count_points_in_collection(client, collection_name):
    response = client.count(collection_name=collection_name)
    if hasattr(response, 'count'):
        return response.count  # If the count is directly accessible
    elif hasattr(response, 'result') and hasattr(response.result, 'count'):
        return response.result.count  # If the count is inside a 'result' attribute
    else:
        raise AttributeError("Unable to find the count attribute in the response")

# Now, try to get the count again
try:
    point_count = count_points_in_collection(qdrant_client, collection_name)
    print(f"Number of points in the collection: {point_count}")
except AttributeError as e:
    print(f"Error: {e}")


Number of points in the collection: 495


In [None]:
# Retrieve sample points
def retrieve_sample_point(client, collection_name, point_id):
    try:
        # Call the retrieve method
        response = client.retrieve(collection_name=collection_name, ids=[point_id])

        # Directly check and return the response
        if response:
            return response
        else:
            return "No response received from Qdrant."
    except Exception as e:
        print(f"Error retrieving point: {e}")
        return "An error occurred while retrieving the point."

# Example usage
sample_point_id = 100  # Replace with a valid ID from your collection
sample_point = retrieve_sample_point(qdrant_client, collection_name, sample_point_id)
print(f"Sample point retrieved: {sample_point}")


Sample point retrieved: [Record(id=100, payload={'description': 'celebrate finishing school safely at schoolies. stay with friends, pace drinking, watch belongings, know limits, and call 000 in emergencies.', 'last_updated': '2023-11-06T00:00:00', 'title': 'staying safe on schoolies', 'url': 'https://adf.org.au/insights/staying-safe-on-schoolies/'}, vector=None, shard_key=None)]


---

#### Retrieval Augmented Question Answering (QA) Setup:
##### Supabase: Custom function that interacts with Supabase to fetch vectors and perform retrieval operations.
*   RetrievalQA object: Combines a language model (llm) with the retriever to answer queries.
*   chain_type: Specifies how the language model and retriever interact.

Note: The exact nature of this interaction (how the retriever's results influence the language model's responses) depends on the chain_type.

### Model (LLM
*   mistralai/Mistral-7B-Instruct-v0.1)

In [None]:
# # Set Model
# # mistralai/Mistral-7B-Instruct-v0.1
# import torch
# from transformers import BitsAndBytesConfig
# quantization_config = BitsAndBytesConfig(
#     load_in_4bit=True,
#     bnb_4bit_compute_dtype=torch.float16,
#     bnb_4bit_quant_type="nf4",
#     bnb_4bit_use_double_quant=True,
# )

### Test LLM Model
*   Import components
*   Create pipeline (text-generation) for model


In [None]:
# # LLM model
# model_id = "mistralai/Mistral-7B-Instruct-v0.1"

# from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# model_4bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto",quantization_config=quantization_config, )
# tokenizer = AutoTokenizer.from_pretrained(model_id)
# pipeline = pipeline(
#         "text-generation",
#         model=model_4bit,
#         tokenizer=tokenizer,
#         use_cache=True,
#         device_map="auto",
#         max_length=500,
#         do_sample=True,
#         top_k=5,
#         num_return_sequences=1,
#         eos_token_id=tokenizer.eos_token_id,
#         pad_token_id=tokenizer.eos_token_id,
# )

# # Pipline for model
# llm = HuggingFacePipeline(pipeline=pipeline)

In [None]:
print(llm)

### Templates for Queries
*   Template for responses (context and question)
*   Specific question and context for response generation stored in variable

Note: This template reflects the ADFs focus on providing evidence-based information about minimising alcohol and drug harm. The chatbot is designed to assist users in navigating the website, offering information on ADF's programs, educational content, and aligning with the Australian National Drug Strategy.

In [None]:
# template = """<s>[INST] You are a helpful, respectful, and well-informed assistant, specifically designed to guide users on the Alcohol and Drug Foundation website. Your responses should be concise, accurate, and relevant to the ADF's mission of inspiring positive change and delivering evidence-based approaches to minimize alcohol and drug harm. Answer the question below based on the context provided, ensuring the information aligns with ADF's principles and content available on the website.
# {context}
# {question} [/INST] </s>
# """
# # Test template
# question_p = """What can you do?"""
# context_p = """As an AI assistant tailored for ADF, I focus on providing information and support related to harm minimisation and advocacy for people who use drugs. For YouGov poll data or insights, I would recommend reaching out directly to YouGov or accessing their official website where they publish their findings. If you have any questions related to the services ADF provides or if there's information I can offer you about harm minimisation, feel free to ask."""

# prompt = PromptTemplate(template=template, input_variables=["question","context"])
# llm_chain = LLMChain(prompt=prompt, llm=llm)
# response = llm_chain.invoke({"question":question_p,"context":context_p})
# response

In [None]:
# # Setup QA Chain
# qa_chain = RetrievalQA(
#     llm=llm,  # model
#     retriever=retriever,  # retriever
#     verbose=True
# )

# # Qdrant
# from langchain.chains import RetrievalQA
# qa = RetrievalQA.from_chain_type(
#     llm=llm,
#     chain_type="stuff",
#     retriever=retriever,
#     verbose=True
# )

### Queries
#### RetrievalQA chain
1. Processes the retrieving data based on the query
2. Generates embeddings for the query
3. Feeds this information into LLM (model)
4. Produces response


In [None]:
# def run_my_rag(qa, query):
#     print(f"Query: {query}\n")
#     result = qa.invoke(query)
#     print("\nResult: ", result)

In [None]:
# query =""" What is this company? """
# run_my_rag(qa, query)