## Chat with `Ismail Ouahbi` using [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) based on `textual data` from [His portfolio](https://ismailouahbi.github.io/)
* Applying `4bit loading`.
* Using [LangChain](https://python.langchain.com/docs/get_started/introduction) framework and [FAISS](https://ai.meta.com/tools/faiss/) or Facebook AI Similarity Search to quickly search for embeddings of multimedia documents that are similar to each other.
* By: [Ismail](https://www.linkedin.com/in/ismail-ouahbi/) (30/08/2023)

#### Introduction to LLMs

`Large language models (LLMs)` are natural language processing computer programs that use artificial neural networks to generate text.
Some notable ones are `GPT-3`, `GPT-4`, `LaMDA`, `BLOOM`, and `LLaMA`.
LLMs power many applications, such as `AI chatbots` and `AI search engines`.


* Wikipedia

#### Why LLaMA2?

<figure>
<center>
<img src='https://raw.githubusercontent.com/ismailouahbi/Supercharging_LLAMA2-7b-chat-hf/main/assets/llama2.png' />
<figcaption>Why Llama2</figcaption></center>
</figure>

* [credit](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf#llama-2)



---



#### Introduction to LangChain and why use it?

<figure>
<center>
<img src='https://raw.githubusercontent.com/ismailouahbi/Supercharging_LLAMA2-7b-chat-hf/main/assets/langchain.jpg' />
<figcaption>Introduction to LangChain</figcaption></center>
</figure>

* [credit](https://python.langchain.com/docs/get_started/introduction.html)

#### Chatbot as an LLM use-case

<figure>
<center>
<img src='https://raw.githubusercontent.com/ismailouahbi/Supercharging_LLAMA2-7b-chat-hf/main/assets/chatbot_lllms.jpg' />
<figcaption>Chabots using Llms</figcaption></center>
</figure>

* [credit](https://python.langchain.com/docs/use_cases/chatbots)



---



**Note: Before starting let us clarify one thing,**

> ***What is the difference between fine tuning llama2 and make it able to chat with your documents and own data?***

> ***Answer:***

`Fine-tuning` is about adapting the model to better understand and generate content related to a specific domain, while `making it chat with your documents and own data` is about integrating your data into the model’s workflow to provide more accurate and personalized responses.
These two processes can be combined to create a highly customized and efficient chatbot based on LLaMA 2.

credit: Bing









---



#### Start the process

# The below image depicts our project overall workflow

<figure>
<center>
<img src='https://github.com/ismailouahbi/Supercharging_LLAMA2-7b-chat-hf/blob/main/assets/workflow.png?raw=true' />
<figcaption>Overall workflow design</figcaption></center>
</figure>



---



#### Make sure you are using a GPU

<figure>
<center>
<img src='https://raw.githubusercontent.com/ismailouahbi/Supercharging_LLAMA2-7b-chat-hf/main/assets/gpu_steps.png' />
<figcaption>Activate collab's GPU</figcaption></center>
</figure>

In [1]:
# Create a virtual environment to hold the project dependenices

# install virtualenv library
!pip install virtualenv

# create a virtual environment named env
!virtualenv env

Collecting virtualenv
  Downloading virtualenv-20.25.0-py3-none-any.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting distlib<1,>=0.3.7 (from virtualenv)
  Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.9/468.9 kB[0m [31m39.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: distlib, virtualenv
Successfully installed distlib-0.3.8 virtualenv-20.25.0
created virtual environment CPython3.10.12.final.0-64 in 1270ms
  creator CPython3Posix(dest=/content/env, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,Py

In [2]:
# activate the env virtual environment
!source env/bin/activate

In [4]:
#### Intall the required libraries (in quite mode using -q)
!pip install -qU transformers torch==2.1.0 accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.2/670.2 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.8/209.8 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.2/89.2 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 MB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.8/211.8 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h



---



#### Initializing Hugging Face's Pipeline

In [5]:
# import necessary libraries
import torch
from torch import cuda, bfloat16
import transformers

**Note**


> The [NVIDIA® CUDA® Toolkit](https://developer.nvidia.com/cuda-toolkit) provides a development environment for creating high performance GPU-accelerated applications.
With the CUDA Toolkit, you can develop, optimize, and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms, and HPC supercomputers

In [6]:
# get model id (from hugging face website)
model_id = 'meta-llama/Llama-2-7b-chat-hf'

# ensure GPU is enabled
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

In [8]:
# Make sure the GPU runtime is enabled
print(device)

cuda:0


In [9]:
# The bitsandbytes library is designed to optimize the performance of deep learning models on NVIDIA GPUs using quantization techniques
bitsandbytes_config = transformers.BitsAndBytesConfig(
    # Load model weights in a 4-bit format (to reduce memory costs)
    load_in_4bit=True,
    # Specify the type of quantization used
    bnb_4bit_quant_type='nf4',
    # Apply double quantization to maintain a better model performance
    bnb_4bit_use_double_quant=True,
    # Specify the data type that is used for computations during the quantization process
    bnb_4bit_compute_dtype=torch.float16
)

In [10]:
# Initialize HuggingFace with private token
from google.colab import userdata

# Instantiate the object to get the model
model_config = transformers.AutoConfig.from_pretrained(
    # model id
    model_id,
    # replace the userdata.get('hf_auth') with your HuggingFace's private token
    use_auth_token=userdata.get('hf_auth')
)



config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

#### Download the model

In [11]:
# download the model from HF using predefined configs
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bitsandbytes_config,
    device_map='auto',
    use_auth_token=userdata.get('hf_auth')
)



model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [12]:
# Download the tokenizer to translate human readable plaintext to LLM readable token IDs
  # LlaMA2 uses Llama 2 7B tokenizer

tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=userdata.get('hf_auth')
)



tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [13]:
# Implement the final HuggingFace pipeline

generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    temperature=0.02,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=1000,  # max number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

#### Simple test -without LnagChain-

In [14]:
# Test with some text

res = generate_text("What is BI?")
print(res[0]["generated_text"])

What is BI? Business Intelligence (BI) refers to the process of collecting, analyzing, and reporting data to support decision-making activities in an organization. everybody knows that data is important for businesses, but not everyone understands how to use it effectively. That's where BI comes in. BI helps organizations make better decisions by providing them with timely, relevant, and accurate information about their operations, customers, and competitors.

BI tools provide a range of features and functionalities that enable users to analyze and visualize data in various ways. Some common features of BI tools include:

1. Data integration: The ability to bring together data from multiple sources and systems into a single, unified view.
2. Data warehousing: A centralized repository for storing and managing large amounts of data.
3. OLAP (Online Analytical Processing): A technology that enables fast and interactive analysis of large datasets.
4. Data mining: The process of discovering



---



#### Same implementation using LangChain for more capabilities

In [15]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

# checking again that everything is working fine
llm(prompt="What is BI?")

  warn_deprecated(


" Business Intelligence (BI) refers to the process of collecting, analyzing, and reporting data to support decision-making activities in an organization. everybody knows that data is important for businesses, but not everyone understands how to use it effectively. That's where BI comes in. BI helps organizations make better decisions by providing them with timely, relevant, and accurate information about their operations, customers, and competitors.\n\nBI tools provide a range of features and functionalities that enable users to analyze and visualize data in various ways. Some common features of BI tools include:\n\n1. Data integration: The ability to bring together data from multiple sources and systems into a single, unified view.\n2. Data warehousing: A centralized repository for storing and managing large amounts of data.\n3. OLAP (Online Analytical Processing): A technology that enables fast and interactive analysis of large datasets.\n4. Data mining: The process of discovering pa

#### Ingesting Data using Document Loader for finetuning

##### We will be using my official [portfolio](https://ismailouahbi.github.io/), my [LinkedIn](https://www.linkedin.com/in/ismail-ouahbi/), and my [github](https://github.com/ismailouahbi) for more data.

In [16]:
#### Ingesting Data using Document Loader

from langchain.document_loaders import WebBaseLoader


# links to scrape
links = ["https://ismailouahbi.github.io/", "https://www.linkedin.com/in/ismail-ouahbi/", "https://ismailouahbi.github.io/projects.html"
,"https://ismailouahbi.github.io/projects/ml/mobilenetv2/Fine-Tuned%20MobileNetV2.html", "https://ismailouahbi.github.io/projects/bi/BikeSales/business.html",
"https://ismailouahbi.medium.com/unlocking-the-secrets-of-casablancas-stock-market-part-i-605fe487f0ee","https://ismailouahbi.medium.com/unlocking-the-secrets-of-casablancas-stock-market-part-ii-a5c6bd00ea2d",
"https://ismailouahbi.github.io/resume.html","https://www.linkedin.com/in/ismail-ouahbi/details/experience/","https://ismailouahbi.github.io/index1.html#projects",
"https://github.com/ismailouahbi", "https://github.com/ismailouahbi?tab=repositories", "https://www.datascienceportfol.io/ismailouahbi"]

# initialize the webloader
loader = WebBaseLoader(links)
# get the documents
documents = loader.load()

In [17]:
# As data is being fed to the LLM into small chunks, let us devide it

from langchain.text_splitter import RecursiveCharacterTextSplitter

# make sure to tune these parameters (chunk_size and chunk_overlap) for optimal results
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)

#### Create `embeddings` (or encoded vectors) and store them into a `vector store` (FAISS in this case) for fast retreival

In [18]:
# import necessary libraries
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

# using the sentence-transformers for encoding
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

# specify embedding type
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [19]:
# storing embeddings in the vector store (expects as arguments: the text split and embedding type)
vectorstore = FAISS.from_documents(all_splits, embeddings)

#### Start the ConversationalRetrievalChain


> This chain allows you to have a chatbot with memory while relying on a vector store to find relevant information from your document.



In [20]:
# import libraries
from langchain.chains import ConversationalRetrievalChain

# initialize the object [return_source_documents=True to get original documents used to answer a question]
chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)

#### The fun part (Interacting with the Chatbot)

In [21]:
# to enable chat history
chat_history = []

while True:
    query = input('Prompt: ')
    if query == "exit" or query == "quit" or query == "q":
        print('Exiting')
        break
    result = chain({'question': query, 'chat_history': chat_history})
    print('Answer: ' + result['answer'] + '\n')
    chat_history.append((query, result['answer']))

Prompt: who is Ismail ouahbi


  warn_deprecated(


Answer:  Ismail Ouahbi is a Moroccan Data Scientist and blogger. He has a master's degree in Data Science and Decision Support from Cady Ayyad University and over 2 years of experience in applying data science, data analytics, machine learning, and business intelligence to solve real-world problems.

Prompt: can you give me some projects of him
Answer:   Of course! As a Data Scientist, Ismail Ouahbi has extensive experience in applying data science, data analytics, machine learning, and business intelligence to solve real-world problems. He has worked on various projects, including ingredient prediction using CNNs, and has achieved high accuracy through his freelancing project. Additionally, he has skills in AI, APIs, big data, business intelligence, computer science, critical thinking, decision support systems, deep learning, effective time management, ETL development, Excel, GCP, GCP BigQuery, GCP Cloud Storage, GCP Dataflow, generative AI, and more. He is also a writer who shares hi