<a href="https://colab.research.google.com/github/nestorsgarzonc/intelligent-systems-project-un/blob/main/llm_model_with_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using HuggingFace Embeddings in LangChain
Authored by: [Fabio A. González](https://github.com/fagonzalezo)


Code partially based on: https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html


Load necessary libraries

In [1]:
!pip install fire
!pip install gradio
!pip install transformers
!pip install git+https://github.com/huggingface/peft.git
!pip install sentencepiece
!pip install accelerate
#!pip install git+https://github.com/TimDettmers/bitsandbytes.git
!pip install bitsandbytes
!pip -q install langchain
!pip install sentence_transformers
!pip install chromadb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fire
  Downloading fire-0.5.0.tar.gz (88 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/88.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.3/88.3 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: fire
  Building wheel for fire (setup.py) ... [?25l[?25hdone
  Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116932 sha256=04d26e78ef3b6da5f96bbe0ee16f77868c9d2231b5319c5629edba26c3700b8b
  Stored in directory: /root/.cache/pip/wheels/90/d4/f7/9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
Successfully built fire
Installing collected packages: fire
Successfully installed fire-0.5.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple

In [2]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain

Download the file we are going to use as example:

In [3]:
!wget https://raw.githubusercontent.com/nestorsgarzonc/intelligent-systems-project-un/main/attractions.csv

--2023-06-17 17:46:28--  https://raw.githubusercontent.com/nestorsgarzonc/intelligent-systems-project-un/main/attractions.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 515025 (503K) [text/plain]
Saving to: ‘attractions.csv’


2023-06-17 17:46:28 (13.2 MB/s) - ‘attractions.csv’ saved [515025/515025]



In [4]:
CSV_PATH = '/content/attractions.csv'

Load the document:

In [5]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(CSV_PATH)
documents = loader.load()

Split the document on chunks:

In [6]:
text_splitter = CharacterTextSplitter(chunk_size=256, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

Define and load the embeding model. It is a sentence transformer from HuggingFace

In [7]:
embeding_model_path = "all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=embeding_model_path)


Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

Create the database of embeddings:

In [8]:
vectorstore = Chroma.from_documents(documents, embeddings)

Let's test the database:

In [12]:
vectorstore.search("Que puedo hacer en bogota?", 'similarity')

[Document(page_content='name: 229. Das Haus\nurl: https://www.tripadvisor.co/Attraction_Review-g294074-d10200960-Reviews-Das_Haus-Bogota.html\ncity: Bogota\ntitle: Das Haus\ndescription: Das Haus es el lugar para probar algunas de las mejores cervezas artesanales y sidras locales. ¡También puedes probar nuestras pizzas hamburguesas y sánduches (tenemos opciones vegetarianas) así como también nuestros cocteles! El primer sábado de cada mes se realiza el Reggae Corner Bogotá con Djs locales poniendo música jamaiquina de los 60s-80s en vinilo; y el último sábado de cada mes hacemos el Das Haus 80s con los mejores hits de los 80s (italo disco electro etc) en vinilo.\nduration: 2-3 horas\nlocation: Carrera 4 A # 53- 15 Bogotá 110231 Colombia', metadata={'source': '/content/attractions.csv', 'row': 1053}),
 Document(page_content='name: 237. La Negra Bar\nurl: https://www.tripadvisor.co/Attraction_Review-g294074-d7052967-Reviews-La_Negra-Bogota.html\ncity: Bogota\ntitle: La Negra Bar\ndescrip

In [13]:
db = Chroma.from_documents(documents, embeddings, persist_directory="db")

In [15]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = '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'

from langchain import PromptTemplate, HuggingFaceHub, LLMChain

In [16]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [17]:
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer, pipeline

In [18]:
base_model_path = 'decapoda-research/llama-7b-hf'
# Weights Lora-7B fine tuned for Spanish
weights_path = "plncmm/guanaco-lora-7b"

# Load tokenizer for base model
tokenizer = LlamaTokenizer.from_pretrained(base_model_path)

# Load base model
base_model = LlamaForCausalLM.from_pretrained(
        base_model_path,
        load_in_8bit=True,
        device_map="auto",
    )

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/141 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.


Downloading (…)lve/main/config.json:   0%|          | 0.00/427 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/33 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00002-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00003-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00004-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00005-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00006-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00007-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00008-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00009-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00010-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00011-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00012-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00013-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00014-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00015-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00016-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00017-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00018-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00019-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00020-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00021-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00022-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00023-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00024-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00025-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00026-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00027-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00028-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00029-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00030-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00031-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00032-of-00033.bin:   0%|          | 0.00/405M [00:00<?, ?B/s]

Downloading (…)l-00033-of-00033.bin:   0%|          | 0.00/524M [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [19]:
!pip install git+https://github.com/huggingface/peft.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-p0tnqwzf
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-p0tnqwzf
  Resolved https://github.com/huggingface/peft.git to commit 03eb378eb914fbee709ff7c86ba5b1d033b89524
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [21]:
from peft import PeftModel
# Adapt the base model weights
model = PeftModel.from_pretrained(
    base_model,
    weights_path,
)

Downloading (…)/adapter_config.json:   0%|          | 0.00/370 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

In [22]:
from langchain.llms import HuggingFacePipeline

pipe = pipeline(
    "text-generation",
    model=base_model,
    tokenizer=tokenizer,
    max_length=256,
    temperature=0.1,
    top_p=0.75,
    top_k=40,
    max_new_tokens = 256,
    repetition_penalty=1.2
)
local_llm = HuggingFacePipeline(pipeline=pipe)
chain = load_qa_chain(local_llm, chain_type="stuff")

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [23]:
import torch
import gc
gc.collect()
torch.cuda.empty_cache()
query = "Que puedo hacer en bogota?"
docs = db.similarity_search(query)
chain.run(input_documents=docs, question=query)


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


' Hay muchas cosas que pueden hacerse en Bogotá. Puede visitar museos, parques, centros comerciales o incluso ir a alguna de las numerosas discotecas. También puede explorar la gastronomía local, tomar clases de danza o aprender español.'

In [None]:
#STRATEGY 2
# from langchain.chains.question_answering import load_qa_chain
# from langchain.llms import GPT4All

# chain = load_qa_chain(GPT4All(model_path="/path/to/model.bin", n_threads = 2), chain_type="stuff")