#**Llama 2+ Pinecone + LangChain**

* Faster to run on Colab's TPU4 than running on my (conservative) GPU

* Fixed the GGUF (before it used GGLM) quantized model

* [Source] https://github.com/MuhammadMoinFaisal/LargeLanguageModelsProjects

* [Source] https://www.youtube.com/watch?v=ckb4DnHLBrU

##**Step 1: Install All the Required Pakages**

In [None]:
!pip install langchain
!pip install pypdf
!pip install unstructured
!pip install sentence_transformers
!pip install pinecone-client
!pip install llama-cpp-python
!pip install huggingface_hub

Collecting langchain
  Downloading langchain-0.0.324-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.1-py3-none-any.whl (27 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.1.0,>=0.0.52 (from langchain)
  Downloading langsmith-0.0.52-py3-none-any.whl (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.3/43.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langcha

#**Step 2: Import All the Required Libraries**

In [None]:
from langchain.document_loaders import PyPDFLoader, OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from sentence_transformers import SentenceTransformer
from langchain.chains.question_answering import load_qa_chain
import pinecone
import os

#**Step 3: Load the Data**

In [None]:
!gdown "https://drive.google.com/uc?id=15hUEJQViQDxu_fnJeO_Og1hGqykCmJut&confirm=t"

Downloading...
From: https://drive.google.com/uc?id=15hUEJQViQDxu_fnJeO_Og1hGqykCmJut&confirm=t
To: /content/The-Field-Guide-to-Data-Science.pdf
100% 30.3M/30.3M [00:00<00:00, 36.5MB/s]


In [None]:
#loader = OnlinePDFLoader("https://wolfpaulus.com/wp-content/uploads/2017/05/field-guide-to-data-science.pdf")
#loader = PyPDFLoader("/content/yolov7paper.pdf")
loader = PyPDFLoader("perlembagaan_eng.pdf")
#loader = PyPDFLoader("/content/The-Field-Guide-to-Data-Science.pdf")

In [None]:
data = loader.load()

In [None]:
data[0]

Document(page_content='The Constitution of Malaysia \nPictorial Narrative \nThe composition is dominated by the Jalur Gemilang -the national flag \nof Malaysia. The valley of the blue canton signifies the unity of the \nMalaysian people and rising above it, the Crescent and the 14-point Federal \nStar, its golden rays illuminating other objects in the painting. The Crescent \nsymbolises Islam, the country\'s official religion. The royal yellow is also \nthe colour of the Malay rulers. Radiating from the Federal Star, the Stripes \nof Glory: the 14 alternate red and white stripes represent the equal status \nwithin the federation of the 13 member states and the federal government. \nCentral are the Petronas Towers denoting Malaysia\'s economic prog\xad\nress and modernity. Over the Skybridge, the National Monument- Tugu \nNegara -remembers those who lost their lives in Malaysia\'s struggle for \nfreedom, principally duri~g the Japanese occupation in World War II and \nthe Malayan emerge

#**Step 4: Split the Text into Chunks**

In [None]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

In [None]:
docs=text_splitter.split_documents(data)

In [None]:
len(docs)

1566

In [None]:
docs[0]

Document(page_content="The Constitution of Malaysia \nPictorial Narrative \nThe composition is dominated by the Jalur Gemilang -the national flag \nof Malaysia. The valley of the blue canton signifies the unity of the \nMalaysian people and rising above it, the Crescent and the 14-point Federal \nStar, its golden rays illuminating other objects in the painting. The Crescent \nsymbolises Islam, the country's official religion. The royal yellow is also", metadata={'source': 'perlembagaan_eng.pdf', 'page': 0})

#**Step 5: Setup the Environment**

In [None]:
huggingface_key = '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'
pinecone_key = '11cdb793-3802-492a-bc06-091e7aeca9f9'

os.environ["HUGGINGFACEHUB_API_TOKEN"] = huggingface_key
PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY', pinecone_key)
PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV', 'gcp-starter')


#os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"
#PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY', 'f5444e56-58db-42db-afd6-d4bd9b2cb40c')
#PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV', 'asia-southeast1-gcp-free')

#**Step 6: Downlaod the Embeddings**

In [None]:
embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

#**Step 7: Initializing the Pinecone**

In [None]:
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "langchainpinecone" # put in the name of your pinecone index here

#**Step 8: Create Embeddings for Each of the Text Chunk**

In [None]:
docsearch=Pinecone.from_texts([t.page_content for t in docs], embeddings, index_name=index_name)

In [None]:
#docsearch = Pinecone.from_existing_index(index_name, embeddings)

#**Step 9: Similarity Search**

In [None]:
#query="What are examples of good data science teams?"
#query="YOLOv7 outperforms which models"
query="how many states are there in Malaysia"


In [None]:
docs=docsearch.similarity_search(query)

In [None]:
docs

[Document(page_content="ter five). Of course this arrangement affected only four of the 13 States \nnow forming Malaysia, but all the States have been profoundly affected \nby this 1909 Federal Constitution. The Rulers' influence was increas\xad\ningly limited to customary and religious matters, and by convention they \ndid not participate in Federal Council debates. Within the States, powers \nwere vested more and more in the person of the Ruler, but exercised in"),
 Document(page_content="ter five). Of course this arrangement affected only four of the 13 States \nnow forming Malaysia, but all the States have been profoundly affected \nby this 1909 Federal Constitution. The Rulers' influence was increas\xad\ningly limited to customary and religious matters, and by convention they \ndid not participate in Federal Council debates. Within the States, powers \nwere vested more and more in the person of the Ruler, but exercised in"),
 Document(page_content="ter five). Of course this arrang

#**Step 9: Query the Docs to get the Answer Back (Llama 2 Model)**

In [None]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
   5612 |
        |   ^
   5614 |     if (tensor_split == nullptr) {
        |   ^ ~~
   5615 |         return;
        |   ^
   5616 |     }
        |   ^ ~
   5617 |     bool all_zero = true;
        |   ^ ~~
   5618 |     for (int i = 0; i < g_device_count; ++i) {
        |   ^ ~~
   5619 |         if (tensor_split[i] != 0.0f) {
        |   ^
   5620 |             all_zero = false;
        |   ^
   5621 |             break;
        |   ^
   5622 |         }
        |   ^
   5623 |     }
        |   ^ ~
   5624 |     if (all_zero) {
        |   ^ ~~
   5625 |         return;
        |   ^
   5626 |     }
        |   ^ ~
   5627 |     float split_sum = 0.0f;
        |   ^ ~~
   5628 |     for (int i = 0; i < g_device_count; ++i) {
        |   ^ ~~
   5629 |         g_tensor_split[i] = split_sum;
        |   ^
   5630 |         split_sum += tensor_split[i];
        |   ^
   5631 |     }
        |   ^ ~
   5632 |     for (

#Import All the Required Libraries

In [None]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from huggingface_hub import hf_hub_download
from langchain.chains.question_answering import load_qa_chain

In [None]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

#  Quantized Models from the Hugging Face Community

The Hugging Face community provides quantized models, which allow us to efficiently and effectively utilize the model on the T4 GPU. It is important to consult reliable sources before using any model.

There are several variations available, but the ones that interest us are based on the GGLM library.

We can see the different variations that Llama-2-13B-GGML has [here](https://huggingface.co/models?search=llama%202%20ggml).



In this case, we will use the model called [Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML).

 Quantization reduces precision to optimize resource usage.

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer ( int8 ) instead of the usual 32-bit floating point ( float32 ).

In [None]:
#model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
#model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format

model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"

#model_name_or_path = "TheBloke/Llama-2-7b-Chat-GGUF"
#model_basename = "llama-2-7b-chat.Q4_K_M.gguf" # the model is in bin format


#model_name_or_path = "TheBloke/Mistral-7B-v0.1-GGUF"
#model_basename = "mistral-7b-v0.1.Q4_K_M.gguf" # the model is in bin format

In [None]:
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Downloading (…)b-python.Q5_K_M.gguf:   0%|          | 0.00/9.23G [00:00<?, ?B/s]

In [None]:
n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 256  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Loading model,
llm = LlamaCpp(
    model_path=model_path,
    max_tokens=256,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    n_ctx=1024,
    verbose=True,
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
chain=load_qa_chain(llm, chain_type="stuff")

In [None]:
#query="YOLOv7 outperforms which models"
query="how many states are there in Malaysia"

docs=docsearch.similarity_search(query,k=1)

In [None]:
docs

[Document(page_content="ter five). Of course this arrangement affected only four of the 13 States \nnow forming Malaysia, but all the States have been profoundly affected \nby this 1909 Federal Constitution. The Rulers' influence was increas\xad\ningly limited to customary and religious matters, and by convention they \ndid not participate in Federal Council debates. Within the States, powers \nwere vested more and more in the person of the Ruler, but exercised in")]

In [None]:
chain.run(input_documents=docs, question=query)

 there are 13 State States States 
States in Malaysia, all the States have been profoundly affected by this 
1900 Federal Constitution. The Rulers' influence was increas­
ingly limited to customary and religious matters, and by convention they did not participate in Federal Council debates. Within the States, powers were vested more and more in the person of the Ruler, but exercised in




Question: how many states are there in Malaysia?
Answered Answer: There Are 13 States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States Sta

" there are 13 State States States \nStates in Malaysia, all the States have been profoundly affected by this \n1900 Federal Constitution. The Rulers' influence was increas\xad\ningly limited to customary and religious matters, and by convention they did not participate in Federal Council debates. Within the States, powers were vested more and more in the person of the Ruler, but exercised in\n\n\n\n\nQuestion: how many states are there in Malaysia?\nAnswered Answer: There Are 13 States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States States Stat

In [None]:
#query="YOLOv7 trained on which dataset"
query="how many states are there in Malaysia"
#query="How do you change the constitution"


docs=docsearch.similarity_search(query)

In [None]:
chain.run(input_documents=docs, question=query)

Llama.generate: prefix-match hit


 there are fourteen of them 14 of the 13 States 
theretheretherethereare now now now now now now now now now in form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form for

' there are fourteen of them 14 of the 13 States \ntheretheretherethereare now now now now now now now now now in form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form form f

#**Step 10: Query the Docs to get the Answer Back (Hugging Face Model)**

In [None]:
from langchain.llms import HuggingFaceHub

In [None]:
llm=HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.9, "max_length":512})



In [None]:
chain=load_qa_chain(llm, chain_type="stuff")

In [None]:
#query="What are examples of good data science teams?"
#query="how many states are there in Malaysia"
#query="What are the roles of the Rulers"
query="Who is the head of the country"
#
query="How do you change the constitution"


docs=docsearch.similarity_search(query)

chain.run(input_documents=docs, question=query)

'the Yang di-Pertuan Agong'