<a href="https://colab.research.google.com/github/poo5zan/llm_public/blob/main/Extract_Information.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Extract Information from company's private data using open source Large Language Models. In this document, I have used Llama3 8 billion model, which is an open source model from Meta (Facebook).

Important Note: Please make sure that the connected runtime has GPU. By default, the runtime might not have GPU. So, in order to change the runtime, click on the 'Runtime' menu, then click 'Change runtime type', and select 'T4 GPU'. The runtime type should be 'Python 3'.

# Ollama Installation

In [None]:
# Download and Start Ollama
# The command to start ollama i.e 'ollama serve' has been started in a new process.
# If you just run the command !ollama serve, then it will run the process in the main UI thread,
# thus blocking everything. You can try that too, and then revert back to this process method
!curl https://ollama.ai/install.sh | sh
import subprocess
process_serve = subprocess.Popen("ollama serve", shell=True)

# Google Drive

Any files uploaded to the colab runtime will be deleted with each new session, and uploading the documents again and again is tedious. Thus, I prefer using google drive to store the documents. The following command will open a new window to connect to the google drive.

In [13]:
# connect google drive and mount the root folder
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


I created a folder named "llm_data" in my google drive as the root folder. You can name it anything.

TIP: After you utilize all the compute resources for free tier, then you can try with new gmail account. In case of google drive, you need to share this llm_data folder with the new gmail account, and then add shortcut to this folder in the root folder i.e 'My Drive' of the new google account. This way you don't need to re-upload your documents.

In [14]:
import os
llm_data_folder = "/content/drive/My Drive/llm_data"
print('llm_data_folder ', llm_data_folder)

llm_data_folder  /content/drive/My Drive/llm_data


# Install Python Dependencies

During the installation of the dependencies. You might receive pip dependency error and it might ask you to restart the session. So what do you do ? Simply restart the session with the 'Restart Session' button at the end of the error message.

After this, run the command once again. Be happy if there are no errors.

In [None]:
%pip install llama-index llama-parse ollama llama-index-llms-ollama  llama-index-llms-huggingface \
llama-index-embeddings-huggingface llama-index-extractors-entity transformers torch accelerate bitsandbytes joblib

In this document, we will be using llama3 both from Ollama and HuggingFace. The llama3 in HuggingFace is not public, thus you need to request access the owner of the model in the HuggingFace. It's a simple process, create an account in HuggingFace and request access to this model, https://huggingface.co/meta-llama/Meta-Llama-3-8B

In the following command, we login to the huggingface cli. This part is optional if you are only going to use llama3 from Ollama.

Create your personal access token in the profile settings of huggingface portal, https://huggingface.co/settings/tokens


In [None]:
# Login to Huggingface cli
# uncomment the following for the login
# !huggingface-cli login --token your-huggingface-access-token-goes-here

# Code

In [16]:
# define some config classes
from enum import Enum

class ModelSource(Enum):
    Default = 0
    Ollama = 1
    HuggingFace = 2

class QuantizationBit(Enum):
    Default = 0
    Four = 4
    Eight = 8

class OllamaConfig():
    def __init__(self, model_name: str, request_timeout: int):
        self.model_name = model_name
        self.request_timeout = request_timeout

class HuggingFaceConfig():
    def __init__(self, model_name:str, quantize_model: bool, quantization_bit: QuantizationBit):
        self.model_name = model_name
        self.quantize_model = quantize_model
        self.quantization_bit = quantization_bit

class ModelConfig():
    def __init__(self, model_source: ModelSource,
                 ollama_config: OllamaConfig,
                 huggingface_config: HuggingFaceConfig):
        self.model_source = model_source
        self.ollama_config = ollama_config
        self.huggingface_config = huggingface_config


In [26]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
import json
from collections import Counter
from tqdm import tqdm
import ollama
import torch
from transformers import BitsAndBytesConfig
from llama_index.core.prompts import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM
from ollama import Client
from joblib import Parallel, delayed

class ExtractInformation():
    def __init__(self, model_config: ModelConfig):
        self.model_config = model_config

    def get_llm_model(self):
        if self.model_config.model_source == ModelSource.Ollama:
            print(f'Using Ollama model:{self.model_config.ollama_config.model_name}')
            return Ollama(model = self.model_config.ollama_config.model_name,
                          request_timeout = self.model_config.ollama_config.request_timeout,
                          json_mode=True)

        elif self.model_config.model_source == ModelSource.HuggingFace:
            quantization_config = BitsAndBytesConfig()
            if self.model_config.huggingface_config.quantize_model:
                if self.model_config.huggingface_config.quantization_bit == QuantizationBit.Four:
                    quantization_config = BitsAndBytesConfig(
                        load_in_4bit=True,
                        bnb_4bit_compute_dtype=torch.float16,
                        bnb_4bit_quant_type="nf4",
                        bnb_4bit_use_double_quant=True,
                    )
                elif self.model_config.huggingface_config.quantization_bit == QuantizationBit.Eight:
                    quantization_config = BitsAndBytesConfig(
                        load_in_8bit=True,
                        llm_int8_enable_fp32_cpu_offload=True
                    )
                else:
                    raise ValueError(f"Invalid Huggingface quantization bit \
                     {self.model_config.huggingface_config.quantization_bit}")

            print(f'Using Huggingface model:{self.model_config.huggingface_config.model_name},  \
                quantize:{self.model_config.huggingface_config.quantize_model}, \
                quantization bit:{self.model_config.huggingface_config.quantization_bit}')
            return HuggingFaceLLM(
                model_name = self.model_config.huggingface_config.model_name,
                tokenizer_name = self.model_config.huggingface_config.model_name,
                context_window=4096,
                max_new_tokens=256,
                model_kwargs={"quantization_config": quantization_config},
                tokenizer_kwargs={"max_length": 4096},
                generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
                device_map="auto"
            )


    def call_llm(self, query_engine, query_text: str):
        try :
            return query_engine.query(query_text)
        except Exception as ex:
            if self.model_config.model_source == ModelSource.Ollama and "Client error '404 Not Found'" in str(ex):
                print("LLM not found. So pull llm  model: ", self.model_config.ollama_config.model_name)
                # since the ollama in installed in the localhost,
                # the following code to pull the model works well,
                # However, if the ollama is installed somewhere else,
                # then uncomment the code below
                ollama.pull(self.model_config.ollama_config.model_name)

                # from ollama import Client
                # client = Client(host=ollama_url_goes_here)
                # client.pull(self.model_config.ollama_config.model_name)
                print("LLM pull completed. Thus, retry llm call.")
                return self.call_llm(query_engine, query_text)
            else:
                print("Exception in calling llm ", ex)
                raise ex

    def find_value_from_document(self, query_engine, query_text: str):
        query_result = self.call_llm(query_engine, query_text)
        if not query_result:
            raise ValueError('No response from llm call')
        query_response = ''
        if query_result.response:
            query_response = query_result.response

        return {
            'query_text': query_text,
            'response': query_response
        }

    def extract(self, input_data_folder: str, query_text: str):
        Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
        Settings.llm = self.get_llm_model()

        documents = SimpleDirectoryReader(input_data_folder).load_data()
        vector_index = VectorStoreIndex.from_documents(documents)
        query_engine = vector_index.as_query_engine()

        return self.find_value_from_document(query_engine, query_text)




In [27]:
continue_execution = True
ollama_config = OllamaConfig("llama3", 60)
model_config = ModelConfig(ModelSource.Ollama, ollama_config, None)
extract_information = ExtractInformation(model_config)
while continue_execution:
    input_text = input("Enter your query (Enter exit to exit the program): ")
    if input_text == "exit":
        continue_execution = False
    else:
        print("Query: ", input_text)
        response = extract_information.extract(llm_data_folder, input_text)
        print('Response:', response['response'])


Enter your query (Enter exit to exit the program): What is interesting point here
Query:  What is interesting point here




Using Ollama model:llama3
Response: {"It was as weird as it sounds." : "I resumed all my old patterns, except now there were doors where there hadn't been. Now when I was tired of walking, all I had to do was raise my hand, and (unless it was raining) a taxi would stop to pick me up. Now when I walked past charming little restaurants I could go in and order lunch."}
Enter your query (Enter exit to exit the program): exit
