## Preparing all the libraries, imports and data

In [1]:
# Install necessary libraries
!pip install llama-index
!pip install llama-index-experimental
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface
!pip install transformers accelerate bitsandbytes

Collecting llama-index
  Downloading llama_index-0.10.31-py3-none-any.whl (6.9 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.3-py3-none-any.whl (13 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core<0.11.0,>=0.10.31 (from llama-index)
  Downloading llama_index_core-0.10.31-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m87.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.8-py3-none-any.whl (6.0 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.5-py3-none-any.whl (6.7 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading 

In [2]:
# Download "medium.csv"
!wget https://raw.githubusercontent.com/giciq/tensorflow_keras_notebooks/main/medium.csv

--2024-04-24 11:56:47--  https://raw.githubusercontent.com/giciq/tensorflow_keras_notebooks/main/medium.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7880045 (7.5M) [text/plain]
Saving to: ‘medium.csv’


2024-04-24 11:56:48 (89.9 MB/s) - ‘medium.csv’ saved [7880045/7880045]



In [3]:
# All the necessary imports
import os
import pandas as pd
from llama_index.core import Settings
import torch
from transformers import BitsAndBytesConfig
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import ServiceContext, set_global_service_context
from llama_index.core import SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import QueryBundle
from llama_index.core.schema import NodeWithScore
from llama_index.core.prompts.prompts import SimpleInputPrompt
from llama_index.core.retrievers import (BaseRetriever, VectorIndexRetriever, KeywordTableSimpleRetriever,)
from typing import List

In [43]:
# Hugging Face token
hf_token = "<hugging_face_token>"

## Defining LLM model (Llama-2-7b-chat-hf)

In [5]:
# This prompt template is used to instruct the assistant on how to respond to user queries.
SYSTEM_PROMPT = """[INST] <>
- You are a helpful assistant that is to find answers to user's questions according to the file you are given.
- Use only tool named retriever!
- DON'T SPREAD FALSE INFORMATION.
- If you don't find an answer tell that you can't answer with the provided file.
- Be kind and helpful.
"""

# This template wraps the user query within the context of the assistant's guidelines.
query_wrapper_prompt = SimpleInputPrompt(
    "{query_str}[/INST] "
)

In [6]:
# Define quantization configuration for the model.
# Used to reduce the memory footprint and increase inference speed of neural networks.
# Load data in 4-bit format, using 16-bit floating-point computation,
# Select a specific quantization type ("nf4"), and enable double quantization.
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Instantiate the HuggingFaceLLM  with specified parameters.
llm = HuggingFaceLLM(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    # Setting the maximum number of new tokens that can be generated in a single inference.
    max_new_tokens=400,
    # Providing a system prompt that will be used during inference.
    system_prompt=SYSTEM_PROMPT,
    # Providing a wrapper prompt for query processing.
    query_wrapper_prompt=query_wrapper_prompt,
    # Specifying the size of the context window to be used during inference.
    context_window=3900,
    model_kwargs={"token": hf_token, "quantization_config": quantization_config},
    tokenizer_kwargs={"token": hf_token},
    device_map="auto",
)

# Create a service context using default configurations.
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5"
)

# Set the global service context with the newly created service context.
set_global_service_context(service_context)


config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

  service_context = ServiceContext.from_defaults(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Loading data, extracting nodes and initializing storage context

In [7]:
# Load data
documents = SimpleDirectoryReader(input_files=["medium.csv"]).load_data()

In [8]:
# Extract nodes (representing documents) from the provided documents using a node parser.
nodes = Settings.node_parser.get_nodes_from_documents(documents)

In [9]:
# Initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

## Creating custom retriever

In [10]:
# Create a vector index for the documents.
# This index allows for efficient retrieval of documents based on vector representations.
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

# Create a keyword index for the documents.
# This index allows for efficient retrieval of documents based on keyword matches.
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)

In [11]:
# Custom retriever class
class CustomRetriever(BaseRetriever):
    """Custom retriever that performs both semantic search and hybrid search."""

    def __init__(
        self,
        vector_retriever: VectorIndexRetriever,
        keyword_retriever: KeywordTableSimpleRetriever,
        mode: str = "AND",
    ) -> None:

        self._vector_retriever = vector_retriever
        self._keyword_retriever = keyword_retriever
        if mode not in ("AND", "OR"):
            raise ValueError("Invalid mode.")
        self._mode = mode
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve nodes given query."""

        vector_nodes = self._vector_retriever.retrieve(query_bundle)
        keyword_nodes = self._keyword_retriever.retrieve(query_bundle)

        vector_ids = {n.node.node_id for n in vector_nodes}
        keyword_ids = {n.node.node_id for n in keyword_nodes}

        combined_dict = {n.node.node_id: n for n in vector_nodes}
        combined_dict.update({n.node.node_id: n for n in keyword_nodes})

        if self._mode == "AND":
            retrieve_ids = vector_ids.intersection(keyword_ids)
        else:
            retrieve_ids = vector_ids.union(keyword_ids)

        retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids]
        return retrieve_nodes

In [12]:
# Define retrievers
vector_retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=2)
keyword_retriever = KeywordTableSimpleRetriever(index=keyword_index)

custom_retriever = CustomRetriever(vector_retriever, keyword_retriever)

# Assemble query engine
custom_query_engine = RetrieverQueryEngine(
    retriever=custom_retriever,
)

## Testing the Retriever that uses Llama as a LLM

In [44]:
# Make a prompt
prompt = "Describe Logistic Regression"

In [45]:
# Response with the custom retriever and Llama-7b
response = custom_query_engine.query(prompt)
print(response)

Logistic Regression is a type of supervised learning algorithm used for classification problems. It is a generalization of linear regression to classify the output into two or more categories. In logistic regression, the output is a real-valued function that maps the input features to a probability between 0 and 1. The goal is to find the optimal set of weights and bias that maximizes the likelihood of the correct class given the input features.

The logistic regression model is represented as:

p(y=1|x) = 1 / (1 + e^(-wx-b))

where x is the input feature vector, w is the weight vector, b is the bias term, and y is the target variable. The output of the model is the probability of the input example belonging to the positive class.

The logistic regression model is a non-linear extension of the linear regression model, where the output of the linear regression model is mapped to a probability between 0 and 1 using the sigmoid function. The sigmoid function has an S-shaped curve that ran

In [46]:
# Sources with which the response was generated
for node in response.source_nodes:
  print("Source:\n" + node.text + "\n\n\n")

Source:
Special thanks to the major contributors — In Visal, Yin Seng, Choung Chamnab who make this work possible.
Logistic Regression, Logistic Regression

Contrary to its name logistic regression is a classification algorithm. Given an input example, a logistic regression model assigns the example to a relevant class.

A note on the notation. x_{i} means x subscript i and x_{^th} means x superscript th.

Quick Review of Linear Regression

Linear Regression is used to predict a real-valued output anywhere between +∞ and -∞.

Each example used to train a linear regression model is defined by its properties or features which are collectively called as the feature vector. Your name, age, contact number, gender, et-cetera correspond to a feature vector describing you.

A linear regression model f(x), is a linear combination of the features of the input examples x, and is represented by f(x) = wx+b.

Transforming the original features (consider a 1-dimensional feature vector x) by squaring