<a href="https://colab.research.google.com/github/jayozer/advanced_llm/blob/main/Jay_Module_1b_Advanced_LLMs_BigQuery_Palm_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**If you use our code, please cite:**

@misc{2024<br>
  title = {LLM Reference Architecture using Redis & Google Cloud Platform},<br>
  author = {Hamza Farooq, Darshil Modi, Kanwal Mehreen, Nazila Shafiei},<br>
  keywords = {Semantic Cache},<br>
  year = {2024},<br>
  copyright = {APACHE 2.0 license}<br>
}



# LLM Reference Architecture using Redis & Google Cloud Platform

<a href="https://colab.research.google.com/github/RedisVentures/redis-google-llms/blob/main/BigQuery_Palm_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook serves as a getting started guide for working with LLMs on Google Cloud Platform with Redis Enterprise.

## Intro
Google's Vertex AI has expanded its capabilities by introducing [Generative AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). This advanced technology comes with a specialized [in-console studio experience](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart), a [dedicated API](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart) and [Python SDK](https://cloud.google.com/vertex-ai/docs/python-sdk/use-vertex-ai-python-sdk) designed for deploying and managing instances of Google's powerful PaLM language models (more sample code). With a distinct focus on text generation, summarization, chat completion, and embedding creation, PaLM models are reshaping the boundaries of natural language processing and machine learning.

Redis Enterprise offers robust vector database features, with an efficient API for vector index creation, management, distance metric selection, similarity search, and hybrid filtering. When coupled with its versatile data structures - including lists, hashes, JSON, and sets - Redis Enterprise shines as the optimal solution for crafting high-quality Large Language Model (LLM)-based applications. It embodies a streamlined architecture and exceptional performance, making it an instrumental tool for production environments.

![](https://github.com/RedisVentures/redis-google-llms/blob/main/assets/GCP_RE_GenAI.drawio.png?raw=true)

Below we will work through several design patterns with Vertex AI LLMs and Redis Enterprise that will ensure optimal production performance.

___
## Contents
- Setup
    1. Prerequisites
    2. Create BigQuery Table
    3. Generate Embeddings
        
        a. Embed Text Data

    4. Load Embeddings to Redis
    5. Create Index
- Build LLM Applications
- LLM Design Patterns
    1. Semantic Search
    2. Retrieval Augmented Generation (RAG)
    3. Caching
    4. Memory
- Cleanup

___

# Setup

## 1. Prerequisites
Before we begin, we must install some required libraries, authenticate with Google, create a Redis database, and initialize other required components.

### Install required libraries

In [2]:
!pip install redis "google-cloud-aiplatform==1.25.0" --upgrade --user



In [1]:
!pip install huggingface datasets

Collecting huggingface
  Using cached huggingface-0.0.1-py3-none-any.whl (2.5 kB)
Collecting datasets
  Using cached datasets-2.20.0-py3-none-any.whl (547 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Using cached pyarrow-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (40.8 MB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Collecting requests>=2.32.2 (from datasets)
  Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Collecting xxhash (from datasets)
  Using cached xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
Collecting multiprocess (from datasets)
  Using cached multiprocess-0.70.16-py310-none-any.whl (134 kB)
Installing collected packages: huggingface, xxhash, requests, pyarrow, dill, multiprocess, datasets
  Attempting uninstall: requests
    Found existing installation: requests 2.31.0
    Uninstalling requests-2.31.0:
      Successfully uninstalled requests-2.31.0
  Attempting uninstall: p

^^^ If prompted press the Restart button to restart the kernel. ^^^

### Install Redis locally (optional)
If you have a Redis db running elsewhere with [Redis Stack](https://redis.io/docs/about/about-stack/) installed, you don't need to run it on this machine. You can skip to the "Connect to Redis server" step.

In [None]:
# %%sh
# curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
# echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
# sudo apt-get update  > /dev/null 2>&1
# sudo apt-get install redis-stack-server  > /dev/null 2>&1
# redis-stack-server --daemonize yes

### Using Free Redis Cloud account on GCP
You can also use Forever Free instance of Redis Cloud. To activate it:
- Head to https://redis.com/try-free/
- Register (using gmail-based registration is the easiest)
- Create New Subscription
- Use the following options:
    - Fixed plan, Gogle Cloud
    - New 30Mb Free database
- Create new RedisStack DB

If you are registering at Redis Cloud for the first time - the last few steps would be performed for you by default. Capture the host, port and default password of the new database. You can use these instead of default `localhost` based in the following code block.

### Connect to Redis server
Replace the connection params below with your own if you are connecting to an external Redis instance.

In [3]:
from google.colab import userdata
import redis

# Redis connection params
REDIS_HOST =  userdata.get('REDIS_HOST')
REDIS_PORT = userdata.get('REDIS_PORT')
REDIS_PASSWORD = userdata.get('REDIS_PASSWORD')

# Create Redis client
redis_client = redis.Redis(
  host=REDIS_HOST,
  port=REDIS_PORT,
  password=REDIS_PASSWORD)

# Test connection
redis_client.ping()

True

In [4]:
# Clear Redis database (optional)
redis_client.flushdb()

True

### Authenticate to Google Cloud

In [5]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [6]:
from getpass import getpass

# input your GCP project ID and region for Vertex AI
PROJECT_ID = getpass("PROJECT_ID:") #acrobatllm
REGION = input("REGION:") #us-central1

PROJECT_ID:··········
REGION:us-central1


### Initialize Vertex AI Components



In [7]:
import vertexai

vertexai.init(project=PROJECT_ID , location=REGION)

In [8]:
from google.colab import auth

PROJECT_ID = "acrobatllm"  # @param {type:"string"}

auth.authenticate_user(project_id=PROJECT_ID)

In [9]:
! gcloud services enable compute.googleapis.com aiplatform.googleapis.com storage.googleapis.com bigquery.googleapis.com --project {PROJECT_ID}

Operation "operations/acat.p2-670255580476-1b644dbf-47d1-460e-9316-bcb1debcb8d1" finished successfully.


In [10]:
!gcloud config get-value project
!gcloud config get-value region

acrobatllm
[1;31mERROR:[0m (gcloud.config.get-value) Section [core] has no property [region].


In [37]:
!gcloud compute regions list


NAME                     CPUS  DISKS_GB  ADDRESSES  RESERVED_ADDRESSES  STATUS  TURNDOWN_DATE
africa-south1            0/24  0/4096    0/8        0/8                 UP
asia-east1               0/24  0/4096    0/8        0/8                 UP
asia-east2               0/24  0/4096    0/8        0/8                 UP
asia-northeast1          0/24  0/4096    0/8        0/8                 UP
asia-northeast2          0/24  0/4096    0/8        0/8                 UP
asia-northeast3          0/24  0/4096    0/8        0/8                 UP
asia-south1              0/24  0/4096    0/8        0/8                 UP
asia-south2              0/24  0/4096    0/8        0/8                 UP
asia-southeast1          0/24  0/4096    0/8        0/8                 UP
asia-southeast2          0/24  0/4096    0/8        0/8                 UP
australia-southeast1     0/24  0/4096    0/8        0/8                 UP
australia-southeast2     0/24  0/4096    0/8        0/8                 UP
europe

In [12]:
# Set the desired region - I had to do it this way since I could not figure out from the console
!gcloud config set compute/region us-central1

Updated property [compute/region].


In [13]:
# Checking Region - Learned why we choose us-central1 the hard way. The limits for west are low
!gcloud config configurations list

NAME     IS_ACTIVE  ACCOUNT  PROJECT     COMPUTE_DEFAULT_ZONE  COMPUTE_DEFAULT_REGION
default  True                acrobatllm                        us-central1


## Test Vertex AI

In [14]:
!pip3 install google-cloud-aiplatform&gt==1.25




In [15]:
from vertexai.preview.language_models import TextEmbeddingModel
model = TextEmbeddingModel.from_pretrained("textembedding-gecko")

embeddings = model.get_embeddings(["Dinner in New York City"])
for embedding in embeddings:
  vector = embedding.values
  print(vector)

[0.03092864528298378, -0.029868636280298233, -0.029459740966558456, -0.0009445197647437453, 0.002139341551810503, 0.003000705735757947, 0.013629520311951637, 0.04873180389404297, 0.010373848490417004, 0.026051972061395645, 0.016042839735746384, -0.0191232617944479, -0.01023846585303545, -0.04757988452911377, -0.006645353976637125, 0.015495737083256245, -0.01545269787311554, 0.015853025019168854, 0.0457160621881485, -0.03305236995220184, -0.04611865431070328, 0.008199586533010006, -0.028266873210668564, -0.023114023730158806, -0.019202029332518578, 0.004720576573163271, 0.053389158099889755, -0.06605908274650574, -0.011999640613794327, 0.0033735090401023626, -0.04246285557746887, 0.04036254063248634, -0.0734783336520195, 0.041490502655506134, 0.035548970103263855, -0.04976203665137291, -0.0034386806655675173, 0.03629659116268158, -0.006216620095074177, 0.04435154050588608, -0.024378718808293343, -0.029397232457995415, -0.0809723362326622, -0.05687682703137398, 0.029550012201070786, -0.0

## 2. Create BigQuery Table
The second step involves preparing the dataset for our LLM applications. We utilize a free (public) hacker news dataset from **Google BigQuery**.

*Leveraging BigQuery is a common pattern for building ML applications because of it's powerful query and analytics capabilities.*

We will start by creating our own big query table for the dataset. Additionally, if you have a different dataset to work with you can follow a similar pattern, or even load a CSV from a Google Cloud Storage bucket into BigQuery.

### Create source table
First step is to create a new table from the public datasource.

In [16]:
from datasets import load_dataset
import pandas as pd

dataset = load_dataset("traversaal-ai-hackathon/hotel_datasets")

Downloading readme:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.53M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.57M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.70M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.76M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5997 [00:00<?, ? examples/s]

In [17]:
df=pd.DataFrame(dataset['train'])

In [18]:
df.head()

Unnamed: 0,hotel_name,hotel_description,review_title,review_text,rate,tripdate,hotel_url,hotel_image,price_range,rating_value,review_count,street_address,locality,country
0,Romance Istanbul Hotel,Romance Istanbul Hotel has 39 rooms.Every room...,"An exceptional boutique hotel, great value for...",,,February 2020,https://www.tripadvisor.com/Hotel_Review-g2939...,https://media-cdn.tripadvisor.com/media/photo-...,$ (Based on Average Nightly Rates for a Standa...,5.0,4023,Hudavendigar Cd. No:5 Sirkeci,Istanbul,Turkiye
1,Romance Istanbul Hotel,Romance Istanbul Hotel has 39 rooms.Every room...,You can’t get better than this.,,,March 2021,https://www.tripadvisor.com/Hotel_Review-g2939...,https://media-cdn.tripadvisor.com/media/photo-...,$ (Based on Average Nightly Rates for a Standa...,5.0,4023,Hudavendigar Cd. No:5 Sirkeci,Istanbul,Turkiye
2,Romance Istanbul Hotel,Romance Istanbul Hotel has 39 rooms.Every room...,Exceeds all expectations,,,March 2021,https://www.tripadvisor.com/Hotel_Review-g2939...,https://media-cdn.tripadvisor.com/media/photo-...,$ (Based on Average Nightly Rates for a Standa...,5.0,4023,Hudavendigar Cd. No:5 Sirkeci,Istanbul,Turkiye
3,Romance Istanbul Hotel,Romance Istanbul Hotel has 39 rooms.Every room...,"Great Location, Fantastic Accommodations",,,August 2021,https://www.tripadvisor.com/Hotel_Review-g2939...,https://media-cdn.tripadvisor.com/media/photo-...,$ (Based on Average Nightly Rates for a Standa...,5.0,4023,Hudavendigar Cd. No:5 Sirkeci,Istanbul,Turkiye
4,Romance Istanbul Hotel,Romance Istanbul Hotel has 39 rooms.Every room...,Perfection. It is all in the details.,,,June 2021,https://www.tripadvisor.com/Hotel_Review-g2939...,https://media-cdn.tripadvisor.com/media/photo-...,$ (Based on Average Nightly Rates for a Standa...,5.0,4023,Hudavendigar Cd. No:5 Sirkeci,Istanbul,Turkiye


#Adding an index

In [19]:
df["id"] = df.index + 1

In [20]:
from google.cloud import bigquery

# Create bigquery client
bq = bigquery.Client(project=PROJECT_ID)

TABLE_NAME = input("Input a Big Query TABLE_NAME:") #hotel_data, hotel_reviews
DATASET_ID = f"{PROJECT_ID}.google_redis_llms"

# Create dataset
dataset = bigquery.Dataset(DATASET_ID)
dataset.location = "US"
dataset = bq.create_dataset(dataset, timeout=30, exists_ok=True)

# Define table ID
TABLE_ID = f"{DATASET_ID}.{TABLE_NAME}"

Input a Big Query TABLE_NAME:hotel_reviews


In [21]:
# client = bigquery.Client()
job = bq.load_table_from_dataframe(
    df, TABLE_ID
)  # Make an API request.
job.result()

LoadJob<project=acrobatllm, location=US, id=32a44a0c-bb7d-459c-87a7-11c7aff4aee5>

In [22]:
table = bq.get_table(TABLE_ID)

In [23]:
table

Table(TableReference(DatasetReference('acrobatllm', 'google_redis_llms'), 'hotel_reviews'))

In [24]:
table.schema

[SchemaField('hotel_name', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('hotel_description', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('review_title', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('review_text', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('rate', 'FLOAT', 'NULLABLE', None, None, (), None),
 SchemaField('tripdate', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('hotel_url', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('hotel_image', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('price_range', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('rating_value', 'FLOAT', 'NULLABLE', None, None, (), None),
 SchemaField('review_count', 'INTEGER', 'NULLABLE', None, None, (), None),
 SchemaField('street_address', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('locality', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('country', 'STRING', 'NULLABLE', None, None

Make sure to enable API key: https://cloud.google.com/bigquery/docs/explore-data-colab

## Load data from BigQuery - Hotel reviews.csv

In [25]:
from google.cloud import bigquery

PROJECT_ID = "acrobatllm"
TABLE_NAME = "hotel_reviews"

query = f"""
SELECT * FROM `{PROJECT_ID}.google_redis_llms.{TABLE_NAME}`
"""

# Initialize the BigQuery client
client = bigquery.Client(project=PROJECT_ID)

results = client.query(query).to_dataframe()

In [28]:
results.head(3)

Unnamed: 0,hotel_name,hotel_description,review_title,review_text,rate,tripdate,hotel_url,hotel_image,price_range,rating_value,review_count,street_address,locality,country,id
0,Citadines Tour Eiffel Paris,,No pride of ownership,If you’ve ever stayed at a hotel which owners ...,2.0,November 2023,https://www.tripadvisor.com/Hotel_Review-g1871...,https://media-cdn.tripadvisor.com/media/photo-...,$$ (Based on Average Nightly Rates for a Stand...,4.0,471,132 boulevard de Grenelle 15th Arr.,Paris,France,5598
1,Citadines Tour Eiffel Paris,,Location Location!,"Citadines for is located in a great place, clo...",4.0,April 2023,https://www.tripadvisor.com/Hotel_Review-g1871...,https://media-cdn.tripadvisor.com/media/photo-...,$$ (Based on Average Nightly Rates for a Stand...,4.0,471,132 boulevard de Grenelle 15th Arr.,Paris,France,5599
2,Citadines Tour Eiffel Paris,,Amazing stay!,We absolutely loved this hotel! The staff was ...,5.0,November 2023,https://www.tripadvisor.com/Hotel_Review-g1871...,https://media-cdn.tripadvisor.com/media/photo-...,$$ (Based on Average Nightly Rates for a Stand...,4.0,471,132 boulevard de Grenelle 15th Arr.,Paris,France,5600


## 3. Generate Embeddings

### Create text embeddings with Vertex AI embedding model
Use the [Vertex AI API for text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings), developed by Google.

> Text embeddings are a dense vector representation of a piece of content such that, if two pieces of content are semantically similar, their respective embeddings are located near each other in the embedding vector space. This representation can be used to solve common NLP tasks, such as:
> - **Semantic search**: Search text ranked by semantic similarity.
> - **Recommendation**: Return items with text attributes similar to the given text.
> - **Classification**: Return the class of items whose text attributes are similar to the given text.
> - **Clustering**: Cluster items whose text attributes are similar to the given text.
> - **Outlier Detection**: Return items where text attributes are least related to the given text.

The `textembedding-gecko` model accepts a maximum of 3,072 input tokens (i.e. words) and outputs 768-dimensional vector embeddings.

### Define embedding helper function
We define a helper function, `embedding_model_with_backoff`, to create embeddings from a list of texts while making it resilient to [Vertex AI API quotas](https://cloud.google.com/vertex-ai/docs/quotas) via [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff).

We also define a method to convert an array of floats to a byte string for efficient storage in Redis (later on).



In [29]:
vertexai.init()

In [30]:
!pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/227.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━[0m [32m194.6/227.1 kB[0m [31m5.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (

In [31]:
from typing import Generator, List, Any

from tenacity import retry, stop_after_attempt, wait_random_exponential
from vertexai.preview.language_models import TextEmbeddingModel

# Embedding model definition from VertexAI PaLM API
embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
VECTOR_DIMENSIONS = 768
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(3))
def embed_text(text=[]):
    embeddings = embedding_model.get_embeddings(text)
    return [each.values for each in embeddings]

# Convert embeddings to bytes for Redis storage
def convert_embedding(emb: List[float]):
  return np.array(emb).astype(np.float32).tobytes()

### Embed text data
At the moment, our table in BigQuery (created above), contains records of hotel review posts that we wish to embed and make available for LLMs.

In order to conserve RAM usage of this machine, we will iterate over batches of posts from BigQuery, create embeddings, and write them to Redis, which is being used as a [vector database](https://redis.com/solutions/use-cases/vector-database).

In [32]:
import pandas as pd
import numpy as np

QUERY_TEMPLATE = f"""
SELECT id,review_title, review_text, hotel_name
FROM `{PROJECT_ID}.google_redis_llms.{TABLE_NAME}`
LIMIT {{limit}} OFFSET {{offset}};
"""

def query_bigquery_batches(
    max_rows: int,
    rows_per_batch: int,
    start_batch: int = 0
) -> Generator[pd.DataFrame, Any, None]:
    # Generate batches from a table in big query
    for offset in range(start_batch, max_rows, rows_per_batch):
        query = QUERY_TEMPLATE.format(limit=rows_per_batch, offset=offset)
        query_job = bq.query(query)
        rows = query_job.result()
        df = rows.to_dataframe()
        # Join title and text fields
        df["content"] = df.apply(lambda r: "Title: " + r.review_title + ". Content: " + r.review_text, axis=1)
        yield df


Below we define a few helper functions for processing a single row of data, writing batches to **Redis**, querying source data from **BigQuery**, and creating text embeddings with **Vertex AI**.

In [33]:
import math
from tqdm.auto import tqdm


# Redis key helper function
def redis_key(key_prefix: str, id: str) -> str:
  return f"{key_prefix}:{id}"

# Process a single dataset record
def process_record(record: dict) -> dict:
  return {
      'id': record['id'],
      'embedding': record['embedding'],
      'text': record['review_text'],
      'title': record['review_title']
  }

# Load batch of data into Redis as HASH objects
def load_redis_batch(
    redis_client: redis.Redis,
    dataset: list,
    key_prefix: str = "doc",
    id_column: str = "id",
):
    pipe = redis_client.pipeline()
    for i, record in enumerate(tqdm(dataset)):
        record = process_record(record)
        key = redis_key(key_prefix, record[id_column])
        pipe.hset(key, mapping=record)
    pipe.execute()

# Run the entire process
def create_embeddings_bigquery_redis(redis_client):
    # Create generator from BigQuery
    max_rows = 1000
    rows_per_batch = 100
    bq_content_query = query_bigquery_batches(max_rows, rows_per_batch)

    for batch in tqdm(bq_content_query):
      # Split batch into smaller chunks for embedding generation
      batch_splits = np.array_split(batch, math.ceil(rows_per_batch/5))
      # Create embeddings
      batch["embedding"] = [
          convert_embedding(embedding)
          for split in batch_splits
          for embedding in embed_text(split.content)
      ]
      # Write batch to Redis
      batch = batch.to_dict("records")
      load_redis_batch(redis_client, batch)


## 4. Load Embeddings
Now that we have a function to generate BigQuery batches, create text embeddings, and write batches to Redis, we can run the single function to process our entire dataset:

Enable: https://console.cloud.google.com/apis/library/aiplatform.googleapis.com?project=maven-advanced-llm

In [34]:
create_embeddings_bigquery_redis(redis_client)

0it [00:00, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

In [35]:
# Validate how many records are stored in Redis
redis_client.dbsize()

1000

## 5. Create Vector Index

Now that we have created embeddings that represent the text in our dataset and stored them in Redis, we will create a secondary index that enables efficient search over the embeddings. To learn more about the vector similarity features in Redis, [check out these docs](https://redis.io/docs/interact/search-and-query/search/vectors/) and [these Redis AI resources](https://github.com/RedisVentures/redis-ai-resources).

**Why do we need to enable search???**
Using Redis for vector similarity search allows us to retrieve chunks of text data that are **similar** or **relevant** to an input question or query. This will be extremely helpful for our sample generative ai / LLM application.

In [36]:
from redis.commands.search.field import (
    NumericField,
    TagField,
    TextField,
    VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query


INDEX_NAME = "google:idx"
PREFIX = "doc:"
VECTOR_FIELD_NAME = "embedding"

# Store vectors in redis and create index
def create_redis_index(
    redis_client: redis.Redis,
    vector_field_name: str = VECTOR_FIELD_NAME,
    index_name: str = INDEX_NAME,
    prefix: list = [PREFIX],
    dim: int = VECTOR_DIMENSIONS
  ):

    # Construct index
    try:
        redis_client.ft(index_name).info()
        print("Existing index found. Dropping and recreating the index", flush=True)
        redis_client.ft(index_name).dropindex(delete_documents=False)
    except:
        print("Creating new index", flush=True)

    # Create new index
    redis_client.ft(index_name).create_index(
        (
            VectorField(
                vector_field_name, "FLAT",
                {
                    "TYPE": "FLOAT32",
                    "DIM": dim,
                    "DISTANCE_METRIC": "COSINE",
                }
            )
        ),
        definition=IndexDefinition(prefix=prefix, index_type=IndexType.HASH)
    )

In [37]:
# Create index
create_redis_index(redis_client)

Creating new index


In [38]:
# Inspect index attributes
redis_client.ft(INDEX_NAME).info()

{'index_name': 'google:idx',
 'index_options': [],
 'index_definition': [b'key_type',
  b'HASH',
  b'prefixes',
  [b'doc:'],
  b'default_score',
  b'1'],
 'attributes': [[b'identifier',
   b'embedding',
   b'attribute',
   b'embedding',
   b'type',
   b'VECTOR',
   b'algorithm',
   b'FLAT',
   b'data_type',
   b'FLOAT32',
   b'dim',
   768,
   b'distance_metric',
   b'COSINE']],
 'num_docs': '1000',
 'max_doc_id': '1000',
 'num_terms': '0',
 'num_records': '1000',
 'inverted_sz_mb': '0',
 'vector_index_sz_mb': '3.0295867919921875',
 'total_inverted_index_blocks': '0',
 'offset_vectors_sz_mb': '0',
 'doc_table_size_mb': '0.0705718994140625',
 'sortable_values_size_mb': '0',
 'key_table_size_mb': '0.03100299835205078',
 'geoshapes_sz_mb': '0',
 'records_per_doc_avg': '1',
 'bytes_per_record_avg': '0',
 'offsets_per_term_avg': '0',
 'offset_bits_per_record_avg': 'nan',
 'hash_indexing_failures': '0',
 'total_indexing_time': '17.514999389648438',
 'indexing': '0',
 'percent_indexed': '1',


In [39]:
# Retreive single HASH from Redis
key = redis_client.keys()[1]
redis_client.hgetall(key)

{b'text': b'After paying in full at check in I was told the room would be ready \xe2\x80\x9cin 20 minutes\xe2\x80\x9d. Rather than standing in a cramped reception we went across the road for a coffee (husband and 2 small Kids). On our return we were given our keys.. go to our 1 bed apartment to find dirty towels on floor of bathroom and a suitcase and food wrappers in bedroom !!! We go back to reception and complete chaos. No one knew what was going on or who was in our room. 4 ladies in reception making many phone calls and looking intently at computers and refusing to give me any information, apologies or assurances. My 2 children extremely tired and becoming increasingly upset. After 2 hours from (from start of check in) manager finally appears and confirms no hotel room available for us. I had to frantically',
 b'embedding': b'W\xfe[=:\xacf\xbd\xc2\xee{\xbd*\x83\x90\xbc\xb7\xe1\x8f=\xfb\nn<\xa9\x83\xc4<^Y#\xbb{\xad\xdd<\xf0g*=W,V\xbcr\xc8#<\xb5\xb7\xeb<O\xbc\xd5\xbcv\xbf\xc9\xbc\x0

At this point, our **Redis** datastore is completely loaded with a subset of data from **BigQuery** including text embeddings created with **Vertex AI** PaLM APIs.

# Build LLM applications
With Redis fully loaded as a vector database and powerful PaLM APIs at our disposal, we can build a number of AI applications on this stack. Below we will briefly describe each of these applications and use cases

- **Document Retrieval** - search through documents to return only the most relevant to a given query.
- **Product Recommendations** - recommend products with similar attributes and descriptions to a product the shopper likes.
- **Chatbots** - provide a conversational interface for information retrieval or customer service.
- **Text Summarization & Generation** - Generate new copy from sources of relevant information to accelerate team output.
- **Fraud/Anomaly Detection** - identify anomalous and potentially fraudulent events, transactions, or items based on attribute similarity of other known entities.

# LLM Design Patterns

In order to build these kinds of apps, below we highlight 4 technical design patterns and techniques where Redis Enterprise comes in handy to boost LLM performance:

- **Semantic Search**
- **Retrieval Augmented Generation (RAG)**
- **Caching**
- **Memory**

Leveraging some combination of these patterns is recommended best practice, derived from enterprise use cases and open source users all over the world.

### Simple Semantic Search


**Semantic Search**, in the context of Large Language Models (LLMs), is a sophisticated search technique that goes beyond *literal* keyword matching to understand the contextual meaning and intent behind user queries. Leveraging the power of Google's Vertex AI platform and Redis' vector database capabilities, semantic search can map and extract deep-level knowledge from vast text datasets, including nuanced relationships and hidden patterns.

This allows applications to return search results that are contextually relevant, enhancing user experience by offering meaningful responses, even to complex or ambiguous search terms. Thus, semantic search not only boosts the accuracy and relevancy of search results but also empowers applications to interact with users in a more human-like, intuitive manner.

The general process of semantic search includes 3 steps:
1. Create query vector
2. Perform vector search
3. Review and return results

In [40]:
# 1. Create query vector
query = "What is the best hotel close to the Louvre?"
query_vector = embed_text([query])[0]

# Our query has been converted to a list of floats (this is a truncated view)
query_vector[:10]

[0.08164466172456741,
 -0.05301200598478317,
 -0.01196212973445654,
 -0.05362908914685249,
 0.0456140898168087,
 -0.020608795806765556,
 0.003969392739236355,
 0.038530223071575165,
 4.5475069782696664e-05,
 0.06444840133190155]

In [41]:
# Helper method to perform KNN similarity search in Redis
def similarity_search(query: str, k: int, return_fields: tuple, index_name: str = INDEX_NAME) -> list:
    # create embedding from query text
    query_vector = embed_text([query])[0]
    # create redis query object
    redis_query = (
        Query(f"*=>[KNN {k} @{VECTOR_FIELD_NAME} $embedding AS score]")
            .sort_by("score")
            .return_fields(*return_fields)
            .paging(0, k)
            .dialect(2)
    )
    # execute the search
    results = redis_client.ft(index_name).search(
        redis_query, query_params={"embedding": convert_embedding(query_vector)}
    )
    return pd.DataFrame([t.__dict__ for t in results.docs ]).drop(columns=["payload"])


In [42]:
# 2. Perform vector similarity search with given query
results = similarity_search(query, k=5, return_fields=("score", "title", "text"))

In [43]:
# 3. Review and return the results
display(results)

Unnamed: 0,id,score,title,text
0,doc:4923,0.165503203869,Wonderful hotel near the Louvre,We stayed at the Maison Favart for three days....
1,doc:4810,0.17958265543,5 stars,Super helpful staff. Nice lobby. Super helpful...
2,doc:4940,0.187327742577,Weekend in Paris,Great location - peaceful and very central - o...
3,doc:4926,0.190664052963,Fantastic location,We stayed for a short break for 3 nights . Fan...
4,doc:5498,0.191933095455,Great location near The Louvre,"Upon arrival, the staff at the hotel were very..."


Results above indicate that our search for recommended operating systems for software devs yielded some posts from Hacker News that might be helpful in answering this question.

**Interested in tuning the search results?**
- Try using a different [Distance Metric](https://redis.io/docs/interact/search-and-query/search/vectors/#creation-attributes-per-algorithm)
- Try using a different [Index Type](https://redis.io/docs/interact/search-and-query/search/vectors/#flat)

### Retrieval Augmented Generation (RAG)

**Retrieval Augmented Generation** (RAG), within the scope of Large Language Models (LLMs), is a technique that combines the knowledge of domain-specific data and generative models to enhance the production of contextually-rich question responses. In essence, *RAG* functions by retrieving relevant information from a knowledge base of documents or data before proceeding to generate a response. This allows generalized foundation models to gain access to these datasources at runtime, and is NOT the same thing as fine-tuning.

RAG exploits the strengths of Redis as a low-latency vector database for efficient retrieval operations and Google's Vertex AI to generate a coherent text response. In LLM applications, RAG enables a deeper comprehension of context, returning highly nuanced responses, even to intricate queries. This pattern enhances the interactive capability of applications, delivering more precise and informative responses, thereby significantly enriching the user experience.


In order to build a RAG pipeline for question answering, we need to use Vertex PaLM API for text generation (`text-bison@001`).

In [44]:
from vertexai.preview.language_models import TextGenerationModel

# Define generation model
generation_model = TextGenerationModel.from_pretrained("text-bison@001")

response = generation_model.predict(prompt="What is a large language model?")

print("Example response:\n", response.text)


Example response:
 A large language model (LLM) is a type of artificial intelligence (AI) model that can understand and generate human language. LLMs are trained on massive datasets of text and code, and they can learn to perform a wide variety of tasks, such as translating languages, writing different kinds of creative content, and answering your questions in an informative way.

LLMs are still under development, but they have the potential to revolutionize many industries. For example, LLMs could be used to create more accurate and personalized customer service experiences, to help doctors diagnose and treat diseases, and to even write entire books and movies.




In order to be able to answer questions **while referencing domain-specific sources** (like our sample hackernews dataset), we must build a RAG pipeline:

1. First perform **Semantic Search** with the user query on the knowledge base (stored in Redis) to find relevant sources that will help the language model answer and respond intelligently.

2. The sources (called context) are "stuffed" into the prompt (input).

3. Lastly, the full prompt is passed on to the language model for text generation.

In [45]:
def create_prompt(prompt_template: str, **kwargs) -> str:
  return prompt_template.format(**kwargs)

def rag(query: str, prompt: str, verbose: bool = True) -> str:
    """
    Simple pipeline for performing retrieval augmented generation with
    Google Vertex PaLM API and Redis Enterprise.
    """
    # Perform a vector similarity search in Redis
    if verbose:
        print("Pulling relevant data sources from Redis", flush=True)
    relevant_sources = similarity_search(query, k=3, return_fields=("text",))
    if verbose:
        print("Relevant sources found!", flush=True)
    # Combine the relevant sources and inject into the prompt
    sources_text = "-" + "\n-".join([source for source in relevant_sources.text.values])
    full_prompt = create_prompt(
        prompt_template=prompt,
        sources=sources_text,
        query=query
      )
    if verbose:
        print("\nFull prompt:\n\n", full_prompt, flush=True)
    # Perform text generation to get a response from PaLM API
    response = generation_model.predict(prompt=full_prompt)
    return response.text



Below is an example prompt template. Feel free to edit and tweak the initial sentence that sets the context for the language model to perform the action we are anticipating. The process of tuning and iterating on prompt design is widely refered to as "*prompt engineering*".

In [46]:
PROMPT = """You are a helpful virtual technology and IT assistant. Use the hotel reviews below as relevant context and sources to help answer the user question. Don't blindly make things up.

SOURCES:
{sources}

QUESTION:
{query}?

ANSWER:"""



In [47]:
query = "Best hotel near the Louvre in Paris?"
response = rag(query=query, prompt=PROMPT)
print(response)

Pulling relevant data sources from Redis
Relevant sources found!

Full prompt:

 You are a helpful virtual technology and IT assistant. Use the hotel reviews below as relevant context and sources to help answer the user question. Don't blindly make things up.

SOURCES:
-We stayed at the Maison Favart for three days. First of all, the hotel is close to the Louvre, so it is perfect for a visit there. The rooms and lobby of the hotel are very nicely decorated. The staff were very friendly and catered for all our needs. Would definitely come back next time!
-Great location - peaceful and very central - opposite the Opera House.  Fifteen minutes walk down to the Louvre and many good restaurants close by.  Very comfortable beds and pretty breakfast room.  Front of House staff couldn't be more helpful especially with directions and great restaurants ! 
-We stayed for a short break for 3 nights . Fantastic little boutique French hotel with a great location, Louve museum 15 mins, ritz 18 mins a

In [48]:
query = "What are some amazing hotels near Big ben?"
response = rag(query=query, prompt=PROMPT)
print(response)

Pulling relevant data sources from Redis
Relevant sources found!

Full prompt:

 You are a helpful virtual technology and IT assistant. Use the hotel reviews below as relevant context and sources to help answer the user question. Don't blindly make things up.

SOURCES:
-We had a fantastic stay! This was our second time in this wonderful hotel where everything has this extra something. From the beds , the food and the friendly people at work. We really enjoyed ourselves! 
-Excellent hotel, beds were so comfortable and rooms were clean and of a good size. My daughter and I also loved the free hot chocolate which we enjoyed in the outdoor patio area every evening. Very quiet peaceful hotel, and would highly recommend.
-We had a really wonderful stay the Grand Hotel.  The staff was always friendly and they went out of their way to make our experience lovely and memorable.  Our room was spacious and well-appointed.  The hotel was perfectly located for all the sights and easy access to the m

Clearly this example dataset (hackernews) is not the only example we could work with and it 's certainly not "production" ready out of the gate. This is also only utilizing a subset (1000 records) of the actual data for teaching purposes.

However, this example demonstrates how you can combine external sources of data and LLMs to surface more useful information.

### LLM Caching

**LLM Caching** is an advanced strategy used to optimize the performance of Large Language Model (LLM) applications. Utilizing the ultra-fast, in-memory data store of Redis, LLM Caching enables the storage and quick retrieval of pre-computed responses generated by Google's Vertex AI (PaLM). This means the computationally expensive process of response generation, especially for repetitive queries, is significantly reduced, resulting in faster response times and efficient resource utilization. This pairing of Google's powerful generative AI capabilities with Redis' high-performance caching system thus facilitates a more scalable and performant architecture for LLM applications, improving overall user experience and application reliability.

There are primarily two modes of caching for LLMs:
- Standard Caching
- Semantic Caching

#### Standard Caching

Standard caching for LLMs involves simply matching an exact phrase or prompt that has been provided before. We can return the previously used response from the LLM in order to speed up the throughput of the system overall and reduce redundant computation.

In [49]:
# Some boiler plate helper methods
import hashlib

def hash_input(prefix: str, _input: str):
    return prefix + hashlib.sha256(_input.encode("utf-8")).hexdigest()

def standard_check(key: str):
  # function to perform a standard cache check
    res = redis_client.hgetall(key)
    if res:
      return res[b'response'].decode('utf-8')

def cache_response(query: str, response: str):
    key = hash_input("llmcache:", query)
    redis_client.hset(key, mapping={"prompt": query, "response": response})

# LLM Cache wrapper / decorator function
def standard_llmcache(llm_callable):
    def wrapper(*args, **kwargs):
        # Check LLM Cache first
        key = hash_input("llmcache:", *args, **kwargs)
        response = standard_check(key)
        # Check if we have a cached response we can use
        if response:
            return response
        # Otherwise execute the llm callable here
        response = llm_callable(*args, **kwargs)
        cache_response(query, response)
        return response

    return wrapper

In [50]:
# Define a function that invokes the PaLM API wrapped with a cache check

@standard_llmcache
def ask_palm(query: str):
  prompt = PROMPT
  response = rag(query, prompt, verbose=False)
  return response

In [51]:
%%time

query = "What are some amazing hotels near Big ben?"

ask_palm(query)

CPU times: user 18.8 ms, sys: 1.99 ms, total: 20.8 ms
Wall time: 989 ms


'The Grand Hotel is a great option for those looking for a hotel near Big Ben. The hotel is located just a short walk from the iconic landmark, and offers a variety of amenities, including a free breakfast buffet, a fitness center, and a rooftop terrace with views of the city. The rooms are spacious and well-appointed, and the staff is friendly and helpful.'

Now if we ask the same question again -- we should get the same response in near real-time.

In [52]:
%%time

ask_palm(query)

CPU times: user 2.06 ms, sys: 2 µs, total: 2.06 ms
Wall time: 39.5 ms


'The Grand Hotel is a great option for those looking for a hotel near Big Ben. The hotel is located just a short walk from the iconic landmark, and offers a variety of amenities, including a free breakfast buffet, a fitness center, and a rooftop terrace with views of the city. The rooms are spacious and well-appointed, and the staff is friendly and helpful.'

#### Semantic Caching - Assignment
Implement Semantic Caching and try to write data to GCP big query and retrieve from it - you can also use json locally


### Memory

Giving your application access to "memory" for chat history is a common technique to improve the models ability to reason through recent or past conversations, gain context from previous answers, and thus provide a more accurate and acceptable response.

Below we setup simple helper functions to persist and load conversation history in a Redis List data structure.

In [53]:
import json

def add_message(prompt: str, response: str):
    msg = {
        "prompt": prompt,
        "response": response
    }
    redis_client.lpush("chat-history", json.dumps(msg))

def get_messages(k: int = 5):
    return [json.loads(msg) for msg in redis_client.lrange("chat-history", 0, k)]

In [54]:

query = "Do you have any advice for getting started in the tech field as a software dev?"
response = rag(query, PROMPT, verbose=False)

print(response)

add_message(query, response)

There are a few things you can do to get started in the tech field as a software developer. First, you need to have a strong foundation in computer science. This includes knowledge of data structures, algorithms, and operating systems. You can learn these concepts by taking courses at a university or online. Second, you need to have experience developing software. You can get this experience by working on personal projects or by contributing to open source projects. Third, you need to be able to demonstrate your skills to potential employers. This can be done by creating a portfolio of your work or by taking part in coding competitions.


In [55]:
query = "What if I am still in college, any tips there?"
response = rag(query, PROMPT, verbose=False)

print(response)

add_message(query, response)

If you are still in college, you may want to consider staying in a hostel. Hostels are typically more affordable than hotels, and they are a great way to meet other travelers. However, it is important to do your research before booking a hostel, as some hostels are better than others.


In [56]:
get_messages()

[{'prompt': 'What if I am still in college, any tips there?',
  'response': 'If you are still in college, you may want to consider staying in a hostel. Hostels are typically more affordable than hotels, and they are a great way to meet other travelers. However, it is important to do your research before booking a hostel, as some hostels are better than others.'},
 {'prompt': 'Do you have any advice for getting started in the tech field as a software dev?',
  'response': 'There are a few things you can do to get started in the tech field as a software developer. First, you need to have a strong foundation in computer science. This includes knowledge of data structures, algorithms, and operating systems. You can learn these concepts by taking courses at a university or online. Second, you need to have experience developing software. You can get this experience by working on personal projects or by contributing to open source projects. Third, you need to be able to demonstrate your skills

# Clean up

In [66]:
# Clean up bigquery
bq.delete_table(TABLE_ID, not_found_ok=True)

bq.delete_dataset(
    DATASET_ID, delete_contents=True, not_found_ok=True
)


In [67]:
# Clean up redis
!redis-stack-server stop

/bin/bash: line 1: redis-stack-server: command not found
