# Google LLMs Reference Architecture with Redis Enterprise

<a href="https://colab.research.google.com/github/RedisVentures/redis-google-llms/blob/main/BigQuery_Palm_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook serves as a getting started guide for working with LLMs on Google Cloud Platform with Redis Enterprise.

## Intro
Google's Vertex AI has expanded its capabilities by introducing [Generative AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). This advanced technology comes with a specialized [in-console studio experience](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart), a [dedicated API](https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart) and [Python SDK](https://cloud.google.com/vertex-ai/docs/python-sdk/use-vertex-ai-python-sdk) designed for deploying and managing instances of Google's powerful PaLM language models (more sample code). With a distinct focus on text generation, summarization, chat completion, and embedding creation, PaLM models are reshaping the boundaries of natural language processing and machine learning.

Redis Enterprise offers robust vector database features, providing an efficient API for vector index creation, management, distance metric selection, similarity search, and hybrid filtering. When coupled with its versatile data structures - including lists, hashes, JSON, and sets - Redis Enterprise shines as the optimal solution for crafting high-quality Large Language Model (LLM)-based applications. It embodies a streamlined architecture and exceptional performance, making it an instrumental tool for production environments.

___
## Contents
- Setup
    1. Prerequisites
    2. Create BigQuery Tables
    3. Generate Embeddings
        
        a. Embed Text

    4. Load Embeddings to Redis
    5. Create Index
- LLM Application
    1. Basic Semantic Search
    2. Retrieval Augmented Generation (RAG)
    3. Caching
    4. Memory
- Cleanup

___

# Setup

## 1. Prerequisites
Before we begin, we must install some required libraries, authenticate with Google, create a Redis database, and initialize other required components.

### Install required libraries

In [4]:
!pip install redis "google-cloud-aiplatform==1.25.0" --upgrade --user

Collecting redis
  Downloading redis-4.6.0-py3-none-any.whl (241 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/241.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m235.5/241.1 kB[0m [31m8.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m241.1/241.1 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-cloud-aiplatform==1.25.0
  Downloading google_cloud_aiplatform-1.25.0-py2.py3-none-any.whl (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m53.8 MB/s[0m eta [36m0:00:00[0m
Collecting google-cloud-resource-manager<3.0.0dev,>=1.3.3 (from google-cloud-aiplatform==1.25.0)
  Downloading google_cloud_resource_manager-1.10.2-py2.py3-none-any.whl (321 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.3/321.3 kB[0m [31m40.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting shapel

^^^ If prompted press the Restart button to restart the kernel. ^^^

### Install Redis locally (optional)
If you have a Redis db running elsewhere with [Redis Stack](https://redis.io/docs/about/about-stack/) installed, you don't need to run it on this machine. You can skip to the "Connect to Redis server" step.

In [1]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb focal main
Starting redis-stack-server, database path /var/lib/redis-stack


gpg: cannot open '/dev/tty': No such device or address
(23) Failed writing body


### Connect to Redis server
Replace the connection params below with your own if you are connecting to an external Redis instance.

In [2]:
import os
import redis

# Redis connection params
REDIS_HOST = os.getenv("REDIS_HOST", "localhost") #"redis-12110.c82.us-east-1-2.ec2.cloud.redislabs.com"
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      #12110
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "") #"pobhBJP7Psicp2gV0iqa2ZOc1WdXXXXX"

# Create Redis client
redis_client = redis.Redis(
  host=REDIS_HOST,
  port=REDIS_PORT,
  password=REDIS_PASSWORD
)

# Test connection
redis_client.ping()

True

In [3]:
# Clear Redis database (optional)
redis_client.flushdb()

True

### Authenticate to Google Cloud

In [4]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [6]:
from getpass import getpass

# input your GCP project ID and region for Vertex AI
PROJECT_ID = getpass("PROJECT_ID:") #'central-beach-194106'
REGION = input("REGION:") #'us-central1'

PROJECT_ID:··········
REGION:us-central1


### Initialize Vertex AI Components



In [15]:
import vertexai

vertexai.init(project=PROJECT_ID, location=REGION)

## 2. Create BigQuery Table
The second step involves preparing the dataset for our LLM applications. We utilize a free (public) hacker news dataset from **Google BigQuery**.

*Leveraging BigQuery is a common pattern for building ML applications because of it's powerful query and analytics capabilities.*

We will start by creating our own big query table for the dataset. Additionally, if you have a different dataset to work with you can follow a similar pattern, or even load a CSV from a Google Cloud Storage bucket into BigQuery.

### Create source table
First step is to create a new table from the public datasource.

In [7]:
from google.cloud import bigquery

# Create bigquery client
bq = bigquery.Client(project=PROJECT_ID)

TABLE_NAME = input("Input a Big Query TABLE_NAME:") #hackernews
DATASET_ID = f"{PROJECT_ID}.google_redis_llms"

# Create dataset
dataset = bigquery.Dataset(DATASET_ID)
dataset.location = "US"
dataset = bq.create_dataset(dataset, timeout=30, exists_ok=True)

# Define table ID
TABLE_ID = f"{DATASET_ID}.{TABLE_NAME}"

Input a Big Query TABLE_NAME:hackernews


In [8]:
# Create source table
def create_source_table(table_id: str):
    create_job = bq.query(f'''
      CREATE OR REPLACE TABLE `{table_id}` AS (
        SELECT
          title, text, time, timestamp, id
        FROM `bigquery-public-data.hacker_news.full`
        WHERE
          type ='story'
        LIMIT 1000
      )
    ''')
    res = create_job.result() # Make an API request
    table = bq.get_table(table_id)
    return table

In [9]:
# Create table
table = create_source_table(TABLE_ID)

# List schema
table.schema

[SchemaField('title', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('text', 'STRING', 'NULLABLE', None, None, (), None),
 SchemaField('time', 'INTEGER', 'NULLABLE', None, None, (), None),
 SchemaField('timestamp', 'TIMESTAMP', 'NULLABLE', None, None, (), None),
 SchemaField('id', 'INTEGER', 'NULLABLE', None, None, (), None)]

The dataset above contains records of hacker news posts including the **title**, **text**, **time**, **id** and **timestamp**. Let's pull a few sample rows and inspect.

In [13]:
# Query a sample record from BQ
query_job = bq.query(f'''
SELECT *
FROM {TABLE_ID}
LIMIT 5
''')

query_job.result().to_dataframe()

Unnamed: 0,title,text,time,timestamp,id
0,D,ZiKoNWuK6Bu0,1611707194,2021-01-27 00:26:34+00:00,25922721
1,Ads,asddddddddddddddd,1585224810,2020-03-26 12:13:30+00:00,22692739
2,Help,Is there any one who can help me better under...,1490188004,2017-03-22 13:06:44+00:00,13930560
3,Hola,Como Estas,1570682296,2019-10-10 04:38:16+00:00,21210935
4,Advice,I am New to HN. What is that 1 advice you woul...,1667047291,2022-10-29 12:41:31+00:00,33383542


## 2. Generate Embeddings

### Create text embeddings with Vertex AI embedding model
Use the [Vertex AI API for text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings), developed by Google.

> Text embeddings are a dense vector representation of a piece of content such that, if two pieces of content are semantically similar, their respective embeddings are located near each other in the embedding vector space. This representation can be used to solve common NLP tasks, such as:
> - **Semantic search**: Search text ranked by semantic similarity.
> - **Recommendation**: Return items with text attributes similar to the given text.
> - **Classification**: Return the class of items whose text attributes are similar to the given text.
> - **Clustering**: Cluster items whose text attributes are similar to the given text.
> - **Outlier Detection**: Return items where text attributes are least related to the given text.

The `textembedding-gecko` model accepts a maximum of 3,072 input tokens (i.e. words) and outputs 768-dimensional vector embeddings.

### Define embedding helper function
We define a helper function, `embedding_model_with_backoff`, to create embeddings from a list of texts while making it resilient to [Vertex AI API quotas](https://cloud.google.com/vertex-ai/docs/quotas) via [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff).

We also define a method to convert an array of floats to a byte string for efficient storage in Redis (later on).



In [25]:
from typing import Generator, List, Any

from tenacity import retry, stop_after_attempt, wait_random_exponential
from vertexai.preview.language_models import TextEmbeddingModel

# Embedding model definition from VertexAI PaLM API
embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")
VECTOR_DIMENSIONS = 768

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(3))
def embed_text(text=[]):
    embeddings = embedding_model.get_embeddings(text)
    return [each.values for each in embeddings]

# Convert embeddings to bytes for Redis storage
def convert_embedding(emb: List[float]):
  return np.array(emb).astype(np.float32).tobytes()

### Embed text
At the moment, our table in BigQuery (created above), contains records of the hacker news posts that we wish to embed and make available for LLMs.

In order to conserve RAM usage of this machine, we will iterate over batches of posts from BigQuery, create embeddings, and write them to Redis, which is being used as a [vector database](https://redis.com/solutions/use-cases/vector-database).

In [17]:
import pandas as pd
import numpy as np

QUERY_TEMPLATE = f"""
SELECT id, title, text
FROM {TABLE_ID}
LIMIT {{limit}} OFFSET {{offset}};
"""

def query_bigquery_batches(
    max_rows: int,
    rows_per_batch: int,
    start_batch: int = 0
) -> Generator[pd.DataFrame, Any, None]:
    # Generate batches from a table in big query
    for offset in range(start_batch, max_rows, rows_per_batch):
        query = QUERY_TEMPLATE.format(limit=rows_per_batch, offset=offset)
        query_job = bq.query(query)
        rows = query_job.result()
        df = rows.to_dataframe()
        # Join title and text fields
        df["content"] = df.apply(lambda r: "Title: " + r.title + ". Content: " + r.text, axis=1)
        yield df


Below we define a few helper functions for processing a single row of data, writing batches to **Redis**, querying source data from **BigQuery**, and creating text embeddings with **Vertex AI**.

In [18]:
import math
from tqdm.auto import tqdm


# Redis key helper function
def redis_key(key_prefix: str, id: str) -> str:
  return f"{key_prefix}:{id}"

# Process a single dataset record
def process_record(record: dict) -> dict:
  return {
      'id': record['id'],
      'embedding': record['embedding'],
      'text': record['text'],
      'title': record['title']
  }

# Load batch of data into Redis as HASH objects
def load_redis_batch(
    redis_client: redis.Redis,
    dataset: list,
    key_prefix: str = "doc",
    id_column: str = "id",
):
    pipe = redis_client.pipeline()
    for i, record in enumerate(tqdm(dataset)):
        record = process_record(record)
        key = redis_key(key_prefix, record[id_column])
        pipe.hset(key, mapping=record)
    pipe.execute()

# Run the entire process
def create_embeddings_bigquery_redis(redis_client):
    # Create generator from BigQuery
    max_rows = 1000
    rows_per_batch = 100
    bq_content_query = query_bigquery_batches(max_rows, rows_per_batch)

    for batch in tqdm(bq_content_query):
      # Split batch into smaller chunks for embedding generation
      batch_splits = np.array_split(batch, math.ceil(rows_per_batch/5))
      # Create embeddings
      batch["embedding"] = [
          convert_embedding(embedding)
          for split in batch_splits
          for embedding in embed_text(split.content)
      ]
      # Write batch to Redis
      batch = batch.to_dict("records")
      load_redis_batch(redis_client, batch)



## 3. Load embeddings to Redis
Now that we have a function to generate BigQuery batches, create text embeddings, and write batches to Redis, we can run the single function to process our entire dataset:

In [19]:
create_embeddings_bigquery_redis(redis_client)

0it [00:00, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

In [20]:
# Validate how many records are stored in Redis
redis_client.dbsize()

1000

### Create vector index

Now that we have created embeddings that represent the text in our dataset and stored them in Redis, we will create a secondary index that enables efficient search over the embeddings. To learn more about the vector similarity features in Redis, [check out these docs](https://redis.io/docs/interact/search-and-query/search/vectors/) and [these Redis AI resources](https://github.com/RedisVentures/redis-ai-resources).

**Why do we need to enable search???**
Using Redis for vector similarity search allows us to retrieve chunks of text data that are **similar** or **relevant** to an input question or query. This will be extremely helpful for our sample generative ai / LLM application.

In [21]:
from redis.commands.search.field import (
    NumericField,
    TagField,
    TextField,
    VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query


INDEX_NAME = "google:idx"
PREFIX = "doc:"
VECTOR_FIELD_NAME = "embedding"

# Store vectors in redis and create index
def create_redis_index(
    redis_client: redis.Redis,
    id_column: str = "id",
    vector_field_name: str = VECTOR_FIELD_NAME,
    index_name: str = INDEX_NAME,
    prefix: list = [PREFIX],
    dim: int = VECTOR_DIMENSIONS
  ):

    # Construct index
    try:
        redis_client.ft(index_name).info()
        print("Existing index found. Dropping and recreating the index", flush=True)
        redis_client.ft(index_name).dropindex(delete_documents=False)
    except:
        print("Creating new index", flush=True)

    # Create new index
    redis_client.ft(index_name).create_index(
        (
            VectorField(
                vector_field_name, "FLAT",
                {
                    "TYPE": "FLOAT32",
                    "DIM": dim,
                    "DISTANCE_METRIC": "COSINE",
                }
            )
        ),
        definition=IndexDefinition(prefix=prefix, index_type=IndexType.HASH)
    )

In [22]:
# Create index
create_redis_index(redis_client)

Creating new index


In [23]:
# Inspect index attributes
redis_client.ft(INDEX_NAME).info()

{'index_name': 'google:idx',
 'index_options': [],
 'index_definition': [b'key_type',
  b'HASH',
  b'prefixes',
  [b'doc:'],
  b'default_score',
  b'1'],
 'attributes': [[b'identifier',
   b'embedding',
   b'attribute',
   b'embedding',
   b'type',
   b'VECTOR']],
 'num_docs': '221',
 'max_doc_id': '221',
 'num_terms': '0',
 'num_records': '221',
 'inverted_sz_mb': '0',
 'vector_index_sz_mb': '17592186044416',
 'total_inverted_index_blocks': '0',
 'offset_vectors_sz_mb': '0',
 'doc_table_size_mb': '0.016265869140625',
 'sortable_values_size_mb': '0',
 'key_table_size_mb': '0.0088224411010742188',
 'records_per_doc_avg': '1',
 'bytes_per_record_avg': '0',
 'offsets_per_term_avg': '0',
 'offset_bits_per_record_avg': '-nan',
 'hash_indexing_failures': '0',
 'indexing': '1',
 'percent_indexed': '0.221',
 'gc_stats': [b'bytes_collected',
  b'0',
  b'total_ms_run',
  b'0',
  b'total_cycles',
  b'0',
  b'average_cycle_time_ms',
  b'-nan',
  b'last_run_time_ms',
  b'0',
  b'gc_numeric_trees_mi

In [24]:
# Retreive single HASH from Redis
key = redis_client.keys()[1]
redis_client.hgetall(key)

{b'text': b'Honda Civic is a small family car which is manufactured by Japanese corporation called Honda. This car has gone from different generations and then becomes up market, large in size and more comfortable. In Pakistan this car is very popular and mostly peoples like buy and drive. find more information about it by visiting our website.',
 b'id': b'3347264',
 b'title': b'Honda Civic',
 b'embedding': b'F\x02\x15\xbc&\xbe?\xbdI\x0f\x12=\r\xae\xee\xbb\n\xff\xfc<S\x19\xe6\xbc\xe8e4=VW1=C\xe0\t\xbd\x15\xde \xbd5\x04\x85<\x8b\xe5$\xbd\x06`\xf6;F\xe5\x1a=\xbe5\x18\xbd\x9a5\'<\x8f\x15\xc0\xbdp\xb9\xd2\xbbf\x8a\x10=\tt\xb1:\xff\x88\xcf\xbdz{\xca<{?\x90\xbb\xc6A\xa0\xbb\xfci\xf6\xbc\x889k\xbdY\x9f:\xbbh_\x13=\xa1\xdf\x0b\xbd\x06\xe1\x92<\x9b\xa38<\x0cD\x11<=\x1c\x9a\xbc\xda\xc0\xad\xbc\xb3A\x8e\xbc-\x16\xd4<\xb6\x99\xaa\xbc\xc4\xc0\xc5<Q`\x86=\xa63\xfe<\x0f\xc6\x8a<\xe0-\xdc\xbc\xd1\r\xa9<\xc8\xe0:=\x89\xcb1\xbc\x0c%\x0c\xb8\xa9j\x9b\xbc\x8d\x9e\xbf<\x81\xa63\xbc\xe0w\xa8\xbc \xeb\xf4\xb

At this point, our **Redis** datastore is completely loaded with a subset of data from **BigQuery** including text embeddings created with **Vertex AI** PaLM APIs.

## 3. Build LLM applications
With Redis fully loaded as a vector database and powerful PaLM APIs at our disposal, we can build a number of sample LLM applications including:.

- **Simple Semantic Search**
- **RAG** (Retrieval Augmented Generation)

We will also highlight two important use cases for leveraging Redis within production LLM apps.

- **Caching**
- **Memory**

### Simple Semantic Search


**Semantic Search**, in the context of Large Language Models (LLMs) and end-user applications, is a sophisticated search technique that goes beyond *literal* keyword matching to understand the contextual meaning and intent behind user queries. Leveraging the power of Google's Vertex AI platform and Redis' vector database capabilities, semantic search can map and extract deep-level knowledge from vast text datasets, including nuanced relationships and hidden patterns.

This allows applications to return results that are contextually relevant, enhancing user experience by offering meaningful responses, even to complex or ambiguous search terms. Thus, semantic search not only boosts the accuracy and relevancy of search results but also empowers applications to interact with users in a more human-like, intuitive manner.

In [26]:
query = "What is the best computer operating system for software dev?"
query_vector = embed_text([query])[0]

In [27]:
# Our query has been converted to a list of floats (this is a truncated view)
query_vector[:10]

[0.019742609933018684,
 -0.006285817362368107,
 0.057011738419532776,
 0.0630318894982338,
 0.019907860085368156,
 -0.045217469334602356,
 0.0657828152179718,
 0.027572786435484886,
 -0.056532640010118484,
 0.003752157324925065]

In [2]:
# Helper method to perform KNN similarity search in Redis
def similarity_search(query: str, k: int) -> list:
  query_vector = embed_text([query])[0]
  redis_query = (
      Query(f"*=>[KNN {k} @{VECTOR_FIELD_NAME} $embedding AS score]")
          .sort_by("score")
          .return_fields("score", "title", "text")
          .paging(0, k)
          .dialect(2)
  )

  results = redis_client.ft(INDEX_NAME).search(
      redis_query, query_params={"embedding": convert_embedding(query_vector)}
  )

  return pd.DataFrame([t.__dict__ for t in results.docs ]).drop(columns=["payload"])


In [29]:
# Test it out!

results = similarity_search(query, k=5)

display(results)

Unnamed: 0,id,score,title,text
0,doc:23429219,0.276380896568,Ask HN: Laptop and OS setup for total web deve...,As a web developer I was handed a MacBook a co...
1,doc:14396292,0.326441407204,Ask HN: Recommend me a linux compatible laptop,Looking for a new laptop. I do full stack deve...
2,doc:9320563,0.329460144043,Ask HN: Best server setup for running machine ...,I&#x27;m into applying machine learning for la...
3,doc:24626052,0.332834243774,What is the best accounting software for a sma...,"I just stumblupon this topic, so like to ask y..."
4,doc:6132351,0.345210790634,Ask HN: Software Stack for WebApp Python vs. R...,I have been seriously out of the hard core tec...


Results above indicate that our search for recommended operating systems for software devs yielded some posts from Hacker News that might be helpful in answering this question.

**Interested in tuning the search results?**
- Try using a different [Distance Metric](https://redis.io/docs/interact/search-and-query/search/vectors/#creation-attributes-per-algorithm)
- Try using a different [Index Type](https://redis.io/docs/interact/search-and-query/search/vectors/#flat)

### Retrieval Augmented Generation (RAG)

**Retrieval Augmented Generation** (RAG), within the scope of Large Language Models (LLMs), is a technique that combines the knowledge of domain-specific data and generative methods to enhance the production of contextually-rich question responses. In essence, RAG functions by retrieving relevant information from a knowledge base of documents or data before proceeding to generate a response. It exploits the strengths of Redis as a low latency vector database for efficient retrieval operations and Google's Vertex AI to generate a coherent and relevant text response. In LLM applications, RAG thus enables a deeper comprehension of context, returning highly nuanced responses, even to intricate queries. This approach enhances the interactive capability of applications, delivering more precise and informative responses, thereby significantly enriching the user experience.


In order to build a RAG pipeline for question answering, we need to use Vertex PaLM API for text generation (`text-bison@001`).

In [1]:
from vertexai.preview.language_models import TextGenerationModel

# Define generation model
generation_model = TextGenerationModel.from_pretrained("text-bison@001")

simple_prompt = "What is a large language model?"
response = generation_model.predict(prompt=simple_prompt)

print(response.text)


ModuleNotFoundError: ignored

In order to be able to answer questions referencing domain-specific sources (like our sample hackernews dataset), we must build a RAG pipeline:

1. This technique requires we perform **Semantic Search** with the user query on the knowledge base (stored in Redis) to find relevant sources that will help the language model answer or respond intelligently.

2. The sources (called context) are "stuffed" into the prompt (input).

3. Lastly, the full prompt is passed on to the language model for text generation.

In [None]:
def rag(query: str, prompt: str) -> str:
    """
    Simple pipeline for performing retrieval augmented generation with
    Google Vertex PaLM API and Redis Enterprise.
    """
    # Perform a vector similarity search in Redis
    print("Pulling relevant data sources from Redis", flush=True)
    relevant_sources = similarity_search(query, k=3)
    print("Relevant sources found!", flush=True)
    full_prompt = prompt.format(sources=relevant_sources.text, query=query)
    print("Full prompt:", full_prompt, flush=True)
    return full_prompt
    #response = generation_model.predict(prompt=full_prompt)
    #return response.text



Below is an example prompt. Feel free to edit and tweak the initial sentence that sets the context for the language model to perform the action we are anticipating. The process of tuning and iterating on prompt design is widely refered to as "*prompt engineering*".

In [None]:
prompt = """Your are a helpful technology and IT assistant. Use the relevant content and sources in the hacker news posts below to answer the following user question.

SOURCES:
{sources}

QUESTION:
{query}?

ANSWER:"""



In [None]:
response = rag(query="", prompt=prompt)

### LLM Caching

**LLM Caching** is an advanced strategy used to optimize the performance of Large Language Model (LLM) applications. Utilizing the ultra-fast, in-memory data store of Redis, LLM Caching enables the storage and quick retrieval of pre-computed responses generated by Google's Vertex AI (PaLM). This means the computationally expensive process of response generation, especially for repetitive queries, is significantly reduced, resulting in faster response times and efficient resource utilization. This pairing of Google's powerful generative AI capabilities with Redis' high-performance caching system thus facilitates a more scalable and performant architecture for LLM applications, improving overall user experience and application reliability.

There are primarily two modes of caching:
- Standard Caching
- Semantic Caching

### Memory

TODO

## Hello Palm!

A large language model (LLM) is a type of artificial intelligence (AI) model that can understand and generate human language. LLMs are trained on massive datasets of text and code, and they can learn to perform a wide variety of tasks, such as translating languages, writing different kinds of creative content, and answering your questions in an informative way.

LLMs are still under development, but they have the potential to revolutionize many industries. For example, LLMs could be used to create more accurate and personalized customer service experiences, to help doctors diagnose and treat diseases, and to even write entire books and movies.




## Hello Chat!

In [None]:

chat_model = ChatModel.from_pretrained("chat-bison@001")

chat = chat_model.start_chat()

print(
    chat.send_message(
        """
Hello! Can you write a 300 word abstract for a research paper I need to write about the impact of generative AI on society?
"""
    )
)


print(
    chat.send_message(
        """
Could you give me a catchy title for the paper?
"""
    )
)

Generative AI (GAN) is a type of machine learning that uses artificial neural networks to create new, original content. This can include images, text, music, and even videos. GANs have the potential to revolutionize many industries, from healthcare to advertising. However, there are also concerns about the potential negative impacts of GANs, such as the creation of fake news and deepfakes.

In this paper, we explore the potential impact of GANs on society. We first discuss the benefits of GANs, such as their ability to create new and original content, to solve real-world problems, and to democratize creativity
Generative AI: The Future of Creativity
