## Install required libraries

<a href="https://colab.research.google.com/github/RedisVentures/redis-google-llms/blob/main/BigQuery_Palm_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install redis "google-cloud-aiplatform==1.25.0" --upgrade --user



^^^ If prompted press the Restart button to restart the kernel. ^^^

In [2]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb focal main
Starting redis-stack-server, database path /var/lib/redis-stack


## Connect to Redis server

In [3]:
import os
import redis
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-12110.c82.us-east-1-2.ec2.cloud.redislabs.com"
#REDIS_PORT=12110
#REDIS_PASSWORD="pobhBJP7Psicp2gV0iqa2ZOc1WdXXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"
redis = redis.Redis(
  host=REDIS_HOST,
  port=REDIS_PORT,
  password=REDIS_PASSWORD)
redis.ping()

True

## Authenticate to Google Cloud

In [4]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [5]:
PROJECT_ID = 'central-beach-194106'

## Big Query SQL to Datafame

In [6]:
from google.cloud import bigquery

client = bigquery.Client(project=PROJECT_ID)

df = client.query('''
SELECT
  title, text, time, timestamp, id
FROM `bigquery-public-data.hacker_news.full`
WHERE
  type ='story'
LIMIT 1000
''').to_dataframe()

display(df)

Unnamed: 0,title,text,time,timestamp,id
0,Ultimate Guide of Twitter Tips & Tricks,Ultimate Guide of Twitter Tips &#38; Tricks,1269407340,2010-03-24 05:09:00+00:00,1215015
1,Placeholder,Mind the gap.,1401561740,2014-05-31 18:42:20+00:00,2100665
2,Placeholder,Mind the gap.,1401561740,2014-05-31 18:42:20+00:00,4774206
3,Listen it u will be lost,for soul of ur body,1394988993,2014-03-16 16:56:33+00:00,7410144
4,Ejendomsmægler,Nice nice artikel thanks.,1369741942,2013-05-28 11:52:22+00:00,5779378
...,...,...,...,...,...
995,Watch Haye vs Chisora live boxing on box nation,"Hai dude, i think you already have known that ...",1342145150,2012-07-13 02:05:50+00:00,4237879
996,Ask YC: Do you process bounced emails from you...,"Hi, we're in the process of building the bounc...",1200258277,2008-01-13 21:04:37+00:00,97960
997,Ask HN: What to Do with £100K ($132k),Hi HN!<p>I have a main account but am using a ...,1641073281,2022-01-01 21:41:21+00:00,29763133
998,How Q&A Model can help your business?,Q&#38;A sites -in my opinion- is a very simple...,1290499427,2010-11-23 08:03:47+00:00,1932980


## Redis Helper functions

In [7]:
from tqdm.auto import tqdm
import numpy as np
import pandas as pd
tqdm.pandas()


# Load Pandas dataframe to Redis as a HASH
def load_dataframe(redis, df, key_prefix="tweet", id_column="id", pipe_size=100):
    records = df.to_dict(orient="records")
    pipe = redis.pipeline()
    i=1
    for record in tqdm(records):
        #print(record)
        # Convert <NA> values to an empty string, timestamp, bool to string representation
        # Not sure if it belons here or in df API, but the problem it addresses is specific to redis-py
        converted_record = {
            key: '' if pd.isna(value) else str(value) if isinstance(value, (pd.Timestamp, bool)) else value
            for key, value in record.items()
        }
        #print(converted_record)
        record=converted_record
        i=i+1
        key = f"{key_prefix}:{record[id_column]}"
        pipe.hset(key, mapping=record)
        if (i+1) % pipe_size == 0:
          res=pipe.execute()
    pipe.execute()


from redis.commands.search.field import (
    NumericField,
    TagField,
    TextField,
    VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query

# Create Redis Vectr Index
def create_redis_index(redis, vector_field_name = "text_embedding", idxname = "google:idx", prefix = ["google:"], dim = 384):
  try:
    redis.ft(idxname).dropindex()
    print("Existing index found. Dropping and recreating the index")
  except:
    print("creating index")

  # Create an index
  indexDefinition = IndexDefinition(prefix=prefix, index_type=IndexType.HASH)
  redis.ft(idxname).create_index(
      (
          VectorField(vector_field_name, "HNSW", {  "TYPE": "FLOAT32",
                                                    "DIM": dim,
                                                    "DISTANCE_METRIC": "COSINE",
                                                  })
      ),
      definition=indexDefinition
  )

## Init Vertex AI

In [8]:
import vertexai

vertexai.init(project=PROJECT_ID, location="us-central1")

In [9]:
from vertexai.preview.language_models import (ChatModel, InputOutputTextPair,
                                              TextEmbeddingModel,
                                              TextGenerationModel)

### Init VertexAI embeddings.

Vertex AI imposes limits for API call, so exponential backoff might be required.
Currently not used!
TODO: add Redis-specific type conversion `.astype(np.float32).tobytes()`

In [10]:
embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")

from tenacity import retry, stop_after_attempt, wait_random_exponential

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(3))
def embedding_model_with_backoff(text=[]):
    embeddings = embedding_model.get_embeddings(text)
    return [each.values for each in embeddings][0]

### Init HuggingFace embeddings

Here we are using `sentence-transformers/all-MiniLM-L6-v2` from HuggingFace. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

This embedding library benefit from running on GPU - enabled notebook

If possible - use VertexAI embeddings instead

In [11]:
!pip install -q sentence_transformers

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m87.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m83.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m35.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m106.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m84.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for sentence_transformers (setup.py) ... [?25l[?25hdone


In [12]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

def text_to_embedding(text):
  return model.encode(text).astype(np.float32).tobytes()

Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

## Build embeddings and Load Dataframe to Redis

In [13]:
# clear Redis database (optional)
redis.flushdb()

True

In [14]:
df["text_embedding"] = df["text"].progress_apply(text_to_embedding)
df.head()

  0%|          | 0/1000 [00:00<?, ?it/s]

Unnamed: 0,title,text,time,timestamp,id,text_embedding
0,Ultimate Guide of Twitter Tips & Tricks,Ultimate Guide of Twitter Tips &#38; Tricks,1269407340,2010-03-24 05:09:00+00:00,1215015,b'\n\x14\xfe\xbcn}\xde\xbc\xbfx\xbe<0\x85\x8b\...
1,Placeholder,Mind the gap.,1401561740,2014-05-31 18:42:20+00:00,2100665,b'\xc0\xb0\x1b\xbd:\xec\xd0\xbc\x94\xfe5\xbd$s...
2,Placeholder,Mind the gap.,1401561740,2014-05-31 18:42:20+00:00,4774206,b'\xc0\xb0\x1b\xbd:\xec\xd0\xbc\x94\xfe5\xbd$s...
3,Listen it u will be lost,for soul of ur body,1394988993,2014-03-16 16:56:33+00:00,7410144,b'%\xd2>\xbd\x00n\xf7<\x0cM\xc1\xbc\x18!\xdb\x...
4,Ejendomsmægler,Nice nice artikel thanks.,1369741942,2013-05-28 11:52:22+00:00,5779378,b'l^\x9e\xbd\n\xba\xbe<\x8ai\xc2<;\xe7\xea\xbc...


In [15]:
# load data from Dataframe to Redis HASH
load_dataframe(redis,df,key_prefix="google", id_column="id", pipe_size=100)

  0%|          | 0/1000 [00:00<?, ?it/s]

In [16]:
#!redis-cli $REDIS_CONN KEYS "*"

In [17]:
#retreive single HASH from Redis
redis.hgetall(f"google:{df.loc[1, 'id']}")

{b'time': b'1401561740',
 b'title': b'Placeholder',
 b'timestamp': b'2014-05-31 18:42:20+00:00',
 b'text': b'Mind the gap.',
 b'text_embedding': b'\xc0\xb0\x1b\xbd:\xec\xd0\xbc\x94\xfe5\xbd$s=\xbc\xc5c.==\x8b\x9e\xbb\x99{\xda<{\x9bQ=\xe1\xf9\xa9<\xa4\xd4\x0f\xbd\xfe\x1b#=\x01\xb3T\xbc\x90i\x86\xbd\x1a\xc9b\xbdL\x99\xf6=\x10N\xf2<\xef\xa7Z\xbd:Q\x81\xbdP\x98\x95\xbd\xe9\x9fd\xbc\x88`e\xbc\x00\xb4\xf9<\xddA)\xbc~6\x11\xbd\x8a\xa6\xc3\xbc\x8f\xb4Y=S \xe5<\x11\xde\xaa\xbb\xda44=\xa7\xa5\xdb;K\xb2\x1a\xbc9v\xfb<\x9d\x8c\x95\xbd\xefv\x8f\xbcb\xc6\xc2<Q\xb0\x81;\xc1\xde\xbf<\xc8\xe1\n>\xa0b\x00=\xe37O\xbd\xe5F\xd5\xbc\xb2o\x12\xbd\xa5\xa8\x93;5\x08\n=\xdbn\xe4\xbc%\x12\x91=Q\xffR\xbaOr*<\xf1\xd68\xbc\x8b\x8d\x97\xbd\x8d\xb3g\xbd*H\xd3\xbc\xe1~$\xbd\xb2\xef3=\x08\xbd`={m\xe5\xba\xec\xc5\xc0<\x99_\x0c<l-\xaf\xbc\xa0\xaaU;y4z\xbd\x01J\x91<tt\x04\xbe\x02\xa9#=\x9f\'\xc7\xbb\x17G\x8f\xbb\x91\x07r\xbc\xef7\xb9;\xa9\xeep\xbd\t\x96E=\xf2\xff|=\x96\xc7A=FlZ\xbd\x92b\x0c\xbd\xb0\x0fH=\\\xeaI=,\xa4@=Z\x

### Create Vector Index in Redis

In [18]:
# Dataframe field with Vector Embeddings
VECTOR_FIELD_NAME = "text_embedding"

# Embedding dimension
# - HuggingFace all-MiniLM-L6-v2 - 384
# - VertexAI textembedding-gecko@001 - 768
DIM = 384

INDEX_NAME = "google:idx"


In [19]:
create_redis_index(redis, vector_field_name = VECTOR_FIELD_NAME, idxname = INDEX_NAME, prefix = ["google:"], dim = DIM)

creating index


### Vector Similarity Search

In [20]:
#using Vector Similarity Index

user_query="facebook rant"

query_vector=text_to_embedding(user_query)
q = Query(f"*=>[KNN 10 @{VECTOR_FIELD_NAME} $vector AS result_score]")\
                .return_fields("result_score","text")\
                .dialect(2)\
                .sort_by("result_score", True)
res = redis.ft(INDEX_NAME).search(q, query_params={"vector": query_vector})
#print(res)
res_df = pd.DataFrame([t.__dict__ for t in res.docs ]).drop(columns=["payload"])
res_df

Unnamed: 0,id,result_score,text
0,google:7271663,0.726075947285,I can&#x27;t figure out how to make an &quot;A...
1,google:27663839,0.741255402565,So I want to find a bf for my bsf 11-13 but I ...
2,google:3870393,0.757247984409,Mashing up classic literature with popular twi...
3,google:631700,0.770403802395,"In the New Year, qeep continues to experience ..."
4,google:4691054,0.772511005402,"Beta promo code is ""hackernews"" and limited to..."
5,google:1738138,0.785915493965,"Ben Lerer, cofounder of Thrillist, talks about..."
6,google:1246706,0.792101979256,Do you feel more intimately/emotionally connec...
7,google:17636858,0.812070071697,Metronews24.com keeps you update with latest o...
8,google:2893042,0.818111002445,I know i might be a bit late with this news bu...
9,google:10182770,0.830830395222,Do you find these daily posts (https:&#x2F;&#x...


## Hello Palm!

In [21]:
generation_model = TextGenerationModel.from_pretrained("text-bison@001")

prompt = "What is a large language model?"

response = generation_model.predict(prompt=prompt)

print(response.text)

A large language model (LLM) is a type of artificial intelligence (AI) model that can understand and generate human language. LLMs are trained on massive datasets of text and code, and they can learn to perform a wide variety of tasks, such as translating languages, writing different kinds of creative content, and answering your questions in an informative way.

LLMs are still under development, but they have the potential to revolutionize many industries. For example, LLMs could be used to create more accurate and personalized customer service experiences, to help doctors diagnose and treat diseases, and to even write entire books and movies.




## Hello Chat!

In [22]:

chat_model = ChatModel.from_pretrained("chat-bison@001")

chat = chat_model.start_chat()

print(
    chat.send_message(
        """
Hello! Can you write a 300 word abstract for a research paper I need to write about the impact of generative AI on society?
"""
    )
)


print(
    chat.send_message(
        """
Could you give me a catchy title for the paper?
"""
    )
)

Generative AI (GAN) is a type of machine learning that uses artificial neural networks to create new, original content. This can include images, text, music, and even videos. GANs have the potential to revolutionize many industries, from healthcare to advertising. However, there are also concerns about the potential negative impacts of GANs, such as the creation of fake news and deepfakes.

In this paper, we explore the potential impact of GANs on society. We first discuss the benefits of GANs, such as their ability to create new and original content, to solve real-world problems, and to democratize creativity
Generative AI: The Future of Creativity
