# Pipeline 3: A RAG Pinecone vector store for Generative AI

Copyright 2024, Denis Rothman

**Pipeline Overview:**
1. Connect to existing Pinecone index (from Pipeline 2)
2. Query vector store with customer profiles
3. Retrieve similar customer records
4. Generate personalized retention emails using GPT-4o
5. Demonstrate production-scale RAG with 50K vectors

**Local Jupyter Setup:** Uses `.env` file for API keys

#Installing the environment

# Environment Setup

Required API keys in `.env` file:
```
OPENAI_API_KEY=sk-proj-...
PINECONE_API_KEY=pcsk_...
```

In [1]:
# Import required modules
import os
from dotenv import load_dotenv

# Load API keys from .env file
load_dotenv()

# Set API keys
os.environ['OPENAI_API_KEY'] = os.getenv("OPENAI_API_KEY")
os.environ['PINECONE_API_KEY'] = os.getenv("PINECONE_API_KEY")

# Verify API keys are loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in .env file")
if not os.getenv("PINECONE_API_KEY"):
    raise ValueError("PINECONE_API_KEY not found in .env file")

print("✓ Environment configured")
print(f"  OpenAI API Key: {os.getenv('OPENAI_API_KEY')[:10]}...")
print(f"  Pinecone API Key: {os.getenv('PINECONE_API_KEY')[:10]}...")

✓ Environment configured
  OpenAI API Key: sk-proj-lq...
  Pinecone API Key: pcsk_28h4H...


In [2]:
# API clients configured from Environment Setup
import openai
from pinecone import Pinecone, ServerlessSpec

openai.api_key = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")

print("✓ Ready to query Pinecone index")

✓ Ready to query Pinecone index


#  The Pinecone index

In [3]:
import os
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
pc = Pinecone(api_key=PINECONE_API_KEY)

In [4]:
from pinecone import ServerlessSpec

index_name = 'bank-index-50000'
cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'

spec = ServerlessSpec(cloud=cloud, region=region)

In [5]:
import time
import pinecone
# check if index already exists (it shouldn't if this is first time)
if index_name not in pc.list_indexes().names():
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of text-embedding-ada-002
        metric='cosine',
        spec=spec
    )
    # wait for index to be initialized
    time.sleep(1)

# connect to index
index = pc.Index(index_name)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'': {'vector_count': 50000}},
 'total_vector_count': 50000,
 'vector_type': 'dense'}

# RAG with GPT-4o

# Query the dataset

In [7]:
import openai

embedding_model = "text-embedding-3-small"

# Initialize the OpenAI client
client = openai.OpenAI()

def get_embedding(text, model=embedding_model):
    text = text.replace("\n", " ")
    response = client.embeddings.create(input=[text], model=model)
    embedding = response.data[0].embedding
    return embedding

## Querying a target vector

In [8]:
import time
start_time = time.time()  # Start timing before the request
# Target vector
query_text = "Customer Henderson CreditScore 599 Age 37Tenure 2Balance 0.0NumOfProducts 1HasCrCard 1IsActiveMember 1EstimatedSalary 107000.88Exited 1Complain 1Satisfaction Score 2Card Type DIAMONDPoint Earned 501"
query_embedding = get_embedding(query_text,model=embedding_model)

In [9]:
# Perform the query using the embedding
query_results = index.query(
    vector=query_embedding,
    include_metadata=True,
    top_k=1
)
# Print the query results along with metadata
print("Query Results:")
for match in query_results['matches']:
    print(f"ID: {match['id']}, Score: {match['score']}")
    if 'metadata' in match and 'text' in match['metadata']:
        print(f"Text: {match['metadata']['text']}")
    else:
        print("No metadata available.")

response_time = time.time() - start_time              # Measure response time
print(f"Querying response time: {response_time:.2f} seconds")  # Print response time

Query Results:
ID: 1690, Score: 0.855045319
Text: CustomerId: 15648064 CreditScore: 649 Age: 33 Tenure: 2 Balance: 0.0 NumOfProducts: 2 HasCrCard: 1 IsActiveMember: 0 EstimatedSalary: 2010.98 Exited: 0 Complain: 0 Satisfaction Score: 3 Card Type: DIAMOND Point Earned: 720
Querying response time: 7.55 seconds


## Extract Relevant Texts

In [10]:
relevant_texts = [match['metadata']['text'] for match in query_results['matches'] if 'metadata' in match and 'text' in match['metadata']]

# Join all items in the list into a single string separated by a specific delimiter (e.g., a newline or space)
combined_text = '\n'.join(relevant_texts)  # Using newline as a separator for readability
print(combined_text)

CustomerId: 15648064 CreditScore: 649 Age: 33 Tenure: 2 Balance: 0.0 NumOfProducts: 2 HasCrCard: 1 IsActiveMember: 0 EstimatedSalary: 2010.98 Exited: 0 Complain: 0 Satisfaction Score: 3 Card Type: DIAMOND Point Earned: 720


## Augmented prompt

In [14]:
# Combine texts into a single string, separated by new lines
combined_context = "\n".join(relevant_texts)
#prompt
query_prompt="I have this customer bank record with interesting information on age, credit score and more and similar customers. What could I suggest to keep them in my bank in an email with an url to get new advantages based on the fields for each Customer ID:\n"
itext=query_prompt+ query_text+combined_context
# Augmented input
print("Prompt for the Generative AI model:\n", itext)

Prompt for the Generative AI model:
 I have this customer bank record with interesting information on age, credit score and more and similar customers. What could I suggest to keep them in my bank in an email with an url to get new advantages based on the fields for each Customer ID:
Customer Henderson CreditScore 599 Age 37Tenure 2Balance 0.0NumOfProducts 1HasCrCard 1IsActiveMember 1EstimatedSalary 107000.88Exited 1Complain 1Satisfaction Score 2Card Type DIAMONDPoint Earned 501CustomerId: 15648064 CreditScore: 649 Age: 33 Tenure: 2 Balance: 0.0 NumOfProducts: 2 HasCrCard: 1 IsActiveMember: 0 EstimatedSalary: 2010.98 Exited: 0 Complain: 0 Satisfaction Score: 3 Card Type: DIAMOND Point Earned: 720


## Augmented generation

In [19]:
import openai
from IPython.display import display, Markdown

gpt_model = "gpt-4o"

import time
start_time = time.time()  # Start timing before the request

response = client.chat.completions.create(
  model=gpt_model,
  messages=[
    {
      "role": "system",
      "content": "You are the community manager can write engaging email based on the text you have. Do not use a surname but simply Dear Valued Customer instead."
    },
    {
      "role": "user",
      "content": itext
    }
  ],
  temperature=0,
  max_tokens=300,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

# Get the response content
email_content = response.choices[0].message.content

# Display in markdown format for better readability
print("="*70)
print("GENERATED RETENTION EMAIL (Markdown Format)")
print("="*70)
print()
display(Markdown(email_content))

response_time = time.time() - start_time              # Measure response time
print()
print("="*70)
print(f"✓ Email generated in {response_time:.2f} seconds")
print(f"✓ Model: {gpt_model}")
print(f"✓ Tokens used: {response.usage.total_tokens}")
print("="*70)

GENERATED RETENTION EMAIL (Markdown Format)



Subject: Unlock Exclusive Benefits with Our New Customer Loyalty Program!

Dear Valued Customer,

We hope this message finds you well. At [Bank Name], we are constantly striving to enhance your banking experience and provide you with the best services tailored to your needs. We noticed that you are a valued member of our community, and we want to ensure you continue to enjoy the benefits of banking with us.

Based on your profile, we have some exciting opportunities that we believe will be of great interest to you:

1. **Exclusive Rewards Program**: As a DIAMOND cardholder, you can now earn even more points on your everyday transactions. Redeem these points for exciting rewards, travel perks, and more.

2. **Personalized Financial Advice**: Our team of financial experts is here to help you make the most of your finances. Whether you're looking to save more, invest wisely, or plan for the future, we have the resources to guide you.

3. **Enhanced Digital Banking Features**: Enjoy seamless banking with our upgraded mobile app and online services. Manage your accounts, pay bills, and transfer funds with ease, anytime and anywhere.

4. **Special Offers on Products and Services**: As a token of our appreciation, we are offering exclusive discounts and offers on a range of products and services. Don't miss out on these limited-time deals!

To explore these new advantages and more, please visit our dedicated page: [Insert URL Here]

We value your feedback and are committed to ensuring


✓ Email generated in 5.38 seconds
✓ Model: gpt-4o
✓ Tokens used: 531


In [17]:
print(query_text)

Customer Henderson CreditScore 599 Age 37Tenure 2Balance 0.0NumOfProducts 1HasCrCard 1IsActiveMember 1EstimatedSalary 107000.88Exited 1Complain 1Satisfaction Score 2Card Type DIAMONDPoint Earned 501
