# Environment setup

In [None]:
# Install necessary libraries
!pip install -q openai langchain langchain-openai langchain-community openai-whisper sentence-transformers pdf2image
!apt-get install poppler-utils
!pip install --upgrade Pillow

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
poppler-utils is already the newest version (22.02.0-2ubuntu0.5).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


In [None]:
%cd /content/drive/MyDrive/GenAI/RAG/CAPSTONE PROJECT - MultiModal Data

/content/drive/MyDrive/GenAI/RAG/CAPSTONE PROJECT - MultiModal Data


In [None]:
from google.colab import userdata
api_key = userdata.get('genai_course')

In [None]:
# Import libraries
from langchain_openai import ChatOpenAI
from openai import OpenAI
from IPython.display import display, Markdown
from sentence_transformers import SentenceTransformer
import whisper
import pandas as pd
import base64
from pdf2image import convert_from_path
from PIL import Image
from sklearn.metrics.pairwise import cosine_similarity
import os
import torch

# Audio Transcription

In [None]:
# Check if a CUDA GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cpu


In [None]:
# Transcribe an audio file using Whisper
model = whisper.load_model("base", device=device)
input_file = "starbucks-q3.mp3"
result = model.transcribe(input_file)
transcription_text = result['text']



In [None]:
# Check transcription output
print("Transcription Output:")
print(transcription_text)

Transcription Output:
 2024. And with that, I'll now tell Nicole over to Luxembourg. Thank you Tiffany and thank you for joining us this afternoon. Let me start by laying out our results for the squatter. Our Q3 total company revenue was $9.1 billion up 1% year over year and 6% over Q2. Our global comparable store sales declined 3% year over year driven by a negative 2% Comcro in North America and a negative 14% Comcro in China and partially offset by strong performance in Japan. Our global operating margins contracted by 70 basis points to 16.7% and overall earnings per share for the quarter was 93 cents. Our total company results were in line with guidance but international performance particularly in China was challenged. We are not satisfied with the results but our actions are making an impact. Leading business and operational indicators are trending in the right direction ahead of our financial results and our runway for improvement is long. We see green shoots in our US business

In [None]:
# Save the transcription to a text file
os.makedirs("transcript", exist_ok=True)
with open("transcript/transcription.txt", "w") as f:
    f.write(transcription_text)

# Embed Transcription

In [None]:
# Load transcription text and split into chunks of 100 characters
with open('transcript/transcription.txt', 'r', encoding='utf-8') as file:
    text = file.read()

audio_chunks = [text[i:i+100] for i in range(0, len(text), 100)]

# Print chunks to verify splitting
print(f"\nNumber of Chunks: {len(audio_chunks)}")
print("First few chunks:", audio_chunks[:3])


Number of Chunks: 142
First few chunks: [" 2024. And with that, I'll now tell Nicole over to Luxembourg. Thank you Tiffany and thank you for j", 'oining us this afternoon. Let me start by laying out our results for the squatter. Our Q3 total comp', 'any revenue was $9.1 billion up 1% year over year and 6% over Q2. Our global comparable store sales ']


In [None]:
# Load Sentence Transformer model and embed the text chunks
model = SentenceTransformer('clip-ViT-B-32')
audio_embeddings = model.encode(audio_chunks)

# Check shape of embeddings
print(f"\nAudio Embeddings Shape: {audio_embeddings.shape}")




Audio Embeddings Shape: (142, 512)


# Convert PDF to Images

In [None]:
# Convert each page of a PDF into images
pdf_path = '3Q24-Earnings-Release.pdf'
output_folder = 'images'
os.makedirs(output_folder, exist_ok=True)

In [None]:
images = convert_from_path(pdf_path)
image_paths = []

for i, image in enumerate(images):
    image_path = f'images/page_{i + 1}.png'
    image.save(image_path, 'PNG')
    image_paths.append(image_path)

print(f"PDF pages have been converted to images and saved in '{output_folder}'")
print(f"\nNumber of Images: {len(image_paths)}")
print(f"Image Paths: {image_paths[:3]}")  # Print first 3 image paths to check

PDF pages have been converted to images and saved in 'images'

Number of Images: 17
Image Paths: ['images/page_1.png', 'images/page_2.png', 'images/page_3.png']


# Embed Images

In [None]:
# Load the CLIP model for embedding images
image_model = SentenceTransformer('clip-ViT-B-32')
image_embeddings = []

for filename in os.listdir(output_folder):
    if filename.endswith('.png'):
        image_path = os.path.join(output_folder, filename)
        image = Image.open(image_path)
        embedding = image_model.encode(image)
        image_embeddings.append(embedding)



In [None]:
# Check image embeddings
print(f"\nNumber of Image Embeddings: {len(image_embeddings)}")
print(f"First Image Embedding Shape: {image_embeddings[0].shape}")


Number of Image Embeddings: 17
First Image Embedding Shape: (512,)


# Similarity Retrieval

In [None]:
# Define a query and embed it
query = "How is the company doing financially?"
query_embeddings = model.encode(query)

In [None]:
# Print query embedding shape
print(f"\nQuery Embedding Shape: {query_embeddings.shape}")


Query Embedding Shape: (512,)


In [None]:
# Compute similarity with transcription embeddings
audio_similarities = cosine_similarity([query_embeddings], audio_embeddings)[0]
top_k_audio_indices = audio_similarities.argsort()[-20:][::-1]

In [None]:
# Print top 5 similar audio chunks
print("\nTop 5 Most Similar Audio Chunks (Indices):", top_k_audio_indices[:5])
print("Top 5 Audio Similarities:", audio_similarities[top_k_audio_indices[:5]])


Top 5 Most Similar Audio Chunks (Indices): [125 103  93  13  83]
Top 5 Audio Similarities: [0.90979403 0.9083904  0.90681845 0.90532017 0.9050327 ]


In [None]:
# Compute cosine similarity between the query embedding and each image embedding
image_similarities = [(idx, cosine_similarity(query_embeddings.reshape(1, -1), embed.reshape(1, -1))[0][0])
                      for idx, embed in enumerate(image_embeddings)]

In [None]:
# Sort by similarity and get top 2 most similar images
top_2_images = sorted(image_similarities, key=lambda x: x[1], reverse=True)[:2]

In [None]:
# Print top 2 similar images
print("\nTop 2 Most Similar Images (Indices):", [i[0] for i in top_2_images])
print("Top 2 Image Similarities:", [i[1] for i in top_2_images])


Top 2 Most Similar Images (Indices): [8, 9]
Top 2 Image Similarities: [0.26850927, 0.26751676]


# Prepare Context

In [None]:
# Combine the top transcription chunks into a single text context
text_context = ' '.join([audio_chunks[idx] for idx in top_k_audio_indices])

In [None]:
# Print combined text context
print(f"\nText Context (First 300 characters): {text_context[:300]}...")


Text Context (First 300 characters):  one of our most notable international challenges and an area I'd like to talk about in more detail. raging our brand and our ability to intercept customers while demonstrating value, not just in price  growth of the program because the average active member spends materially more annually and drive...


In [None]:
# Convert the top 2 images to base64 format
base64frames = []
for idx, _ in top_2_images:
    image_path = image_paths[idx]
    with open(image_path, "rb") as img_file:
        base64frames.append(base64.b64encode(img_file.read()).decode('utf-8'))

In [None]:
# Print base64 encoding of first image
print(f"\nBase64 of First Image (First 100 characters): {base64frames[0][:100]}...")


Base64 of First Image (First 100 characters): iVBORw0KGgoAAAANSUhEUgAABqQAAAiYCAIAAAA+NVHkAAEAAElEQVR4nOzddVxUWRsH8Gdm6C7pUFDBQDBRDBBFsbu7u3Vd27W7...


# Generate Answer

In [None]:
# Define a system prompt for OpenAI API
client = OpenAI(api_key = api_key)
system_prompt = """
You are a financial adviser. Answer based on the provided financial data.
"""

In [None]:
# Prepare the list of images
image_data_list = [{"type": "image_url",
                    "image_url": {"url": f'data:image/png;base64,{img}', "detail": "high"}} for img in base64frames]

# Prepare message content with text and images
user_message_content = [
    {"type": "text", "text": text_context},
    image_data_list
]


In [None]:
# Prepare message content with images and text
user_message_content = [
    {"type": "text", "text": text_context},
    [{"type": "image_url", "image_url": {"url": f'data:image/png;base64,{img}', "detail": "high"}} for img in base64frames]
]

In [None]:
# Generate response from OpenAI
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message_content}
    ],
    temperature=0.3
)

In [None]:
# Display the generated response
display(Markdown(response.choices[0].message.content))

Based on the financial data provided, here are some key insights and considerations for your business strategy:

### Current Performance Overview
1. **Revenue Trends**:
   - North America shows a slight increase in total net revenues from $6,737.8 million to $6,816.7 million (1.2% growth).
   - International revenues decreased by 6.6% in the most recent quarter, indicating potential challenges in that market.

2. **Operating Income**:
   - North America’s operating income has decreased slightly by 2.1%, while the international segment saw a significant drop of 23.2%. This suggests that cost management is critical, especially in international operations.

3. **Cost Management**:
   - Store operating expenses as a percentage of revenues have improved in North America (from 51.0% to 49.2%) but worsened in the international segment (from 50.5% to 47.9%). This indicates better efficiency in North America but highlights a need for improved cost control in international markets.

### Strategic Focus Areas
1. **Customer Engagement**:
   - Emphasize enhancing customer experience and engagement, particularly in North America where the average active member spends significantly more. Consider loyalty programs or exclusive offers to drive higher spending.

2. **Product Innovation**:
   - Accelerate the introduction of new products that align with customer preferences. The focus on integrating exciting products with relevant marketing can help capture new customers and retain existing ones.

3. **Market Expansion**:
   - While facing challenges in international markets, identify specific regions or demographics that show potential for growth. Tailoring strategies to local preferences may improve performance.

4. **Cost Efficiency**:
   - Continue to streamline operations and reduce costs, particularly in the international segment. This could involve reviewing supply chain efficiencies and administrative expenses.

5. **Long-term Opportunities**:
   - Focus on building a sustainable business model that can withstand short-term market fluctuations. This includes investing in technology and innovation to enhance productivity and customer engagement.

### Conclusion
While the North American segment shows resilience, the international market presents challenges that need to be addressed through targeted strategies. By focusing on customer engagement, product innovation, and cost efficiency, the business can position itself for long-term growth and stability.