<div id="singlestore-header" style="display: flex; background-color: rgba(209, 153, 255, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/vector-circle.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">LLMs</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Large-Context Prompting with Gemini 1.5 Pro
        </h1>
    </div>
</div>

In [12]:
import os
os.environ["GEMINI_API_KEY"]= 'Key'

## Setup
Let's first download the libraries necessary.

In [13]:
!pip install PyPDF2



## Text input into Gemini
We'll be prompting Gemini with multiple modalities. Let's start with text:

In [14]:
import requests
from io import BytesIO
from PyPDF2 import PdfReader

# Function to load content from a PDF

In [15]:
def load_pdf_from_url(url):
    """
    Reads the text content from a PDF file at a specified URL and returns it as a single string.

    Parameters:
    - url (str): The URL to the PDF file.

    Returns:
    - str: The concatenated text content of all pages in the PDF.

    Raises:
    - requests.exceptions.RequestException: If the request to the URL fails.
    - PyPDF2.utils.PdfReadError: If the PDF file is encrypted or malformed.

    Example:
    >>> pdf_text = load_pdf_from_url("https://example.com/example.pdf")
    >>> print(pdf_text)
    "This is the text content extracted from the PDF file."
    """
    # Fetch the PDF content from the URL
    response = requests.get(url)
    response.raise_for_status()  # Ensure that the request was successful

    # Create a file-like object from the downloaded PDF content
    pdf_file = BytesIO(response.content)

    # Read the PDF using PyPDF2
    reader = PdfReader(pdf_file)
    text = ""
    for page in reader.pages:
        text += page.extract_text() or ""

    return text

# Example usage
link = "https://github.com/saurabhgssingh/RAG/blob/f1aefa5e969a559e88d9a4f03a12d1504e5580c4/state_of_the_union.pdf"
pdf_url = (link.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/"))
#print(pdf_url)
pdf_text = load_pdf_from_url(pdf_url)
print(pdf_text)


Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress 
and the Cabinet. Justices of the Supreme Court. My fellow Americans.  
Last year COVID -19 kept us apart. This year we are finally together again. Tonight, we meet as 
Democrats, Republicans, and Independents. But most importantly as Americans. With a duty to one 
another, to the American people, to the Constitution. And with an  unwavering resolve that freedom will 
always triumph over tyranny.  
 
Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world, thinking he could 
make it bend to his menacing ways. But he badly miscalculated. He thought he could roll into Ukraine 
and the world would roll over. Instead, he met a  wall of strength he never imagined. He met the 
Ukrainian people. From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their 
determination, inspires the world.  
 
Groups of citizens blocking tanks with their bodie

# Few-shot Learning with 10 SST2 examples

In [16]:
# A few-shot learning prompt for sentiment analysis, modeled after the SST-2 (Stanford Sentiment Treebank 2) dataset structure. 
# The format consists of the sentence followed by its sentiment label.

prompt = '''Task: Sentiment Analysis
Given a sentence, classify its sentiment as either positive or negative.

Examples:

Example 1:
Sentence: "The movie was a fantastic experience!"
Sentiment: Positive

Example 2:
Sentence: "I really enjoyed the storyline and the characters."
Sentiment: Positive

Example 3:
Sentence: "This was a waste of my time."
Sentiment: Negative

Example 4:
Sentence: "I wouldn't recommend this to anyone."
Sentiment: Negative

Example 5:
Sentence: "The plot was intriguing and kept me engaged."
Sentiment: Positive

Example 6:
Sentence: "The acting was terrible and the script was even worse."
Sentiment: Negative

Example 7:
Sentence: "An absolute masterpiece that I will remember for a long time."
Sentiment: Positive

Example 8:
Sentence: "I was bored throughout the entire film."
Sentiment: Negative

Example 9:
Sentence: "The special effects were stunning and added a lot to the movie."
Sentiment: Positive

Example 10:
Sentence: "It had potential but ended up being disappointing."
Sentiment: Negative

Now, analyze the sentiment of the following sentences:

Sentence: "The visuals were impressive, but the story was lackluster."
Sentiment: 

Sentence: "A brilliant piece of cinema that left me speechless."
Sentiment: 

Sentence: "The film was too long and uninteresting."
Sentiment: 

Sentence: "An engaging and well-crafted narrative."
Sentiment: 
'''

# Generating 10K samples for Gemini's Large-Context Prompt

In [17]:
# Code to generate the samples dynamically
# Refer Many-Shot In-Context Learning (https://arxiv.org/abs/2404.11018)
# Paper review: https://aman.ai/papers/#many-shot-in-context-learning

import random

# Define a list of 100 positive and negative sentiments
positive_sentiments = [
    "The movie was a fantastic experience!",
    "I really enjoyed the storyline and the characters.",
    "The plot was intriguing and kept me engaged.",
    "An absolute masterpiece that I will remember for a long time.",
    "The special effects were stunning and added a lot to the movie.",
    "A brilliant piece of cinema that left me speechless.",
    "An engaging and well-crafted narrative.",
    "A thoroughly enjoyable and entertaining movie.",
    "The direction and performances were top-notch.",
    "A must-watch for any movie lover.",
    "The film was a visual treat with great performances.",
    "A heartwarming story that resonated with me.",
    "Excellent cinematography and a captivating plot.",
    "A great blend of action, drama, and humor.",
    "The soundtrack added so much to the experience.",
    "A moving and beautifully told story.",
    "The actors delivered outstanding performances.",
    "The film exceeded all my expectations.",
    "A beautifully crafted piece of cinema.",
    "A gripping and intense film from start to finish.",
    "A wonderful depiction of the story with great depth.",
    "A perfect mix of suspense and emotion.",
    "A refreshing take on a well-known genre.",
    "The film's pacing was perfect, keeping me engaged.",
    "A delightful movie with a lot of heart.",
    "The plot twists were unexpected and thrilling.",
    "The movie was both entertaining and thought-provoking.",
    "A visually stunning and emotionally rich film.",
    "An inspiring and uplifting story.",
    "The character development was exceptional.",
    "A beautiful portrayal of human emotions.",
    "A film that keeps you on the edge of your seat.",
    "A masterful performance by the lead actor.",
    "A heartfelt and sincere movie experience.",
    "The film had a perfect blend of humor and drama.",
    "A touching story that left a lasting impression.",
    "A brilliant adaptation of the original story.",
    "A film that I would gladly watch again.",
    "An epic tale told with great skill and passion.",
    "The dialogue was sharp and witty.",
    "A magical journey from start to finish.",
    "The film had a powerful and meaningful message.",
    "A must-see for fans of the genre.",
    "The director's vision was executed perfectly.",
    "A poignant and emotional story.",
    "The film's realism was both shocking and beautiful.",
    "A deeply moving and thought-provoking film.",
    "An incredible journey that I enjoyed thoroughly.",
    "The cinematography was breathtaking.",
    "A compelling narrative that kept me hooked.",
    "An absolute delight for the senses.",
    "The movie's themes were explored beautifully.",
    "A stunning achievement in filmmaking.",
    "A film that touched my heart.",
    "A timeless story told in a unique way.",
    "The chemistry between the leads was fantastic.",
    "A film that is both entertaining and profound.",
    "A remarkable and unforgettable movie.",
    "The visual effects were top-notch.",
    "A deeply engaging and rewarding experience.",
    "An outstanding piece of storytelling.",
    "A film that offers both thrills and heart.",
    "A touching and beautifully crafted movie.",
    "The movie's message was powerful and clear.",
    "An excellent example of its genre.",
    "A captivating and emotional rollercoaster.",
    "The film was executed with great precision.",
    "An unforgettable and moving film.",
    "A powerful story told with great sensitivity.",
    "A film that is both smart and entertaining.",
    "The movie had a lot of heart and soul.",
    "A masterclass in acting and direction.",
    "A beautifully written and directed film.",
    "The film's emotional impact was profound.",
    "A perfect film in every way.",
    "The movie was a beautiful experience.",
    "An inspiring tale of resilience and hope.",
    "A film that was both entertaining and insightful.",
    "A rich and textured story.",
    "An unforgettable cinematic experience.",
    "The film's attention to detail was remarkable.",
    "A touching and heartfelt story.",
    "A film that left me with a smile.",
    "The movie was a joy to watch.",
    "An impressive and captivating film.",
    "A story that was both unique and universal.",
    "A film that will stay with me for a long time.",
    "A powerful and emotional journey.",
    "The movie was a perfect blend of art and entertainment.",
    "A thoroughly enjoyable film experience.",
    "A movie that was both engaging and thought-provoking.",
    "An expertly crafted film.",
    "A deeply affecting and beautifully told story.",
    "A film that was as entertaining as it was moving.",
    "An excellent film that I highly recommend.",
    "A beautiful and inspiring story."
]

negative_sentiments = [
    "This was a waste of my time.",
    "I wouldn't recommend this to anyone.",
    "The acting was terrible and the script was even worse.",
    "I was bored throughout the entire film.",
    "It had potential but ended up being disappointing.",
    "The film was too long and uninteresting.",
    "The story was predictable and unoriginal.",
    "I found the movie to be extremely dull.",
    "The characters were poorly developed.",
    "An overhyped movie that didn't live up to expectations.",
    "The plot was a mess and hard to follow.",
    "The special effects couldn't save the bad script.",
    "The movie lacked any real substance.",
    "A cliched and uninspired film.",
    "The performances were wooden and unconvincing.",
    "The film felt disjointed and poorly paced.",
    "A forgettable and bland movie.",
    "The dialogue was cringe-worthy.",
    "A poorly executed film with little to offer.",
    "I regret watching this movie.",
    "The direction was amateurish.",
    "The film was a complete letdown.",
    "A boring and uneventful movie.",
    "The movie failed to keep my interest.",
    "A lackluster and uninspired film.",
    "The story was weak and unengaging.",
    "The film's pacing was all over the place.",
    "A dull and lifeless movie.",
    "The characters were one-dimensional and boring.",
    "The film was poorly written and directed.",
    "A tedious and monotonous movie.",
    "The film's plot was full of holes.",
    "A movie that was a chore to sit through.",
    "The acting was subpar and unconvincing.",
    "The film lacked any real excitement.",
    "A movie that fell flat in every way.",
    "The ending was unsatisfying and abrupt.",
    "The film was poorly edited.",
    "A movie that tried too hard and failed.",
    "The plot was convoluted and confusing.",
    "A disappointing and forgettable film.",
    "The movie was overly long and boring.",
    "A film that lacked any real emotion.",
    "The performances were lackluster.",
    "A movie that was all style and no substance.",
    "The film was a major disappointment.",
    "A poorly acted and directed movie.",
    "The story was unoriginal and boring.",
    "A film that was difficult to get through.",
    "The movie had no real plot.",
    "A film that was completely uninteresting.",
    "The direction was sloppy.",
    "The movie was a letdown in every way.",
    "The film lacked any real tension.",
    "A boring and uninspired movie.",
    "The movie was a disaster from start to finish.",
    "The plot was dull and unengaging.",
    "A film that was hard to sit through.",
    "The acting was terrible and unconvincing.",
    "The film was a total waste of time.",
    "The movie was boring and lifeless.",
    "A film that failed to deliver.",
    "The plot was weak and unoriginal.",
    "A movie that was difficult to watch.",
    "The film had no redeeming qualities.",
    "A dull and uninspired movie.",
    "The movie was a mess.",
    "The film was poorly made.",
    "A boring and forgettable movie.",
    "The acting was bad and the story was worse.",
    "A film that was a complete waste of time.",
    "The movie was uninteresting and dull.",
    "A poorly executed and boring film.",
    "The story was predictable and boring.",
    "A film that was lacking in every way.",
    "The movie was a complete failure.",
    "The film was a major disappointment.",
    "A movie that was boring and unoriginal.",
    "The plot was uninteresting and dull.",
    "A film that was a waste of time.",
    "The movie was poorly directed and acted.",
    "A boring and uninspired film.",
    "The movie was a complete letdown.",
    "The film was a chore to watch.",
    "A movie that was devoid of any real emotion.",
    "The film was a total bore.",
    "The story was dull and predictable.",
    "A movie that was poorly made and uninteresting.",
    "The film was a waste of my time.",
    "A boring and lifeless movie.",
    "The movie was a total disappointment.",
    "I didn't like it.",
    "Nothing to see here.",
    "See something else.",
    "Bad movie."
]

# Generate 10,000 samples
samples = []
for i in range(10000):
    if random.random() < 0.5:
        sentiment = "Positive"
        sentence = random.choice(positive_sentiments)
    else:
        sentiment = "Negative"
        sentence = random.choice(negative_sentiments)
    samples.append(f"Example {i+1}:\nSentence: \"{sentence}\"")

## Installing Vertex AI and Generative AI 

In [18]:
!pip install google-generativeai
!pip install vertexai



## Gemini Text Output

In [19]:
import google.generativeai as genai
def generate_response(model, prompt):
    gemini_api_key = os.getenv("GEMINI_API_KEY")
    if not gemini_api_key:
        raise ValueError("Gemini API Key not provided. Please provide GEMINI_API_KEY as an environment variable")
    genai.configure(api_key=gemini_api_key)
    answer = model.generate_content([prompt])
    return answer.text

In [None]:
model = genai.GenerativeModel('gemini-1.5-pro-latest')
print(generate_response(model, prompt))

# Additional code to experiment with multimodal prompting and RAG to Gemini
## The below sections contain extra code for the reader to try out prompting with modalities beyond text and RAG.

# Image Input to Gemini

In [None]:
!curl -o image.jpg https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg
!curl -O https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  349k  100  349k    0     0   632k      0 --:--:-- --:--:-- --:--:--  647k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4835k  100 4835k    0     0  5474k      0 --:--:-- --:--:-- --:--:-- 5501kk


In [None]:

sample_file = genai.upload_file(path="image.jpg", display_name="Sample drawing")
print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")
file = genai.get_file(name=sample_file.name)
print(f"Retrieved file '{file.display_name}' as: {sample_file.uri}")
# Set the model to Gemini 1.5 Pro.
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")
response = model.generate_content(["Describe the image with a creative description.", sample_file])
print(response.text, end="")
# Markdown(">" + response.text)
genai.delete_file(sample_file.name)
print(f'Deleted {sample_file.display_name}.')

Video Input to Gemini

In [None]:
video_file_name = "https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4"

### Feeding in Video to Gemini

In [None]:
import cv2
import os
import shutil

# Create or cleanup existing extracted image frames directory.
FRAME_EXTRACTION_DIRECTORY = "content/frames"
FRAME_PREFIX = "_frame"
def create_frame_output_dir(output_dir):
  if not os.path.exists(output_dir):
    os.makedirs(output_dir)
  else:
    shutil.rmtree(output_dir)
    os.makedirs(output_dir)

def extract_frame_from_video(video_file_path):
  print(f"Extracting {video_file_path} at 1 frame per second. This might take a bit...")
  create_frame_output_dir(FRAME_EXTRACTION_DIRECTORY)
  vidcap = cv2.VideoCapture(video_file_path)
  fps = vidcap.get(cv2.CAP_PROP_FPS)
  print(fps)
  frame_duration = 1 / fps  # Time interval between frames (in seconds)
  output_file_prefix = os.path.basename(video_file_path).replace('.', '_')
  frame_count = 0
  count = 0
  while vidcap.isOpened():
      success, frame = vidcap.read()
      if not success: # End of video
          break
      if int(count / fps) == frame_count: # Extract a frame every second
          min = frame_count // 60
          sec = frame_count % 60
          time_string = f"{min:02d}:{sec:02d}"
          image_name = f"{output_file_prefix}{FRAME_PREFIX}{time_string}.jpg"
          output_filename = os.path.join(FRAME_EXTRACTION_DIRECTORY, image_name)
          cv2.imwrite(output_filename, frame)
          frame_count += 1
      count += 1
  vidcap.release() # Release the capture object\n",
  print(f"Completed video frame extraction!\n\nExtracted: {frame_count} frames")

extract_frame_from_video(video_file_name)

Extracting https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4 at 1 frame per second. This might take a bit...
24.0
Completed video frame extraction!

Extracted: 597 frames


In [None]:
import os

class File:
  def __init__(self, file_path: str, display_name: str = None):
    self.file_path = file_path
    if display_name:
      self.display_name = display_name
    self.timestamp = get_timestamp(file_path)

  def set_file_response(self, response):
    self.response = response

def get_timestamp(filename):
  """Extracts the frame count (as an integer) from a filename with the format
     'output_file_prefix_frame00:00.jpg'.
  """
  parts = filename.split(FRAME_PREFIX)
  if len(parts) != 2:
      return None  # Indicates the filename might be incorrectly formatted
  return parts[1].split('.')[0]

# Process each frame in the output directory
files = os.listdir(FRAME_EXTRACTION_DIRECTORY)
files = sorted(files)
files_to_upload = []
for file in files:
  files_to_upload.append(
      File(file_path=os.path.join(FRAME_EXTRACTION_DIRECTORY, file)))

# Upload the files to the API
# Only upload a 10 second slice of files to reduce upload time.
# Change full_video to True to upload the whole video.
full_video = False

uploaded_files = []
print(f'Uploading {len(files_to_upload) if full_video else 10} files. This might take a bit...')

for file in files_to_upload if full_video else files_to_upload[40:50]:
  print(f'Uploading: {file.file_path}...')
  response = genai.upload_file(path=file.file_path)
  file.set_file_response(response)
  uploaded_files.append(file)

print(f"Completed file uploads!\n\nUploaded: {len(uploaded_files)} files")

In [None]:
## List files uploaded in the API
for n, f in zip(range(len(uploaded_files)), genai.list_files()):
 print(f.uri)

## Multimodal input to Gemini: Text and Video

In [None]:
## Create the prompt.
prompt = "Summarize the above text and then describe this video."
prompt = pdf_text + prompt

# Set the model to Gemini 1.5 Pro.
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

# Make GenerateContent request with the structure described above.
def make_request(prompt, files):
  request = [prompt]
  for file in files:
    request.append(file.timestamp)
    request.append(file.response)
  return request

# Make the LLM request.
#request = make_request([pdf_text, prompt], uploaded_files)
request = make_request(prompt, uploaded_files)
response = model.generate_content(request,
                                  request_options={"timeout": 600})
print(response.text)

In [None]:
print(f'Deleting {len(uploaded_files)} images. This might take a bit...')
for file in uploaded_files:
 genai.delete_file(file.response.name)
 print(f'Deleted {file.file_path} at URI {file.response.uri}')
print(f"Completed deleting files!\n\nDeleted: {len(uploaded_files)} files")

# Retrieval Augmented Generation

In [None]:
!pip install wget --quiet
!pip install openai==1.3.3 --quiet

In [None]:
import json
import os
import pandas as pd
import wget

In [None]:
## Import the library for vectorizing the data (Up to 2 minutes)
!pip install sentence-transformers --quiet


In [None]:

from sentence_transformers import SentenceTransformer

modelRAG = SentenceTransformer('flax-sentence-embeddings/all_datasets_v3_mpnet-base')

In [None]:
## download reviews csv file
cvs_file_path = 'https://raw.githubusercontent.com/openai/openai-cookbook/main/examples/data/AG_news_samples.csv'
file_path = 'AG_news_samples.csv'

if not os.path.exists(file_path):
    wget.download(cvs_file_path, file_path)
    print('File downloaded successfully.')
else:
    print('File already exists in the local file system.')

In [None]:
df = pd.read_csv('AG_news_samples.csv')
df

In [None]:
data = df.to_dict(orient='records')
data[0]

In [None]:
shared_tier_check = %sql show variables like 'is_shared_tier'
if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
    %sql DROP DATABASE IF EXISTS news;
    %sql CREATE DATABASE news;

In [None]:
%%sql
DROP TABLE IF EXISTS news_articles;
CREATE TABLE IF NOT EXISTS news_articles (
    title TEXT,
    description TEXT,
    genre TEXT,
    embedding BLOB,
    FULLTEXT (title, description)
);

In [None]:
# Will take around 3.5 minutes to get embeddings for all 2000 rows

descriptions = [row['description'] for row in data]
all_embeddings = modelRAG.encode(descriptions)
all_embeddings.shape

In [None]:
for row, embedding in zip(data, all_embeddings):
    row['embedding'] = embedding

In [None]:
data[0]

In [None]:
%sql TRUNCATE TABLE news_articles;

import sqlalchemy as sa
from singlestoredb import create_engine

# Use create_table from singlestoredb since it uses the notebook connection URL
conn = create_engine().connect()

statement = sa.text('''
        INSERT INTO news_articles (
            title,
            description,
            genre,
            embedding
        )
        VALUES (
            :title,
            :description,
            :label,
            :embedding
        )
    ''')

conn.execute(statement, data)

In [None]:
search_query = 'Articles about Aussie captures'
search_query = 'Aussie'
search_embedding = modelRAG.encode(search_query)

query_statement = sa.text('''
    SELECT
        title,
        description,
        genre,
        DOT_PRODUCT(embedding, :embedding) AS score
    FROM news_articles
    ORDER BY score DESC
    LIMIT 10
    ''')


# Execute the SQL statement.
results = pd.DataFrame(conn.execute(query_statement, dict(embedding=search_embedding)))
print(results)

In [None]:
# Hybrid search for "Articles about Aussie captures"
hyb_query = 'Articles about Aussie captures'
hyb_embedding = modelRAG.encode(hyb_query)

# Create the SQL statement.
hyb_statement = sa.text('''
    SELECT
        title,
        description,
        genre,
        DOT_PRODUCT(embedding, :embedding) AS semantic_score,
        MATCH(title, description) AGAINST (:query) AS keyword_score,
        (semantic_score + keyword_score) / 2 AS combined_score
    FROM news_articles
    ORDER BY combined_score DESC
    LIMIT 10
    ''')

# Execute the SQL statement.
hyb_results = pd.DataFrame(conn.execute(hyb_statement, dict(embedding=hyb_embedding, query=hyb_query)))
hyb_results

In [None]:
import google.generativeai as genai
def generate_response(modelRAG, prompt):
    gemini_api_key = os.getenv("GEMINI_API_KEY")
    if not gemini_api_key:
        raise ValueError("Gemini API Key not provided. Please provide GEMINI_API_KEY as an environment variable")
    genai.configure(api_key=gemini_api_key)
    answer = modelRAG.generate_content([pdf_text, prompt])
    return answer.text

modelRAG = genai.GenerativeModel('gemini-1.5-pro-latest')
results_string = hyb_results.to_string()
print(generate_response(modelRAG, hyb_query + results_string))

# Clean up

In [None]:
# shared_tier_check = %sql show variables like 'is_shared_tier'
# if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
#     %sql DROP DATABASE IF EXISTS news;