<a href="https://www.kaggle.com/code/gpreda/gemini-1-5-q-a-from-a-large-romanian-book?scriptVersionId=209247007" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Load a pdf from a dataset on Kaggle

## Step 1: Import Python Packages

In [1]:
import os
import time
import google.generativeai as genai
from kaggle_secrets import UserSecretsClient

## Step 2: Authenticate with Google Generative AI

In [2]:
user_secrets = UserSecretsClient()
ai_studio_token = user_secrets.get_secret("GEMINI_API_KEY_SECOND")
genai.configure(api_key=ai_studio_token)

## Step 3: Define helper functions

In [3]:
def upload_to_gemini(path, mime_type=None):
  """Uploads the given file to Gemini.

  See https://ai.google.dev/gemini-api/docs/prompting_with_media
  """
  file = genai.upload_file(path, mime_type=mime_type)
  print(f"Uploaded file '{file.display_name}' as: {file.uri}")
  return file

def wait_for_files_active(files):
  """Waits for the given files to be active.

  Some files uploaded to the Gemini API need to be processed before they can be
  used as prompt inputs. The status can be seen by querying the file's "state"
  field.

  This implementation uses a simple blocking polling loop. Production code
  should probably employ a more sophisticated approach.
  """
  print("Waiting for file processing...")
  for name in (file.name for file in files):
    file = genai.get_file(name)
    while file.state.name == "PROCESSING":
      print(".", end="", flush=True)
      time.sleep(10)
      file = genai.get_file(name)
    if file.state.name != "ACTIVE":
      raise Exception(f"File {file.name} failed to process")
  print("...all files ready")
  print()

## Step 4: Load the Gemini 1.5 model

In [4]:
# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 64,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
)

## Step 5: Upload your large file to Gemini 1.5

In [5]:
romanian_book = "/kaggle/input/scrisori-ctre-vasile-alecsandri-de-ion-ghica/Scrisori_catre_Vasile_Alecsandri.pdf"

In [6]:
files = [
  upload_to_gemini(romanian_book, mime_type="application/pdf"),
]

wait_for_files_active(files)

chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
        files[0],
      ],
    }
  ]
)

Uploaded file 'Scrisori_catre_Vasile_Alecsandri.pdf' as: https://generativelanguage.googleapis.com/v1beta/files/xxpnawn5uhzf
Waiting for file processing...
...all files ready



## Step 6: Ask Gemini 1.5 questions about your large file

In [7]:
response = chat_session.send_message("Make a summary of this book, and write this summary in French. Do it in less than 1000 typograpical signs")
print(response.text)
print(response.usage_metadata)

Voici un résumé du livre "Scrisori către Vasile Alecsandri" d'Ion Ghica, en français et en moins de 1000 signes :

Ce recueil de lettres, adressées à Vasile Alecsandri, couvre une période de la vie de l'auteur, notamment la vie politique tumultueuse des Principautés roumaines au XIXe siècle. Ghica décrit avec précision les conditions sociales, les luttes politiques, les personnages clés, et les évènements marquants de cette période.  Il évoque  la corruption, la pauvreté, les tensions entre les différentes communautés, ainsi que l'influence des grandes puissances européennes.  Le ton est souvent anecdotique et nostalgique,  témoignant d'une grande connaissance de l'époque.

prompt_token_count: 421698
candidates_token_count: 164
total_token_count: 421862



In [8]:
response = chat_session.send_message("Make the summary of the first chapter in the book. Do it in maximum 3 phrases. Write this answer in German.")
print(response.text)
print(response.usage_metadata)

Das erste Kapitel bietet eine Einleitung zu Ion Ghicas Briefen an Vasile Alecsandri.  Es beschreibt den gesellschaftlichen und politischen Wandel Rumäniens im 19. Jahrhundert und die Bedeutung der Briefe als historische Quelle.  Ghica betont dabei den Einfluss des aufkommenden Nationalismus und die Herausforderungen der Zeit.

prompt_token_count: 421889
candidates_token_count: 68
total_token_count: 421957



Credit:
 - Adapted from https://aistudio.google.com/app/prompts/video-qa