
https://cloud.google.com/use-cases/ocr?hl=en#ocr-optical-character-recognition-with-world-class-google-cloud-ai 

In [1]:
%%capture --no-stderr
%pip install -U -q google-genai

In [2]:
import os
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

https://ai.google.dev/gemini-api/docs/document-processing?lang=python

In [4]:
from google import genai
from google.genai import types
import httpx

client = genai.Client()

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"  # Replace with the actual URL of your PDF

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

# prompt = "Summarize this document"
# prompt = "Extract abstract of this document"
prompt = "summarize the results in figure 1b"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Figure 1b shows AlphaFold's TM-score compared to other groups for six new protein folds identified by CASP13 assessors.  AlphaFold achieves significantly higher TM-scores than other groups for each of these folds, demonstrating its ability to predict novel protein structures with high accuracy.  One fold (T1017s2-D1) is not shown due to unavailability for publication.


In [10]:
prompt = "print out the results of AlphaFold (or AF) in figure 1c as a table"
response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Here's a table summarizing the contact precisions from Figure 1c of the AlphaFold paper:

| Set       | FM (N=31) | FM/TBM (N=12) | TBM (N=61) |  AF 498 (032) | AF 498 (032) | AF 498 (032) |
|-----------|------------|-----------------|-------------|----------------|----------------|----------------|
|           | L long      | L/2 long        | L/5 long     | L long          | L/2 long        | L/5 long        |
| **Contact precisions** |            |                 |             |                |                |                |
| **AF**     | 45.5       | 42.9            | 39.8        | 58.0           | 55.1           | 51.7           |
| **498**    | 59.1       | 53.0            | 48.9        | 74.2           | 64.5           | 64.2           |
| **032**    | 68.3       | 65.5            | 61.9        | 82.4           | 80.3           | 76.4           |
| **AF**     |            |                 |             | 90.6           | 90.5           | 87.1           |


**Note:**  "L" 

### <font color='red'> Multiple PDFs </font>

In [7]:
from google import genai
import io
import httpx

client = genai.Client()

doc_url_1 = "https://arxiv.org/pdf/2312.11805" # Replace with the URL to your first PDF
doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Replace with the URL to your second PDF

# Retrieve and upload both PDFs using the File API
doc_data_1 = io.BytesIO(httpx.get(doc_url_1).content)
doc_data_2 = io.BytesIO(httpx.get(doc_url_2).content)

sample_pdf_1 = client.files.upload(
  file=doc_data_1,
  config=dict(mime_type='application/pdf')
)
sample_pdf_2 = client.files.upload(
  file=doc_data_2,
  config=dict(mime_type='application/pdf')
)

prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table."

response = client.models.generate_content(
  model="gemini-1.5-flash",
  contents=[sample_pdf_1, sample_pdf_2, prompt])
print(response.text)

Here's a table summarizing the differences between the main benchmarks used in the two papers (Gemini 1.0 and Gemini 1.5).  Note that a complete comparison is difficult because the papers don't always use the same metrics and reporting methods and there is some overlap in benchmarks.

| Benchmark Category | Gemini 1.0 Benchmarks | Gemini 1.5 Benchmarks | Key Differences |
|---|---|---|---|
| **Text-based Reasoning & Language Modeling** | MMLU, GSM8K, MATH, BIG-Bench-Hard, HellaSwag, DROP, HumanEval, Natural2Code | MMLU, GSM8K, MATH, BIG-Bench-Hard, HellaSwag, DROP, HumanEval, Natural2Code,  WMT23, MGSM,  Plus internal benchmarks  |  Gemini 1.5 adds WMT23 and MGSM for multilingual performance.  Internal benchmarks introduced, likely held out for more robust evaluation. MMLU, GSM8K etc. are updated versions with possibly different prompt engineering. |
| **Image Understanding** | VQAv2, TextVQA, DocVQA, ChartQA, InfographicVQA, Ai2D, MathVista | VQAv2, TextVQA, DocVQA, ChartQA, Infograph