<a href="https://colab.research.google.com/github/rahiakela/general-utility-notebooks/blob/main/arabic_to_eng_with_gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

**Reference**:

https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemini-api/docs/vision.ipynb

In [None]:
!pip install -q -U google-generativeai

In [None]:
%%shell

pip install pillow
pip install pdf2image

In [None]:
!sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
!sudo apt install tesseract-ocr
!sudo apt-get install poppler-utils

In [None]:
import os
import tempfile
from pdf2image import convert_from_path
from PIL import Image
import base64
from IPython.display import Markdown

import google.generativeai as genai

In [None]:
!wget https://github.com/rahiakela/genai-research-and-practice/raw/main/gemini-projects/dataset.zip?raw=true -O dataset.zip
!unzip dataset.zip

In [None]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

## PDF to Image

In [None]:
def convert_pdf(file_path, output_path):
    # save temp image files in temp dir, delete them after we are finished
    with tempfile.TemporaryDirectory() as temp_dir:

        # convert pdf to multiple image
        images = convert_from_path(file_path, output_folder=temp_dir)

        # save images to temporary directory
        temp_images = []
        for i in range(len(images)):
            image_path = f'{temp_dir}/{i}.jpg'
            images[i].save(image_path, 'JPEG')
            temp_images.append(image_path)

        # read images into pillow.Image
        imgs = list(map(Image.open, temp_images))

    # find maximum width of images
    max_img_width = max(i.width for i in imgs)

    # find total height of all images
    total_height = 0
    for i, img in enumerate(imgs):
        total_height += imgs[i].height

    # create new image object with width and total height
    merged_image = Image.new(imgs[0].mode, (max_img_width, total_height))

    # paste images together one by one
    y = 0
    for img in imgs:
        merged_image.paste(img, (0, y))
        y += img.height

    # save merged image
    merged_image.save(output_path)

    return output_path

In [None]:
!mkdir img_output

In [None]:
output_path = convert_pdf("Input_arabic.pdf", "img_output/input_arabic.jpg")

## Image bytes

In [None]:
from PIL import Image

# Specify the file path
file_path = 'img_output/input_arabic.jpg'
image_url = Image.open(file_path)

In [None]:
display(image_url)

In [None]:
# Convert the image to bytes
import io
buffered = io.BytesIO()
image_url.save(buffered, format="JPEG")
img_bytes = buffered.getvalue()

## Translatation

In [None]:
# Choose a Gemini model
model = genai.GenerativeModel(model_name="gemini-1.5-pro")

# Create a prompt
prompt = "You are an Arabic to English translator expert that translates Arabic to English. Traslate Arabic text into English based on the provided image."
response = model.generate_content(
    [
        {
            "mime_type": "image/jpeg",
            "data": base64.b64encode(img_bytes).decode("utf-8"),
        },
        prompt,
    ]
)

In [None]:
Markdown(">" + response.text)

>This document appears to be a financial report in Arabic, likely audited by Deloitte.  Due to the image quality and the narrow, elongated format, it is extremely difficult to provide an accurate and complete translation.  The blurriness makes many of the numbers and even some of the words illegible.  A clearer image, or ideally a text version of the document, would be required for a proper translation.

However, I can provide a general idea of what some sections *likely* contain based on common financial document structure and the few legible words:

* **Deloitte header/cover:** This clearly identifies Deloitte as the firm involved. The Arabic text likely refers to the report's title and perhaps the client's name.
* **Tables of numbers:**  These likely represent financial data.  Common elements that might appear include:
    * **Assets, Liabilities, and Equity:**  (موجودات, مطلوبات, حقوق الملكية)
    * **Income Statement:** (قائمة الدخل) Showing revenues, expenses, and profit/loss.
    * **Cash Flow Statement:** (قائمة التدفقات النقدية) Detailing cash inflows and outflows.
* **Arabic text descriptions:**  These sections likely explain the figures in the tables, provide context, and offer analysis. They may also include footnotes and disclosures.
* **"Scanned by CamScanner" footer:** This simply indicates the method used to create the digital copy.

To get a usable translation, you will need to provide a clearer image or a text version of the document. If you have specific sections you are most interested in, please provide cropped, higher-resolution images of those sections and I will do my best to translate them.


```log
This document appears to be a financial report or audit statement prepared by Deloitte, likely for a client in a region where Arabic is used. Because the image is blurry and fragmented, providing a completely accurate translation is impossible. However, I can give you a general idea of what some sections likely contain:

Cover Page: This shows the Deloitte logo and likely includes information like the report title, client name (redacted in this case), and date. The Arabic phrase likely translates to something similar to "Independent Auditor's Report."

Subsequent Pages: These pages contain financial data presented in tables. Typical elements that can be inferred, though the numbers are unreadable:

Amounts in Arabic numerals: These are financial figures, likely in the local currency.
Column Headings: Likely represent periods (e.g., "Current Year," "Prior Year," possibly quarters or months). Other columns might indicate "Description" or "Account Name."
Row Labels (Arabic text): These would be the names of accounts (e.g., "Cash and Cash Equivalents," "Accounts Receivable," "Revenue," "Expenses," "Net Income," etc.). Due to blurriness, providing specific translations is impossible.
Footnotes (Arabic text at page bottoms): These provide further explanations or details regarding the figures presented in the tables. They often explain accounting policies or significant events.
"Scanned by CamScanner": Indicates the document was digitally scanned.
Narrative Sections (Arabic Text): These sections, also too blurry to read, would contain explanations and analysis of the financial data. They'd likely cover topics like:

Basis of Presentation: Explains the accounting standards followed (e.g., IFRS).
Key Performance Indicators: Discussion of important financial metrics.
Risk Factors: Potential issues that could affect the company's financial performance.
Auditor's Opinion: Deloitte's formal statement on the fairness and accuracy of the financial statements.
To provide a more useful translation, you would need to provide a clearer image of the document. If you can provide a sharper image of specific sections you are most interested in, I can attempt a more precise translation.
```

In [None]:
model = genai.GenerativeModel(model_name="gemini-1.5-flash")

In [None]:
# Create a prompt
prompt = "You are an Arabic to English translator expert that translates Arabic to English. Traslate Arabic text into English based on the provided image."
response = model.generate_content(
    [
        {
            "mime_type": "image/jpeg",
            "data": base64.b64encode(img_bytes).decode("utf-8"),
        },
        prompt,
    ]
)

In [None]:
Markdown(">" + response.text)

>I cannot provide a complete translation of the provided document because the image quality is poor and blurry.  Much of the text is unreadable.  To get an accurate translation, I need a clearer image.


However, I can offer some observations:

* **The document appears to be a financial report or audit.**  There are numerous tables with numbers, likely representing financial data.  There are also sections that appear to be explanatory text.
* **The language is Arabic.**  The text is written from right to left.
* **Deloitte is mentioned.** This suggests the document is related to an audit or financial review conducted by the accounting firm.
* **There are page numbers visible.** Indicating the length of the original document.

To receive an accurate translation, please provide a clear and high-resolution image of the document.
