<a href="https://colab.research.google.com/github/javatcoding1/MRI-Prediction/blob/main/site/en/gemini-api/docs/vision.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2024 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Explore vision capabilities with the Gemini API

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://ai.google.dev/gemini-api/docs/vision"><img src="https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png" height="32" width="32" />View on ai.google.dev</a>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemini-api/docs/vision.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/generative-ai-docs/blob/main/site/en/gemini-api/docs/vision.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

The Gemini API is able to process images and videos, enabling a multitude of
 exciting developer use cases. Some of Gemini's vision capabilities include
 the ability to:

*   Caption and answer questions about images
*   Transcribe and reason over PDFs, including long documents up to 2 million token context window
*   Describe, segment, and extract information from videos,
including both visual frames and audio, up to 90 minutes long
*   Detect objects in an image and return bounding box coordinates for them

This tutorial demonstrates some possible ways to prompt the Gemini API with
images and video input, provides code examples,
and outlines prompting best practices with multimodal vision capabilities.
All output is text-only.

## Setup

Before you use the File API, you need to install the Gemini API SDK package and configure an API key. This section describes how to complete these setup steps.

### Install the Python SDK and import packages

The Python SDK for the Gemini API is contained in the [google-generativeai](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip.

In [1]:
!pip install -q -U google-generativeai

Import the necessary packages.

In [2]:
import google.generativeai as genai
from IPython.display import Markdown

### Set up your API key

The File API uses API keys for authentication and access. Uploaded files are associated with the project linked to the API key. Unlike other Gemini APIs that use API keys, your API key also grants access to data you've uploaded to the File API, so take extra care in keeping your API key secure. For more on keeping your keys
secure, see [Best practices for using API
keys](https://support.google.com/googleapi/answer/6310037).

Store your API key in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or are unfamiliar with Colab Secrets, refer to the [Authentication quickstart](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb).

In [5]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

## Prompting with images

In this tutorial, you will upload images using the File API or as inline data and generate content based on those images.

### Technical details (images)
Gemini 1.5 Pro and Flash support a maximum of 3,600 image files.

Images must be in one of the following image data [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):

-   PNG - `image/png`
-   JPEG - `image/jpeg`
-   WEBP - `image/webp`
-   HEIC - `image/heic`
-   HEIF - `image/heif`

Each image is equivalent to 258 tokens.

While there are no specific limits to the number of pixels in an image besides the model’s context window, larger images are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller images are scaled up to 768x768 pixels. There is no cost reduction for images at lower sizes, other than bandwidth, or performance improvement for images at higher resolution.

For best results:

*   Rotate images to the correct orientation before uploading.
*   Avoid blurry images.
*   If using a single image, place the text prompt after the image.

## Image input

For total image payload size less than 20MB, it's recommended to either upload
base64 encoded images or directly upload locally stored image files.

### Base64 encoded images

You can upload public image URLs by encoding them as Base64 payloads.
You can use the httpx library to fetch the image URLs.
The following code example shows how to do this:

In [27]:
import httpx
import base64
import mimetypes

# Path to the image
image_path = "/content/SOB_M_DC-14-2523-400-005.png"  # Replace with any image

# Detect MIME type dynamically
mime_type = mimetypes.guess_type(image_path)[0] or "image/jpeg"  # Default to JPEG if unknown

# Read image file as binary data
with open(image_path, "rb") as image_file:
    image_data = image_file.read()

# Choose a Gemini model
model = genai.GenerativeModel(model_name="gemini-1.5-pro")

# System prompt for structured output
prompt = """You are analyzing a medical scan image. Provide the following structured output:

- **Scan Type:** (e.g., MRI, CT Scan, X-ray)
- **Organ:** (e.g., Brain, Lung, Heart,Breast)
- **Tumor Type:** (If detected, specify type)
- **Tumor Subclass:** (If applicable)
- **Detailed Description:** (Size, shape, and location if possible)
- **Possible Causes:** (List genetic factors, environmental exposure, lifestyle)
- **Clinical Insights:** (Any relevant medical observations)

Ensure the output is structured in plain text without any extra information or unrelated content."""

# Generate response
response = model.generate_content(
    [
        {
            "mime_type": mime_type,
            "data": base64.b64encode(image_data).decode("utf-8"),
        },
        prompt,
    ]
)

# Extract response text
response_text = response.text

# Print the structured response
print(response_text)

# Extract specific values (optional)
lines = response_text.split("\n")
scan_type = lines[0].replace("Scan Type:", "").strip() if len(lines) > 0 else "Unknown"
organ_type = lines[1].replace("Organ:", "").strip() if len(lines) > 1 else "Unknown"

# Print extracted values (for further use)
print("Scan Type:", scan_type)
print("Organ Type:", organ_type)


- **Scan Type:** Microscopic image (likely H&E stain)
- **Organ:**  Placenta
- **Tumor Type:** Not evident
- **Tumor Subclass:** Not applicable
- **Detailed Description:**  Image shows placental villi and intervillous space. No obvious abnormalities are readily apparent. 
- **Possible Causes:** Not applicable (no pathology identified)
- **Clinical Insights:** Normal placental tissue. Further clinical correlation and potentially additional sections would be needed for a full diagnosis.  The white spaces suggest tissue separation artifact, a common occurrence in slide preparation.

Scan Type: - **** Microscopic image (likely H&E stain)
Organ Type: - ****  Placenta


### Multiple images

To prompt with multiple images in Base64 encoded format, you can do the
following:

In [23]:
organ_type

'- **** Lung'

In [29]:
if organ_type == "- **** Brain":
  import tensorflow as tf
  import numpy as np
  from tensorflow.keras.models import load_model
  from tensorflow.keras.preprocessing import image

  # Load the trained model
  model = load_model("brain_model.h5")

  # Model summary (optional, to verify)
  model.summary()

  # Function to preprocess an input image
  def preprocess_image(img_path):
      img_size = (299, 299)  # Match model's input shape
      img = image.load_img(img_path, target_size=img_size)
      img_array = image.img_to_array(img)  # Convert to numpy array
      img_array = np.expand_dims(img_array, axis=0)  # Add batch dimension
      img_array = tf.keras.applications.xception.preprocess_input(img_array)  # Apply Xception preprocessing
      return img_array

  # Load and preprocess an MRI image
  img_path = "/content/men.jpeg"  # Replace with your image path
  processed_img = preprocess_image(img_path)

  # Perform prediction
  predictions = model.predict(processed_img)

  # Decode predictions (assuming 4 classes)
  class_labels = ["Class 1", "Class 2", "Class 3", "Class 4"]  # Replace with actual class names
  predicted_class = class_labels[np.argmax(predictions)]

  # Print results
  print("Predicted Class:", predicted_class)
  print("Prediction Probabilities:", predictions)
else:
  import tensorflow as tf
  from tensorflow.keras.models import load_model
  from tensorflow.keras.preprocessing import image
  import numpy as np

  # Load the trained model
  model = load_model("/content/breast_tumor.h5")  # Ensure the correct file path

  # Function to preprocess input image
  def preprocess_image(img_path):
      img = image.load_img(img_path, target_size=(244, 244))  # Resize to match model input
      img_array = image.img_to_array(img)
      img_array = np.expand_dims(img_array, axis=0)  # Add batch dimension
      img_array = img_array / 255.0  # Normalize pixel values
      return img_array

  # Path to the test image
  img_path = "/content/SOB_M_DC-14-2523-400-005.png"  # Change this to your actual image path

  # Preprocess and predict
  img_array = preprocess_image(img_path)
  prediction = model.predict(img_array)

  # Interpret the results
  class_labels = ["Benign", "Malignant"]  # Adjust labels if different
  predicted_class = np.argmax(prediction)  # Get class index

  print(f"Predicted Class: {class_labels[predicted_class]}")
  print(f"Confidence Scores: {prediction}")




[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 218ms/step
Predicted Class: Malignant
Confidence Scores: [[0.19755012 0.8024499 ]]


### Upload one or more locally stored image files

Alternatively, you can upload one or more locally stored image files..

You can download and use our drawings of [piranha-infested waters](https://storage.googleapis.com/generativeai-downloads/images/piranha.jpg) and a [firefighter with a cat](https://storage.googleapis.com/generativeai-downloads/images/firefighter.jpg). First, save these files to your local directory.

Then click **Files** on the left sidebar. For each file, click the **Upload** button, then navigate to that file's location and upload it:

<img width=400 src="https://ai.google.dev/tutorials/images/colab_upload.png">

When the combination of files and system instructions that you intend to send is larger than 20 MB in size, use the File API to upload those files. Smaller files can instead be called locally from the Gemini API:
