In [None]:
# STEP 1: Install the Google Generative AI SDK
!pip install -q google-generativeai

In [None]:
import os
os.environ["GOOGLE_API_KEY"] = "ENTER API KEY "

# DOCUMENT UNDERSTANDING

# Document understanding Capability

Gemini models can process documents, using 'native vision' to understand entire document contexts. This goes beyond 'simple text extraction'.

## 1. Simple Text Extraction

Simple text extraction means pulling out text (written words) from a source — usually an image, PDF, or document — without understanding or analyzing it.

It’s a basic OCR (Optical Character Recognition) process.

a) The system looks at an image or scanned page.

b) It detects characters (letters, numbers, punctuation).

c) Then it converts them into machine-readable text.

Example:
You upload an image of a bill that says:

“Total: ₹450”

Simple text extraction will just output:
Total: ₹450

It doesn’t interpret or analyze meaning — it only extracts raw text.

## 2. Native Vision

Native Vision refers to built-in visual understanding capabilities of an AI model — meaning the model can directly “see” and reason about images, not just extract text.

This includes:

a) Recognizing objects, people, or scenes.

b) Describing images in natural language.

c) Reading charts, graphs, or handwritten notes.

d) Combining image and text understanding (multimodal reasoning).

It’s called native because the vision capability is integrated directly inside the model, not using an external OCR or image analysis tool.

Example:
You upload the same image of a bill.
A native vision model could say:

“This is a restaurant bill showing a total of ₹450 for two items: Pizza and Coke.”

Here, it not only reads the text — it understands the context.

## 3. Benefits of Native Vision

A) It performs holistic content analysis, it interpret and synthesize information from all elements within a document simultaneously. It understands the inherent relationships between text, images, tables, and diagrams because it processes the page's visual structure. This goes beyond simple text recognition to comprehend the document as a whole.

B) Handles huge documents easily it can process PDFs up to `1000 pages in one go, without you needing to split or shrink them.

C) It enables structured data extraction can turn messy PDF content into clean, organized formats like CSV, JSON, or database ready tables. The model uses its understanding of visual layout, such as labels, fields, columns, and rows, to accurately map unstructured data to a predefined schema.

D) It gives us capability of Context Aware Summarization and Question Answering, This capability allows the model to generate summaries or answer questions by synthesizing information from multiple, often disparate, parts of a document. Its contextual awareness means it can draw conclusions that require combining information from a paragraph of text, a data table, and a visual diagram.

E) It preserves formatting for reuse it can convert PDFs into HTML or other formats while keeping original styling for website, apps, or reports


In [None]:
from google import genai
from google.genai import types
import httpx    # REQUEST

client = genai.Client()

# document URL
doc_url = "https://www.ril.com/sites/default/files/2025-08/RIL-Integrated-Annual-Report-2024-25.pdf"

doc_data = httpx.get(doc_url).content  # IT SEND URL A GET REQUEST

prompt="Summarize the document ."

response= client.models.generate_content(
    model= "gemini-2.5-flash",
    contents= [
        types.Part.from_bytes(
            data=doc_data,
            mime_type="application/pdf"
        ), prompt
    ])

In [None]:
print(response.text)

The document is the **Integrated Annual Report 2024-25 of Reliance Industries Limited (RIL)**, providing a comprehensive overview of the company's financial, operational, governance, and sustainability performance.

Here's a summary of its key sections:

**1. Cover Page (Page 1):**
*   **Theme:** "Realising Aspirations"
*   **Highlights:** Visuals and keywords like Sustainability, Responsibility, Connectivity, Mobility, Accessibility, Reliability, and Variety, indicating RIL's diverse operations and commitment to sustainable growth.
*   **Reporting Period:** 2024-25.

**2. Table of Contents & Company Overview (Page 2):**
*   **Vision:** Inspired by Founder Chairman Shri Dhirubhai H. Ambani's philosophy "Pursue your goals even in the face of difficulties, and convert adversities into opportunities."
*   **Chairman's Message (Shri Mukesh D. Ambani):** Emphasizes customer-centric innovation, operational discipline, resilience, and progression in fulfilling India's growth.
*   **Company Pr

In [None]:
# to check if model is working or not

from google import genai
from google.genai import types
import httpx

client = genai.Client()

# document URL
doc_url = "https://www.ril.com/sites/default/files/2025-08/RIL-Integrated-Annual-Report-2024-25.pdf"

doc_data = httpx.get(doc_url).content

prompt="IN page number 5  there is an table  and content written on the right side of the table,is the content  and table related to each other."

response_1= client.models.generate_content(
    model= "gemini-2.5-flash",
    contents= [
        types.Part.from_bytes(
            data=doc_data,
            mime_type="application/pdf"
        ), prompt
    ])

In [None]:
print(response_1.text)

Yes, the table "10-YEAR FINANCIAL HIGHLIGHTS" on page 5 and the content written on the right side under "MANAGEMENT DISCUSSION AND ANALYSIS" are **directly related** to each other.

Here's how:

1.  **Financial Data Source:** The "10-YEAR FINANCIAL HIGHLIGHTS" table provides the raw, historical financial figures (Revenue, EBITDA, Net Profit, etc.) for Reliance Industries Limited over a decade, culminating in FY 2024-25.

2.  **Interpretation and Explanation:** The "MANAGEMENT DISCUSSION AND ANALYSIS" section, particularly the "Performance Overview" subsection on the right, *interprets and explains* these financial figures, especially for the most recent fiscal year (FY 2024-25).
    *   It discusses the **Consolidated revenue**, **EBITDA**, and **PAT** figures for FY 2024-25, which are directly presented in the table.
    *   It then breaks down the **Segment Performance** (Digital Services, Retail, Oil to Chemicals, Oil & Gas) and explains *how* each segment contributed to the overall

In [None]:
from google import genai
from google.genai import types
import pathlib                     # copy file from local system


client = genai.Client()

filepath= pathlib.Path("/content/DISSERTATION.pdf")

prompt="Summarize this document"

response_2= client.models.generate_content(
    model= "gemini-2.5-flash",
    contents= [
        types.Part.from_bytes(
            data=filepath.read_bytes(),    # read_bytes = extract raw data in the form of bytes
            mime_type="application/pdf"
        ), prompt
    ])

print(response_2.text)

This document is the **Integrated Annual Report 2024-25** for **Reliance Industries Limited (RIL)**, themed "Realising Aspirations" and driven by the philosophy "Growth is Life." It details RIL's performance, strategic direction, and commitment to sustainable growth across its diversified businesses.

**Key Highlights & Overall Performance (FY 2024-25):**

*   **India's Largest Private Sector Enterprise:** A Fortune Global 500 leader, it aims to fulfill India's growth requirements through customer-centric innovation and operational discipline.
*   **Financial Performance (Consolidated):**
    *   **Revenue:** ₹10,71,174 Crore (+7.1% Y-o-Y)
    *   **EBITDA:** ₹1,83,422 Crore (+2.9% Y-o-Y)
    *   **Net Profit:** ₹81,309 Crore (+2.9% Y-o-Y)
    *   **Total Equity:** First Indian company to cross ₹10,00,000 Crore in consolidated total equity.
*   **Significant Contributions:** ₹2,83,719 Crore in exports, ₹2,156 Crore in CSR contribution, employing 403,303 people.
*   **Total Value Added:

In [None]:
response_2

GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      content=Content(
        parts=[
          Part(
            text="""This document presents a comprehensive multifactorial analysis of cancer patient outcomes, drawing insights from a global dataset of 50,000 patients spanning 2015-2024. The study aims to understand how genetic, lifestyle, environmental, and economic factors collectively influence cancer severity, treatment costs, and survival years.

**1. Introduction & Background:**
Cancer is identified as a complex global health challenge, causing nearly 10 million deaths in 2020. Its multifactorial nature, driven by genetics, lifestyle, environment, and socioeconomic conditions, necessitates an integrated approach. The study highlights the need to move beyond isolated factor research to understand complex interdependencies for better personalized care and public health strategies.

**2. Methodology:**
The research employed a **retr

In [None]:

# we just gaved a promt of 2 line from the pdf idk in which page is this text is on?

prompt = """
From the early-stage diagnosis percentages for different cancer types, identify the type with the lowest early-stage detection rate.
Then, using the literature review, explain at least two environmental or behavioral factors that could contribute to this lower detection.
"""

response_3= client.models.generate_content(
    model= "gemini-2.5-flash",
    contents= [
        types.Part.from_bytes(
            data=filepath.read_bytes(),
            mime_type="application/pdf"
        ), prompt
    ])

print(response_3.text)


# this will give you the exact page and even section also

Based on the early-stage diagnosis percentages provided in Section 4.2.2 ("Proportion of Early-Stage Diagnoses by Cancer Type and"), the cancer type with the **lowest early-stage detection rate** is:

*   **Lung Cancer: 38.43%** of cases are diagnosed at Stage 0 and Stage I.

Using the literature review (Chapter 2), at least two environmental or behavioral factors that could contribute to this lower early-stage detection of lung cancer include:

1.  **Smoking (Behavioral Factor - Section 2.3):**
    The literature review explicitly states that "Smoking... Strongly linked to lung, throat, bladder, and pancreatic cancers." For lung cancer, smoking can contribute to lower early detection in several ways:
    *   **Masking Symptoms:** Smokers often experience chronic coughs or shortness of breath due to smoking itself (e.g., "smoker's cough," COPD). These symptoms are also early indicators of lung cancer, but they may be dismissed by the patient or even initially by healthcare providers as

In [None]:
# comparing 2 pdf

from google import genai
from google.genai import types
import pathlib

client = genai.Client()

# File paths for both strategies
strategy1_path = pathlib.Path('/content/Streategy 1.pdf')
strategy2_path = pathlib.Path('/content/Streategy 2.pdf')

prompt = "Tell me the difference between both the strategies"


# Generate content with both PDFs passed as separate inputs
response_4 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_bytes(
            data=strategy1_path.read_bytes(),
            mime_type='application/pdf'
        ),
        types.Part.from_bytes(
            data=strategy2_path.read_bytes(),
            mime_type='application/pdf'
        ),
        prompt
    ]
)

print(response_4.text)

The two strategies, **Growth Through Product-Led Strategy (PLG)** and **Growth Through Strategic Partnerships**, represent fundamentally different approaches to achieving business growth:

Here's a breakdown of their core differences:

| Feature           | Product-Led Strategy (PLG)                                   | Strategic Partnerships                                       |
| :---------------- | :----------------------------------------------------------- | :----------------------------------------------------------- |
| **Core Idea**     | The product itself is the primary driver of customer acquisition, conversion, and expansion. It sells itself. | Collaborate with established players or complementary businesses to gain access to their audience, credibility, and resources. |
| **Source of Growth** | **Internal:** Driven by the inherent value, user experience, and virality built *into the product itself*. | **External:** Driven by leveraging the existing audience, trust, and re

In [None]:
from google import genai
from google.genai import types
import pathlib

client = genai.Client()

# File paths for both strategies
strategy1_path = pathlib.Path('/content/Streategy 1.pdf')
strategy2_path = pathlib.Path('/content/Streategy 2.pdf')


prompt = "Tell me the similarity between both the strategies"

# Generate content with both PDFs passed as separate inputs
response_5 = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_bytes(
            data=strategy1_path.read_bytes(),
            mime_type='application/pdf'
        ),
        types.Part.from_bytes(
            data=strategy2_path.read_bytes(),
            mime_type='application/pdf'
        ),
        prompt
    ]
)

print(response_5.text)

The primary similarity between Growth Through Product-Led Strategy (PLG) and Growth Through Strategic Partnerships is that **both are scalable strategies aimed at acquiring customers, expanding market reach, and driving business growth, often by leveraging existing assets or channels rather than solely relying on direct, heavy sales or marketing spend.**

While their *methods* differ significantly (one focusing on the product itself, the other on external relationships), their ultimate *goal* is the same: to grow the business by efficiently reaching and converting new users.
