<a href="https://colab.research.google.com/github/jorisjose/ai-home-assignment/blob/main/notebooks/Overview_of_Colaboratory_Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cells
A notebook is a list of cells. Cells contain either explanatory text or executable code and its output. Click a cell to select it.

## Code cells
Below is a **code cell**. Once the toolbar button indicates CONNECTED, click in the cell to select it and execute the contents in the following ways:

* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu.


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X = vectorizer.fit_transform(df['text'])
y = df['category']
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = MultinomialNB()
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))



              precision    recall  f1-score   support

    negative       1.00      0.08      0.15       115
     neutral       0.69      0.95      0.80       567
    positive       0.61      0.39      0.47       287

    accuracy                           0.68       969
   macro avg       0.77      0.47      0.47       969
weighted avg       0.70      0.68      0.62       969



In [None]:
!pip -q install gcsfs

from google.colab import auth
auth.authenticate_user()


import pandas as pd

# First try with ISO-8859-1 (Latin-1)
df = pd.read_csv("gs://ai-home-assignment-datasets/all-data.csv", encoding="ISO-8859-1")
# rename columns for clarity
df.columns = ['category', 'text']

print(df.head())
print(df['category'].value_counts())




   category                                               text
0   neutral  Technopolis plans to develop in stages an area...
1  negative  The international electronic industry company ...
2  positive  With the new production plant the company woul...
3  positive  According to the company 's updated strategy f...
4  positive  FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is ag...
category
neutral     2878
positive    1363
negative     604
Name: count, dtype: int64


In [None]:
# Clean slate + install the right client
!pip -q uninstall -y vertex-ai google-cloud-aiplatform google-generativeai || true
!pip -q install google-genai


[0m

In [None]:
# Clean slate + install the right client
!pip -q uninstall -y vertex-ai google-cloud-aiplatform google-generativeai || true
!pip -q install google-genai


[0m

In [4]:
from google.colab import userdata
from google import genai

# Fetch your saved secret
api_key = userdata.get('GOOGLE_API_KEY')

# Use it with Gemini client
client = genai.Client(api_key=api_key)

resp = client.models.generate_content(
    model="gemini-1.5-flash",
    contents="Hello from Colab using Gemini API securely!"
)

print(resp.text)



That's great!  It's impressive to be able to securely access the Gemini API from Google Colab.  Is there anything specific you'd like to know or discuss about your setup or application?  For example, are you facing any challenges, or do you have questions about best practices for secure API access?



In [None]:
# 1) Auth to your Google account in Colab
from google.colab import auth
auth.authenticate_user()


In [None]:
# 2) Point gcloud to your project and ensure Vertex AI API is enabled
!gcloud config set project ai-home-assignment
!gcloud services enable aiplatform.googleapis.com


Updated property [core/project].


In [None]:
from google.colab import auth
auth.authenticate_user()

# 1) Point gcloud to the right project & enable Vertex AI API
!gcloud config set project ai-home-assignment
!gcloud services enable aiplatform.googleapis.com

# 2) Initialize Vertex AI in a region that has the models
import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="ai-home-assignment", location="us-east1")  # try us-east1

# 3) Use the versioned model ID
model = GenerativeModel("gemini-2.0-flash-001")  # or: "gemini-1.5-pro-002"
resp = model.generate_content("Hello from Vertex AI in us-east1")
print(resp.text)


Updated property [core/project].
Hello to you too from a Google AI model! How can I help you today?



In [None]:
from google.colab import auth
auth.authenticate_user()

import vertexai
from vertexai.generative_models import GenerativeModel

# Init your project + region
vertexai.init(project="ai-home-assignment", location="us-east1")

# Use Gemini model (try 2.0, fallback to 1.5 if 404)
model = GenerativeModel("gemini-2.0-flash-001")

# Prompt
prompt = "Summarize this text: hello, how are you and hope you are doing good"

# Call model
response = model.generate_content(prompt)

# Print output text
print(response.candidates[0].content.parts[0].text)




The text is a friendly greeting, asking "hello, how are you?" and expressing the hope that the recipient is doing well.



In [8]:
# 1) Install GCS support
!pip -q install gcsfs

# 2) Sign in to your Google account (popup appears)
from google.colab import auth
auth.authenticate_user()


In [13]:
# 3) Read the CSV from GCS using ADC (google_default)
import pandas as pd
import gcsfs
import vertexai
from vertexai.generative_models import GenerativeModel

path = "gs://ai-home-assignment-datasets/all-data.csv"

fs = gcsfs.GCSFileSystem(token="google_default")  # <- key line
with fs.open(path, "rb") as f:
    df = pd.read_csv(f, encoding="latin1")         # use the encoding that worked for you

# Take first 10 rows of text
texts = df.iloc[:5, 1].tolist()   # assumes 2nd column is text


# Init Vertex AI
vertexai.init(project="ai-home-assignment", location="us-east1")

# Load model
model = GenerativeModel("gemini-2.0-flash-001")

# Summarize each row
for i, text in enumerate(texts, 1):
    prompt = f"Summarize in 10 words or less:\n\n{text}"
    response = model.generate_content(prompt)
    summary = response.candidates[0].content.parts[0].text
    print(f"\n--- Row {i} ---")
    print("Original:", text[:200], "...")   # show first 200 chars
    print("Summary:", summary)






--- Row 1 ---
Original: Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said . ...
Summary: Technopolis to build tech hub, minimum 100,000 square meters.


--- Row 2 ---
Original: The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , th ...
Summary: Elcoteq Tallinn office workers laid off; more cuts made.


--- Row 3 ---
Original: With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitabi ...
Summary: New plant: more capacity, efficient materials, better profits.


--- Row 4 ---
Original: According to the company 's updated strategy for the years 2009-2012 , Basware targ

In [15]:
# =========================
# Setup (run this cell once)
# =========================
!pip -q install google-genai spacy pytextrank gcsfs pandas

# Download a small English model for spaCy
import spacy, sys, subprocess, importlib
try:
    spacy.load("en_core_web_sm")
except OSError:
    subprocess.run([sys.executable, "-m", "spacy", "download", "en_core_web_sm"], check=True)

# =========================
# Imports & clients
# =========================
import os
import pandas as pd
import gcsfs
import spacy
import pytextrank as ptx

from google.colab import auth, userdata
from google import genai

# --- Auth for GCS (only needed if reading gs://) ---
auth.authenticate_user()

# --- Gemini API key from Colab Secrets ---
api_key = userdata.get("G_API_KEY")  # set this in the Colab sidebar (🔑)
if not api_key:
    # fallback: prompt once (won't be saved to repo)
    api_key = input("Enter your GEMINI_API_KEY: ").strip()

gemini_client = genai.Client(api_key=api_key)

# --- spaCy pipeline with PyTextRank for extractive summarization ---
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")  # adds TextRank to the pipeline


# =========================
# Data loading
# =========================
# Change this if you uploaded file locally instead of GCS:
USE_GCS = True
GCS_PATH = "gs://ai-home-assignment-datasets/all-data.csv"  # <- your path
LOCAL_PATH = "all-data.csv"  # if you upload manually

def load_dataset():
    if USE_GCS:
        fs = gcsfs.GCSFileSystem(token="google_default")  # use Colab auth
        with fs.open(GCS_PATH, "rb") as f:
            df = pd.read_csv(f, encoding="latin1")
    else:
        df = pd.read_csv(LOCAL_PATH, encoding="latin1")

    # Your CSV appears to have two columns: label + text.
    # Normalize the column names for convenience.
    if len(df.columns) >= 2:
        df = df.rename(columns={df.columns[0]: "category", df.columns[1]: "text"})
    else:
        raise ValueError("Expected at least two columns (label + text).")
    return df

df = load_dataset()
df10 = df.head(10).copy()


# =========================
# Summarizers
# =========================
def gemini_summary(text: str, max_words: int = 15) -> str:
    """
    Abstractive, very short summary using Gemini.
    """
    prompt = f"Summarize the following in ONE concise sentence, max {max_words} words:\n\n{text}"
    try:
        resp = gemini_client.models.generate_content(
            model="gemini-1.5-flash",
            contents=prompt
        )
        return (resp.text or "").strip()
    except Exception as e:
        return f"[Gemini error: {e}]"

def spacy_textrank_summary(text: str, sent_limit: int = 2) -> str:
    """
    Extractive summary using spaCy + PyTextRank (top-ranked sentences).
    """
    try:
        doc = nlp(text)
        sents = [str(s) for s in doc._.textrank.summary(limit_sentences=sent_limit)]
        return " ".join(sents).strip()
    except Exception as e:
        return f"[spaCy error: {e}]"


# =========================
# Compare on the first 10 rows
# =========================
rows = []
for i, row in df10.iterrows():
    txt = str(row.get("text", ""))
    cat = row.get("category", "")
    gsum = gemini_summary(txt, max_words=15)           # tight, headline-style
    esum = spacy_textrank_summary(txt, sent_limit=2)   # 1–2 extractive sentences

    rows.append({
        "row": i + 1,
        "category": cat,
        "text_snippet": (txt[:180] + "…") if len(txt) > 180 else txt,
        "gemini_summary_≤15w": gsum,
        "spacy_textrank_summary": esum,
        "orig_words": len(txt.split()),
        "gemini_words": len(gsum.split()),
        "spacy_words": len(esum.split())
    })

compare_df = pd.DataFrame(rows)
compare_df


Unnamed: 0,row,category,text_snippet,gemini_summary_≤15w,spacy_textrank_summary,orig_words,gemini_words,spacy_words
0,1,neutral,Technopolis plans to develop in stages an area...,"Technopolis will develop a 100,000+ sq m tech ...",Technopolis plans to develop in stages an area...,31,11,31
1,2,negative,The international electronic industry company ...,Elcoteq laid off dozens of Tallinn facility em...,The international electronic industry company ...,36,11,36
2,3,positive,With the new production plant the company woul...,Increased production capacity improves profita...,With the new production plant the company woul...,33,12,33
3,4,positive,According to the company 's updated strategy f...,Basware aims for 20-40% net sales growth and 1...,According to the company 's updated strategy f...,41,13,41
4,5,positive,FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is ag...,Aspocomp's growth strategy focuses on high-dem...,FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is ag...,25,8,25
5,6,positive,"For the last quarter of 2010 , Componenta 's n...",Componenta's Q4 2010 net sales doubled to EUR1...,"For the last quarter of 2010 , Componenta 's n...",39,12,39
6,7,positive,"In the third quarter of 2010 , net sales incre...",Q3 2010 saw 5.2% net sales growth (EUR 205.5mn...,"In the third quarter of 2010 , net sales incre...",29,16,29
7,8,positive,Operating profit rose to EUR 13.1 mn from EUR ...,Operating profit increased by 50% to EUR 13.1 ...,Operating profit rose to EUR 13.1 mn from EUR ...,24,14,24
8,9,positive,"Operating profit totalled EUR 21.1 mn , up fro...",Operating profit rose 9.7% to EUR 21.1 million.,"Operating profit totalled EUR 21.1 mn , up fro...",22,8,22
9,10,positive,TeliaSonera TLSN said the offer is in line wit...,TeliaSonera's offer strengthens Eesti Telekom'...,TeliaSonera TLSN said the offer is in line wit...,30,14,30


In [19]:
# --- Setup: installs ---
!pip -q install google-genai gcsfs pandas spacy pytextrank
!python -m spacy download en_core_web_sm

import pandas as pd
import gcsfs
import os
import spacy, pytextrank
from google.colab import auth

# --- Authenticate Colab with GCP for GCS access ---
auth.authenticate_user()

# --- Load CSV from GCS ---
fs = gcsfs.GCSFileSystem(token="google_default")
csv_path = "gs://ai-home-assignment-datasets/all-data.csv"

with fs.open(csv_path, "rb") as f:
    df = pd.read_csv(f, encoding="latin1")

# Standardize column names (first col = label, second col = text)
df = df.iloc[:, :2].copy()
df.columns = ["category", "original_text"]

# Take first 5 rows
subset = df.head(5).copy()

# --- spaCy + PyTextRank setup ---
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")

def spacy_summary_10w(text: str) -> str:
    """Extractive summary using PyTextRank, compressed to ≤10 words."""
    doc = nlp(text)
    # Take top 1 sentence from textrank
    sents = [str(s) for s in doc._.textrank.summary(limit_sentences=1)]
    if not sents:
        return ""
    # Keep only first 10 words
    return " ".join(sents[0].split()[:10])

# --- Gemini client setup ---
try:
    from google.colab import userdata
    GEMINI_KEY = userdata.get('GOOGLE_API_KEY')
except Exception:
    GEMINI_KEY = os.getenv("GEMINI_API_KEY")

if not GEMINI_KEY:
    raise RuntimeError("Gemini API key not found. Add it in Colab Secrets as 'GOOGLE_API_KEY'.")

from google import genai
genai_client = genai.Client(api_key=GEMINI_KEY)

def gemini_summary_10w(text: str, model_name="gemini-1.5-flash") -> str:
    """Abstractive summary from Gemini, max 10 words."""
    prompt = f"Summarize the following text in 10 words or less:\n\n{text}"
    resp = genai_client.models.generate_content(model=model_name, contents=prompt)
    return resp.text.strip() if hasattr(resp, "text") and resp.text else ""

# --- Build comparison dataframe ---
rows = []
for idx, r in subset.iterrows():
    gem = gemini_summary_10w(r["original_text"])
    spx = spacy_summary_10w(r["original_text"])
    rows.append({
        "row_index": idx,
        "category": r["category"],
        "original_text": r["original_text"],  # full text, no cut
        "gemini_summary": gem,
        "spacy_textrank_summary": spx
    })

comparison_df = pd.DataFrame(rows, columns=[
    "row_index", "category", "original_text", "gemini_summary", "spacy_textrank_summary"
])

# --- Display the table (full text visible) ---
pd.set_option("display.max_colwidth", None)
display(comparison_df)


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m66.3 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Unnamed: 0,row_index,category,original_text,gemini_summary,spacy_textrank_summary
0,0,neutral,"Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said .",Technopolis plans large tech campus development in stages.,Technopolis plans to develop in stages an area of no
1,1,negative,"The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , the daily Postimees reported .",Elcoteq laid off dozens of Tallinn office workers.,The international electronic industry company Elcoteq has laid off tens
2,2,positive,With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .,"New plant boosts capacity, improves efficiency, increases profitability.",With the new production plant the company would increase its
3,3,positive,"According to the company 's updated strategy for the years 2009-2012 , Basware targets a long-term net sales growth in the range of 20 % -40 % with an operating profit margin of 10 % -20 % of net sales .","Basware targets 20-40% net sales growth, 10-20% operating profit margin (2009-2012).",According to the company 's updated strategy for the years
4,4,positive,FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its growth strategy by increasingly focusing on technologically more demanding HDI printed circuit boards PCBs .,Aspocomp's growth fueled by high-demand HDI PCB focus.,FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its


In [21]:
# Assumes you already built `comparison_df` with:
# ['row_index','category','original_text','gemini_summary','spacy_textrank_summary']

import pandas as pd

# If gold_summary is missing, create it interactively
if 'gold_summary' not in comparison_df.columns:
    gold = []
    for _, r in comparison_df.iterrows():
        print("\n--- Row", r["row_index"], "| category:", r["category"], "---")
        print(r["original_text"])
        ref = input("\nType your GOLD summary (<=10 words), then press Enter:\n> ")
        gold.append(ref.strip())
    comparison_df["gold_summary"] = gold

# Now compute ROUGE
!pip -q install rouge-score
from rouge_score import rouge_scorer

scorer = rouge_scorer.RougeScorer(["rouge1","rouge2","rougeLsum"], use_stemmer=True)

def rouge_scores(pred, ref):
    s = scorer.score(ref, pred)
    return {k: s[k].fmeasure for k in s}

rows = []
for _, r in comparison_df.iterrows():
    g = rouge_scores(r["gemini_summary"], r["gold_summary"])
    s = rouge_scores(r["spacy_textrank_summary"], r["gold_summary"])
    rows.append({
        "row_index": r["row_index"],
        "rouge1_gem": g["rouge1"], "rouge1_spa": s["rouge1"],
        "rouge2_gem": g["rouge2"], "rouge2_spa": s["rouge2"],
        "rougeL_gem": g["rougeLsum"], "rougeL_spa": s["rougeLsum"],
    })

rouge_df = pd.DataFrame(rows)
print("\nPer-row ROUGE (F1):")
display(rouge_df)

print("\nAverages:")
display(rouge_df.mean(numeric_only=True).to_frame("mean").T)



--- Row 0 | category: neutral ---
Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said .

Type your GOLD summary (<=10 words), then press Enter:
> develop for companies

--- Row 1 | category: negative ---
The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , the daily Postimees reported .

Type your GOLD summary (<=10 words), then press Enter:
> laid off

--- Row 2 | category: positive ---
With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .

Type your GOLD summary (<=10 words), then press Enter:
> production incresae

--- Row 3 | category: positive ---
Accor

Unnamed: 0,row_index,rouge1_gem,rouge1_spa,rouge2_gem,rouge2_spa,rougeL_gem,rougeL_spa
0,0,0.181818,0.153846,0.0,0.0,0.181818,0.153846
1,1,0.4,0.333333,0.25,0.2,0.4,0.333333
2,2,0.0,0.166667,0.0,0.0,0.0,0.166667
3,3,0.25,0.0,0.142857,0.0,0.25,0.0
4,4,0.142857,0.0,0.0,0.0,0.142857,0.0



Averages:


Unnamed: 0,row_index,rouge1_gem,rouge1_spa,rouge2_gem,rouge2_spa,rougeL_gem,rougeL_spa
mean,2.0,0.194935,0.130769,0.078571,0.04,0.194935,0.130769


In [23]:
# ====== SETUP (Vertex AI only) ======
!pip -q install gcsfs pandas vertexai "google-cloud-aiplatform>=1.68.0" spacy

from google.colab import auth
auth.authenticate_user()  # login popup

import vertexai
from vertexai.generative_models import GenerativeModel
import pandas as pd, gcsfs, re, spacy

# ---- Init Vertex AI (use a region that has access; us-east1 is reliable) ----
PROJECT_ID = "ai-home-assignment"        # <-- your GCP project
LOCATION   = "us-east1"                  # try us-east1; if not, europe-west1 / us-central1

vertexai.init(project=PROJECT_ID, location=LOCATION)

# ---- Load CSV from GCS (first 10 rows) ----
CSV_PATH = "gs://ai-home-assignment-datasets/all-data.csv"
fs = gcsfs.GCSFileSystem(token="google_default")

with fs.open(CSV_PATH, "rb") as f:
    df = pd.read_csv(f, encoding="latin1")

# Normalize cols: assume first col = label, second col = text
assert df.shape[1] >= 2, "Expected at least 2 columns (label + text)"
df = df.iloc[:, :2].copy()
df.columns = ["category", "original_text"]
subset = df.head(10).copy()  # change to 5 if you only want 5

# ---- Generate Gemini summaries (Vertex AI) ----
# Use versioned IDs; 1.5-* -002 are widely available. If 404, try the other model below.
MODEL_ID = "gemini-2.0-flash-001"
# MODEL_ID = "gemini-1.5-pro-002"

model = GenerativeModel(MODEL_ID)

def summarize_10_words(text: str) -> str:
    prompt = (
        "Summarize the following text in 10 words or fewer. "
        "Be faithful to the source and avoid adding facts:\n\n"
        f"{text}"
    )
    resp = model.generate_content(prompt)
    # Robust fetch across SDK versions
    try:
        return resp.text.strip()
    except Exception:
        return resp.candidates[0].content.parts[0].text.strip()

subset["gemini_summary"] = subset["original_text"].map(summarize_10_words)

# ====== QUALITATIVE EVALUATION (proxy metrics) ======
# 1) Word-limit compliance (≤10 words)
def word_limit_ok(s: str, limit=10) -> int:
    return int(len((s or "").split()) <= limit)

# 2) Compression ratio (shorter is more concise)
def compression_ratio(src: str, summ: str) -> float:
    return len(summ or "") / max(1, len(src or ""))

# 3) Faithfulness proxy: support ratio of entities & numbers in source
nlp = spacy.load("en_core_web_sm")

def support_ratio(source: str, summary: str) -> float:
    if not summary:
        return 0.0
    # Extract entities + numbers from the SUMMARY and check if source contains them (case/space normalized)
    doc = nlp(summary)
    ents = [e.text for e in doc.ents]
    nums = re.findall(r"\b\d+(?:\.\d+)?\b", summary)
    probes = [*ents, *nums]
    if not probes:
        return 1.0  # no entities/numbers to check; treat as fully supported
    src_norm = " ".join((source or "").lower().split())
    hits = 0
    for p in probes:
        p_norm = " ".join((p or "").lower().split())
        if p_norm and p_norm in src_norm:
            hits += 1
    return hits / len(probes)

subset["words_ok"] = subset["gemini_summary"].map(word_limit_ok)
subset["compression"] = subset.apply(lambda r: compression_ratio(r["original_text"], r["gemini_summary"]), axis=1)
subset["support_ratio"] = subset.apply(lambda r: support_ratio(r["original_text"], r["gemini_summary"]), axis=1)

# ====== OPTIONAL: Human rubric (enter scores 1..5) ======
# Faithfulness, Coverage, Readability, Non-redundancy, Overall
ADD_HUMAN_RUBRIC = False   # <-- set True if you want to type scores interactively

if ADD_HUMAN_RUBRIC:
    human_scores = []
    for i, r in subset.iterrows():
        print("\n====================================================")
        print(f"Row {i} | Category: {r['category']}")
        print("Original:\n", r["original_text"], "\n")
        print("Gemini Summary:\n", r["gemini_summary"])
        print("----------------------------------------------------")
        faith = float(input("Faithfulness (1-5): "))
        cover = float(input("Coverage (1-5): "))
        readb = float(input("Readability/Clarity (1-5): "))
        nonred= float(input("Non-redundancy (1-5): "))
        overall=float(input("Overall (1-5): "))
        human_scores.append((faith, cover, readb, nonred, overall))
    cols = ["human_faithfulness","human_coverage","human_readability","human_nonredundancy","human_overall"]
    subset[cols] = pd.DataFrame(human_scores, index=subset.index)

# ====== OPTIONAL: ROUGE if you add gold summaries later ======
# If you later create subset["gold_summary"], you can uncomment this to compute ROUGE:
ADD_ROUGE = False  # switch to True if you add a 'gold_summary' column

if ADD_ROUGE:
    assert "gold_summary" in subset.columns, "Add subset['gold_summary'] first."
    !pip -q install rouge-score
    from rouge_score import rouge_scorer
    scorer = rouge_scorer.RougeScorer(["rouge1","rouge2","rougeLsum"], use_stemmer=True)
    def rouge_f1(pred, ref):
        s = scorer.score(ref, pred)
        return s["rouge1"].fmeasure, s["rouge2"].fmeasure, s["rougeLsum"].fmeasure
    rouge_vals = subset.apply(lambda r: pd.Series(rouge_f1(r["gemini_summary"], r["gold_summary"]),
                                                  index=["rouge1_f1","rouge2_f1","rougeL_f1"]), axis=1)
    subset = pd.concat([subset, rouge_vals], axis=1)

# ====== PRESENT RESULTS ======
pd.set_option("display.max_colwidth", None)

print("=== Gemini summaries (first 10 rows) with qualitative proxies ===")
display(subset[[
    "category","original_text","gemini_summary","words_ok","compression","support_ratio"
]])

print("\n=== Aggregated metrics ===")
agg = {
    "word_limit_ok(%)": 100*subset["words_ok"].mean(),
    "avg_compression":  subset["compression"].mean(),
    "avg_support_ratio":subset["support_ratio"].mean(),
}
print(agg)

if ADD_HUMAN_RUBRIC:
    print("\n=== Human rubric (means) ===")
    print(subset[["human_faithfulness","human_coverage","human_readability","human_nonredundancy","human_overall"]].mean())

if ADD_ROUGE:
    print("\n=== ROUGE (means) ===")
    print(subset[["rouge1_f1","rouge2_f1","rougeL_f1"]].mean())


[0m

  return self._prediction_async_client_value


=== Gemini summaries (first 10 rows) with qualitative proxies ===


Unnamed: 0,category,original_text,gemini_summary,words_ok,compression,support_ratio
0,neutral,"Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said .","Technopolis will develop 100,000+ sqm for tech/telecom companies.",1,0.342105,1.0
1,negative,"The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , the daily Postimees reported .","Elcoteq Tallinn facility laid off office workers, Postimees reported.",1,0.302632,0.5
2,positive,With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .,"New plant increases capacity, efficiency, and profitability.",1,0.291262,1.0
3,positive,"According to the company 's updated strategy for the years 2009-2012 , Basware targets a long-term net sales growth in the range of 20 % -40 % with an operating profit margin of 10 % -20 % of net sales .","Basware targets 20-40% sales growth, 10-20% profit margin.",1,0.285714,0.666667
4,positive,FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its growth strategy by increasingly focusing on technologically more demanding HDI printed circuit boards PCBs .,Aspocomp focuses on technologically demanding HDI PCBs to drive growth.,1,0.398876,1.0
5,positive,"For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .","Componenta's sales doubled, profit increased from loss, Q4 2010.",1,0.331606,0.666667
6,positive,"In the third quarter of 2010 , net sales increased by 5.2 % to EUR 205.5 mn , and operating profit by 34.9 % to EUR 23.5 mn .","Q3 2010: Net sales, operating profit increased significantly.",1,0.488,0.5
7,positive,Operating profit rose to EUR 13.1 mn from EUR 8.7 mn in the corresponding period in 2007 representing 7.7 % of net sales .,"Operating profit increased to EUR 13.1 mn, 7.7% of net sales.",0,0.5,0.8
8,positive,"Operating profit totalled EUR 21.1 mn , up from EUR 18.6 mn in 2007 , representing 9.7 % of net sales .","Operating profit increased to EUR 21.1 mn, 9.7% of net sales.",0,0.592233,0.8
9,positive,TeliaSonera TLSN said the offer is in line with its strategy to increase its ownership in core business holdings and would strengthen Eesti Telekom 's offering to its customers .,TeliaSonera's offer will increase ownership and strengthen Eesti Telekom.,1,0.410112,1.0



=== Aggregated metrics ===
{'word_limit_ok(%)': np.float64(80.0), 'avg_compression': np.float64(0.3942541255112141), 'avg_support_ratio': np.float64(0.7933333333333332)}


## Text cells
This is a **text cell**. You can **double-click** to edit this cell. Text cells
use markdown syntax. To learn more, see our [markdown
guide](/notebooks/markdown_guide.ipynb).

You can also add math to text cells using [LaTeX](http://www.latex-project.org/)
to be rendered by [MathJax](https://www.mathjax.org). Just place the statement
within a pair of **\$** signs. For example `$\sqrt{3x-1}+(1+x)^2$` becomes
$\sqrt{3x-1}+(1+x)^2.$


## Adding and moving cells
You can add new cells by using the **+ CODE** and **+ TEXT** buttons that show when you hover between cells. These buttons are also in the toolbar above the notebook where they can be used to add a cell below the currently selected cell.

You can move a cell by selecting it and clicking **Cell Up** or **Cell Down** in the top toolbar.

Consecutive cells can be selected by "lasso selection" by dragging from outside one cell and through the group.  Non-adjacent cells can be selected concurrently by clicking one and then holding down Ctrl while clicking another.  Similarly, using Shift instead of Ctrl will select all intermediate cells.

# Working with python
Colaboratory is built on top of [Jupyter Notebook](https://jupyter.org/). Below are some examples of convenience functions provided.

Long running python processes can be interrupted. Run the following cell and select **Runtime -> Interrupt execution** (*hotkey: Cmd/Ctrl-M I*) to stop execution.

## System aliases

Jupyter includes shortcuts for common operations, such as ls:

That `!ls` probably generated a large output. You can select the cell and clear the output by either:

1. Clicking on the clear output button (x) in the toolbar above the cell; or
2. Right clicking the left gutter of the output area and selecting "Clear output" from the context menu.

Execute any other process using `!` with string interpolation from python variables, and note the result can be assigned to a variable:

## Magics
Colaboratory shares the notion of magics from Jupyter. There are shorthand annotations that change how a cell's text is executed. To learn more, see [Jupyter's magics page](http://nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb).


## Automatic completions and exploring code

Colab provides automatic completions to explore attributes of Python objects, as well as to quickly view documentation strings. As an example, first run the following cell to import the  [`numpy`](http://www.numpy.org) module.

If you now insert your cursor after `np` and press **Period**(`.`), you will see the list of available completions within the `np` module. Completions can be opened again by using **Ctrl+Space**.

If you type an open parenthesis after any function or class in the module, you will see a pop-up of its documentation string:

The documentation can be opened again using **Ctrl+Shift+Space** or you can view the documentation for method by mouse hovering over the method name.

When hovering over the method name the `Open in tab` link will open the documentation in a persistent pane. The `View source` link will navigate to the source code for the method.

## Exception Formatting

Exceptions are formatted nicely in Colab outputs:

## Rich, interactive outputs
Until now all of the generated outputs have been text, but they can be more interesting, like the chart below.

# Integration with Drive

Colaboratory is integrated with Google Drive. It allows you to share, comment, and collaborate on the same document with multiple people:

* The **SHARE** button (top-right of the toolbar) allows you to share the notebook and control permissions set on it.

* **File->Make a Copy** creates a copy of the notebook in Drive.

* **File->Save** saves the File to Drive. **File->Save and checkpoint** pins the version so it doesn't get deleted from the revision history.

* **File->Revision history** shows the notebook's revision history.

## Commenting on a cell
You can comment on a Colaboratory notebook like you would on a Google Document. Comments are attached to cells, and are displayed next to the cell they refer to. If you have **comment-only** permissions, you will see a comment button on the top right of the cell when you hover over it.

If you have edit or comment permissions you can comment on a cell in one of three ways:

1. Select a cell and click the comment button in the toolbar above the top-right corner of the cell.
1. Right click a text cell and select **Add a comment** from the context menu.
3. Use the shortcut **Ctrl+Shift+M** to add a comment to the currently selected cell.

You can resolve and reply to comments, and you can target comments to specific collaborators by typing *+[email address]* (e.g., `+user@domain.com`). Addressed collaborators will be emailed.

The Comment button in the top-right corner of the page shows all comments attached to the notebook.