<a href="https://colab.research.google.com/github/parshvak26/GENAI/blob/main/Complete_GenAI_Base2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# !pip install -U google-genai>=1.37.0


In [None]:
from google import genai
from google.genai import types

from IPython.display import HTML, Markdown, display

In [None]:
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)


In [None]:
import os
os.environ["GOOGLE_API_KEY"] = "YourAPIKEY"


In [None]:
from google import genai

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain AI to me like I'm a kid."
)

print(response.text)


In [None]:
Markdown(response.text)


In [None]:
chat = client.chats.create(model='gemini-2.0-flash', history=[])
response = chat.send_message('Hello! My name is Parshva. I am just starting with GENAI. Any tips?')


In [None]:
Markdown(response.text)

In [None]:
for model in client.models.list():
  print(model.name)

In [None]:
from pprint import pprint

for model in client.models.list():
  if model.name == 'models/gemini-2.5-flash':
    pprint(model.to_json_dict())
    break

### **Temperature**
Temperature controls the degree of randomness in token selection. Higher temperatures result in a higher number of candidate tokens from which the next output token is selected, and can produce more diverse results, while lower temperatures have the opposite effect, such that a temperature of 0 results in greedy decoding, selecting the most probable token at each step.

Temperature doesn't provide any guarantees of randomness, but it can be used to "nudge" the output somewhat.

In [None]:
high_temp_config = types.GenerateContentConfig(temperature=2.0)
# high_temp_config = types.GenerateContentConfig(temperature=0.0) #This is Low temperature. Try uncommenting above one and see change in output





# for _ in range(5):
#   response = client.models.generate_content(
#       model='gemini-2.5-flash',
#       config=high_temp_config,
#       contents='Pick a random colour... (respond in a single word)')

#   if response.text:
#     print(response.text, '-' * 25)

### **Top-P**
Like temperature, the top-P parameter is also used to control the diversity of the model's output.

Top-P defines the probability threshold that, once cumulatively exceeded, tokens stop being selected as candidates. A top-P of 0 is typically equivalent to greedy decoding, and a top-P of 1 typically selects every token in the model's vocabulary.

You may also see top-K referenced in LLM literature. Top-K is not configurable in the Gemini 2.0 series of models, but can be changed in older models. Top-K is a positive integer that defines the number of most probable tokens from which to select the output token. A top-K of 1 selects a single token, performing greedy decoding.

Run this example a number of times, change the settings and observe the change in output.

In [None]:
# model_config = types.GenerateContentConfig(
#     # These are the default values for gemini-2.0-flash.
#     temperature=1.0,
#     top_p=0.95,
# )

# story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
# response = client.models.generate_content(
#     model='gemini-2.5-flash',
#     config=model_config,
#     contents=story_prompt)

# print(response.text)

## **Prompting**

### **Zero Shot**

Zero-shot prompts are prompts that describe the request for the model directly.

**EXAMPLE**- Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment:

## **Enum mode**
The models are trained to generate text, and while the Gemini 2.0 models are great at following instructions, other models can sometimes produce more text than you may wish for. In the preceding example, the model will output the label, but sometimes it can include a preceding "Sentiment" label, and without an output token limit, it may also add explanatory text afterwards.

**ENUM**, short for Enumeration, is basically a fancy way of saying “a list of named options you can choose from.” It’s not some secret AI spell — it’s just a data structure used in programming (and yes, also in GenAI frameworks) to define a fixed set of values that something can take.

Think of it like giving names to a few specific choices instead of letting someone type random junk. It’s like saying:

“You can pick from {Cat, Dog, Hamster}, but not ‘DragonWithWiFi’.”

Imagine you’re designing a GenAI pipeline where the model can only act in certain modes:

    class AgentAction(Enum):

      SUMMARIZE = "summarize"

      TRANSLATE = "translate"

      CODE = "code"


So, if the model gets “SUMMARIZE”, it knows exactly what operation to perform. Keeps things neat, predictable, and stops the AI from hallucinating another mode called “make memes”.

In [None]:
# import enum

# class Sentiment(enum.Enum):
#     POSITIVE = "positive"
#     NEUTRAL = "neutral"
#     NEGATIVE = "negative"


# response = client.models.generate_content(
#     model='gemini-2.5-flash',
#     config=types.GenerateContentConfig(
#         response_mime_type="text/x.enum",
#         response_schema=Sentiment
#     ),
#     contents=zero_shot_prompt)

# print(response.text)

### OUTPUT would be "positive"


# enum_response = response.parsed
# print(enum_response)
# print(type(enum_response))

**ENUM** = a controlled vocabulary for your GenAI system.
It ensures structure and sanity in a world full of chaotic data and even more chaotic humans.

As Seneca said, “Order is what keeps the universe from sliding into chaos.”

### **One-shot and few-shot**
Providing an example of the expected response is known as a "one-shot" prompt. When you provide multiple examples, it is a "few-shot" prompt.

In [None]:
few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "pepperoni"]
}
```

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}
```

ORDER:
"""

# customer_order = "Give me a large with cheese & pineapple"

# response = client.models.generate_content(
#     model='gemini-2.5-flash',
#     config=types.GenerateContentConfig(
#         temperature=0.1,
#         top_p=1,
#         max_output_tokens=250,
#     ),
#     contents=[few_shot_prompt, customer_order])

# print(response.text)

### **Chain of Thought (CoT)**
Direct prompting on LLMs can return answers quickly and (in terms of output token usage) efficiently, but they can be prone to hallucination. The answer may "look" correct (in terms of language and syntax) but is incorrect in terms of factuality and reasoning.

Chain-of-Thought prompting is a technique where you instruct the model to output intermediate reasoning steps, and it typically gets better results, especially when combined with few-shot examples. It is worth noting that this technique doesn't completely eliminate hallucinations, and that it tends to cost more to run, due to the increased token count.

Models like the Gemini family are trained to be "chatty" or "thoughtful" and will provide reasoning steps without prompting, so for this simple example you can ask the model to be more direct in the prompt to force a non-reasoning response. Try re-running this step if the model gets lucky and gets the answer correct on the first try.

In [None]:
# prompt = """When I was 4 years old, my partner was 3 times my age. Now, I
# am 20 years old. How old is my partner? Return the answer directly."""

# response = client.models.generate_content(
#     model='gemini-2.0-flash',
#     contents=prompt)

# print(response.text)

# Output -
# 52

In [None]:
# prompt = """When I was 4 years old, my partner was 3 times my age. Now,
# I am 20 years old. How old is my partner? Let's think step by step."""

# response = client.models.generate_content(
#     model='gemini-2.0-flash',
#     contents=prompt)

# Markdown(response.text)

# Output-
# Here's how to solve this:

# Find the age difference: When you were 4, your partner was 3 times your age, meaning they were 4 * 3 = 12 years old.

# Calculate the age difference: The age difference between you and your partner is 12 - 4 = 8 years.

# Determine partner's current age: Since the age difference remains constant, your partner is currently 20 + 8 = 28 years old.

# Answer: Your partner is currently 28 years old.

### **ReAct: Reason and act**
In this example you will run a ReAct prompt directly in the Gemini API and perform the searching steps yourself. As this prompt follows a well-defined structure, there are frameworks available that wrap the prompt into easier-to-use APIs that make tool calls automatically, such as the LangChain example from the "Prompting" whitepaper.

Instead of an AI model just thinking (reasoning) or just doing (acting), ReAct makes it do both — in turns.

The model reasons about the problem step-by-step, takes an action (like calling a tool, searching, or retrieving info), observes the result, and then continues reasoning.
Think of it like a detective alternating between thinking out loud and doing stuff until the mystery’s solved.

#### *Step-by-step*

Reasoning → the model thinks: “Hmm, what do I need to solve this?”

Acting → it takes an action: “Let me look that up in the database.”

Observation → it gets the result: “Okay, here’s what I found.”

Repeat → it continues reasoning with the new info until it reaches the answer.

So instead of blindly guessing, the model becomes something like a thoughtful agent that mixes logic with action — like a data scientist who actually tests their hypothesis instead of just tweeting it.

  Question - “What’s the current temperature in New York?”

    The AI’s internal process might look like this:

      Thought: I need real-time weather data.

      Action: call_weather_api("New York")

      Observation: The API returns 12°C.

      Thought: The temperature in New York is 12°C.

      Final Answer: It’s 12°C in New York.

In [None]:
# model_instructions = """
# Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
# Observation is understanding relevant information from an Action's output and Action can be one of three types:
#  (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
#      will return some similar entities to search and you can try to search the information from those topics.
#  (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
#      so keep your searches short.
#  (3) <finish>answer</finish>, which returns the answer and finishes the task.
# """

# example1 = """Question
# Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

# Thought 1
# The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

# Action 1
# <search>Milhouse</search>
# Observation 1
# Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

# Thought 2
# The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

# Action 2
# <lookup>named after</lookup>

# Observation 2
# Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

# Thought 3
# Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

# Action 3
# <finish>Richard Nixon</finish>
# """

# example2 = """Question
# What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
# Thought 1
# I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

# Action 1
# <search>Colorado orogeny</search>

# Observation 1
# The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

# Thought 2
# It does not mention the eastern sector. So I need to look up eastern sector.

# Action 2
# <lookup>eastern sector</lookup>

# Observation 2
# The eastern sector extends into the High Plains and is called the Central Plains orogeny.

# Thought 3
# The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

# Action 3
# <search>High Plains</search>

# Observation 3
# High Plains refers to one of two distinct land regions

# Thought 4
# I need to instead search High Plains (United States).

# Action 4
# <search>High Plains (United States)</search>

# Observation 4
# The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

# Thought 5
# High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

# Action 5
# <finish>1,800 to 7,000 ft</finish>
# """
# # Take a look through https://github.com/ysymyth/ReAct/


In [None]:
# question = """Question
# Who was the youngest author listed on the transformers NLP paper?
# """

# # You will perform the Action; so generate up to, but not including, the Observation.
# react_config = types.GenerateContentConfig(
#     stop_sequences=["\nObservation"],
#     system_instruction=model_instructions + example1 + example2,
# )

# # Create a chat that has the model instructions and examples pre-seeded.
# react_chat = client.chats.create(
#     model='gemini-2.0-flash',
#     config=react_config,
# )

# resp = react_chat.send_message(question)
# print(resp.text)

### **Thinking mode**
The experiemental Gemini Flash 2.0 "Thinking" model has been trained to generate the "thinking process" the model goes through as part of its response. As a result, the Flash Thinking model is capable of stronger reasoning capabilities in its responses.

Using a "thinking mode" model can provide you with high-quality responses without needing specialised prompting like the previous approaches. One reason this technique is effective is that you induce the model to generate relevant information ("brainstorming", or "thoughts") that is then used as part of the context in which the final response is generated.

In [None]:
# import io
# from IPython.display import Markdown, clear_output


# response = client.models.generate_content_stream(
#     model='gemini-2.0-flash-thinking-exp',
#     contents='Who was the youngest author listed on the transformers NLP paper?',
# )

# buf = io.StringIO()
# for chunk in response:
#     buf.write(chunk.text)
#     # Display the response as it is streamed
#     print(chunk.text, end='')

# # And then render the finished response as formatted markdown.
# clear_output()
# Markdown(buf.getvalue())

# Document Q&A with RAG using Chroma


Two big limitations of LLMs are 1) that they only "know" the information that they were trained on, and 2) that they have limited input context windows. A way to address both of these limitations is to use a technique called Retrieval Augmented Generation, or RAG. A RAG system has three stages:

    Indexing
    Retrieval
    Generation

1. Indexing → Break documents into chunks, turn them into embeddings (numerical meaning), and store them in a vector database.

2. Retrieval → When asked a question, find the most relevant chunks from that database using semantic similarity.

3. Generation → Feed those chunks + the question to the LLM so it can craft a grounded, natural-language answer.




  - Documents → [Indexing] → Vector DB

  - User Query → [Retrieval] → Top Relevant Chunks

  - Query + Chunks → [Generation] → Final Answer

RAG, short for Retrieval-Augmented Generation, is just a fancy combo move that helps AI answer questions based on real documents

Normal LLMs are good at language but terrible at memory.
You ask about some obscure company policy PDF — I can’t know it, because it’s not in my training data.

So, RAG fixes that by giving the model access to your specific documents.

It works like this:

1. Store your documents somewhere searchable (usually as chunks of text in a “vector database” — basically a big mathy filing cabinet).

2. When you ask a question, the system:

    - Searches for the most relevant parts of those docs.

    - Pulls out the top matches.

3. Then it feeds both your question and those retrieved snippets into the language model.

4. The model uses that context to generate a smart, grounded answer — not a wild guess.





In [None]:
for m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

### Sample Data

In [None]:
DOCUMENT1 = "Operating the Climate Control System  Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console.  Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it."
DOCUMENT2 = 'Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.'
DOCUMENT3 = "Shifting Gears Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions."

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

In [None]:
# !pip install chromadb


In [None]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

from google.genai import types


# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})


class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

In [None]:
import chromadb

DB_NAME = "googlecardb"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documents, ids=[str(i) for i in range(len(documents))])

In [None]:
db.count()


In [None]:
# Switch to query mode when generating embeddings.
embed_fn.document_mode = False

# Search the Chroma DB using the specified query.
query = "How do you use the touchscreen to play music?"

result = db.query(query_texts=[query], n_results=1)
[all_passages] = result["documents"]

Markdown(all_passages[0])

In [None]:
all_passages

**Augmented generation: Answer the question**

Now that we’ve found a relevant passage from our document set during the retrieval step, we can move on to assembling a generation prompt and use the Gemini API to produce the final answer.

In this example, we only retrieved a single passage. But in real-world scenarios — especially when dealing with a large collection of data — we’d typically retrieve multiple passages. That way, the Gemini model can decide which pieces of text are actually useful for answering the question.

It’s perfectly fine if a few of those retrieved passages aren’t directly relevant; the model’s generation process is designed to filter out the noise and focus on what truly matters.

In [None]:
query_oneline = query.replace("\n", " ")

# This prompt is where you can specify any guidance on tone, or what topics the model should stick to, or avoid.
prompt = f"""You are a helpful and informative bot that answers questions using text from the reference passage included below.
Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.
However, you are talking to a non-technical audience, so be sure to break down complicated concepts and
strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.

QUESTION: {query_oneline}
"""

# Add the retrieved documents to the prompt.
for passage in all_passages:
    passage_oneline = passage.replace("\n", " ")
    prompt += f"PASSAGE: {passage_oneline}\n"

print(prompt)

In [None]:
answer = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt)

Markdown(answer.text)

## Fine Tuning Model

Fine-tuning means taking a pretrained model (like Gemini, GPT, or Llama) and teaching it new tricks — not from scratch, but by giving it extra, specific examples so it learns your tone, domain, or task.

Think of it like this:

 - The base model = a smart intern who knows everything in general.

 - Fine-tuning = teaching that intern how your company does things.

You’re not retraining their whole brain — you’re just nudging their habits.

1. **Start with a pretrained model**

    You don’t build from zero — you take an existing model like:

    text-bison, gemini, gpt-3.5-turbo, etc.
    These models already know grammar, logic, reasoning — basically the “universal stuff.”

2. **Prepare your custom data**

    You create a dataset that shows the model how you want it to behave.
    Usually looks like this:

        {"input": "What is your refund policy?", "output": "Refunds are processed in 14 days."}
  
    OR
  
        {"messages": [
        {"role": "user", "content": "Summarize this report."},
        {"role": "assistant", "content": "Here’s a short summary..."}
        ]}
    
    You collect hundreds or thousands of such examples. The more clean and consistent, the better.

3. **Train the model**

    You feed that dataset into the fine-tuning API (e.g., Google’s Vertex AI, OpenAI’s fine-tuning endpoint, etc.).

    Under the hood:

    - The base weights are slightly adjusted to prefer your examples.

    - The model’s general knowledge stays intact.

    This step takes anywhere from minutes to hours depending on your data and compute.

4. **Deploy your fine-tuned model**

    You’ll get a new model ID, something like:

        projects/your-id/locations/us-central1/models/fine-tuned-customer-support

    Now you use that instead of the base model in your API calls.













    



**Example Use Cases**

 - Customer support: Model learns your company’s tone, product facts, policies.

 - Code generation: Teach it your team’s code style, naming conventions.

 - Legal / medical text: Make it speak your industry’s language.

 - Creative writing: Train it to write like you, not a Wikipedia article.

**What Fine-Tuning isn’t**

 - It’s not for teaching brand-new world knowledge.
(You can’t fine-tune GPT on tomorrow’s news — that’s RAG’s job.)

 - It’s not for fixing hallucinations.
It’s for improving style and consistency.

 - And it’s definitely not cheap — those GPUs are thirsty.

In [None]:
from sklearn.datasets import fetch_20newsgroups

newsgroups_train = fetch_20newsgroups(subset="train")
newsgroups_test = fetch_20newsgroups(subset="test")

# View list of class names for dataset
newsgroups_train.target_names

In [None]:
print(newsgroups_train.data[0])


In [None]:
# newsgroups_train

{'data': ["From: lerxst@wam.umd.edu (where's my thing)\nSubject: WHAT car is this!?\nNntp-Posting-Host: rac3.wam.umd.edu\nOrganization: University of Maryland, College Park\nLines: 15\n\n I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n   ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n",

In [None]:
import re
import pandas as pd

def preprocess_newsgroup_row(data):
    # Split headers from body manually
    parts = data.split("\n\n", 1)
    headers = parts[0] if len(parts) > 0 else ""
    body = parts[1] if len(parts) > 1 else ""

    # Extract the Subject field manually
    subject = ""
    for line in headers.split("\n"):
        if line.lower().startswith("subject:"):
            subject = line[len("subject:"):].strip()
            break

    # Build text
    text = f"{subject}\n\n{body}"

    # Strip email addresses
    text = re.sub(r"[\w\.-]+@[\w\.-]+", "", text)

    # Truncate
    return text[:40000]


def preprocess_newsgroup_data(newsgroup_dataset):
    df = pd.DataFrame({
        "Text": newsgroup_dataset.data,
        "Label": newsgroup_dataset.target
    })

    df["Text"] = df["Text"].apply(preprocess_newsgroup_row)
    df["Class Name"] = df["Label"].map(lambda l: newsgroup_dataset.target_names[l])

    return df


In [None]:
df_train = preprocess_newsgroup_data(newsgroups_train)
df_test = preprocess_newsgroup_data(newsgroups_test)

df_train.head()

In [None]:
def sample_data(df, num_samples, classes_to_keep):
    # Sample rows, selecting num_samples of each Label.
    df = (
        df.groupby("Label")[df.columns]
        .apply(lambda x: x.sample(num_samples))
        .reset_index(drop=True)
    )

    df = df[df["Class Name"].str.contains(classes_to_keep)]
    df["Class Name"] = df["Class Name"].astype("category")

    return df


TRAIN_NUM_SAMPLES = 50
TEST_NUM_SAMPLES = 10
# Keep rec.* and sci.*
CLASSES_TO_KEEP = "^rec|^sci"

df_train = sample_data(df_train, TRAIN_NUM_SAMPLES, CLASSES_TO_KEEP)
df_test = sample_data(df_test, TEST_NUM_SAMPLES, CLASSES_TO_KEEP)


In [None]:
sample_idx = 0
sample_row = preprocess_newsgroup_row(newsgroups_test.data[sample_idx])
sample_label = newsgroups_test.target_names[newsgroups_test.target[sample_idx]]

print(sample_row)
print('---')
print('Label:', sample_label)

In [None]:
response = client.models.generate_content(
    model="gemini-2.5-flash", contents=sample_row)


In [None]:
Markdown(response.text)

In [None]:
prompt = "From what newsgroup does the following message originate?"
baseline_response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[prompt, sample_row])
print(baseline_response.text)

In [None]:
from google.api_core import retry

# You can use a system instruction to do more direct prompting, and get a
# more succinct answer.

system_instruct = """
You are a classification service. You will be passed input that represents
a newsgroup post and you must respond with the newsgroup from which the post
originates.
"""

# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

# If you want to evaluate your own technique, replace this body of this function
# with your model, prompt and other code and return the predicted answer.
@retry.Retry(predicate=is_retriable)
def predict_label(post: str) -> str:
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        config=types.GenerateContentConfig(
            system_instruction=system_instruct),
        contents=post)

    rc = response.candidates[0]

    # Any errors, filters, recitation, etc we can mark as a general error
    if rc.finish_reason.name != "STOP":
        return "(error)"
    else:
        # Clean up the response.
        return response.text.strip()


prediction = predict_label(sample_row)

print(prediction)
print()
print("Correct!" if prediction == sample_label else "Incorrect.")

In [None]:
import tqdm
from tqdm.rich import tqdm as tqdmr
import warnings

# Enable tqdm features on Pandas.
tqdmr.pandas()

# But suppress the experimental warning
warnings.filterwarnings("ignore", category=tqdm.TqdmExperimentalWarning)


# Further sample the test data to be mindful of the free-tier quota.
df_baseline_eval = sample_data(df_test, 2, '.*')

# Make predictions using the sampled data.
df_baseline_eval['Prediction'] = df_baseline_eval['Text'].progress_apply(predict_label)

# And calculate the accuracy.
accuracy = (df_baseline_eval["Class Name"] == df_baseline_eval["Prediction"]).sum() / len(df_baseline_eval)
print(f"Accuracy: {accuracy:.2%}")

In [None]:
df_baseline_eval


In [None]:
from collections.abc import Iterable
import random


input_data = {'examples':
    df_train[['Text', 'Class Name']]
      .rename(columns={'Text': 'textInput', 'Class Name': 'output'})
      .to_dict(orient='records')
 }

model_id = None

if not model_id:
  queued_model = None
  for m in reversed(client.tunings.list()):
    if m.name.startswith('tunedModels/newsgroup-classification-model'):
      if m.state.name == 'JOB_STATE_SUCCEEDED':
        model_id = m.name
        print('Found existing tuned model to reuse.')
        break
      elif m.state.name == 'JOB_STATE_RUNNING' and not queued_model:
        # If there's a model still queued, remember the most recent one.
        queued_model = m.name
  else:
    if queued_model:
      model_id = queued_model
      print('Found queued model, still waiting.')


if not model_id:
    tuning_op = client.tunings.tune(
        base_model="models/gemini-2.5-flash-001",
        training_dataset=input_data,
        config=types.CreateTuningJobConfig(
            tuned_model_display_name="Newsgroup classification model",
            batch_size=16,
            epoch_count=2,
        ),
    )

    print(tuning_op.state)
    model_id = tuning_op.name

print(model_id)

In [None]:
# Colab-ready: convert df to JSONL and optionally upload to GCS
import json
import os

# Step 1 — create JSONL locally
records = (
    df_train[['Text', 'Class Name']]
    .rename(columns={'Text': 'textInput', 'Class Name': 'output'})
    .to_dict(orient='records')
)

jsonl_path = "train.jsonl"
with open(jsonl_path, "w", encoding="utf-8") as f:
    for rec in records:
        f.write(json.dumps(rec, ensure_ascii=False) + "\n")

print(f"Wrote {len(records)} records to {jsonl_path}")

# ---------- Option A: If your client accepts a local file path ----------
# Some client libraries let you pass a filepath or a file-like object.
# Try this if the API docs say "upload file" or accept a path.
try:
    if not model_id:
        tuning_op = client.tunings.tune(
            base_model="models/gemini-2.5-flash",
            training_dataset=jsonl_path,  # <-- local path (only if supported)
            config=types.CreateTuningJobConfig(
                tuned_model_display_name="Newsgroup classification model",
                batch_size=16,
                epoch_count=2,
            ),
        )
        print("Tuning job submitted (using local path).")
except Exception as e:
    print("Local-path submission failed (likely unsupported). Error:", e)

# ---------- Option B: Upload JSONL to Google Cloud Storage (recommended) ----------
# Many tuning endpoints expect a cloud URI (gs://...). This uses google-cloud-storage.
# You need to run `pip install --upgrade google-cloud-storage` in Colab if missing,
# and have authenticated Colab (gcloud auth login or use Colab's auth).
#
# Replace YOUR_BUCKET with your GCS bucket name.

try:
    from google.colab import auth
    auth.authenticate_user()
    from google.cloud import storage

    storage_client = storage.Client()
    bucket_name = "YOUR_BUCKET"   # <<-- change this
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(os.path.basename(jsonl_path))
    blob.upload_from_filename(jsonl_path)
    gcs_uri = f"gs://{bucket_name}/{blob.name}"
    print("Uploaded to:", gcs_uri)

    # Now call the tuning API with the GCS URI.
    # Many APIs accept a dict like {"gcs_uri": "..."} or training_dataset=gcs_uri.
    # Check your API docs; below are two common patterns — try the one your client expects.

    # Pattern 1: pass the GCS URI as a string
    try:
        if not model_id:
            tuning_op = client.tunings.tune(
                base_model="models/gemini-2.5-flash",
                training_dataset=gcs_uri,  # some clients accept a gs:// path
                config=types.CreateTuningJobConfig(
                    tuned_model_display_name="Newsgroup classification model",
                    batch_size=16,
                    epoch_count=2,
                ),
            )
            print("Tuning job submitted (using GCS URI).")
    except Exception as e:
        print("Submission with plain GCS URI failed:", e)

    # Pattern 2: pass an object referencing the file (API-specific)
    try:
        if not model_id:
            # Example shape: adapt to your client. Many APIs want {"file_uri": "..."} or {"gcs_uri": "..."}
            training_dataset_ref = {"gcs_uri": gcs_uri}
            tuning_op = client.tunings.tune(
                base_model="models/gemini-2.5-flash",
                training_dataset=training_dataset_ref,
                config=types.CreateTuningJobConfig(
                    tuned_model_display_name="Newsgroup classification model",
                    batch_size=16,
                    epoch_count=2,
                ),
            )
            print("Tuning job submitted (using training_dataset object).")
    except Exception as e:
        print("Submission with training_dataset object failed:", e)

except Exception as e:
    print("GCS upload block failed. If you don't want to use GCS, skip Option B. Error:", e)


In [None]:
models = client.models.list()
for model in models:
  print(model.name)