# Setup Gemini API

In [None]:
from google.colab import userdata
from google import genai

client = genai.Client(api_key=userdata.get('GOOGLE_API_KEY')) # store your API key in the secrets or replace here

response = client.models.generate_content(
    model="gemini-2.0-flash", # gemini-2.5-flash-preview-05-20
    contents="Explain how AI works in a few words"
)

print(response.text)


check the documentation here: https://pypi.org/project/google-genai/

Quick in-class exercise to setup Gemini API Key?

# Load some data

In [3]:
import pandas as pd

lead_paragraphs = pd.read_csv("http://farys.org/daten/nyt_chatgpt.csv")
full_articles = pd.read_csv("http://farys.org/daten/nyt_chatgpt_fulltexts.csv")


In [None]:
lead_paragraphs.head()

**some minor cleaning ...**

In [5]:
import re

filters = [
    "Every Tuesday and Friday",
    "Want to get this newsletter in your inbox",
    "To the Editor:"
]

pattern = "|".join(filters)

lead_paragraphs = lead_paragraphs[
    (lead_paragraphs["document_type"] == "article") &
    (~lead_paragraphs["abstract"].str.contains(pattern, na=False))
]


**Narrow down to 50**

In [6]:
lead_paragraphs = lead_paragraphs.head(50)

**Prepare one-shot example prompt**

In [None]:
example_text = full_articles["fulltext"].iloc[1]

prompt = f"""Extract the 5 main topics of the following text. Then, categorize the text according to how strongly it is described by each topic. The output should look like this:

NameOfTopic1: 50%
NameOfTopic2: 20%
NameOfTopic3: 15%
NameOfTopic4: 15%
NameOfTopic5: 0%

Here is the text:
{example_text}
"""
print(prompt)


**Adjust the temperature to get consistent results.**

Illustration of the temperature concept: https://poloclub.github.io/transformer-explainer/

In [8]:
from google.genai import types # to control temperature

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-05-20",
    contents=prompt,
    config=types.GenerateContentConfig(
        temperature=0.0 # Adjust this value between 0.0 and 1.0; for replicable results 0 is best
    )
)

In [None]:
print(response.text)

Maybe its better to derive a handful of topics from more than 1 text?
- could give the complete corpus to generate topics
- could ask for "topics around chatgpt/LLMs"

**Batch processing** (50 texts at once)

In [25]:
mytexts_with_delimiters = "\n".join(
    f"## Paragraph {i+1}\n[{text}]\n" for i, text in enumerate(lead_paragraphs["abstract"])
)

batch_prompt = f"""
Below are lead paragraphs from New York Times articles. Each paragraph is marked with '## Paragraph [Number]'. Here are the paragraphs:

{mytexts_with_delimiters}

Based on these lead paragraphs, perform the following 3 tasks for each paragraph:

1) Categorize the paragraph by how strongly it is described by these 5 topics (total must be 100%):
1. Applications and Use Cases
2. Technological Advancements
3. Social and Ethical Implications
4. Business and Economic Impact
5. Public Perception and Cultural Influence

2) Is it about 'ChatGPT'? ('yes' or 'no')

3) Is the paragraph in favor of ChatGPT technology? (-1 = against, 0 = neutral, 1 = in favor)

Output the response as a JSON list:
{{
    "paragraph_id": ##,
    "topic_applications": xx,
    "topic_advancements": xx,
    "topic_social": xx,
    "topic_business": xx,
    "topic_cultural": xx,
    "topic_other": xx,
    "about_chatgpt": "yes/no",
    "in_favor": value_between_-1_and_1
}}

'xx' are percentages and must total 100%.
"""


In [None]:
print(batch_prompt)

**Send to LLM**

In [26]:
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-05-20",
    contents=batch_prompt,
    config=types.GenerateContentConfig(
        temperature=0.0,
        response_mime_type='application/json' # nice option to directly ask for clean json output
    )
)


In [None]:
print(response.text[:570])

**Extract JSON and convert to dataframe**

In [27]:
import json

df = pd.DataFrame(json.loads(response.text))
df["originaltext"] = lead_paragraphs["abstract"].values

from google.colab import data_table
data_table.DataTable(df)

# index 11 is interesting!

Unnamed: 0,paragraph_id,topic_applications,topic_advancements,topic_social,topic_business,topic_cultural,about_chatgpt,in_favor,originaltext
0,1,20,0,40,40,0,no,0,Artificial intelligence is confronting white-c...
1,2,10,60,20,10,0,yes,0,The company unveiled new technology called GPT...
2,3,70,0,30,0,0,no,0,Journalists may have a new semi-reliable source.
3,4,10,0,70,0,20,yes,-1,With the rise of the popular new chatbot ChatG...
4,5,0,0,80,20,0,no,-1,The action by Italy’s data protection agency i...
5,6,50,20,0,30,0,no,0,Powerful new artificial-intelligence software ...
6,7,70,30,0,0,0,no,1,Large language models are already good at a wi...
7,8,0,0,0,40,60,yes,1,The San Francisco artificial intelligence lab ...
8,9,0,20,30,50,0,no,-1,The promised “live” demonstration of the bot h...
9,10,0,0,0,30,70,yes,1,"Even inside the company, the chatbot’s popular..."


potential problems:

- are results consistent?
- does the LLM have enough information to classify?
- whatr about more than 50 articles? (context window size matters)

**For the fulltexts:**

In [15]:
mytexts_with_delimiters = "\n".join(
    f"## Article {i+1}\n[{text}]\n" for i, text in enumerate(full_articles["fulltext"])
)

batch_prompt = f"""
Below are articles from the New York Times. Each article is marked with '## Article [Number]'. Here are the articles:

{mytexts_with_delimiters}

Based on these articles, perform the following 3 tasks for each article:

1) Categorize the article by how strongly it is described by these 5 topics (total must be 100%):
1. Applications and Use Cases
2. Technological Advancements
3. Social and Ethical Implications
4. Business and Economic Impact
5. Public Perception and Cultural Influence

2) Is it about 'ChatGPT'? ('yes' or 'no')

3) Is the article in favor of ChatGPT technology? (-1 = against, 0 = neutral, 1 = in favor)

Output the response as a JSON list:
{{
    "article_id": ##,
    "topic_applications": xx,
    "topic_advancements": xx,
    "topic_social": xx,
    "topic_business": xx,
    "topic_cultural": xx,
    "topic_other": xx,
    "about_chatgpt": "yes/no",
    "in_favor": value_between_-1_and_1
}}

'xx' are percentages and must total 100%.
"""


In [None]:
print(batch_prompt)

In [19]:
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-05-20",
    contents=batch_prompt,
    config=types.GenerateContentConfig(
        temperature=0.0,
        response_mime_type='application/json' # explain more about JSON here?
    )
)


In [None]:
print(response.text[:500])

**Extract JSON and convert to dataframe**

In [None]:
import json

df = pd.DataFrame(json.loads(response.text))

def truncate_text(s, length=50):
    return s if len(s) <= length else s[:length] + "..."

df["shorttext"] = full_articles["fulltext"].astype(str).apply(lambda x: truncate_text(x, 50))
#df["originaltext"] = full_articles["fulltext"].values

from google.colab import data_table
data_table.DataTable(df)

In [None]:
import textwrap
print(textwrap.fill(full_articles["fulltext"][4], width=80)) # negative article -> prediction seems pretty good

In [None]:
print(textwrap.fill(full_articles["fulltext"][9], width=80)) # business topic -> also pretty good

**Summary**

- pros of using LLMs:
  + can prompt/ask whatever is of interest
  + not a lot of cleaning required
- cons:
  + hallucinations (e.g. sentiments for rubbish texts)
  + results might vary between requests (temperature parameter!)
  + costs
  + bias

**Where to go from here?**

- improve prompt
  + better parsing
  + more accurate instructions and delimiters
  + examples (few shot prompting)
  + time to think
  + ask for reasoning (why is it "not about ChatGPT", why the sentiment?)
  + calculate costs beforehand
- reflect about potential biases
  + in training data
  + in algorithms / finetuning / HFRL
  + in the prompt itself
- potential problems:
  + model being trusted too much
  + hallucinations
  + exposing sensitive data
- how to structure your code for larger requests
  + junking of batches (pros/cons, e.g. calibration)
  + repeating requests for stability of results?
  + comparing models
  + sampling of results (better than human encoding or not?)

## Appendix

**Example of highly structued prompt using XML**

In [None]:
# Format reviews as XML
review_texts = [...]  # your list of reviews
mycontext = "\n".join(f"<review>{review}</review>" for review in review_texts)

# Create the XML prompt
xmlprompt = f"""
<purpose>
Analyse the sentiment of Amazon reviews for a Tamagochi.
</purpose>

<instruction>
<instruction>Below are Amazon reviews for a Tamagochi. Provide a summary of around 10 words specifying whether the sentiment of each paragraph is Negative, Neutral, or Positive, and briefly explaining the reason why.</instruction>
<instruction>Provide a sentiment score between -1 and 1 for each paragraph. Negative values indicate negative sentiment, positive values indicate positive sentiment, and 0 is neutral.</instruction>
</instruction>

<example-output>
<review>
    <number>1</number>
    <summary>Neutral - Mixed feelings about visibility and setup issues.</summary>
    <score>0.0</score>
</review>
</example-output>

<reviews>
{mycontext}
</reviews>
"""
