# Topic Detection
The goal of this notebook is to detect what a passage is talking about using an LLM. Each user needs to be authenticated individually in order to 

![Input, authentication, and evaluation flow](./vertex_flow.png)

In [1]:
%pip install google-cloud-aiplatform

Note: you may need to restart the kernel to use updated packages.


In [2]:
import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models

2024-02-16 10:22:14.727660: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-16 10:22:14.784562: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-16 10:22:14.784611: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-16 10:22:14.786497: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-16 10:22:14.796395: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-16 10:22:14.798035: I tensorflow/core/platform/cpu_feature_guard.cc:1

In [3]:
# !gcloud auth application-default login 
# !gcloud auth application-default set-quota-project pic-semantic-search

## Authentication
VertexAI requires authentication before querying the model. For this notebook, test accounts will be provided.

In [4]:
# TODO: Change the authentication
configs = {"project": "pic-gen-ai-project",
            "region": "asia-northeast1"}
vertexai.init(
    project=configs["project"],
    location=configs["region"]
)


In [5]:
def generate(prompt:str, model:GenerativeModel):
  """Pass a prompt to the generative model, print the response."""
  responses = model.generate_content(
    prompt,
    generation_config={
        "max_output_tokens": 2048,
        "temperature": 0.9,
        "top_p": 1
    },
    safety_settings={
          generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
          generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
          generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
          generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    stream=True,
  )
  
  for response in responses:
    print(response.text, end="")
    print("\n")

## Testing samples
Our simple dataset is focused on sentences from various topics ranging from news to enterntainment. We wish our model to infer the sentence topic and simply return the topic being discussed. The `generate()` function takes in the prompt with the target sentence and queries the model instance.

In [6]:
PROMPT = "Given a sentence or a short passage, identify the topic being discussed and return your guess as a single word or phrase. Sentence: "

# These topics are mostly from news headlines. Requests are normally limited to 10 per minute
dataset = ["UK economy slips into recession",
           "Here's Why It Might Seem Like You've Gained Weight In Your Face",
           "The quick brown fox jumps over the lazy dog.",
           "S&P 500 sets new record as it hits 5000.",
           "Five recipes to start off the Chiense New Year.",
           "Germany overtakes Japan as world's third largest economy.",
           "The 2022 world cup final drew more than 1.5 billion viewers"
        ]

# These include excerpts from the news articles, are still summarized in a few lines.
longer_dataset = ["The Insatiable Ambition of LeBron James: The basketball legend is building a business that includes movies and TV, advertising and a grooming line. It’s all part of a new model for how athletes can cash in on their fame.",
                "Part of an ancient fossil prized for what its mummified flesh revealed about a reptile that predated the first dinosaurs has been shown to be a forgery. While the fossil’s legs and scales are genuine, portions of what appeared to be an intact 8-inch-long reptile are painted rock.",
                "OpenAI has introduced new technology that uses artificial intelligence to create high-quality videos from text descriptions. The company released short clips showcasing vivid, seemingly realistic videos, including woolly mammoths trekking across a snowy field, ocean waves crashing against a cliff’s shoreline and people doing everyday things like reading a book or walking down a city street. OpenAI calls the new system Sora. It takes a written prompt and, through AI, renders a richly detailed video. OpenAI is one of many companies like Alphabet’s Google and Meta Platforms seeking to capitalize on new AI-video developments.",
]

# Create an instance of a gemini-pro model in our project
model = GenerativeModel("gemini-1.0-pro")
for item in dataset:
    print(item)
    generate(PROMPT + item, model)

UK economy slips into recession
United Kingdom's economy

Here's Why It Might Seem Like You've Gained Weight In Your Face
Facial weight gain

The quick brown fox jumps over the lazy dog.
English alphabet

S&P 500 sets new record as it hits 5000.
Stock market

Five recipes to start off the Chiense New Year.
Chinese New Year cooking

Germany overtakes Japan as world's third largest economy.
Economy

The 2022 world cup final drew more than 1.5 billion viewers
FIFA World Cup

