# Blog Title Generation

This is a simple standalone dabble into writing generative AI code, but it's also part of a bigger example.

This is safe to run multiple times as you add topics. It writes to the filesystem and will check that topics haven't already had titles generated for them before executing logic.

To regenerate titles for a topic after changing parameters or prompts, delete the `data/gen/blog-titles` directory or one of its child directories that corresponds to the topic you want to run again.

Strategy overview:
1. Use few-shot prompting to generate titles for blog posts.
2. Use few-shot prompting to generate the blog post content.
3. Run supervised tuning with the outputs to fine-tune a model to generate blog posts in the format we want given a minimal prompt.
4. Use the tuned model to generate blog posts.

## Example generated files

You can find the generated files and training/evaluation sets from when this example was run at:
- [data/gen/blog-titles-example](../../data/gen/blog-titles-example)
- [data/gen/blog-articles-example](../../data/gen/blog-articles-example)
- [data/training-sets/blog-generation/training.jsonl](../../data/training-sets/blog-generation/training.jsonl)
- [data/training-sets/blog-generation/evaluation.jsonl](../../data/training-sets/blog-generation/evaluation.jsonl)

## Configure variables

In [1]:
import os

# ****************** [START] Google Cloud project settings ****************** #
project =  os.getenv('GCP_PROJECT')
location = os.environ.get('GCP_REGION', 'us-central1')
# ******************* [END] Google Cloud project settings ******************* #


# ********************** [START] data directory config ********************** #
from helpers.files import get_data_dir
data_dir = get_data_dir()
# *********************** [END] data directory config *********************** #


# ******************* [START] generated blog title config ******************* #
# directory to write files to for persisting generated responses
blog_titles_dir = os.path.join(data_dir, 'gen', 'blog-titles')

# topics to generate blog titles for
topics = [
  "applications of generative ai",
  "linkedin",
  "starting a business",
  "learning to knit",
  "learning to cook healthy meals",
]
# ******************** [END] generated blog title config ******************** #


# *********************** [START] LLM parameter config ********************** #
# Vertex AI model to use for the LLM
model_name='text-bison@002'

# maximum number of model responses generated per prompt
candidate_count = 1

# determines the maximum amount of text output from one prompt.
# a token is approximately four characters.
max_output_tokens = 256

# temperature controls the degree of randomness in token selection.
# lower temperatures are good for prompts that expect a true or
# correct response, while higher temperatures can lead to more
# diverse or unexpected results. With a temperature of 0 the highest
# probability token is always selected. for most use cases, try
# starting with a temperature of 0.2.
temperature = 0.2

# top-p changes how the model selects tokens for output. Tokens are
# selected from most probable to least until the sum of their
# probabilities equals the top-p value. For example, if tokens A, B, and C
# have a probability of .3, .2, and .1 and the top-p value is .5, then the
# model will select either A or B as the next token (using temperature).
# the default top-p value is .8.
top_p = 0.8

# top-k changes how the model selects tokens for output.
# a top-k of 1 means the selected token is the most probable among
# all tokens in the model’s vocabulary (also called greedy decoding),
# while a top-k of 3 means that the next token is selected from among
# the 3 most probable tokens (using temperature).
top_k = 40
# *********************** [END] LLM parameter config ************************ #


# ********************** [START] Configuration Checks *********************** #
if not project:
    raise Exception('GCP_PROJECT environment variable not set')
# *********************** [END] Configuration Checks ************************ #


## Import and Initialize Vertex AI Client

This will complain about not having cuda drivers and the GPU not being used. You can safely ignore that. If you want to use the GPU, that's possible in Linux with Docker, but you'll need to set up a non-containerized development environment to use GPUs with MacOS.

In [2]:
from google.cloud import aiplatform
import vertexai

vertexai.init(project=project, location=location)

print(f"Vertex AI SDK version: {aiplatform.__version__}")


2023-12-13 04:27:43.575508: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-13 04:27:43.577462: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-13 04:27:43.596969: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-13 04:27:43.597007: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-13 04:27:43.597023: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to regi

Vertex AI SDK version: 1.36.0


## Few-Shot Prompt Configuration

This is where we show a few examples of the input we're going to give to the model, and what we expect the output to look like.

We break the prompt into sections for `Context:` and `Examples:`, and use two newline characters as a delimiter between the examples.

In [3]:
def create_prompt(topic):
  prompt = f"""
Context:
Assume the role of a Buzzfeed journalist and provide blog article titles related to the given prompt.

Examples:
Article subject: software engineering
Article titles:
1. 5 Ways to Improve Your Software Engineering Skills
2. 3 Tips for Writing Clean Code
3. 5 Tools Every Software Engineer Should Know
4. 5 Ways to Improve Your Software Development Process
5. 3 Tips for Debugging Software
6. 5 Ways to Improve Your Software Testing Skills
7. 3 Tips for Designing Scalable Software
8. 5 Ways to Improve Your Software Security
9. 3 Tips for Managing Software Projects
10. 5 Ways to Improve Your Software Documentation


Article subject: generative ai
Article titles:
1.  What is Generative AI and How is it Changing the World?
2.  The 5 Most Important Things You Need to Know About Generative AI
3.  5 Ways Generative AI is Revolutionizing the Way We Create Content
4.  The Future of Generative AI: What to Expect in the Next 5 Years
5.  The Ethical Implications of Generative AI: 3 Important Questions to Ask
6.  3 Ways Generative AI is Changing the Way We Work
7.  The 5 Biggest Challenges Facing Generative AI Today
8.  The 5 Most Promising Applications of Generative AI
9.  3 Ways Generative AI is Making the World a Better Place
10.  The 5 Biggest Risks Associated with Generative AI


Article subject: skiing
Article titles:
1. 5 Tips for Beginners to Learn How to Ski
2. 3 Ways to Improve Your Skiing Skills
3. 5 Best Ski Resorts for Beginners
4. 3 Tips for Choosing the Right Skis
5. 5 Ways to Stay Safe While Skiing
6. 3 Tips for Getting in Shape for Skiing
7. 5 Best Ski Resorts for Families
8. 3 Tips for Planning a Ski Trip
9. 5 Best Ski Resorts for Après-Ski
10. 3 Tips for Making the Most of Your Ski Trip


Article subject: {topic}
Article titles:"""

  return prompt


## Write some functions to help persist generated text in files

We could just print the output and look at it here, but we can integrate it with other notebooks and tools if we write responses to a file.

These article titles are intended to be used to generate real article content, so we'll make sure to save them.

In [4]:
from helpers.files import file_exists, make_dir_if_not_exists

def get_output_directory(topic):
    # replace spaces with dashes and remove punctuation
    topic_cleaned = topic.replace(" ", "-").replace("'", "").replace(":", "")

    return os.path.join(blog_titles_dir, topic_cleaned)


def prepare_output_directory(topic):
  """
  Creates the output directory for the given topic if it doesn't already exist.

  Returns the output directory path and a boolean indicating whether the directory
  was created or not
  """

  output_dir = get_output_directory(topic)

  # skip if the directory already exists
  if file_exists(output_dir):
      return output_dir, False

  make_dir_if_not_exists(output_dir)

  return output_dir, True


def persist_generated_response(output_dir, prompt, response):
  """
  Persists the given prompt and response to the given output directory
  """

  with open(os.path.join(output_dir, "input.txt"), "w") as f:
      f.write(prompt)

  with open(os.path.join(output_dir, "output.txt"), "w") as f:
      f.write(response.text.strip())


## Write a function to print attributes returned by the model

This is just to help understand what the model is returning. For this purpose, we're ignoring all outputs other than the response text.

If you were generating content for a family-friendly blog, you might want to check the toxicity score or some other attributes of the output and reject it if it's too high.

In [5]:
def print_verbose_response(response):
  """
  Prints the response from the LLM alongside various metadata
  """

  print(f"\n---\nResponse:\n\n{response.text.strip()}")
  print(f"\n---\nResponse is_blocked:\n\n{response.is_blocked}")
  print(f"\n---\nResponse safety_attributes:\n\n{response.safety_attributes}")

## Content Generation Logic

This configures the LLM prediction parameters and runs the logic to generate the content.

In [6]:

from vertexai.language_models import TextGenerationModel

parameters = {
    "candidate_count": candidate_count,
    "max_output_tokens": max_output_tokens,
    "temperature": temperature,
    "top_p": top_p,
    "top_k": top_k,
}

model = TextGenerationModel.from_pretrained(model_name)

for topic in topics:
    output_dir, created = prepare_output_directory(topic)

    # skip if the directory already exists
    if not created:
        print(f"Skipping {topic} because it already exists.")
        continue

    prompt = create_prompt(topic)

    print(f"\n---\nPrompt:\n\n{prompt}")

    # generate response
    response = model.predict(prompt=prompt, **parameters)

    # print some metadata alongside the response
    print_verbose_response(response)

    # persist the prompt and response to the output directory
    # in input.txt and output.txt files respectively
    persist_generated_response(output_dir, prompt, response)



---
Prompt:


Context:
Assume the role of a Buzzfeed journalist and provide blog article titles related to the given prompt.

Examples:
Article subject: software engineering
Article titles:
1. 5 Ways to Improve Your Software Engineering Skills
2. 3 Tips for Writing Clean Code
3. 5 Tools Every Software Engineer Should Know
4. 5 Ways to Improve Your Software Development Process
5. 3 Tips for Debugging Software
6. 5 Ways to Improve Your Software Testing Skills
7. 3 Tips for Designing Scalable Software
8. 5 Ways to Improve Your Software Security
9. 3 Tips for Managing Software Projects
10. 5 Ways to Improve Your Software Documentation


Article subject: generative ai
Article titles:
1.  What is Generative AI and How is it Changing the World?
2.  The 5 Most Important Things You Need to Know About Generative AI
3.  5 Ways Generative AI is Revolutionizing the Way We Create Content
4.  The Future of Generative AI: What to Expect in the Next 5 Years
5.  The Ethical Implications of Generative A