<a href="https://colab.research.google.com/github/jonasmue/natural-language-generation/blob/master/Generating_Cover_Letters_with_GPT_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Writing Cover Letters with GPT-2
---

The following notebook lets you write cover letters with a fine-tuned version of [OpenAI's GPT-2 model](https://openai.com/blog/better-language-models/). For more information on the model and the data it has been fine-tuned on, please refer to [this article](https://jonasportfol.io/posts/cover-letters-gpt2).

## Setup and Imports

First we need to install Hugging Face's **transformers framework**.

In [None]:
!pip install transformers

In the following cell, we download and initialize the [model](https://huggingface.co/jonasmue/cover-letter-distilgpt2).



In [None]:
from transformers import AutoTokenizer, GPT2LMHeadModel
SPECIAL_TOKEN = "<|endoftext|>"
MODEL_NAME = "jonasmue/cover-letter-distilgpt2" # One of ["jonasmue/cover-letter-gpt2","jonasmue/cover-letter-distilgpt2"]
model = GPT2LMHeadModel.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, bos_token=SPECIAL_TOKEN, eos_token=SPECIAL_TOKEN, unk_token=SPECIAL_TOKEN)

## Text Generation

Cover letters are written **starting from a human-written prompt**. It "guides" the model into the topic your cover letter should be about. **Try experimenting with different prompts** from different job fields. 🧪


In [None]:
PROMPT = "Dear Sir or Madam, I want to apply as software engineer in your company." # <-- Experiment with this 🧪!
input_ids = tokenizer.encode(PROMPT, return_tensors="pt")

Finally, let's now **generate a cover letter**, starting with your prompt! ⚙️

In [None]:
NUM_SEQUENCES = 1 # <-- Adjust this, if you want to generate more than one cover letter 🔢

In [None]:
sample_outputs = model.generate(
    input_ids,
    do_sample=True,
    max_length=512,
    top_k=50,
    top_p=0.92,
    num_return_sequences=NUM_SEQUENCES
)

Voila, we're done! 🥳 Now let's take a look at our **result(s)**! 📨 

In [None]:
print("Output\n" + "=" * 80)
for i, sample_output in enumerate(sample_outputs):
  print("\nCover Letter {}:\n\n{}".format(i + 1, tokenizer.decode(sample_output, skip_special_tokens=True)))