# Before we begin

In this tutorial:
1. You will learn to use [KerasNLP](https://keras.io/keras_nlp/) to load a pre-trained Large Language Model (LLM) - [GPT-2 model](https://openai.com/research/better-language-models) (originally invented by OpenAI)
2. finetune it to a specific text style
3. generate text based on users' input (also known as prompt). 

You will also learn how GPT2 adapts quickly to non-English languages, such as Chinese.

# Install KerasNLP, Choose Backend and Import Dependencies

This examples uses [Keras Core](https://keras.io/keras_core/) to work in any of "tensorflow", "jax" or "torch". Support for Keras Core is baked into KerasNLP, simply change the "KERAS_BACKEND" environment variable to select the backend of your choice. We select the JAX backend below.

Source tutorial: https://keras.io/examples/generative/gpt2_text_generation_with_kerasnlp/

In [None]:
%pip install pip -U -q
%pip install -r ../requirements.txt -q

In [None]:
import os

os.environ["KERAS_BACKEND"] = "tensorflow"  # "jax"  # or "tensorflow" or "torch"

import keras_nlp
import tensorflow as tf
import keras_core as keras
import time

# Introduction to Generative Large Language Models (LLMs)

Large language models (LLMs) are a type of machine learning models that are trained on a large corpus of text data to generate outputs for various natural language processing (NLP) tasks, such as text generation, question answering, and machine translation.

Generative LLMs are typically based on deep learning neural networks, such as the [Transformer architecture](https://arxiv.org/abs/1706.03762) invented by Google researchers in 2017, and are trained on massive amounts of text data, often involving billions of words. These models, such as Google [LaMDA](https://blog.google/technology/ai/lamda/) and [PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html), are trained with a large dataset from various data sources which allows them to generate output for many tasks. The core of Generative LLMs is predicting the next word in a sentence, often referred as Causal LM Pre-training. In this way LLMs can generate coherent text based on user prompts. For a more pedagogical discussion on language models, you can refer to the [Stanford CS324 LLM class](https://stanford-cs324.github.io/winter2022/lectures/introduction/).

# Introduction to KerasNLP

## Pre-trained LLMs
Large Language Models are complex to build and expensive to train from scratch. Luckily there are pre-trained LLMs available for use right away. [KerasNLP](https://keras.io/keras_nlp/) provides a large number of pre-trained checkpoints that allow you to experiment with SOTA models without needing to train them yourself.

KerasNLP is a natural language processing library that supports users through their entire development cycle. KerasNLP offers both pre-trained models and modularized building blocks, so developers could easily reuse pre-trained models or stack their own LLM.

## KerasNLP

Pre-trained models with generate() method, e.g., [keras_nlp.models.GPT2CausalLM](https://keras.io/api/keras_nlp/models/gpt2/gpt2_causal_lm#gpt2causallm-class) and [keras_nlp.models.OPTCausalLM](https://keras.io/api/keras_nlp/models/opt/opt_causal_lm#optcausallm-class).
Sampler class that implements generation algorithms such as Top-K, Beam and contrastive search. These samplers can be used to generate text with custom models.

# Load a pre-trained GPT-2 model and generate some text

KerasNLP provides a number of pre-trained models, such as [Google Bert](https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html) and [GPT-2](https://openai.com/research/better-language-models). You can see the list of models available in the [KerasNLP repository](https://github.com/keras-team/keras-nlp/tree/master/keras_nlp/models).

It's very easy to load the GPT-2 model and interchange with other versions.

|Preset name|	Parameters|	Description|Download Time| Model Size|
|-----------|-----------|-----------|-----------|-----------|
|gpt2_base_en	|124.44M	|12-layer GPT-2 model where case is maintained. Trained on WebText.| ~13.5s| 477M  |
|gpt2_medium_en	|354.82M	|24-layer GPT-2 model where case is maintained. Trained on WebText.|  ~39s  | 1.4G  |
|gpt2_large_en	|774.03M	|36-layer GPT-2 model where case is maintained. Trained on WebText.|    |   |
|gpt2_extra_large_en	|1.56B	|48-layer GPT-2 model where case is maintained. Trained on WebText.|    |   |
|gpt2_base_en_cnn_dailymail	|124.44M	|12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.|  |   |

*download speed from hotel WiFi at 25 Mbps*

In [None]:
# copy and paste the model

base_model = "gpt2_medium_en"

In [None]:
# To speed up training and generation, we use preprocessor of length 128
# instead of full length 1024.
preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
    base_model,
    sequence_length=128,
)
gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset(
    base_model, preprocessor=preprocessor
)

Once the model is loaded, you can use it to generate some text right away. Run the cells below to give it a try. It's as simple as calling a single function generate():

In [None]:
prompt = "Can kubernetes run LLM training in the context of AI"

In [None]:
start = time.time()

# We slightly altered the text to understand what the model thought of us?
output = gpt2_lm.generate(prompt, max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")

Try another one

In [None]:
prompt = "How do I check if you are hallucinating in my jupyter notebook in the context of AI LLM"

In [None]:
start = time.time()

output = gpt2_lm.generate(prompt, max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")

Notice how much faster the second call is. This is because the computational graph is [XLA compiled](https://www.tensorflow.org/xla) in the 1st run and re-used in the 2nd behind the scenes.

The quality of the generated text looks OK, but we can improve it via fine-tuning.

# Save the pre-trained GPT-2 model to storage

You can save the model in different formats depending on how you intend to serve the model. In short, this save will enable us to do early online experimentation with the pre-trained model.

In [None]:
# Local storage

gpt2_lm.save("../models/gpt2_lm.keras")
# gpt2_lm.save('../models/gpt2_lm.h5')

## Save to S3 Object Storage (Minio)

Lets use the NVIDIA Triton model folder structure to store the saved models

Triton model folder structure:
```
models (provide this dir as source / MODEL_REPOSITORY )
└─ [ model name ]
    └─ 1 (version)
        └── model.savedmodel (we will use .keras)
            ├── saved_model.pb
```

In [None]:
# install requirements

%pip install -U boto3 python-dotenv -q

In [None]:
# assuming Minio is deployed, populate the environment variables

!  echo "AWS_S3_BUCKET=${AWS_S3_BUCKET:-models}" > .env
!  echo "AWS_S3_ENDPOINT=${AWS_S3_ENDPOINT:-http://minio.minio.svc:9000}" >> .env
!  echo "AWS_ACCESS_KEY_ID=$(oc -n minio extract secret/minio-root-user --keys=MINIO_ROOT_USER --to=-)" >> .env
!  echo "AWS_SECRET_ACCESS_KEY=$(oc -n minio extract secret/minio-root-user --keys=MINIO_ROOT_PASSWORD --to=-)" >> .env

In [None]:
# import the packages

import os, boto3
from dotenv import load_dotenv

load_dotenv()

In [None]:
# upload the models from local storage to S3

local_path = "../models"
remote_path = "gpt2/1"

bucket = os.getenv("AWS_S3_BUCKET", "models")

s3 = boto3.client(
    "s3",
    endpoint_url=os.getenv("AWS_S3_ENDPOINT", "http://minio.minio.svc:9000"),
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", "minioadmin"),
    aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin"),
)


if bucket not in [bu["Name"] for bu in s3.list_buckets()["Buckets"]]:
    s3.create_bucket(Bucket=bucket)


def uploadDirectory(path, bucketname):
    for root, dirs, files in os.walk(path):
        for file in files:
            print(f"uploading: {file} to {bucket}/{remote_path}")
            s3.upload_file(
                os.path.join(root, file), bucketname, f"{remote_path}/{file}"
            )
            print("[ok]")


uploadDirectory(path=local_path, bucketname=bucket)

# Please Clear All Outputs and close the notebook before running 02_ notebook.